Skip to content

Tzesh/Forester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forester

Modular python class to use Random Forest Classifier and make predictions without re-training data. Does search to find best suitable hyper parameters to the given dataset. Evaluates and saves the statistics, also logs every single action using a logging mechanism.

Features

  • Doesn't need to re-train data
  • Don't need to manually preprocess data
  • Predicts using the best hyper parameters
  • Saves statistics
  • Logs every single action
  • Modular
  • Easy to use

General Project Structure

  • data
    • data.csv # your data that you want to use to train the model
  • log
    • data_unique_datetime_identifier.txt # simply a log file
  • model
    • data_model_encoders.pickle # encoders that are used to encode the data in the preprocessing step
    • data_model_feature_names.pickle # feature names
    • data_model_value_name.pickle # output name
  • statistics
    • data_model_datetime_identifier_confusion_matrix.png # confusion matrix
    • data_model_datetime_identifier_decision_tree.dot # decision tree of the first tree
    • data_model_datetime_identifier_statistics.txt # statistics like accuracy, precision, recall, f1-score, etc.

Usage

from forester import Forester

# Initialize Forester
## Assumes that the data is in the './data/data.csv' file and the default delimiter is ','
## When we set train=True, it will train the model and save the required files
forester = Forester(train=True)

# Create your prediction data
val = [0,...,'Example', 1]

# call make_prediction method
## It will return the prediction
prediction = forester.make_prediction(val)

# Print the prediction
print(prediction)

Example usages from different datasets can be found in the Example.py file.

When you first run the code (Example.py), it will train the model and save the required files. After that, it will use the saved files to make predictions without re-training the model.

First run of Example.py

On the sequential runs, it will use the saved files to make predictions without re-training the model.

Second run of Example.py

Requirements

  • Python 3.6+
  • Scikit-learn
  • Pandas
  • NumPy
  • SciPy

About

Modular python class to use Random Forest Classifier and make predictions without re-training data. Does search to find best suitable hyper parameters to the given dataset. Evaluates and saves the statistics, also logs every single action using a logging mechanism.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages