Easy-to-use dataset generator for applying machine learning on financial markets
- You can run it fast, and it is easy to use.
- There are no complexities and no database usage in this project. Even dependencies are a few.
- It is easy to modify and customize.
- This project generates practical datasets for data scientists.
- You can read the code for educational purposes.
- Clone the repository.
- Run
pip3 install -r requirements.txt. - Put your Nasdaq Data Link API key in the
API_KEYfile. - Run
python3 main.py.
This will generate train set and test set for you.
For the configuration, you can:
- Change
config.pyconstants. - Define new indicators in
indicators.py.
PAIR_NAMES_LIST_WITH_SOURCE: What's your machine learning model input?TARGET_PAIR_NAME_WITH_SOURCE: What's your machine learning model output?SMA_LENGTHS_LIST: Do you want to generate a dataset with some moving averages?APPLY_FLIP_AUGMENTATIONandAPPLY_NOISE_AUGMENTATION: Using data augmentationsAUGMENTATION_NOISE_INTERVAL: Set the amount of augmentation noiseTRAIN_DATASET_NEW_SIZE_COEFFICIENT: How much augmented data do you want?START_TIMEandEND_TIME: The time interval for the datasetFORECAST_DAYS: How many days is your target?USE_WMA_FOR_FORECAST_DAYS: Do you want to use linear weighted moving average for your target?NUMBER_OF_CANDLES: Number of candles your machine learning model needs as its inputTRAIN_CSV_FILE_PATH,TEST_CSV_FILE_PATH, andPREDICT_CSV_FILE_PATH: Output CSV file pathsTEST_SET_SIZE_RATIO: Test set size to whole dataset size ratioCSV_DELIMITER: The delimiter in every generated CSV fileAPI_KEY_FILE_PATH: Path to the Nasdaq Data Link API key file