- Dataset Content
- Business Requirements
- Project hypothesis and validation
- Rationale to map the business requirements to the Data Visualizations and ML tasks
- ML Business Case
- Dashboard Design
- Fixed Bugs
- Unfixed Bugs
- Deployment to Heroku
- Main Data Analysis and Machine Learning Libraries
- Credits
- The dataset is sourced from Kaggle. This dataset provides historical Bitcoin data to facilitate predictive analytics and research in cryptocurrency trends.
- The dataset contains daily historical data, represented by various metrics for Bitcoin (BTC) trading. It includes information such as opening, closing, high, and low prices, along with trading volumes for the currency. Additionally, it has been augmented with new features to enhance analysis.
| Variable | Meaning | Units |
|---|---|---|
| Unix Timestamp | Unix timestamp or Epoch Time | Seconds since 1970-01-01 00:00:00 UTC |
| Date | The date corresponding to the timestamp | UTC Timezone |
| Symbol | The cryptocurrency symbol associated with the data | BTC |
| Open | Opening price for the time period | USD |
| High | Highest price for the time period | USD |
| Low | Lowest price for the time period | USD |
| Close | Closing price for the time period | USD |
| Volume (Crypto) | Volume transacted in the cryptocurrency | BTC |
| Volume Base Ccy | Volume transacted in the base currency | USD |
| Variable | Meaning | Units |
|---|---|---|
| Price Mean | Average price computed from open, high, low, and close | USD |
| Upper Shadow | Difference between the high and the maximum of open and close | USD |
| Lower Shadow | Difference between the minimum of open and close and the low | USD |
| Spread | Difference between the high and the low | USD |
| Trade | Difference between the closing and opening prices | USD |
| 12EMA | 12-period Exponential Moving Average of closing price | USD |
| 26EMA | 26-period Exponential Moving Average of closing price | USD |
| MACD | Moving Average Convergence Divergence (12EMA - 26EMA) | USD |
| Buy/Sell Signal | Indicator of price movement (1 for buy, 0 for sell) | Binary (0 or 1) |
@misc{Kottarathil2020,
author = {Prasoon Kottarathil},
title = {Bitcoin Historical Dataset},
year = {2020},
publisher = {Kaggle},
journal = {Kaggle Dataset},
howpublished = {\url{https://www.kaggle.com/prasoonkottarathil/btcinusd}}
}The client is a trading company aiming to gain a competitive edge by identifying patterns in Bitcoin price movements to make informed trading decisions.
Objective:
- Identify key variables that correlate with significant Bitcoin price changes.
- Predict if Bitcoin's price will rise or fall in the near term and determine the likely trend strength.
Business Requirement 1: Data Visualization and Correlation study:
- Will inspect the data related to the closing price.
- Will conduct a correlation study (Pearson and Spearman) to understand better how the variables are correlated to the closing price.
- Will plot the main variables against the closing price to visualize insights.
Business Requirement 2: Classification, Regression:
- We want to predict the closing price of the new day. Will build a regression model.
- We want to predict if it is a good momnet o buy or to sell. We want to build a classification model for buy and sell signals.
- Hypothesis: Bitcoin price changes correlate with trading volume (in BTC and USD).
- Validation: A Correlation study can help in this investigation
- Hypothesis: Daily closing prices are influenced by the daily high and low prices.
- Validation: A Correlation study can help in this investigation
- Data collection and preparation
- Data visualization
- Model training, optimization, and evaluation
- Dashboard development and deployment
- User Story: As a developer, I can gather and prepare a high-quality dataset to ensure reliable model training.
- Task: Acquire the necessary datasets and organize them in a structured folder format for easy access.
-
User Story: As a developer, I can create engaging visualizations to help stakeholders grasp key insights from the data.
- Task: Select suitable visualization tools, generate visual representations, and export them for integration.
-
User Story: As a developer, I can embed visualizations within the dashboard to facilitate intuitive data exploration.
- Task: Design a user-friendly interface that incorporates interactive visual elements.
-
User Story: As a developer, I can identify the best hyperparameters for my model to enhance its performance.
- Task: Implement hyperparameter tuning techniques such as Grid Search to determine optimal settings.
-
User Story: As a developer, I can train my machine learning model using the optimal hyperparameters to achieve accurate predictions.
- Task: Define the model architecture and create functions that utilize the optimal parameters for training.
-
User Story: As a developer, I can assess my model's performance using various evaluation metrics to ensure it meets business requirements.
- Task: Utilize machine learning libraries to conduct evaluations and visualize the results effectively.
-
User Story: As a user, I can view comprehensive evaluation results to understand the model's strengths and weaknesses.
- Task: Create an interface that presents detailed performance reports for the user.
-
User Story: As a developer, I can implement interactive features in Streamlit to enhance user engagement with the dashboard.
- Task: Develop and integrate various Streamlit functionalities that make the dashboard intuitive and responsive.
-
User Story: As a developer, I can successfully deploy the Streamlit dashboard to ensure it is available for public use.
- Task: Deploy the application on Heroku and confirm that users can access the dashboard online.
- Model Type: Regression
- Goal: Predict the next closing price based on historical data.
- Success Metrics:
- R² score of at least 0.7.
- Failure Conditions:
- More than 30% of predictions being off by 20% after 6 months.
- Model Type: Classification
- Goal: Predict whether the market trend is favorable for a "buy" or "sell" decision.
- Success Metrics:
- 80% recall for buy/sell signals.
- Failure Conditions:
- More than 30% incorrect buy/sell predictions after 3 months.
- Data-driven insights for better trading decisions.
- Real-time predictions for increased market responsiveness.
- Enhanced profitability by accurate forecasting of prices and trends.
- Quick project summary
- Project Terms & Jargon
- Describe Project Dataset
- State Business Requirements
- Adresses Business Requirement No. 1: Data Visualization and Correlation Study.
- Checkbox: Data inspection on Bitcoin metrics (display the number of rows and columns in the data, and display the first ten rows of the data).
- Checkbox: Individual plots for closing prices against high/low prices.
- Checkbox: Visualize the overlay plot for trading volume and price changes.
- Addresses Business Requirement No. 2: Price Forecasting, providing a brief overview of the prediction methods used while running predictions on live data.
Before the analysis, we aimed to outline each project hypothesis, the conclusions drawn, and the validation methods employed. After conducting the data analysis, we can report the following:
- Hypothesis 1: Bitcoin price changes correlate with trading volume in USD.
This relationship is supported by our correlation study, indicating a significant link between price movements and trading volume in USD.
- Hypothesis 2: Daily closing prices of Bitcoin are influenced by the daily high and low prices, impacting the trend direction and strength.
This hypothesis is validated by correlation analysis, suggesting that further exploration of these variables is warranted.
- Considerations and conclusions after the pipeline is trained
- Present ML pipeline steps
- Feature importance
- Pipeline performance
- Considerations and conclusions after the pipeline is trained
- Present ML pipeline steps
- Feature importance
- Pipeline performance
-
The initial regression model struggled to make accurate predictions on live data. To address this, I developed Version 2 of the regression model, which involved a thorough review of the feature set. During this process, I eliminated the simple moving averages as they appeared redundant. Moreover, the 100-day and 50-day SMAs required a substantial amount of historical data to produce their first data points.
-
After retraining Version 2 of the model, I observed improved results, although they were still not optimal. To further enhance performance, I accessed the dataset directly from the Binance API, which I refer to as 'live data.' This dataset includes approximately one year of additional historical data, providing more context. I retrained the model using this API data by fitting it with the transformed features, excluding the 'close' column for predictions. Then, I used the retrained model to predict the closing price for the most recent day, achieving satisfactory results. This approach significantly improved the model's predictive capability.
- There are no Unfixed Bugs.
-
The App live link is: https://bitcoin-forecast-21da6c961a62.herokuapp.com/
-
Deployed on the Heroku-20 stack, this project utilizes a currently supported version of Python as outlined in the Heroku documentation
-
The project was deployed to Heroku using the following steps.
- Log in to Heroku and create an App
- At the Deploy tab, select GitHub as the deployment method.
- Select your repository name and click Search. Once it is found, click Connect.
- Select the branch you want to deploy, then click Deploy Branch.
- The deployment process should happen smoothly if all deployment files are fully functional. Click now the button Open App on the top of the page to access your App.
- If the slug size is too large then add large files not required for the app to the .slugignore file.
- Libraries used in this project:
- numpy==1.24.4
- pandas
- matplotlib==3.3.1
- seaborn
- streamlit
- joblib==1.4.2
- feature-engine==1.6.2
- scikit-learn==1.3.2
Examples:
- Matplotlib is used extensively for creating and displaying various plots related to Bitcoin price movements.
- Pandas is used extensively for data manipulation and analysis.
- NumPy is used for a few key operations, primarily to perform mathematical calculations on the DataFrame.
- This project draws inspiration from various sources, including the Code Institute Walkthrough Project 2 for general project structure and guidance. Additionally, the BTC to USD Dataset on Kaggle served as a key source for content inspiration and practical guidance on dataset handling and manipulation techniques.
The following resources were used as references and guides for this project:
- Churnometer repo by Code Institute - General guide for project structure.
- Data Plot Types for Visualization
- Pandas Documentation:
date_range- Reference for generating date ranges in Pandas. - Line Plots in Matplotlib - Tutorial on creating line plots using Matplotlib.
- BTC to USD Dataset on Kaggle