S&P 500 Historical Data Analysis and Prediction

This repository contains a Jupyter Notebook that analyzes historical data for the S&P 500 stock market index and builds a predictive model to forecast future closing prices. The code uses Yahoo Finance data and applies a RandomForestClassifier from scikit-learn to predict whether the closing price will be higher the next day.

Introduction

This project aims to predict the movement of the S&P 500 index using historical data and machine learning techniques. The primary focus is on determining if the closing price of the S&P 500 will increase the next day.

Data Collection

The data is fetched from Yahoo Finance using the yfinance package. The historical data for the S&P 500 is retrieved for the maximum available period.

Data Preprocessing

Loading Data:
- The S&P 500 historical data is loaded into a DataFrame.
- The index of the DataFrame is set to the date of each record.
Feature Engineering:
- A new column 'Tomorrow' is created to store the closing price of the next day.
- A 'Target' column is added to indicate if the closing price will be higher the next day (1 for true, 0 for false).
Data Filtering:
- The data is filtered to include records from January 1, 1990, to December 31, 2021.
Rolling Averages and Trends:
- Rolling averages and trends are calculated for different horizons (2, 5, 60, 250, 1000 days).
- New predictor columns are added based on these calculations.
Handling Missing Values:
- Rows with missing values are removed from the dataset.

Model Training

A RandomForestClassifier is used to train the model. The predictors include 'Close', 'Volume', 'Open', 'High', 'Low', and the newly engineered features.

Training Steps:

Split the data into training and testing datasets.
Fit the model using the training data.
Make predictions on the test data.
Evaluate the model using precision score.

Backtesting

The backtesting function iterates through the data in steps, training and testing the model on specified predictors. It returns concatenated predictions from each iteration for further analysis.

Usage

To run the code, follow these steps:

Clone the repository:

git clone https://github.com/yourusername/sp500-prediction.git
cd sp500-prediction

Install the required packages:
```
pip install -r requirements.txt
```
Run the Jupyter Notebook:
```
jupyter notebook
```
Open sp500_prediction.ipynb and run the cells sequentially.

Results

The precision score and the proportion of true 'Target' values in the predictions dataset are calculated.
The model's predictions and the actual target values are plotted for visual analysis.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue if you have any suggestions or improvements.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

S&P 500 Historical Data Analysis and Prediction

Table of Contents

Introduction

Data Collection

Data Preprocessing

Model Training

Training Steps:

Backtesting

Usage

Results

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

S&P 500 Historical Data Analysis and Prediction

Table of Contents

Introduction

Data Collection

Data Preprocessing

Model Training

Training Steps:

Backtesting

Usage

Results

Contributing

License