Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
terence-lim committed Sep 4, 2023
1 parent 01cb55c commit baaaf5c
Show file tree
Hide file tree
Showing 44 changed files with 20,252 additions and 32,824 deletions.
163 changes: 163 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# emacs backup
*~

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
177 changes: 47 additions & 130 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,149 +1,66 @@
# 30+ Financial Data Science Projects
# Financial Data Science Notebooks

These Jupyter notebooks contain code examples and output from 30+
financial data science projects, which apply quantitative and
machine learning methods to large structured and unstructured
financial data sets. They accompany the [FinDS Python
repo](https://github.com/terence-lim/financial-data-science.git),
but reflect an older version hence do not (yet) sync with the code and examples
presently in that repo.
_UNDER CONSTRUCTION_

1. [Stock identifier changes and price adjustments](stock_prices.ipynb)
- stock splits, dividends, identifiers, and total holding returns

2. [Construct Jegadeesh-Titman rolling portfolios](jegadeesh_titman.ipynb)
- Newey-West correction; momentum effect
__30+ Projects in Financial Data Science__, presented as Jupyter Notebooks, using the _FinDS_ Python package

3. [Construct Fama-French sorted portfolio](fama_french.ipynb)
- linear regression; value and size anomaly

## Expected Returns

4. [Estimate Fama-Macbeth cross-sectional regressions](fama_macbeth.ipynb)
- CAPM tests; polynomial regression; feature transformations
## Topics

5. [Backtesting a stock price reversal trading strategy](weekly_reversal.ipynb)
- Contrarian strategy; statistical arbitrage
- implementation shortfall; structural change with unknown breakpoint

2. [Event studies of key developments](event_study.ipynb)
- Abnormal returns; post-announcement drift; multiple testing
| notebook | Financial | Data | Science |
|:--|:--|:--|:--|
| stock_prices | Stock distributions, delistings | CRSP stocks | Sample selection |
| jegadeesh_titman | Overlapping portfolios; <br> Momentum | CRSP stocks | Hypothesis testing; <br> Newey-West correction |
| fama_french | Bivariate sorts; <br> Value, Size; <br> CAPM | CRSP stocks; <br> Compustat | Linear regression; <br> Quadratic programming |
| fama_macbeth | Cross-sectional Regressions; <br> Beta | Ken French data library | Feature transformations; <br> Kernel regression, LOOCV |
| weekly_reversals | Mean reversion; <br> Implementation shortfall | CRSP stocks | Structural break tests |
| quant_factors | Factor zoo; <br> Performance evaluation | CRSP stocks; <br> Compustat; IBES | Clustering for unsupervised learning |
| event_study | Event studies | S&P key developments | Multiple testing; <br> FFT |
| economic_releases | Macroeconomic analysis; <br> Unemployment | ALFRED | Economic data revisions |
| regression_diagnostics | Regression analysis; <br> Inflation | FRED | Linear regression diagnostics; <br> Residual Analysis |
| econometric_forecast | Time series analysis; <br> National Output | FRED | Stationarity, Autocorrelation |
| approximate_factors | Approximate factor models | FRED-MD | Unit Root; <br> PCA; <br> EM Algorithm |
| economic_states | State space models | FRED-MD | Gaussian Mixture; <br> HMM; <br> Kalman Filter |
| conditional_volatility | Value at risk; <br> Conditional volatility | FRED cryptos and currencies | ARCH, GARCH; <br> VaR, TVaR |
| covariance_matrix | Covariance matrix estimation; <br> Portfolio risk | Ken French data library | Shrinkage |
| term_structure | Interest rates, yield curve | FRED | Splines, PCA |
| bond_returns | Bond portfolio returns | FRED | SVD |
| option_pricing | Binomial trees; <br> the Greeks | OptionMetrics; <br> FRED | Simulations |
| market_microstructure | Liquidity costs; <br> Bid-ask spreads | TAQ tick data | Realized volatility; Variance ratio |
| event_risk | Earnings surprises | IBES; <br> FRED-QD | Poisson regression; <br> GLM's |
| customer_ego | Principal customers | Compustat customer segments | Graph Networks |
| bea_centrality | Input-output use tables | Bureau of Economic Analysis | Graph centrality |
| industry_community | Industry sectors | Hoberg&Phillips data library | Community detection |
| link_prediction | Product markets | Hoberg&Phillips data library | Links prediction |
| spatial_regression | Earnings surprises | IBES; <br> Hoberg&Phillips data library | Spatial regression |
| fomc_topics | Fedspeak | FOMC meeting minutes | Topic modelling |
| mda_sentiment | Company filings | SEC Edgar | Sentiment analysis |
| business_description | Growth and value stocks | SEC Edgar | Part-of-speech tagging |
| classification_models | News classification | S&P key developments | Classification for supervised learning |
| regression_models | Macroeconomic forecasting | FRED-MD | Regression for supervised learning |
| deep_classifier | News classification | S&P key developments | Feedforward neural networks; <br> Word embeddings; <br> Deep averaging |
| convolutional_net | Macroeconomic forecasting | FRED-MD | Temporal convolutional networks; <br> Vector autoregression |
| recurrent_net | Macroeconomic forecasting | FRED-MD | Elman recurrent networks; <br> Kalman filter |
| fomc_language | Fedspeak | FOMC meeting minutes | Language modelling; <br> Transformers |
| reinforcement_learning | Spending policy | Stocks, bonds, bills, and inflation | Reinforcement learning |

2. [Performance evaluation of factor investing](quant_factors.ipynb)
- Return predicting signals; performance evaluation

## Risk
## Resources

2. [Conditional volatility of cryptocurrencies](conditional_volatility.ipynb)
- Value at Risk, Expected Shortfall, GARCH, EWMA; bitcoin, etherium
1. [Online Jupyter-book](https://terence-lim.github.io/data-science-notebooks/), or [download pdf](https://terence-lim.github.io/notes/data-science-notebooks.pdf)

2. [Covariance matrix estimates of industry returns](covariance_matrix.ipynb)
- Covariance Matrix: PCA, SVD, Shrinkage
- Risk Decomposition, Black-Litterman, Risk Parity
2. [FinDS API reference](https://terence-lim.github.io/financial-data-science/)

2. [Visualizing the term structure of interest rates](term_structure.ipynb)
- yield curve, duration, bootstrap

2. [Examine principal components of bond returns](bond_returns.ipynb)
- Principal components analysis, bond returns

2. [Market microstructure: Intra-day liquidity from tick data](market_microstructure.ipynb)
- TAQ tick data; spreads, Lee-Ready tick test, intra-day volatility
3. [FinDS repo](https://github.com/terence-lim/finds)

2. Event risk: Count dependent and aggregate loss models
- frequency and severity of actuarial risks
4. [Jupyter notebooks repo](https://github.com/terence-lim/finds-notebooks)


## Econometric Methods
## Contact

2. [Revisions of macroeconomic time series from ALFRED](revisions_vintage.ipynb)
- Archival-FRED, vintages
Github: [https://terence-lim.github.io](https://terence-lim.github.io)

2. [Analyze linear regression gaussian assumptions](linear_diagnostics.ipynb)
- Residual analysis, outliers, leverage, influential points
- Multicollinearity; robust standard errors

2. [Forecast inflation time series](econometric_forecast.ipynb)
- trends, stationarity, seasonality, ARMA, smoothing, cointegration
- granger causality, impulse response function

2. [Approximate factor model of FRED-MD macroeconomic series](approximate_factors.ipynb)
- PCA-EM, unit root



## Network Science

2. [Ego network of principal customers supply chain](customer_ego.ipynb)
- Induced subgraph, ego network

2. [Centrality measures of BEA input-output tables](bea_centrality.ipynb)
- Graph centrality algorithms

2. [Community detection for industry sectoring](industry_community.ipynb)
- Community detection graph algorithms

2. [Link prediction on company relationships](link_prediction.ipynb)
- Accuracy metrics; imbalanced sample
- Random graphs, link prediction graph algorithms


## Text Mining

2. [Logistic regression for text classification of key developments
financial news](keydev_classifier.ipynb)
- Logistic regression, stochastic gradient descent

2. [Sentiment analysis of 10-K management discussion text](mda_sentiment.ipynb)
- SEC Edgar, Loughran-MacDonald dictionary

2. [Syntactic analysis of 10-K business descriptions for industry
classifications](business_description.ipynb)
- Softmax regression; POS tagging, named entity recognition

2. [Topic modeling of FOMC meeting minutes](fomc_topics.ipynb)
- Matrix decomposition algorithms


## Machine Learning

2. [Compare classification models for key developments financial news
classification](classification_models.ipynb)
- Generalized linear models, SVM, KNN, Naive-Bayes, decision tree
- Cross-validation, feature importances

2. [Compare regression models for inflation prediction](regression_models.ipynb)
- Subset selection, dimensional reduction, penalized least squares, ensembles
- Regularization

2. Unsupervised learning: Cluster analysis of factor risk premiums
- K-Means, hierarchical clustering

2. [Estimate state space economic models](economic_states.ipynb)
- Mixture models, hidden markov models

2. Bayesian belief networks for fraud detection


## Deep Learning

2. [Tune word embeddings for text classification](dan_classifier.ipynb)
- Deep averaging networks, Feed forward neural net

2. [Recurrent neural network and dynamic factor models](elman_kalman.ipynb)
- Long short term memory (LSTM), kalman filter

2. Train language model of fedspeak

2. [Temporal convolutional networks and VAR](tcn_var.ipynb)
- Convolutional neural network, vector autoregression

2. Deep reinforcement learning and derivatives pricing


## Big Data and the Cloud

2. Big data
- Hadoop, Spark, Hive

2. Cloud computing
Loading

0 comments on commit baaaf5c

Please sign in to comment.