-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
01cb55c
commit baaaf5c
Showing
44 changed files
with
20,252 additions
and
32,824 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# emacs backup | ||
*~ | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
cover/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
.pybuilder/ | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
# For a library or package, you might want to ignore these files since the code is | ||
# intended to run in multiple environments; otherwise, check them in: | ||
# .python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# poetry | ||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. | ||
# This is especially recommended for binary packages to ensure reproducibility, and is more | ||
# commonly ignored for libraries. | ||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control | ||
#poetry.lock | ||
|
||
# pdm | ||
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. | ||
#pdm.lock | ||
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it | ||
# in version control. | ||
# https://pdm.fming.dev/#use-with-ide | ||
.pdm.toml | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# pytype static type analyzer | ||
.pytype/ | ||
|
||
# Cython debug symbols | ||
cython_debug/ | ||
|
||
# PyCharm | ||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can | ||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore | ||
# and can be added to the global gitignore or merged into this file. For a more nuclear | ||
# option (not recommended) you can uncomment the following to ignore the entire idea folder. | ||
#.idea/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,149 +1,66 @@ | ||
# 30+ Financial Data Science Projects | ||
# Financial Data Science Notebooks | ||
|
||
These Jupyter notebooks contain code examples and output from 30+ | ||
financial data science projects, which apply quantitative and | ||
machine learning methods to large structured and unstructured | ||
financial data sets. They accompany the [FinDS Python | ||
repo](https://github.com/terence-lim/financial-data-science.git), | ||
but reflect an older version hence do not (yet) sync with the code and examples | ||
presently in that repo. | ||
_UNDER CONSTRUCTION_ | ||
|
||
1. [Stock identifier changes and price adjustments](stock_prices.ipynb) | ||
- stock splits, dividends, identifiers, and total holding returns | ||
|
||
2. [Construct Jegadeesh-Titman rolling portfolios](jegadeesh_titman.ipynb) | ||
- Newey-West correction; momentum effect | ||
__30+ Projects in Financial Data Science__, presented as Jupyter Notebooks, using the _FinDS_ Python package | ||
|
||
3. [Construct Fama-French sorted portfolio](fama_french.ipynb) | ||
- linear regression; value and size anomaly | ||
|
||
## Expected Returns | ||
|
||
4. [Estimate Fama-Macbeth cross-sectional regressions](fama_macbeth.ipynb) | ||
- CAPM tests; polynomial regression; feature transformations | ||
## Topics | ||
|
||
5. [Backtesting a stock price reversal trading strategy](weekly_reversal.ipynb) | ||
- Contrarian strategy; statistical arbitrage | ||
- implementation shortfall; structural change with unknown breakpoint | ||
|
||
2. [Event studies of key developments](event_study.ipynb) | ||
- Abnormal returns; post-announcement drift; multiple testing | ||
| notebook | Financial | Data | Science | | ||
|:--|:--|:--|:--| | ||
| stock_prices | Stock distributions, delistings | CRSP stocks | Sample selection | | ||
| jegadeesh_titman | Overlapping portfolios; <br> Momentum | CRSP stocks | Hypothesis testing; <br> Newey-West correction | | ||
| fama_french | Bivariate sorts; <br> Value, Size; <br> CAPM | CRSP stocks; <br> Compustat | Linear regression; <br> Quadratic programming | | ||
| fama_macbeth | Cross-sectional Regressions; <br> Beta | Ken French data library | Feature transformations; <br> Kernel regression, LOOCV | | ||
| weekly_reversals | Mean reversion; <br> Implementation shortfall | CRSP stocks | Structural break tests | | ||
| quant_factors | Factor zoo; <br> Performance evaluation | CRSP stocks; <br> Compustat; IBES | Clustering for unsupervised learning | | ||
| event_study | Event studies | S&P key developments | Multiple testing; <br> FFT | | ||
| economic_releases | Macroeconomic analysis; <br> Unemployment | ALFRED | Economic data revisions | | ||
| regression_diagnostics | Regression analysis; <br> Inflation | FRED | Linear regression diagnostics; <br> Residual Analysis | | ||
| econometric_forecast | Time series analysis; <br> National Output | FRED | Stationarity, Autocorrelation | | ||
| approximate_factors | Approximate factor models | FRED-MD | Unit Root; <br> PCA; <br> EM Algorithm | | ||
| economic_states | State space models | FRED-MD | Gaussian Mixture; <br> HMM; <br> Kalman Filter | | ||
| conditional_volatility | Value at risk; <br> Conditional volatility | FRED cryptos and currencies | ARCH, GARCH; <br> VaR, TVaR | | ||
| covariance_matrix | Covariance matrix estimation; <br> Portfolio risk | Ken French data library | Shrinkage | | ||
| term_structure | Interest rates, yield curve | FRED | Splines, PCA | | ||
| bond_returns | Bond portfolio returns | FRED | SVD | | ||
| option_pricing | Binomial trees; <br> the Greeks | OptionMetrics; <br> FRED | Simulations | | ||
| market_microstructure | Liquidity costs; <br> Bid-ask spreads | TAQ tick data | Realized volatility; Variance ratio | | ||
| event_risk | Earnings surprises | IBES; <br> FRED-QD | Poisson regression; <br> GLM's | | ||
| customer_ego | Principal customers | Compustat customer segments | Graph Networks | | ||
| bea_centrality | Input-output use tables | Bureau of Economic Analysis | Graph centrality | | ||
| industry_community | Industry sectors | Hoberg&Phillips data library | Community detection | | ||
| link_prediction | Product markets | Hoberg&Phillips data library | Links prediction | | ||
| spatial_regression | Earnings surprises | IBES; <br> Hoberg&Phillips data library | Spatial regression | | ||
| fomc_topics | Fedspeak | FOMC meeting minutes | Topic modelling | | ||
| mda_sentiment | Company filings | SEC Edgar | Sentiment analysis | | ||
| business_description | Growth and value stocks | SEC Edgar | Part-of-speech tagging | | ||
| classification_models | News classification | S&P key developments | Classification for supervised learning | | ||
| regression_models | Macroeconomic forecasting | FRED-MD | Regression for supervised learning | | ||
| deep_classifier | News classification | S&P key developments | Feedforward neural networks; <br> Word embeddings; <br> Deep averaging | | ||
| convolutional_net | Macroeconomic forecasting | FRED-MD | Temporal convolutional networks; <br> Vector autoregression | | ||
| recurrent_net | Macroeconomic forecasting | FRED-MD | Elman recurrent networks; <br> Kalman filter | | ||
| fomc_language | Fedspeak | FOMC meeting minutes | Language modelling; <br> Transformers | | ||
| reinforcement_learning | Spending policy | Stocks, bonds, bills, and inflation | Reinforcement learning | | ||
|
||
2. [Performance evaluation of factor investing](quant_factors.ipynb) | ||
- Return predicting signals; performance evaluation | ||
|
||
## Risk | ||
## Resources | ||
|
||
2. [Conditional volatility of cryptocurrencies](conditional_volatility.ipynb) | ||
- Value at Risk, Expected Shortfall, GARCH, EWMA; bitcoin, etherium | ||
1. [Online Jupyter-book](https://terence-lim.github.io/data-science-notebooks/), or [download pdf](https://terence-lim.github.io/notes/data-science-notebooks.pdf) | ||
|
||
2. [Covariance matrix estimates of industry returns](covariance_matrix.ipynb) | ||
- Covariance Matrix: PCA, SVD, Shrinkage | ||
- Risk Decomposition, Black-Litterman, Risk Parity | ||
2. [FinDS API reference](https://terence-lim.github.io/financial-data-science/) | ||
|
||
2. [Visualizing the term structure of interest rates](term_structure.ipynb) | ||
- yield curve, duration, bootstrap | ||
|
||
2. [Examine principal components of bond returns](bond_returns.ipynb) | ||
- Principal components analysis, bond returns | ||
|
||
2. [Market microstructure: Intra-day liquidity from tick data](market_microstructure.ipynb) | ||
- TAQ tick data; spreads, Lee-Ready tick test, intra-day volatility | ||
3. [FinDS repo](https://github.com/terence-lim/finds) | ||
|
||
2. Event risk: Count dependent and aggregate loss models | ||
- frequency and severity of actuarial risks | ||
4. [Jupyter notebooks repo](https://github.com/terence-lim/finds-notebooks) | ||
|
||
|
||
## Econometric Methods | ||
## Contact | ||
|
||
2. [Revisions of macroeconomic time series from ALFRED](revisions_vintage.ipynb) | ||
- Archival-FRED, vintages | ||
Github: [https://terence-lim.github.io](https://terence-lim.github.io) | ||
|
||
2. [Analyze linear regression gaussian assumptions](linear_diagnostics.ipynb) | ||
- Residual analysis, outliers, leverage, influential points | ||
- Multicollinearity; robust standard errors | ||
|
||
2. [Forecast inflation time series](econometric_forecast.ipynb) | ||
- trends, stationarity, seasonality, ARMA, smoothing, cointegration | ||
- granger causality, impulse response function | ||
|
||
2. [Approximate factor model of FRED-MD macroeconomic series](approximate_factors.ipynb) | ||
- PCA-EM, unit root | ||
|
||
|
||
|
||
## Network Science | ||
|
||
2. [Ego network of principal customers supply chain](customer_ego.ipynb) | ||
- Induced subgraph, ego network | ||
|
||
2. [Centrality measures of BEA input-output tables](bea_centrality.ipynb) | ||
- Graph centrality algorithms | ||
|
||
2. [Community detection for industry sectoring](industry_community.ipynb) | ||
- Community detection graph algorithms | ||
|
||
2. [Link prediction on company relationships](link_prediction.ipynb) | ||
- Accuracy metrics; imbalanced sample | ||
- Random graphs, link prediction graph algorithms | ||
|
||
|
||
## Text Mining | ||
|
||
2. [Logistic regression for text classification of key developments | ||
financial news](keydev_classifier.ipynb) | ||
- Logistic regression, stochastic gradient descent | ||
|
||
2. [Sentiment analysis of 10-K management discussion text](mda_sentiment.ipynb) | ||
- SEC Edgar, Loughran-MacDonald dictionary | ||
|
||
2. [Syntactic analysis of 10-K business descriptions for industry | ||
classifications](business_description.ipynb) | ||
- Softmax regression; POS tagging, named entity recognition | ||
|
||
2. [Topic modeling of FOMC meeting minutes](fomc_topics.ipynb) | ||
- Matrix decomposition algorithms | ||
|
||
|
||
## Machine Learning | ||
|
||
2. [Compare classification models for key developments financial news | ||
classification](classification_models.ipynb) | ||
- Generalized linear models, SVM, KNN, Naive-Bayes, decision tree | ||
- Cross-validation, feature importances | ||
|
||
2. [Compare regression models for inflation prediction](regression_models.ipynb) | ||
- Subset selection, dimensional reduction, penalized least squares, ensembles | ||
- Regularization | ||
|
||
2. Unsupervised learning: Cluster analysis of factor risk premiums | ||
- K-Means, hierarchical clustering | ||
|
||
2. [Estimate state space economic models](economic_states.ipynb) | ||
- Mixture models, hidden markov models | ||
|
||
2. Bayesian belief networks for fraud detection | ||
|
||
|
||
## Deep Learning | ||
|
||
2. [Tune word embeddings for text classification](dan_classifier.ipynb) | ||
- Deep averaging networks, Feed forward neural net | ||
|
||
2. [Recurrent neural network and dynamic factor models](elman_kalman.ipynb) | ||
- Long short term memory (LSTM), kalman filter | ||
|
||
2. Train language model of fedspeak | ||
|
||
2. [Temporal convolutional networks and VAR](tcn_var.ipynb) | ||
- Convolutional neural network, vector autoregression | ||
|
||
2. Deep reinforcement learning and derivatives pricing | ||
|
||
|
||
## Big Data and the Cloud | ||
|
||
2. Big data | ||
- Hadoop, Spark, Hive | ||
|
||
2. Cloud computing |
Oops, something went wrong.