Skip to content

Commit

Permalink
version update
Browse files Browse the repository at this point in the history
  • Loading branch information
terence-lim committed Jun 7, 2024
1 parent 791d2da commit bfc0ffc
Show file tree
Hide file tree
Showing 39 changed files with 46,262 additions and 18,075 deletions.
87 changes: 48 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,62 @@
# Financial Data Science Notebooks
# FINANCIAL DATA SCIENCE

_UNDER CONSTRUCTION_

_Financial Data Science_ with __FinDS__ Python package in Jupyter-notebooks:

__30+ Projects in Financial Data Science__, presented as Jupyter Notebooks, using the _FinDS_ Python package

- use database engines SQL, MongoDB, Redis
- interfaces for
- structured data from CRSP, Compustat, IBES, TAQ
- APIs from ALFRED, BEA
- unstructured data from SEC Edgar, Federal Reserve websites
- academic websites by Ken French, Loughran and MacDonald, Hoberg and Phillips
- recipes for econometrics, finance, graphs, event studies, backtesting
- applications of statistics, machine learning, neural networks and large language models


## Topics


| notebook | Financial | Data | Science |
|:--|:--|:--|:--|
| stock_prices | Stock distributions, delistings | CRSP stocks | Sample selection |
| jegadeesh_titman | Overlapping portfolios; <br> Momentum | CRSP stocks | Hypothesis testing; <br> Newey-West correction |
| fama_french | Bivariate sorts; <br> Value, Size; <br> CAPM | CRSP stocks; <br> Compustat | Linear regression; <br> Quadratic programming |
| fama_macbeth | Cross-sectional Regressions; <br> Beta | Ken French data library | Feature transformations; <br> Kernel regression, LOOCV |
| weekly_reversals | Mean reversion; <br> Implementation shortfall | CRSP stocks | Structural break tests |
| quant_factors | Factor zoo; <br> Performance evaluation | CRSP stocks; <br> Compustat; IBES | Clustering for unsupervised learning |
| stock_prices | Stock distributions, delistings | CRSP stocks | Statistical moments |
| jegadeesh_titman | Overlapping portfolios; <br> Momentum effect | CRSP stocks | Hypothesis testing; <br> Newey-West estimator |
| fama_french | Portfolio sorts; <br> Value effect | CRSP stocks; <br> Compustat | Linear regression; |
| fama_macbeth | Cross-sectional Regressions; <br> CAPM | Ken French research library | Non-linear regression; <br> Quadratic optimization |
| weekly_reversals | Mean reversion; <br> Implementation shortfall | CRSP stocks | Structural breaks; <br> Performance evaluation |
| quant_factors | Factor investing; <br> Backtests | CRSP stocks; <br> Compustat; IBES | Cluster analysis |
| event_study | Event studies | S&P key developments | Multiple testing; <br> FFT |
| economic_releases | Macroeconomic analysis; <br> Unemployment | ALFRED | Economic data revisions |
| regression_diagnostics | Regression analysis; <br> Inflation | FRED | Linear regression diagnostics; <br> Residual Analysis |
| econometric_forecast | Time series analysis; <br> National Output | FRED | Stationarity, Autocorrelation |
| approximate_factors | Approximate factor models | FRED-MD | Unit Root; <br> PCA; <br> EM Algorithm |
| economic_states | State space models | FRED-MD | Gaussian Mixture; <br> HMM; <br> Kalman Filter |
| conditional_volatility | Value at risk; <br> Conditional volatility | FRED cryptos and currencies | ARCH, GARCH; <br> VaR, TVaR |
| covariance_matrix | Covariance matrix estimation; <br> Portfolio risk | Ken French data library | Shrinkage |
| term_structure | Interest rates, yield curve | FRED | Splines, PCA |
| bond_returns | Bond portfolio returns | FRED | SVD |
| option_pricing | Binomial trees; <br> the Greeks | OptionMetrics; <br> FRED | Simulations |
| market_microstructure | Liquidity costs; <br> Bid-ask spreads | TAQ tick data | Realized volatility; Variance ratio |
| event_risk | Earnings surprises | IBES; <br> FRED-QD | Poisson regression; <br> GLM's |
| customer_ego | Principal customers | Compustat customer segments | Graph Networks |
| bea_centrality | Input-output use tables | Bureau of Economic Analysis | Graph centrality |
| industry_community | Industry sectors | Hoberg&Phillips data library | Community detection |
| link_prediction | Product markets | Hoberg&Phillips data library | Links prediction |
| spatial_regression | Earnings surprises | IBES; <br> Hoberg&Phillips data library | Spatial regression |
| fomc_topics | Fedspeak | FOMC meeting minutes | Topic modelling |
| mda_sentiment | Company filings | SEC Edgar | Sentiment analysis |
| business_description | Growth and value stocks | SEC Edgar | Part-of-speech tagging |
| classification_models | News classification | S&P key developments | Classification for supervised learning |
| regression_models | Macroeconomic forecasting | FRED-MD | Regression for supervised learning |
| deep_classifier | News classification | S&P key developments | Feedforward neural networks; <br> Word embeddings; <br> Deep averaging |
| convolutional_net | Macroeconomic forecasting | FRED-MD | Temporal convolutional networks; <br> Vector autoregression |
| recurrent_net | Macroeconomic forecasting | FRED-MD | Elman recurrent networks; <br> Kalman filter |
| fomc_language | Fedspeak | FOMC meeting minutes | Language modelling; <br> Transformers |
| reinforcement_learning | Spending policy | Stocks, bonds, bills, and inflation | Reinforcement learning |
| economic_releases | Economic data revisions; <br> Employment payrolls | ALFRED | Outliers |
| regression_diagnostics | Consumer and<br> producer prices | FRED | Linear regression diagnostics; <br> Residual analysis |
| econometric_forecast | Production and Inflation | FRED | Time series analysis |
| approximate_factors | Approximate factor models | FRED-MD | Unit root test |
| economic_states | State space models | FRED-MD | Gaussian Mixture; <br> HMM |
| term_structure | Interest rates | FRED yield curve | SVD |
| bond_returns | Bond risk factors | FRED bond returns | PCA |
| option_pricing | Binomial tree; <br> Black-Scholes-Merton and the Greeks | Simulations | Monte Carlo simulation |
| conditional_volatility | Value at risk | FRED crypto-currencies | EWMA; GARCH |
| covariance_matrix | Portfolio risk | Fama-French industries | Covariance matrix estimation |
| market_microstructure | Market impact; <br> Liquidity risk | TAQ tick data | High frequency volatility |
| event_risk | Earnings misses | IBES | Poisson regression; <br> GLM |
| customer_ego | Supply chain | Compustat principal customers | Graph networks |
| industry_community | Industry sectors | Hoberg and Phillips <br> research library | Community detection |
| bea_centrality | Input-output tables | Bureau of Economic Analysis | Graph centrality |
| link_prediction | Product markets | Hoberg and Phillips | Link prediction |
| spatial_regression | Earnings surprises | IBES <br>Hoberg and Phillips | Spatial regression |
| fomc_topics | FOMC meetings | Federal Reserve website | Topic models |
| mda_sentiment | 10-K filings | SEC Edgar; <br> Loughran and Macdonald <br> research library | Sentiment analysis |
| business_description | 10-K filings | SEC Edgar | POS tagging; <br> Density-based clustering |
| classification_models | Industry classification | SEC Edgar | Classification |
| regression_models | Macroeconomic forecasts | FRED-MD | Regression |
| deep_classifier | Industry classification | SEC Edgar | Neural networks; <br> Word embeddings |
| recurrent_net | Macroeconomic models | FRED-MD | RNN; <br> Dynamic factor models |
| convolutional_net | Macroeconomic forecasts | FRED-MD | CNN; <br> Vector autoregression |
| reinforcement_learning | Spending policy | SBBI | Reinforcement learning |
| fomc_language | Fedspeak | FOMC meetings minutes | Language modelling; <br> Transformers |
| sentiment_llm | Financial news sentiment | Kaggle | LLM prompt engineering |
| summarization_llm | 10-K filings | SEC Edgar | LLM text summarization |
| finetune_llm | Industry classification | SEC Edgar | LLM fine-tuning |
| rag_agent | Corporate philanthropy | textbooks | LLM RAG, <br>chatbots, agents |



## Resources
Expand All @@ -62,5 +73,3 @@ __30+ Projects in Financial Data Science__, presented as Jupyter Notebooks, usin
## Contact

Github: [https://terence-lim.github.io](https://terence-lim.github.io)


Loading

0 comments on commit bfc0ffc

Please sign in to comment.