Releases: KarelZe/thesis
Releases · KarelZe/thesis
Changes between 27 February and 5 March
What's Changed
Didn't work 100 % on thesis. Spent some time on exam prep.
Writing 📖
- Incorporate review comments from Christian📬 by @KarelZe in #185
- Add chapter on problem framing and notation⛺ by @KarelZe in #188
- Add feature definition to appendix🪙 by @KarelZe in #189
- Add questions for meeting🙆♀️ by @KarelZe in #190
- Notes on dataset and improved viz⛺ by @KarelZe in #187
- Add chapter on dataset🌏 by @KarelZe in #192
Other Changes
- Bump google-auth from 2.16.1 to 2.16.2 by @dependabot in #191
Outlook🎒
- finish remaining tasks from last week
- exam prep
Full Changelog: 23-09...23-10
Changes between February, 20th and February, 26th
What's Changed
Empirical Study ⚗️
Writing 📖
- Add chapter on Regression Trees🎄 by @KarelZe in #170
- Add section on attention maps🧭 by @KarelZe in #172
- Edit in review comments🎒 by @KarelZe in #174
- Optimized citations/typesetting and extended
check_formalia.py
🐍 by @KarelZe in #175 - Edit in comments from second review👨🎓 by @KarelZe in #179
- Add visualizations for layer norm🍇 by @KarelZe in #178
- Add chapter on TabTransformer📑 by @KarelZe in #180
- Add chapter on FT-Transformer🕹️ by @KarelZe in #181
- Add notes and viz on train-test-split🍿 by @KarelZe in #182
Other Changes
- Bump google-auth from 2.16.0 to 2.16.1 by @dependabot in #171
- Bump actions/checkout from 1 to 3 by @dependabot in #183
Outlook🎒
- Write the chapter on the gradient boosting procedure
- Finish the attention and embeddings chapter. Add some nice visuals!
- Integrate feedback
- Resolve my small TODOs in LaTeX sources / go through warnings / fix overflows
- Loosely research how pre-training on unlabelled data can be implemented in PyTorch
- (merge and rework the Chapter on feature engineering)
Full Changelog: 23-08...23-09
Changes between February, 13th and February, 19th
What's Changed
Writing 📖
- Refactor and enhance stacked hybrid rules to separate chapter 🔢 by @KarelZe in #155
- Extend chapter on LR algorithm📖 by @KarelZe in #156
- Research on trade initiator for CBOE / ISE 📑 by @KarelZe in #157
- Improve readability of Overview over Transformers 🤖 by @KarelZe in #158
- Rewrite chapter on positional encoding🧵 by @KarelZe in #159
- Rewrite chapter position-wise FFN for clarity🎱 by @KarelZe in #160
- Rewrite chapter on residual connections🔗 by @KarelZe in #161
- Update citation style and table of symbols🎙️ by @KarelZe in #162
- Add feature set definition to appendix🧃 by @KarelZe in #164
- Add visualizations of Transformer for tabular data🖼️ by @KarelZe in #165
- Improve captioning and transitions for Transformer chapters 🍞 by @KarelZe in #166
- Fix and simplify formulas❤️🩹 by @KarelZe in #167
- Streamline and extend the chapter on LR algorithm📑 by @KarelZe in #168
- Rewrite layer norm chapter and fuse with residual connections 🍔 by @KarelZe in #169
- Restructure chapter on trade initiator🪴 by @KarelZe in #163
Outlook 🏍️
- Merge and rework chapters on FTTransformer, TabTransforrmer, token embeddings, feature engineering, and attention maps
- Write a chapter on decision trees and gradient boosting as well as attention
- Create nice visualizations for categorical embeddings and layer norm
- Integrate feedback from @lxndrblz and @pheusel
- Improve transformer implementation e.g., by choosing different search spaces, using numerical embeddings, fixing sample weighting, completing experiments with pytorch 2.0 etc.
- Investigate results of current models e.g., robustness, effective spread, spread, partial dependence plots, etc. (see #8)
Full Changelog: 23-07...23-08
Changes between February, 6th and February, 12th
Due to the slow progress last week, I decided to switch plans and progress with writing. I wrote all chapters on classical trade classification rules (9 pages). I also incorporated these chapters into thesis.pdf
. Also, I gathered several ideas on how to improve the transformer chapters.
What's Changed
Writing 📖
- Add chapter on quote rule🔢 by @KarelZe in #146
- Add chapter on depth rule🔢 by @KarelZe in #147
- Add chapter on the EMO rule🔢 by @KarelZe in #149
- Add chapter on trade size rule 📑 by @KarelZe in #150
- Add chapter on CLNV method 🔢 by @KarelZe in #151
- Add chapter on tick rule 🔢 by @KarelZe in #152
- Add chapter on Lee and Ready algorithm + proofreading 🔢 by @KarelZe in #154
Other Changes
- Bump fastparquet from 2023.1.0 to 2023.2.0 by @dependabot in #153
Outlook 🐿️
(same as last week, as I worked on the classical trade classification rules)
- Complete notes and write a draft on the selection of (semi-) supervised approaches
- Rethink Transformer chapter. I'm still not happy with the overall quality. Will probably spend more time rewriting/rethinking.
- Improve transformer implementation e.g., by choosing different search spaces, using numerical embeddings, fixing sample weighting, completing experiments with pytorch 2.0 etc.
- Investigate results of current models e.g., robustness, effective spread, spread, partial dependence plots, etc. (see #8)
Full Changelog: 23-06...23-07
Changes between January, 30th and February, 5th
What's Changed
Writing 📖
- Rewrite transformer chapters for clarity by @KarelZe in #139
- Fix merge and build errors in reports 🐞 by @KarelZe in #140
- Chapter on related works 👪 by @KarelZe in #141
- Add notes on depth, trade size, and CLNV rule💸 by @KarelZe in #142
- Improve notes on tick rule, quote rule, LR algorithm, and EMO rule💸 by @KarelZe in #144
- Notes for meeting and misc pre-writing changes🐿️ by @KarelZe in #145
Other Changes
- Bump docker/build-push-action from 3 to 4 by @dependabot in #138
Outlook 🧪
- Complete notes and write a draft on the selection of (semi-) supervised approaches
- Rethink Transformer chapter. I'm still not happy with the overall quality. Will probably spend more time rewriting/rethinking.
- Improve transformer implementation e.g., by choosing different search spaces, using numerical embeddings, fixing sample weighting, completing experiments with pytorch 2.0 etc.
- Investigate results of current models e.g., robustness, effective spread, spread, partial dependence plots, etc. (see #8)
Full Changelog: 23-05...23-06
Changes between January, 23rd and January, 29th
What's Changed
Empirical Study ⚗️
- Feature engineering for a very large dataset 🌌 by @KarelZe in #126
- Add retraining for gradient boosting [+ 2 %] 🍾 by @KarelZe in #130
- Improve accuracy of
TabTransformer
[+ 5 % from prev.]🪅 by @KarelZe in #129 - Fix cardinalities of Transformer implementation🪲 by @KarelZe in #132
Writing 📖
- Complete notes on layer norm🍔 by @KarelZe in #123
- Chapter on layer norm + notes on SSL and embeddings for tabular data 🧲 by @KarelZe in #131
- Add chapter on embeddings of tabular data💤 by @KarelZe in #133
- Fix broken references in expose 🔗 by @KarelZe in #135
- Rework chapters on transformer 🤖 by @KarelZe in #134
- [WIP] Add a chapter on attention, self-attention, multi-headed attention, and cross attention
🅰️ by @KarelZe in #136
Other Changes
- Bump gcsfs from 2022.11.0 to 2023.1.0 by @dependabot in #127
Outlook 🚀
- Complete notes and write a draft on related works
- Complete notes and write a draft on the selection of (semi-) supervised approaches
- Complete notes and write a draft on classical trade classification rules
- Try to shorten / streamline the theoretical background by one page. Also, aim for understanding, and improve visualizations
- Improve transformer implementation e. g., by choosing different search spaces, using numerical embeddings, fixing sample weighting, completing experiments with pytorch 2.0 etc
Full Changelog: 23-04...23-05
Changes between January, 16th and January, 22nd
What's Changed
Empirical Study ⚗️
- Restore soft links 🔗 by @KarelZe in #120
- Add current results⚡ by @KarelZe in #121
- Change from code review 🧼 by @KarelZe in #124
- Shared embeddings and pre-norm in TabTransformer 🤖 by @KarelZe in #118 After writing the TabTransformer chapter, I noticed that the open source implementation I had used was too simplistic. I also notified Borisov et. al of the issue, who had used the same implementation in their study (https://arxiv.org/abs/2110.01889). See kathrinse/TabSurvey#13 for details.
- Automatically find maximum batch size🥐 by @KarelZe in #125
Writing 📖
- Add chapter on point-wise FFN🎱 by @KarelZe in #117
- Add chapter on residual connections🔗 by @KarelZe in #119
- [WIP] Chapter on layer norm🍔 by @KarelZe in #123
Other Changes
- Bump fastparquet from 2022.12.0 to 2023.1.0 by @dependabot in #122
Outlook 🧪
- It wasn't easy to obtain Jupyter resources on cluster last week. Thus, training and improving the Transformer didn't progress as initially hoped. Two SLURM jobs are still pending. Had some success with small scale experiments, though, with a performance similar to gradient boosting for FTTransformer. The results from Gradient Boosting with option features also look promising. See
readme.md
. - I also decided to break down training and tuning into smaller chunks after reading https://github.com/google-research/tuning_playbook. I hope it will give us more insights. I have already experimented with gradient tracking and added the option to automatically find the maximum batch size. I also restructured my notes on how I want to progress with training and tuning. Will add the option to keep certain parameters static. Also plan to add a much simpler baseline such as logistic regression and simplify evaluation. Might experiment with retraining.
- Writing progressed slower than I anticipated due to various reasons. Still have to write the chapters on attention and MHSA, as well as pre-training of transformers.
- I'll use next week to clean up the remaining tasks. 💯
Full Changelog: 23-03...23-04
Changes between January, 9th and January, 15th
What's Changed
Empirical Study ⚗️
- Run feature engineering on large scale (100 %) 💡 by @KarelZe in #109
- Run exploratory data analysis on cluster (10 %) by @KarelZe in #108
Writing 📖
- Add chapter on input embedding (finished) positional encoding (cont'd) 🛌 by @KarelZe in #107
- Finish chapter on positional encoding🧵 by @KarelZe in #111
- Add chapter on TabTransformer🔢 by @KarelZe in #112
- Add chapter on FTTransformer 🤖 by @KarelZe in #113
- Correction of column embedding in chapter TabTransformer 🤖 by @KarelZe in #115
Other Changes
- Bump google-auth from 2.15.0 to 2.16.0 by @dependabot in #110
- Bump requests from 2.28.1 to 2.28.2 by @dependabot in #114
Outlook💡
- Perform a code review of all previously written code.
- Continue with transformer week. 🤖 Mainly write remaining chapters on the classical transformer architecture, attention and MHSA, as well as pre-training of transformers.
- Research additional tricks from literature to optimize training behaviour of transformers. Structure them for the chapter on training and tuning our models.
- Increase performance of current transformer implementations by applying the tricks from above to match the performance of gradient-boosted trees.
- Add shared embeddings to the
TabTransformer
implementation. - Restructure notes and draft chapter for model selection of supervised and semi-supervised models.
Full Changelog: 23-02...23-03
Changes between January, 2nd and January, 8th
What's Changed
Empirical Study ⚗️
- Create
sklearn
-compatible estimators 🦜 by @KarelZe in #93. Having a common sklearn-like interface is necessary for further aspects of training and evaluation like calculating SHAP values, creating learning curves, or simplify hyperparameter tuning etc. - Interpretability with
SHAP
and attention maps 🐇 by @KarelZe in #85. Kernel SHAP values can now be calculated for all models (classical + ml-based). This was marketed as one of the contributions of my paper. Need to research how to handle high correlation between features in kernel SHAP. Attention maps can be calculated for transformer-based models. - Add sample weighting to
TransformerClassifier
🏋️ by @KarelZe in #100. Weight samples in the training set similar to how it's done in CatBoost. Therefore, more recent observations become more important. - Early stopping based on accuracy for
TransformerClassifier
🧁 by @KarelZe in #102. Perform early stopping based on validation accuracy instead of log loss. Thus, the implementation of early stopping for neural networks and gradient boosting is now discovered. - Improve robustness and tests of
TabDataset
🚀 by @KarelZe in #101 - Add instructions on using
SLURM
🐧 by @KarelZe in #103.SLURM
enables us to run a script on multiple nodes on the bwHPC cluster and extended periods. Required for final training. - Finalize exploratory data analysis 🚏 by @KarelZe in #105.
- Finalize feature engineering🪄 by @KarelZe in #104
Writing 📖
- Pre-write feature engineering chapter 🪛 by @KarelZe in #88
- Write chapter on attention maps (finished) and gbm (contd) 🧭 by @KarelZe in #99. I noticed while implementing attention maps (see #85) that the common practice for calculating attention maps in the tabular domain is myopic. I researched approaches for transformers from other domains e. g., machine translations and documented my findings in this chapter. The chosen approaches take into account all attention layers and can handle attention heads with varying importance.
- Questions for bi-weekly meeting❓ by @KarelZe in #106
Other Changes
- Bump seaborn from 0.12.1 to 0.12.2 by @dependabot in #98
Outlook 🧪
- Start into the Transformer week 🎉 . I will spend next week and the week after improving the Transformer-based models. I want to dive into learning rate scheduling, learning rate warm up etc. Will also pre-write the chapters on FTTransformer, TabTransformer, the classical Transformer, and self-attention.
Full Changelog: 23-01...23-02
Changes between December, 26th and January, 1st
What's Changed
Christmas break🎄
Other Changes
- Bump pydantic from 1.10.2 to 1.10.4 by @dependabot in #95
Full Changelog: v0.2.7...cw-01