Add chapter on conclusion 🔚 (#416)

KarelZe · Jun 25, 2023 · 53440da · 53440da
1 parent 19301a6
commit 53440da
Show file tree

Hide file tree

Showing 9 changed files with 65 additions and 8 deletions.
diff --git a/references/obsidian/📖chapters/🔚 Conclusion.md b/references/obsidian/📖chapters/🔚 Conclusion.md
@@ -0,0 +1,20 @@
+
+The goal of this study is to examine the performance of machine learning-based trade classification in the option market. In particular, we propose to model trade classification with Transformers and gradient boosting. Both approaches are supervised and suffice to learn on labelled trades. For settings, where labelled trades are scarce, we extend Transformers with a pre-training objective to learn on unlabelled trades as well as generate pseudo-labels for gradient-boosting through a self-training procedure.
+
+Our models establish a new state-of-the-art for trade classification on the gls-ISE and gls-CBOE dataset. For gls-ISE trades, Transformers achieve an accuracy of percentage-63.78 when trained on trade and quoted prices as well as percentage-72.58 when trained on additional quoted sizes, improving over hybrid rules by ([[@grauerOptionTradeClassification2022]]27) by percentage-3.73 and percentage-4.97. Similarly, glspl-gbrt reaches accuracies of percentage-63.67 and percentage-73.24. We observe performance improvements up to percentage-6.51 for GBRT and percentage-6.31 for Transformers, when models have access to option characteristics. Both architectures generalise well on gls-CBOE trades, with even stronger improvements between percentage-4.92 and percentage-7.58 depending on model and feature set.
+
+Relative to the ubiquitous tick test, quote rule, and LR algorithm, improvements are percentage-23.88, percentage-17.11, and percentage-17.02, respectively on the gls-ISE dataset without additional data requirements. Performance improvements are particularly strong out-of-the-money options, options with late maturity, as well as trades executed at the quotes.
+
+Considering, the semi-supervised setting, Transformers on gls-ISE dataset profit from pre-training on unlabelled trades with accuracies up to percentage-74.55, but the performance gains slightly diminish on the gls-CBOE test set. Vice versa, we observe no advantage with regard to performance or robustness from semi-supervised training of glspl-GBRT.
+
+Consistent with ([[@grauerOptionTradeClassification2022]]27) and ([[@savickasInferringDirectionOption2003]]901) we find evidence that the performance of common trade classification rules deteriorates in the option market. In particular, tick-base methods marginally outperform a random guess.
+
+Unlike previous studies, we can trace back the performance of our approaches as well as of trade classification rules to individual features and feature groups using the importance measure gls-SAGE. We find that both approaches attain largest performance improvements from classifying trades based on quoted sizes and prices, but machine learning-based classifiers attain higher performance gains and effectively exploit the data. The change in the trade price, decisive criteria to the (reverse) tick test, plays no rule for option trade classification. We identify the relative illiquidity of options to hamper the information content of the surrounding trade prices. Our classifiers profit from the inclusion of option-specific features, like moneyness and  time-to-maturity, unexploited in classical trade classification. 
+
+By probing and visualising the attention mechanism inside the Transformer, we can establish connection to rule-based classification. Experimentally, our results show, that attention heads encode knowledge about rule-based classification. Whilst attention heads in earlier layers of the network broadly attend to all features, in later they focus on specific features jointly used in rule-based classification akin to the gls-LR algorithm, depth rule or others.  Furthermore embeddings encode knowledge about the underlyings. Our results show, that the Transformer learns to group similar underlyings in embedding space.
+
+Our models deliver accurate predictions and improved robustness, which effectively reduce noise and bias in option's research reliant on good estimates for the trade initiator. When applied to the calculation of trading cost through effective spreads, the models dominate all rule-based approaches by approximating the true effective spread best. Concretely, the Transformer pre-trained on unlabelled trades estimates a mean spread of  \SI[round-precision=3]{0.013}[\$]{} versus \SI[round-precision=3]{0.005}[\$]{} actual spread at the gls-ISE.
+(feature importances)
+
+In conclusion, our work demonstrates that machine learning is superior to existing trade signing algorithms for classifying option trades, if partially-labelled or labelled trades are available for training. 
+
diff --git a/references/obsidian/📖chapters/🧓Discussion.md b/references/obsidian/📖chapters/🧓Discussion.md
@@ -4,8 +4,10 @@
 - https://doi.org/10.1287/mnsc.2019.3529
 - https://www.dropbox.com/s/1i4zxc23qm00bv9/OptionMarketMakers.032623.pdf?dl=0
 - https://dmurav.com/CV_Dmitry_Muravyev_202305.pdf
+- for index options see [[@chordiaIndexOptionTrading2021]]
 - To test these hypotheses it would be best if we had the precise motivation behind the trades. While such analysis is not feasible here, using trade classification algorithms, we are able to assign stock and option volume as buyer or seller initiated. Easley et al. (1998) show how this directional volume is more informative than raw volume, because signed volume provides important information about the motivation of the trade (bullish or bearish). (cao paper)
 
+- see also [[@ellisAccuracyTradeClassification2000]] for trades inside and outside the spread
 
 - Whilst we reach the same conclusion, we estimate that large models should be trained for many more training tokens than recommended by the authors.
 

diff --git a/references/obsidian/📥Inbox/@chordiaIndexOptionTrading2021.md b/references/obsidian/📥Inbox/@chordiaIndexOptionTrading2021.md
@@ -0,0 +1,13 @@
+*title:* Index Option Trading Activity and Market Returns
+*authors:* Tarun Chordia, Alexander Kurov, Dmitriy Muravyev, Avanidhar Subrahmanyam
+*year:* 2021
+*tags:* 
+*status:* #📥
+*related:*
+*code:*
+*review:*
+
+## Notes 📍
+
+## Annotations 📖
+Note: 
diff --git a/reports/Content/end.tex b/reports/Content/end.tex
@@ -5,7 +5,24 @@ \section{Discussion}\label{sec:discussion}
 \newpage
 \section{Conclusion}\label{sec:conclusion}
 
+The goal of this study is to examine the performance of machine learning-based trade classification in the option market. In particular, we propose to model trade classification with Transformers and gradient boosting. Both approaches are supervised and leverage labelled trades. For settings, where labelled trades are scarce, we extend Transformers with a pre-training objective to train on unlabelled trades as well as generate pseudo-labels for gradient boosting through a self-training procedure.
+
+Our models establish a new state-of-the-art for trade classification on the \gls{ISE} and \gls{CBOE} dataset. For \gls{ISE} trades, Transformers achieve an accuracy of \SI{63.78}{\percent} when trained on trade and quoted prices as well as \SI{72.58}{\percent} when trained on additional quoted sizes, improving over current best of \textcite[][27]{grauerOptionTradeClassification2022} by \SI{3.73}{\percent} and \SI{4.97}{\percent}. Similarly, \glspl{GBRT} reach accuracies between \SI{63.67}{\percent} and \SI{73.24}{\percent}. We observe performance improvements up to \SI{6.51}{\percent} for \glspl{GBRT} and \SI{6.31}{\percent} for Transformers when models have access to option characteristics. Relative to the ubiquitous tick test, quote rule, and LR algorithm, improvements are \SI{23.88}{\percent}, \SI{17.11}{\percent}, and \SI{17.02}{\percent}. Outperformance is particularly strong for \gls{OTM} options, options with a long maturity, as well as options traded at the quotes. Both architectures generalise well on \gls{CBOE} data, with even stronger improvements between \SI{4.92}{\percent} and \SI{7.58}{\percent} over the benchmark depending on the model and feature set. 
+
+In the semi-supervised setting, Transformers on \gls{ISE} dataset profit from pre-training on unlabelled trades with accuracies up to \SI{74.55}{\percent}, but the performance gains slightly diminish on the \gls{CBOE} test set. Vice versa, we observe no benefits from semi-supervised training of \glspl{GBRT}.
+% Consistent with \textcites[][27]{grauerOptionTradeClassification2022}[][901]{savickasInferringDirectionOption2003} we find evidence that the performance of common trade classification rules deteriorates in the option market. In particular, tick-based methods marginally outperform a random guess.
+
+Unlike previous studies, we can trace back the performance of our approaches as well as of trade classification rules to individual features and feature groups using the importance measure \gls{SAGE}. We find that both paradigms attain the largest performance improvements from classifying trades based on quoted sizes and prices, but machine learning-based classifiers attain higher performance gains and effectively exploit the data. The change in the trade price, decisive criteria to the (reverse) tick test, plays no role in option trade classification. We identify the relative illiquidity of options to affect the information content of the surrounding trade prices. Our classifiers profit from the inclusion of option-specific features, like moneyness and time-to-maturity, currently unexploited in classical trade classification.
+
+By probing and visualising the attention mechanism of the Transformer, we can establish a connection to rule-based classification. Graphically, our results show, that attention heads encode knowledge about rule-based classification. Whilst attention heads in earlier layers of the network broadly attend to all features or their embeddings, later they focus on specific features jointly used in rule-based classification akin to the \gls{LR} algorithm, depth rule or others. Furthermore, embeddings encode domain knowledge. Our results demonstrate exemplary for traded underlying, that the Transformer learns to group similar underlyings in embedding space.
+
+Our classifiers deliver accurate predictions and improved robustness, which effectively reduces noise and bias in option research dependent on reliable trade initiator estimates. When applied to measuring trading cost through effective spreads, the models dominate all rule-based approaches by approximating the true effective spread of options best. Exemplary, the Transformer pre-trained on unlabelled trades estimates a mean spread of  \SI[round-mode=places, round-precision=3]{0.013118}[\$]{} versus \SI[round-mode=places, round-precision=3]{0.004926}[\$]{} actual spread at the \gls{ISE}.
+
+In conclusion, our study showcases the efficacy of machine learning as a viable alternative to existing trade signing algorithms for classifying option trades, if partially-labelled or labelled trades are available for training. % While we tested our models on option trades, we expect the results to transfer to other modalities including equity trades. 
+
 \newpage
 \section{Outlook}\label{sec:outlook}
 
-Graphically, our results show that specific attention heads in the Transformer specialise in patterns akin to classical trade classification rules. We are excited to explore this aspect systematically and potentially reverse engineer classification rules from attention heads that are yet unknown. This way, we can transfer the superior classification accuracy of the Transformer to regimes where labelled training data is abundant or computational costs of training are not affordable.
+In future work, we plan to revisit training Transformers on a larger corpus of unlabelled trades through pre-training objectives and study the effects from \emph{exchange-specific} finetuning. While our current results show that pre-training positively drives classification performance, for comparability it is only performed on a small subset of trades and models have not fully converged. Thus, we expect to see benefits from additional data and compute, following the scaling laws of \textcite[][7]{hoffmannTrainingComputeOptimalLarge2022}. The application confers advantages when finetuning is constrained due to the limited availability of the true trade initiator.
+
+Indicatively, our results show that specific attention heads in the Transformer specialise in patterns akin to classical trade classification rules. We want to explore this aspect further and potentially reverse engineer classification rules from attention heads that are yet unknown. This way, we can transfer the superior classification accuracy of the Transformer to regimes where labels are unavailable or computational costs of training are not affordable.
diff --git a/reports/Content/evaluation.tex b/reports/Content/evaluation.tex
@@ -56,6 +56,8 @@ \subsubsection{Feature Importance
 
 It is typically infeasible to compute the complete \gls{SAGE} values due to a large number of subsets $S$, so an approximation is used instead.
 
+\todo{model the conditional distribution of held out features.}
+
 \textbf{Attention Maps}
 
 In addition to \gls{SAGE}, Transformer-based models offer \emph{some} interpretability through their attention mechanism. In recent research a major controversy embarked around the question, of whether attention offers explanations to model predictions \autocites[cp.][150]{bastingsElephantInterpretabilityRoom2020}[][5--7]{jainAttentionNotExplanation2019}[][9]{wiegreffeAttentionNotNot2019}. The debate sparked around opposing definitions of explainability and the consistency of attention scores with other, established feature-importance measures. Our focus is less on post-hoc explainability of the model, but rather on transparency. Consistent with \textcite[][8]{wiegreffeAttentionNotNot2019} we view attention scores as a vehicle to model transparency.

diff --git a/reports/Content/introduction.tex b/reports/Content/introduction.tex
@@ -14,9 +14,9 @@ \section{Introduction}\label{sec:introduction}
 
 To answer this question, we model trade classification through Transformers and gradient boosting. We consider the supervised case, where fully-labelled trade data is available, as well as the semi-supervised setting, where trades are partially labelled with the true trade initiator. Our work makes the following contributions:
 \begin{enumerate}
-    \item We employ state-of-the-art supervised algorithms i.~e., gradient-boosted trees and Transformer networks to the problem of trade classification and benchmark these approaches against rules-based methods. Data requirements are comparable. Out-of-sample on \gls{CBOE} and \gls{ISE} data, our approaches outperform state-of-the-art trade classification rules by \SI{99.99}{\percent} in accuracy and are robust across various subsets. In the application setting, our approaches produce accurate estimates of the effective spread.
-    \item In a real-world setting, labelled trades are scarce, while unlabelled trades are abundant. Motivated by this consideration, we extend the classifiers to learn on both labelled and unlabelled instances through pre-training and self-training procedures. We analyse the effect on classification accuracy and observe that pre-training of Transformers further alleviates accuracy on \gls{ISE} test data.
-    \item We strive to identify the most predictive features. Through a game-theoretic approach, our work is the first to consistently attribute the performance of rule-based and machine learning-based classification to individual features. We discover that both paradigms share common features, but machine learning-based classifiers attain higher performance gains and effectively exploit the data. By probing and visualising the attention mechanism inside the Transformer, we can strengthen the connection to rule-based classification and reveal that \emph{learned} rules mimic classical rules.
+    \item We employ state-of-the-art supervised algorithms i.~e., gradient-boosted trees and Transformer networks to the problem of trade classification and benchmark these approaches against rules-based methods. Our approaches outperform all rule-based approaches on \gls{ISE} and \gls{CBOE} data with comparable data requirements. In the application setting, our approaches approximate the true effective spread best.
+    \item In a real-world setting, labelled trades are typically scarce, while unlabelled trades are abundant. Motivated by this consideration, we extend the classifiers to learn on both labelled and unlabelled instances through pre-training and self-training procedures. We analyse the effect on classification accuracy and observe that pre-training of Transformers further alleviates accuracy on \gls{ISE} test data.
+    \item We strive to identify the most predictive features. Through a game-theoretic approach, our work is the first to consistently attribute the performance of rule-based and machine learning-based classification to individual features. We discover that both paradigms share common features, but machine learning-based classifiers attain higher performance gains and effectively exploit the data. By probing and visualising the attention mechanism in the Transformer, we can strengthen the connection to rule-based classification and reveal that \emph{learned} rules mimic classical rules.
 \end{enumerate}
 
 The remainder of this paper is organised as follows. \cref{sec:related-work} reviews publications on trade classification in option markets and using machine learning, thereby underpinning our research framework. \cref{sec:rule-based-approaches} introduces extant methods for rule-based trade classification. \cref{sec:supervised-approaches} discusses and introduces supervised methods for trade classification. Then, \cref{sec:semi-supervised-approaches} extends the previously selected algorithms for the semi-supervised case. We test the models in \cref{sec:empirical-study} in an empirical setting. In \cref{sec:application} we apply our models to the problem of effective spread estimation. Finally, \cref{sec:discussion} discusses and \cref{sec:conclusion} concludes.