From 53440da52b0b9135e7c6bcce54f0ae6b6528adfc Mon Sep 17 00:00:00 2001 From: Markus Bilz Date: Sun, 25 Jun 2023 21:26:48 +0200 Subject: [PATCH] =?UTF-8?q?Add=20chapter=20on=20conclusion=20=F0=9F=94=9A?= =?UTF-8?q?=20(#416)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../\360\237\224\232 Conclusion.md" | 20 +++++++++++++++++++ .../\360\237\247\223Discussion.md" | 2 ++ .../@chordiaIndexOptionTrading2021.md" | 13 ++++++++++++ reports/Content/end.tex | 19 +++++++++++++++++- reports/Content/evaluation.tex | 2 ++ reports/Content/introduction.tex | 6 +++--- reports/Content/results.tex | 7 +++++-- reports/Content/rule-approaches.tex | 2 +- reports/thesis.tex | 2 +- 9 files changed, 65 insertions(+), 8 deletions(-) create mode 100644 "references/obsidian/\360\237\223\226chapters/\360\237\224\232 Conclusion.md" create mode 100644 "references/obsidian/\360\237\223\245Inbox/@chordiaIndexOptionTrading2021.md" diff --git "a/references/obsidian/\360\237\223\226chapters/\360\237\224\232 Conclusion.md" "b/references/obsidian/\360\237\223\226chapters/\360\237\224\232 Conclusion.md" new file mode 100644 index 00000000..b6c8f6a2 --- /dev/null +++ "b/references/obsidian/\360\237\223\226chapters/\360\237\224\232 Conclusion.md" @@ -0,0 +1,20 @@ + +The goal of this study is to examine the performance of machine learning-based trade classification in the option market. In particular, we propose to model trade classification with Transformers and gradient boosting. Both approaches are supervised and suffice to learn on labelled trades. For settings, where labelled trades are scarce, we extend Transformers with a pre-training objective to learn on unlabelled trades as well as generate pseudo-labels for gradient-boosting through a self-training procedure. + +Our models establish a new state-of-the-art for trade classification on the gls-ISE and gls-CBOE dataset. For gls-ISE trades, Transformers achieve an accuracy of percentage-63.78 when trained on trade and quoted prices as well as percentage-72.58 when trained on additional quoted sizes, improving over hybrid rules by ([[@grauerOptionTradeClassification2022]]27) by percentage-3.73 and percentage-4.97. Similarly, glspl-gbrt reaches accuracies of percentage-63.67 and percentage-73.24. We observe performance improvements up to percentage-6.51 for GBRT and percentage-6.31 for Transformers, when models have access to option characteristics. Both architectures generalise well on gls-CBOE trades, with even stronger improvements between percentage-4.92 and percentage-7.58 depending on model and feature set. + +Relative to the ubiquitous tick test, quote rule, and LR algorithm, improvements are percentage-23.88, percentage-17.11, and percentage-17.02, respectively on the gls-ISE dataset without additional data requirements. Performance improvements are particularly strong out-of-the-money options, options with late maturity, as well as trades executed at the quotes. + +Considering, the semi-supervised setting, Transformers on gls-ISE dataset profit from pre-training on unlabelled trades with accuracies up to percentage-74.55, but the performance gains slightly diminish on the gls-CBOE test set. Vice versa, we observe no advantage with regard to performance or robustness from semi-supervised training of glspl-GBRT. + +Consistent with ([[@grauerOptionTradeClassification2022]]27) and ([[@savickasInferringDirectionOption2003]]901) we find evidence that the performance of common trade classification rules deteriorates in the option market. In particular, tick-base methods marginally outperform a random guess. + +Unlike previous studies, we can trace back the performance of our approaches as well as of trade classification rules to individual features and feature groups using the importance measure gls-SAGE. We find that both approaches attain largest performance improvements from classifying trades based on quoted sizes and prices, but machine learning-based classifiers attain higher performance gains and effectively exploit the data. The change in the trade price, decisive criteria to the (reverse) tick test, plays no rule for option trade classification. We identify the relative illiquidity of options to hamper the information content of the surrounding trade prices. Our classifiers profit from the inclusion of option-specific features, like moneyness and time-to-maturity, unexploited in classical trade classification. + +By probing and visualising the attention mechanism inside the Transformer, we can establish connection to rule-based classification. Experimentally, our results show, that attention heads encode knowledge about rule-based classification. Whilst attention heads in earlier layers of the network broadly attend to all features, in later they focus on specific features jointly used in rule-based classification akin to the gls-LR algorithm, depth rule or others. Furthermore embeddings encode knowledge about the underlyings. Our results show, that the Transformer learns to group similar underlyings in embedding space. + +Our models deliver accurate predictions and improved robustness, which effectively reduce noise and bias in option's research reliant on good estimates for the trade initiator. When applied to the calculation of trading cost through effective spreads, the models dominate all rule-based approaches by approximating the true effective spread best. Concretely, the Transformer pre-trained on unlabelled trades estimates a mean spread of \SI[round-precision=3]{0.013}[\$]{} versus \SI[round-precision=3]{0.005}[\$]{} actual spread at the gls-ISE. +(feature importances) + +In conclusion, our work demonstrates that machine learning is superior to existing trade signing algorithms for classifying option trades, if partially-labelled or labelled trades are available for training. + diff --git "a/references/obsidian/\360\237\223\226chapters/\360\237\247\223Discussion.md" "b/references/obsidian/\360\237\223\226chapters/\360\237\247\223Discussion.md" index 2c31da75..a8fb86c7 100644 --- "a/references/obsidian/\360\237\223\226chapters/\360\237\247\223Discussion.md" +++ "b/references/obsidian/\360\237\223\226chapters/\360\237\247\223Discussion.md" @@ -4,8 +4,10 @@ - https://doi.org/10.1287/mnsc.2019.3529 - https://www.dropbox.com/s/1i4zxc23qm00bv9/OptionMarketMakers.032623.pdf?dl=0 - https://dmurav.com/CV_Dmitry_Muravyev_202305.pdf +- for index options see [[@chordiaIndexOptionTrading2021]] - To test these hypotheses it would be best if we had the precise motivation behind the trades. While such analysis is not feasible here, using trade classification algorithms, we are able to assign stock and option volume as buyer or seller initiated. Easley et al. (1998) show how this directional volume is more informative than raw volume, because signed volume provides important information about the motivation of the trade (bullish or bearish). (cao paper) +- see also [[@ellisAccuracyTradeClassification2000]] for trades inside and outside the spread - Whilst we reach the same conclusion, we estimate that large models should be trained for many more training tokens than recommended by the authors. diff --git "a/references/obsidian/\360\237\223\245Inbox/@chordiaIndexOptionTrading2021.md" "b/references/obsidian/\360\237\223\245Inbox/@chordiaIndexOptionTrading2021.md" new file mode 100644 index 00000000..e400ef61 --- /dev/null +++ "b/references/obsidian/\360\237\223\245Inbox/@chordiaIndexOptionTrading2021.md" @@ -0,0 +1,13 @@ +*title:* Index Option Trading Activity and Market Returns +*authors:* Tarun Chordia, Alexander Kurov, Dmitriy Muravyev, Avanidhar Subrahmanyam +*year:* 2021 +*tags:* +*status:* #📥 +*related:* +*code:* +*review:* + +## Notes 📍 + +## Annotations 📖 +Note: \ No newline at end of file diff --git a/reports/Content/end.tex b/reports/Content/end.tex index 3d161c5a..a4c9d6b4 100644 --- a/reports/Content/end.tex +++ b/reports/Content/end.tex @@ -5,7 +5,24 @@ \section{Discussion}\label{sec:discussion} \newpage \section{Conclusion}\label{sec:conclusion} +The goal of this study is to examine the performance of machine learning-based trade classification in the option market. In particular, we propose to model trade classification with Transformers and gradient boosting. Both approaches are supervised and leverage labelled trades. For settings, where labelled trades are scarce, we extend Transformers with a pre-training objective to train on unlabelled trades as well as generate pseudo-labels for gradient boosting through a self-training procedure. + +Our models establish a new state-of-the-art for trade classification on the \gls{ISE} and \gls{CBOE} dataset. For \gls{ISE} trades, Transformers achieve an accuracy of \SI{63.78}{\percent} when trained on trade and quoted prices as well as \SI{72.58}{\percent} when trained on additional quoted sizes, improving over current best of \textcite[][27]{grauerOptionTradeClassification2022} by \SI{3.73}{\percent} and \SI{4.97}{\percent}. Similarly, \glspl{GBRT} reach accuracies between \SI{63.67}{\percent} and \SI{73.24}{\percent}. We observe performance improvements up to \SI{6.51}{\percent} for \glspl{GBRT} and \SI{6.31}{\percent} for Transformers when models have access to option characteristics. Relative to the ubiquitous tick test, quote rule, and LR algorithm, improvements are \SI{23.88}{\percent}, \SI{17.11}{\percent}, and \SI{17.02}{\percent}. Outperformance is particularly strong for \gls{OTM} options, options with a long maturity, as well as options traded at the quotes. Both architectures generalise well on \gls{CBOE} data, with even stronger improvements between \SI{4.92}{\percent} and \SI{7.58}{\percent} over the benchmark depending on the model and feature set. + +In the semi-supervised setting, Transformers on \gls{ISE} dataset profit from pre-training on unlabelled trades with accuracies up to \SI{74.55}{\percent}, but the performance gains slightly diminish on the \gls{CBOE} test set. Vice versa, we observe no benefits from semi-supervised training of \glspl{GBRT}. +% Consistent with \textcites[][27]{grauerOptionTradeClassification2022}[][901]{savickasInferringDirectionOption2003} we find evidence that the performance of common trade classification rules deteriorates in the option market. In particular, tick-based methods marginally outperform a random guess. + +Unlike previous studies, we can trace back the performance of our approaches as well as of trade classification rules to individual features and feature groups using the importance measure \gls{SAGE}. We find that both paradigms attain the largest performance improvements from classifying trades based on quoted sizes and prices, but machine learning-based classifiers attain higher performance gains and effectively exploit the data. The change in the trade price, decisive criteria to the (reverse) tick test, plays no role in option trade classification. We identify the relative illiquidity of options to affect the information content of the surrounding trade prices. Our classifiers profit from the inclusion of option-specific features, like moneyness and time-to-maturity, currently unexploited in classical trade classification. + +By probing and visualising the attention mechanism of the Transformer, we can establish a connection to rule-based classification. Graphically, our results show, that attention heads encode knowledge about rule-based classification. Whilst attention heads in earlier layers of the network broadly attend to all features or their embeddings, later they focus on specific features jointly used in rule-based classification akin to the \gls{LR} algorithm, depth rule or others. Furthermore, embeddings encode domain knowledge. Our results demonstrate exemplary for traded underlying, that the Transformer learns to group similar underlyings in embedding space. + +Our classifiers deliver accurate predictions and improved robustness, which effectively reduces noise and bias in option research dependent on reliable trade initiator estimates. When applied to measuring trading cost through effective spreads, the models dominate all rule-based approaches by approximating the true effective spread of options best. Exemplary, the Transformer pre-trained on unlabelled trades estimates a mean spread of \SI[round-mode=places, round-precision=3]{0.013118}[\$]{} versus \SI[round-mode=places, round-precision=3]{0.004926}[\$]{} actual spread at the \gls{ISE}. + +In conclusion, our study showcases the efficacy of machine learning as a viable alternative to existing trade signing algorithms for classifying option trades, if partially-labelled or labelled trades are available for training. % While we tested our models on option trades, we expect the results to transfer to other modalities including equity trades. + \newpage \section{Outlook}\label{sec:outlook} -Graphically, our results show that specific attention heads in the Transformer specialise in patterns akin to classical trade classification rules. We are excited to explore this aspect systematically and potentially reverse engineer classification rules from attention heads that are yet unknown. This way, we can transfer the superior classification accuracy of the Transformer to regimes where labelled training data is abundant or computational costs of training are not affordable. \ No newline at end of file +In future work, we plan to revisit training Transformers on a larger corpus of unlabelled trades through pre-training objectives and study the effects from \emph{exchange-specific} finetuning. While our current results show that pre-training positively drives classification performance, for comparability it is only performed on a small subset of trades and models have not fully converged. Thus, we expect to see benefits from additional data and compute, following the scaling laws of \textcite[][7]{hoffmannTrainingComputeOptimalLarge2022}. The application confers advantages when finetuning is constrained due to the limited availability of the true trade initiator. + +Indicatively, our results show that specific attention heads in the Transformer specialise in patterns akin to classical trade classification rules. We want to explore this aspect further and potentially reverse engineer classification rules from attention heads that are yet unknown. This way, we can transfer the superior classification accuracy of the Transformer to regimes where labels are unavailable or computational costs of training are not affordable. \ No newline at end of file diff --git a/reports/Content/evaluation.tex b/reports/Content/evaluation.tex index 6377ca9d..6f3c45d5 100644 --- a/reports/Content/evaluation.tex +++ b/reports/Content/evaluation.tex @@ -56,6 +56,8 @@ \subsubsection{Feature Importance It is typically infeasible to compute the complete \gls{SAGE} values due to a large number of subsets $S$, so an approximation is used instead. +\todo{model the conditional distribution of held out features.} + \textbf{Attention Maps} In addition to \gls{SAGE}, Transformer-based models offer \emph{some} interpretability through their attention mechanism. In recent research a major controversy embarked around the question, of whether attention offers explanations to model predictions \autocites[cp.][150]{bastingsElephantInterpretabilityRoom2020}[][5--7]{jainAttentionNotExplanation2019}[][9]{wiegreffeAttentionNotNot2019}. The debate sparked around opposing definitions of explainability and the consistency of attention scores with other, established feature-importance measures. Our focus is less on post-hoc explainability of the model, but rather on transparency. Consistent with \textcite[][8]{wiegreffeAttentionNotNot2019} we view attention scores as a vehicle to model transparency. diff --git a/reports/Content/introduction.tex b/reports/Content/introduction.tex index 438b9ed4..0e498f11 100644 --- a/reports/Content/introduction.tex +++ b/reports/Content/introduction.tex @@ -14,9 +14,9 @@ \section{Introduction}\label{sec:introduction} To answer this question, we model trade classification through Transformers and gradient boosting. We consider the supervised case, where fully-labelled trade data is available, as well as the semi-supervised setting, where trades are partially labelled with the true trade initiator. Our work makes the following contributions: \begin{enumerate} - \item We employ state-of-the-art supervised algorithms i.~e., gradient-boosted trees and Transformer networks to the problem of trade classification and benchmark these approaches against rules-based methods. Data requirements are comparable. Out-of-sample on \gls{CBOE} and \gls{ISE} data, our approaches outperform state-of-the-art trade classification rules by \SI{99.99}{\percent} in accuracy and are robust across various subsets. In the application setting, our approaches produce accurate estimates of the effective spread. - \item In a real-world setting, labelled trades are scarce, while unlabelled trades are abundant. Motivated by this consideration, we extend the classifiers to learn on both labelled and unlabelled instances through pre-training and self-training procedures. We analyse the effect on classification accuracy and observe that pre-training of Transformers further alleviates accuracy on \gls{ISE} test data. - \item We strive to identify the most predictive features. Through a game-theoretic approach, our work is the first to consistently attribute the performance of rule-based and machine learning-based classification to individual features. We discover that both paradigms share common features, but machine learning-based classifiers attain higher performance gains and effectively exploit the data. By probing and visualising the attention mechanism inside the Transformer, we can strengthen the connection to rule-based classification and reveal that \emph{learned} rules mimic classical rules. + \item We employ state-of-the-art supervised algorithms i.~e., gradient-boosted trees and Transformer networks to the problem of trade classification and benchmark these approaches against rules-based methods. Our approaches outperform all rule-based approaches on \gls{ISE} and \gls{CBOE} data with comparable data requirements. In the application setting, our approaches approximate the true effective spread best. + \item In a real-world setting, labelled trades are typically scarce, while unlabelled trades are abundant. Motivated by this consideration, we extend the classifiers to learn on both labelled and unlabelled instances through pre-training and self-training procedures. We analyse the effect on classification accuracy and observe that pre-training of Transformers further alleviates accuracy on \gls{ISE} test data. + \item We strive to identify the most predictive features. Through a game-theoretic approach, our work is the first to consistently attribute the performance of rule-based and machine learning-based classification to individual features. We discover that both paradigms share common features, but machine learning-based classifiers attain higher performance gains and effectively exploit the data. By probing and visualising the attention mechanism in the Transformer, we can strengthen the connection to rule-based classification and reveal that \emph{learned} rules mimic classical rules. \end{enumerate} The remainder of this paper is organised as follows. \cref{sec:related-work} reviews publications on trade classification in option markets and using machine learning, thereby underpinning our research framework. \cref{sec:rule-based-approaches} introduces extant methods for rule-based trade classification. \cref{sec:supervised-approaches} discusses and introduces supervised methods for trade classification. Then, \cref{sec:semi-supervised-approaches} extends the previously selected algorithms for the semi-supervised case. We test the models in \cref{sec:empirical-study} in an empirical setting. In \cref{sec:application} we apply our models to the problem of effective spread estimation. Finally, \cref{sec:discussion} discusses and \cref{sec:conclusion} concludes. diff --git a/reports/Content/results.tex b/reports/Content/results.tex index e27bd9d6..12869b97 100644 --- a/reports/Content/results.tex +++ b/reports/Content/results.tex @@ -12,6 +12,7 @@ \subsection{Results of Rule-Based Approaches}\label{sec:result-of-rule-based-app From all rules, the tick rule performs worst when applied to trade prices at the trading venue with accuracies of a random guess, \SI{49.67}{\percent}. For comparison, a simple majority vote achieves \SI{51.40}{\percent} accuracy. The tick test performs best when estimated on the consecutive trade prices, and additionally, when estimated at the inter-exchange level marginally improves over a random classification, achieving accuracies of \SI{55.25}{\percent} for the reversed tick test. Due to the poor performance, of tick-based algorithms at the exchange level, we estimate all hybrids with $\operatorname{tick}_{\mathrm{all}}$ or $\operatorname{rtick}_{\mathrm{all}}$. +\todo{tick all} \begin{table}[ht] \centering \caption[Accuracies of Rule-Based Approaches on \glsentryshort{ISE}]{This table shows the accuracy of common trade classification rules and their variations for option trades on \gls{ISE} sample. Unclassifiable trades by the respective rule are assigned randomly as buy or sell. Hybrid methods are estimated using trade prices across all exchanges. We report the percentage of classifiable trades and the overall accuracy for subsets based on our train-test split and the entire dataset. The best rule is in bold.} @@ -57,6 +58,7 @@ \subsection{Results of Rule-Based Approaches}\label{sec:result-of-rule-based-app \label{fig:classical-accuracies-over-time} \end{figure} +\todo{tick all} \begin{table}[h] \centering \caption[Accuracies of Rule-Based Approaches on \glsentryshort{CBOE}]{This table shows the accuracy of common trade classification rules and their variations for option trades on \gls{CBOE} sample. Unclassifiable trades by the respective rule are assigned randomly as buy or sell. Hybrid methods are estimated using trade prices across all exchanges. We report the percentage of classifiable trades and the overall accuracy for subsets based on our train-test split and the entire dataset. The best rule is in bold.} @@ -871,9 +873,10 @@ \section{Application in Transaction Cost Estimation}\label{sec:application} \label{tab:effective-spread} \end{table} -In summary, quote-based algorithms like the quote rule and the \gls{LR} algorithm severely overestimate the effective spread. The overestimate is less severe for the \gls{CLNV} algorithm due to stronger dependency on the tick rule. The tick rule itself achieves estimates closest to the true effective spread, which is \num[round-mode=places, round-precision=3]{0.004926} and \num[round-mode=places, round-precision=3]{0.012219} for the \gls{ISE} and \gls{CBOE} sample respectively. As primarily tick-based algorithms, like the tick rule or \gls{EMO} rule, act as a random classifier in our samples, we conclude that the close estimate is an artefact to randomness, not due to superior predictive power. This observation is in line with \textcite[][897]{savickasInferringDirectionOption2003}, who make a similar argument for the \gls{EMO} rule on \gls{CBOE} trades. For rule-based algorithms $\operatorname{gsu}_{\mathrm{large}}$ provides reasonable estimates of the effective spread while achieving high classification accuracy. +In summary, quote-based algorithms like the quote rule and the \gls{LR} algorithm severely overestimate the effective spread. The overestimate is less severe for the \gls{CLNV} algorithm due to stronger dependency on the tick rule. The tick rule itself achieves estimates closest to the true effective spread, which is \num[round-mode=places, round-precision=3]{0.004926}[\$]{} and \num[round-mode=places, round-precision=3]{0.012219}[\$]{} for the \gls{ISE} and \gls{CBOE} sample respectively. As primarily tick-based algorithms, like the tick rule or \gls{EMO} rule, act as a random classifier in our samples, we conclude that the close estimate is an artifact to randomness, not due to superior predictive power. This observation is in line with \textcite[][897]{savickasInferringDirectionOption2003}, who make a similar argument for the \gls{EMO} rule on \gls{CBOE} trades. For rule-based algorithms $\operatorname{gsu}_{\mathrm{large}}$ provides reasonable estimates of the effective spread while achieving high classification accuracy. -From our supervised classifiers the FT-Transformer or \gls{GBRT} trained on \gls{FS} option provides estimates closest to the true effective spread, in particular on the \gls{CBOE} sample. For semi-supervised classifiers, Transformer-based models approximate the true effective spread best. This best manifests in a predicted effective spread at the \gls{ISE} of \num[round-mode=places, round-precision=3]{0.013118} versus \num[round-mode=places, round-precision=3]{0.004926}. The null hypothesis of equal medians is rejected at the \SI{1}{\percent} level for all classifiers. +From our supervised classifiers the FT-Transformer or \gls{GBRT} trained on \gls{FS} option provides estimates closest to the true effective spread, in particular on the \gls{CBOE} sample. For semi-supervised classifiers, Transformer-based models approximate the true effective spread best. This best manifests in a predicted effective spread at the \gls{ISE} of \SI[round-mode=places, round-precision=3]{0.013118}[\$]{} versus \SI[round-mode=places, round-precision=3]{0.004926}[\$]{}. The null hypothesis of equal medians is rejected at the \SI{1}{\percent} level for all classifiers. +\SI[round-precision=3]{0.005}[\$]{} Thus, $\operatorname{gsu}_{\mathrm{large}}$ provides the best estimate of the effective spread if the true labels are absent. For labelled data, Transformer or gradient boosting-based approaches can provide more accurate estimates. The de facto standard, the \gls{LR} algorithm, fails to deliver accurate estimates and may bias research. diff --git a/reports/Content/rule-approaches.tex b/reports/Content/rule-approaches.tex index 1a38bceb..a9820658 100644 --- a/reports/Content/rule-approaches.tex +++ b/reports/Content/rule-approaches.tex @@ -27,7 +27,7 @@ \subsection{Basic Rules}\label{sec:basic-rules} \subsubsection{Quote Rule}\label{sec:quote-rule} -The quote rule follows the rationale, that market makers provide quotes, against which buyers or sellers trade. It classifies a trade by comparing the trade price against the corresponding quotes at the time of the trade. We denote the sequence of trade prices of the $i$-th security by $\gls{P}_i = \langle P_{i,1},P_{i,2},\dots,P_{i,T}\rangle$ and the corresponding ask at $t$ by $\gls{A}_{i,t}$ and bid by $\gls{B}_{i,t}$. If the trade price is above the midpoint of the bid-ask spread, estimated as $\gls{M}_{i,t} = \tfrac{1}{2}(B_{i,t} + A_{i,t})$, the trade is classified as a buy and if it is below the midpoint, as a sell \autocite[][41]{harrisDayEndTransactionPrice1989}.\footnote{For simplicity we assume an ideal data regime, where quote data is complete and spreads are positive.} Thus, the classification rule on $\mathcal{A} = \left\{(i, t) \in \mathbb{N}^2: P_{i,t} \neq M_{i,t}\right\}$ is given by: +The quote rule follows the rationale, that market makers provide quotes, against which buyers or sellers trade. It classifies a trade by comparing the trade price against the corresponding quotes at the time of the trade. We denote the sequence of trade prices of the $i$-th security by $(P_{i,t})_{t=1}^{T}$ and the corresponding ask at $t$ by $\gls{A}_{i,t}$ and bid by $\gls{B}_{i,t}$. If the trade price is above the midpoint of the bid-ask spread, estimated as $\gls{M}_{i,t} = \tfrac{1}{2}(B_{i,t} + A_{i,t})$, the trade is classified as a buy and if it is below the midpoint, as a sell \autocite[][41]{harrisDayEndTransactionPrice1989}.\footnote{For simplicity we assume an ideal data regime, where quote data is complete and spreads are positive.} Thus, the classification rule on $\mathcal{A} = \left\{(i, t) \in \mathbb{N}^2: P_{i,t} \neq M_{i,t}\right\}$ is given by: \begin{equation} \operatorname{quote}\colon \mathcal{A} \to \mathcal{Y},\quad \operatorname{quote}(i, t)= diff --git a/reports/thesis.tex b/reports/thesis.tex index e6e21694..7ffeb649 100644 --- a/reports/thesis.tex +++ b/reports/thesis.tex @@ -261,7 +261,7 @@ \newglossaryentry{exploding-gradient}{name={exploding gradient},plural={exploding gradients},description={Exploding gradients is a problem encountered in training deep neural networks with backpropagation. Error gradients can accumulate, and result in very large parameter updates and unstable training of the network. The opposite is the vanishing gradient problem, whereby gradients become successively smaller during backpropagation, resulting in no or small parameter updates of the network. In both cases, the network does not converge.}} % compile only locally -% \includeonly{Content/introduction} +\includeonly{Content/introduction,Content/end} % ----------------------------------- Start of document ----------------------------------- \begin{document}