Merge pull request #11 from fidelity/elmo_fix

Change elmo embedding to use tfhub
fidelity · Dec 23, 2022 · 20c4115 · 20c4115
2 parents 90f6022 + 55def8c
commit 20c4115
Show file tree

Hide file tree

Showing 38 changed files with 879 additions and 785 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -15,7 +15,7 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: [3.6, 3.7, 3.8, 3.9]
+        python-version: ['3.7', '3.8', '3.9', '3.10']
     steps:
       - uses: actions/checkout@v2
       - name: Set up Python ${{ matrix.python-version }}

diff --git a/CHANGELOG.txt b/CHANGELOG.txt
@@ -2,6 +2,12 @@
 TextWiser CHANGELOG
 =====================
 
+-------------------------------------------------------------------------------
+Dec 19, 2022 1.5.0
+-------------------------------------------------------------------------------
+major:
+- Utilize ELMo from TFHub and remove allennlp dependency
+
 -------------------------------------------------------------------------------
 Mar 03, 2022 1.4.0
 -------------------------------------------------------------------------------

diff --git a/README.md b/README.md
@@ -59,7 +59,7 @@ vecs = emb.fit_transform(documents)
 | Word Embedding: [Word2Vec](https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/CLASSIC_WORD_EMBEDDINGS.md) | Supported by these [pretrained embeddings](https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/CLASSIC_WORD_EMBEDDINGS.md) <br> Common pretrained options include ``crawl``, ``glove``, ``extvec``, ``twitter``, and ``en-news`` <br> When the pretrained option is ``None``, trains a new model from the given data <br> Defaults to ``en``, FastText embeddings trained on news |
 | Word Embedding: [Character](https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_3_WORD_EMBEDDING.md#character-embeddings)| Initialized randomly and not pretrained <br> Useful when trained for a downstream task <br> Enable [fine-tuning](#fine-tuning-for-downstream-tasks) to get good embeddings |
 | Word Embedding: [BytePair](https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/BYTE_PAIR_EMBEDDINGS.md) | Supported by these [pretrained embeddings](https://nlp.h-its.org/bpemb/#download>) <br> Pretrained options can be specified with the string ``<lang>_<dim>_<vocab_size>`` <br> Default options can be omitted like ``en``, ``en_100``, or ``en__10000`` <br> Defaults to ``en``, which is equal to ``en_100_10000`` |
-| Word Embedding: [ELMo](https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/ELMO_EMBEDDINGS.md) | Supported by these [pretrained embeddings](https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/ELMO_EMBEDDINGS.md) from [AllenNLP](https://allennlp.org) <br> Defaults to ``original`` |
+| Word Embedding: [ELMo](https://tfhub.dev/google/elmo/3) | Supported by these [pretrained embeddings](https://tfhub.dev/google/elmo/3) from TensorflowHub <br> Defaults to ``original`` |
 | Word Embedding: [Flair](https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/FLAIR_EMBEDDINGS.md) |  Supported by these [pretrained embeddings](https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/FLAIR_EMBEDDINGS.md) <br> Defaults to ``news-forward-fast`` |
 | Word Embedding: [BERT](https://github.com/huggingface/transformers#model-architectures)| Supported by these [pretrained embeddings](https://huggingface.co/transformers/pretrained_models.html) <br> Defaults to ``bert-base-uncased`` |
 | Word Embedding: [OpenAI GPT](https://github.com/huggingface/transformers#model-architectures)| Supported by these [pretrained embeddings](https://huggingface.co/transformers/pretrained_models.html) <br> Defaults to ``openai-gpt`` |

diff --git a/docs/.buildinfo b/docs/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 6434429e0258ad68daae5d25176eb3ff
+config: 94dc7a465159832e42c7096614d5aa1d
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/docs/_sources/embeddings.rst.txt b/docs/_sources/embeddings.rst.txt
@@ -26,7 +26,7 @@ Embeddings
     | Pretrained options can be specified with the string ``<lang>_<dim>_<vocab_size>``
     | Default options can be omitted like ``en``, ``en_100``, or ``en__10000``
     | Defaults to ``en``, which is equal to ``en_100_10000``"
-    "Word Embedding: `ELMo <https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/ELMO_EMBEDDINGS.md>`_", "| Supported by these `pretrained embeddings <https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/ELMO_EMBEDDINGS.md>`_ from `AllenNLP <https://allennlp.org>`_
+    "Word Embedding: `ELMo <https://tfhub.dev/google/elmo/3>`_", "| Supported by these `options <https://tfhub.dev/google/elmo/3>`_
     | Defaults to ``original``"
     "Word Embedding: `Flair <https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/FLAIR_EMBEDDINGS.md>`_", "| Supported by these `pretrained embeddings <https://github.com/zalandoresearch/flair/blob/master/resources/docs/embeddings/FLAIR_EMBEDDINGS.md>`_
     | Defaults to ``news-forward-fast``"

diff --git a/docs/_sources/installation.rst.txt b/docs/_sources/installation.rst.txt
@@ -26,8 +26,7 @@ The library is based on PyTorch but it also relies on:
 * Flair for word vectors
 * Transformers for contextual word vectors
 * Spacy and it's ``en`` model are optional imports for OpenAI GPT; the model can be installed using ``python -m spacy download en``
-* Tensorflow is an optional import for Universal Sentence Encoder. If you want to use USE, make sure you satisfy ``tensorflow>=2.0.0`` and ``tensorflow-hub>=0.7.0``.
-* AllenNLP is an optional import for ELMo. If you want to use ELMo, make sure you satisfy ``allennlp``
+* Tensorflow is an optional import for Universal Sentence Encoder and ELMo. If you want to use USE or ELMo, make sure you satisfy ``tensorflow>=2.0.0`` and ``tensorflow-hub>=0.7.0``.
 * UMAP is an optional import for UMAP transformation. If you want to use UMAP, make sure you satisfy ``umap-learn>=0.5.1``
 
 PyPI

diff --git a/docs/_static/basic.css b/docs/_static/basic.css
@@ -222,7 +222,7 @@ table.modindextable td {
 /* -- general body styles --------------------------------------------------- */
 
 div.body {
-    min-width: 360px;
+    min-width: 450px;
     max-width: 800px;
 }
 
@@ -335,13 +335,13 @@ p.sidebar-title {
     font-weight: bold;
 }
 
-div.admonition, div.topic, aside.topic, blockquote {
+div.admonition, div.topic, blockquote {
     clear: left;
 }
 
 /* -- topics ---------------------------------------------------------------- */
 
-div.topic, aside.topic {
+div.topic {
     border: 1px solid #ccc;
     padding: 7px;
     margin: 10px 0 10px 0;
@@ -380,15 +380,13 @@ div.body p.centered {
 div.sidebar > :last-child,
 aside.sidebar > :last-child,
 div.topic > :last-child,
-aside.topic > :last-child,
 div.admonition > :last-child {
     margin-bottom: 0;
 }
 
 div.sidebar::after,
 aside.sidebar::after,
 div.topic::after,
-aside.topic::after,
 div.admonition::after,
 blockquote::after {
     display: block;
@@ -430,6 +428,10 @@ table.docutils td, table.docutils th {
     border-bottom: 1px solid #aaa;
 }
 
+table.footnote td, table.footnote th {
+    border: 0 !important;
+}
+
 th {
     text-align: left;
     padding-right: 5px;
@@ -613,7 +615,6 @@ ul.simple p {
     margin-bottom: 0;
 }
 
-/* Docutils 0.17 and older (footnotes & citations) */
 dl.footnote > dt,
 dl.citation > dt {
     float: left;
@@ -631,33 +632,6 @@ dl.citation > dd:after {
     clear: both;
 }
 
-/* Docutils 0.18+ (footnotes & citations) */
-aside.footnote > span,
-div.citation > span {
-    float: left;
-}
-aside.footnote > span:last-of-type,
-div.citation > span:last-of-type {
-  padding-right: 0.5em;
-}
-aside.footnote > p {
-  margin-left: 2em;
-}
-div.citation > p {
-  margin-left: 4em;
-}
-aside.footnote > p:last-of-type,
-div.citation > p:last-of-type {
-    margin-bottom: 0em;
-}
-aside.footnote > p:last-of-type:after,
-div.citation > p:last-of-type:after {
-    content: "";
-    clear: both;
-}
-
-/* Footnotes & citations ends */
-
 dl.field-list {
     display: grid;
     grid-template-columns: fit-content(30%) auto;