Skip to content

Releases: roedoejet/g2p

v2.2.0

12 Nov 22:15
b3ee783
Compare
Choose a tag to compare

✨ New Features

  • 1262cbb - add --quiet option to tests/run.py and refactor the runners (commit by @joanise)
  • c419518 - add a lexicon-based tokenizer, esp. for English (commit by @joanise)

🐛 Bug Fixes

  • 419507e - indent only the first line in click indented paragraphs (commit by @joanise)

⚡ Performance Improvements

  • 24a28e0 - prevent quadratic time cost of degenerate inputs for lexicon-based tok (commit by @joanise)

♻️ Refactors

  • cf38989 - tests: quiet and reformat some test suites (commit by @joanise)
  • 5682125 - simplify merge_if_same_label to clearer merge_same_type_tokens (commit by @joanise)
  • d662622 - move merge_non_word_tokens and split_non_word_tokens to utils (commit by @joanise)
  • 163bc39 - import utils as a whole instead of each function (commit by @joanise)
  • c3d73bf - change tokens from a a custom dict to a Token class (PR #406 by @joanise)

✅ Tests

  • 2b8a803 - heroku: exercise the real Heroku server command in CI (commit by @joanise)
  • 0b2c83c - better unit testing for mappings.utils (commit by @joanise)

🔧 Chores

  • b2bd476 - migrate the pre-commit config to 4.x style (commit by @joanise)

v2.1.1

17 Sep 12:59
53c78f1
Compare
Choose a tag to compare

This is a primarily a performance improvement patch, reducing the memory footprint by about 45MB, and the initial load time, by:

  • using a more compact in-memory structure for the English lexicon, and
  • replacing the heavy-weight networkx library by a tiny custom class implementing only the algorithms used.

✨ New Features

🐛 Bug Fixes

  • 8929608 - add [tool.setuptools_scm] in pyproject.toml to please the build system (commit by @joanise)
  • 208a8e0 - deps: pydantic 2.9 changes our schemas, so block it (commit by @joanise)
  • 16668b2 - enable type-checking and fix things (commit by @dhdaines)
  • 6ab8545 - make sure self.rules is always the type we say it is (commit by @dhdaines)
  • 3eee1a6 - seeing match_pattern or intermediate_form is an error (commit by @dhdaines)
  • bbcd1e8 - avoid unnecessarily requiring a schema update (commit by @joanise)

⚡ Performance Improvements

  • e605ae5 - compact lexicon entries to take less RAM (commit by @joanise)
  • 96abff3 - replace networkx by network_lite throughout reduces memory footprint and load time (commit by @joanise)

♻️ Refactors

✅ Tests

  • d03aabb - carefully cover compact lexicon corner cases (commit by @joanise)

🔧 Chores

v2.1.0

23 Aug 13:53
Compare
Choose a tag to compare

💥 BREAKING CHANGES

  • due to 74e6172 - reimplement v1 API with FastAPI (commit by @dhdaines):

    /api/v1 error status code for validation errors is always 422, no longer 400 or 404

✨ Major New Features

✨ New Features

  • 36e4dcc - switch to hatch and dynamic versioning (commit by @dhdaines)
  • e0a0219 - build: autogenerate requirements.txt with hatch-pip-compile (commit by @dhdaines)
  • 1fe3385 - add a G2P_LOGLEVEL environment variable (commit by @dhdaines)
  • bd33314 - add redirections for backward compatibility (commit by @dhdaines)
  • 74c5c47 - new API supporting textual alignments (commit by @dhdaines)
  • 7909e6e - Add sal-apa generic mapping for APA-based Salish writing systems (commit by @joanise)
  • 077afc2 - add logic to auto-delete as_is support in g2p 3 (commit by @joanise)
  • d4bffad - g2p convert accepts - for stdin and linux /dev/ pipes (commit by @joanise)
  • f0cf073 - g2p convert now accepts --file option to read a file (commit by @joanise)
  • a938917 - bump the current major.minor version to 2.1 (commit by @joanise)

🐛 Bug Fixes

Read more

v2.0.0

19 Mar 21:07
Compare
Choose a tag to compare

💥 BREAKING CHANGES

  • Mapping configuration files have changed, and the programmatic API has changed.
    Please visit the migration guide for information on how to update 1.x mappings to g2p 2.x and other changes.

  • due to 1d8e4fb - switch to pydantic 2 (commit by @roedoejet):
    Requires python 3.7 (dropped support for Python 3.6).

✨ New Features

🐛 Bug Fixes

⚡ Performance Improvements

  • a5f51b7 - only create APP when it is really needed (commit by @joanise)
  • 0b8d773 - defer a whole bunch of expensive imports from the CLI (commit by @joanise)
  • 978153b - remove the app from the cli to make the CLI faster (commit by @joanise)

♻️ Refactors

Read more

Release v1.1.20230822

22 Aug 18:17
Compare
Choose a tag to compare

1.1.20230822 (2023-08-22)

Features

  • deps: make dependencies dependant on the Python version (6e68140)
  • clm (Klallam) mapping to g2p (882925a)
  • moh: update moh mappings (14e8bc6)

Bug Fixes

  • bisect_left does not accept key before Python 3.10 (cbb9fb2)
  • updating flask means updating socketio means updating socket.io.js (785f668)
  • deps: make sure engineio and socketio are all compatible (600b2ec)
  • have generate-mapping create files that pass pre-commit hooks (f6494a9)
  • the egg syntax is deprecated, use the at syntax instead (697abcb)
  • deps: lock dnspython to compatible 2.3.0 (e4eaa96)
  • ^ and $ are null-length so require separate sorting for creating fixed-width lookbehind (1ef573b)
  • error with missing apostrophe (8e55e44)
  • mapping: fix bug in haa mapping and add test suite lookbehind construction (a9e5e69)
  • moh: change name of language to Kanien'kéha (e3ab8c3)
  • studio: pin hands on table to 12.4 (b7df593)

Performance Improvements

  • build only in_seq or mappings as needed for alignments (4e6de3b)
  • store lexicon alignments as strings to save memory (6543214)
  • store lexicon k:v entries as joined strings, even less RAM (b984c42)

Tests

  • add unit test case mimicking #130 to confirm it works on Windows (b413089)
  • exercise the short -h option in unit testing (40db7fc)

Build Systems

  • bump gunicorn to latest version, just published (01234c7)
  • bump Heroku runtime to 3.10.12 as per Heroku warning (7f249d9)
  • force Heroku to bump python to 3.10.11, and docs (a0b9c03)

Continuous Integration

  • only run the full matrix test on release (f02f1ff)
  • reorganize CI test suites (c04c660)
  • run matrix tests on push to main too since that gets deployed (2622913)

Documentation

  • tell the user they need python 3.7 if they try to run studio with older (50852d8)
  • update phoneset (5eb14b1)

Code Refactoring

  • apply dhd feedback to remove dead code and unflatten the alignment (324e1a2)

Release v1.1.20230511

11 May 18:01
Compare
Choose a tag to compare

1.1.20230511 (2023-05-11)

⚠ BREAKING CHANGES

  • make_g2p(in, out) used to not tokenize, now it does, and its tok_lang argument is deprecated
  • g2p convert now tokenizes by default

Features

  • expose the tokenize option to api/v1/g2p (3f572c4)
  • g2p convert now tokenizes by default (4d67902)
  • make_g2p now tokenizes by default and has new signature (ecfe2ca)

Bug Fixes

  • adjust all calls to make_g2p to its new signature (bea7cec)
  • g2p needs to update both generated .pkl and .json files (2be51f8), closes #237
  • remove --path option to g2p convert, which does not work anyway (f99774f)
  • use the more canonical DeprecationWarning to flag deprecation (e8a8a4d)
  • mappings: output should not be escaped (5bd3250)

Documentation

  • add tokenize arg for api/v1/g2p to swagger.json (d2f226f)

Continuous Integration

  • make test_studio.py fast enough to run on each push (5fa2a01)
  • remove unused coveralls, make our omit compat with coverage 7.x (3f9d2df)

Tests

  • execise api/v1/g2p with and without tokenize (c64322f)
  • improve coverage of error situations in CLI (0b3f5ee)

Code Refactoring

  • make Tokenizer the base class name, and declare to return types (7c8e8f1)
  • move deprecation and version checking code to their own file (e61daa4)
  • remove dead code in app.py, increase test cov and speed up tests (07e87d6)

Release v1.0.20230417

17 Apr 18:58
Compare
Choose a tag to compare

1.0.20230417 (2023-04-17)

Bug Fixes

  • eng is already in the langs now, no need to hardcode (b038ad2)
  • import g2p should not alter sys.stdout/err globally (80e0d1b)
  • the CLI (and only the CLI) needs to ensure utf8 output on Windows (cbeff1f)

Code Refactoring

  • move get_langs from Studio/readalongs to g2p (c06ae5d)
  • rename get_langs->get_arpabet_langs to make purpose clearer (7c5222e)

Continuous Integration

  • annotate version tags (99b1747)
  • make sure the CLI outputs utf8 on Windows (2612a1a)
  • tell codecov to ignore the utf8 patch for Windows (4339009)

Release v1.0.20230412

12 Apr 22:08
7836f9d
Compare
Choose a tag to compare

1.0.20230412 (2023-04-12)

⚠ BREAKING CHANGES

  • put network_to_echart where we can test it properly

Features

  • add -a/--substring-alignments argument to cli (6b41213)
  • add accessors for useful things like the input and output languages (cacce3b)
  • add aligned cmudict and lexicon transducer type (596ab82)
  • add alignments method to get textual alignments (e2303f4)
  • add edges for alignments in lexicon (f2c9f6c)
  • add proper typing to compose_indices (7bbfb6d)
  • add type checking and use Tuples (as they can be type checked) (4780702)
  • language name for spelling variants describe the variant (ffba389)
  • make the use of None explicit and limited (97aaed5)
  • make TransductionGraph and CompositeTransductionGraph compatible (e00790a)
  • output monotonic alignments for deletions and reorderings (126aa83)
  • properly normalize edges on concatenation (f37897c)
  • shrink pickle by optimizing alignment storage (0860ad6)
  • support lexicon mappings in Studio (but they are slow) (c824f6b)
  • switch script to use phonetisaurus from PyPI (bb91b12)

Bug Fixes

  • add spaces and avoid formatting (a5c2894)
  • avoid crashing on empty edges (8d57e68)
  • avoid creating None in input position (404306d)
  • comment and clean up substring_alignments (9cd84d8)
  • disable the utf8 fix for windows when running in pytest (bd5690a), closes #241
  • do not call logging.basicConfig, just config the logger itself (8ff314f)
  • emit input unchanged when no transducers exist (b0db10e)
  • fix doctor (0b0f2ed)
  • fix speed issues by not deep-copying alignments (56e933b)
  • make pretty_edges consistent and fix tests to expect tuples (065fa23)
  • make sure we do not output bogus edges (fab9f0a)
  • most sensible possible behaviour, keep spaces if user wanted them (70ab1e6)
  • remove impossible try/catch (2db239a)
  • remove spaces in sanitize_unidecode_output as suggested by @littell (bd1b1ec)
  • remove spontaneous extraneous spaces from und-ipa (9e64b7f)
  • remove unnecessary default value (722215a)
  • restore original edges API and rename alignments (c054256)
  • switching back to Custom did not actually work (7f0f640)
  • the only special character we want to escape is ? (7af2f0b)
  • update treatment of deletions in lexicon to match rules (18bdc6b)
  • use OrderedDict explicitly for clarity (d2ef567)

Documentation

  • add documentation for lexicon mappings (dcf5973)
  • add links to non-packaged files (9d6275c)
  • clarify use of generic type (7bb7df6)
  • clean up docstrings (91aa3b3)

Tests

  • add alignment tests and improve coverage for tranducers (76f85dd)
  • add coverage of invalid regex in rule (bd81a70)
  • add coverage to studio tests and app (0945336)
  • add test of lexicon loading from config file (22de19b)
  • fix studio test (31c9e48)
  • long delay no longer necessary (33efc1e)
  • make test_tokenizer.py exercise tce and unknown lang and default (1da815b)
  • run the expensive doctor test because it can catch errors (bb60f55)
  • update lexicon test for eng ipa (f05a513)

Code Refactoring

  • add explicit b, m, p, u rules to moh for borrowed words (2dc5e42)
  • put network_to_echart where we can test it properly (970e358)
  • remove superfluous list comprehension (dd8f5df)
  • test: when a mapping fails, show test case filename:lineno (fb309ec)
  • tests: quiet yappy test suites (c6423b6)

Styles

  • all other badges are rounded, why not the readme one? (ba76f57)
  • rewrite moh_equiv and moh_to_ipa in compact form (c781cbe)

Continuous Integration

  • replace deprecated actions/create-release by ncipollo/release-action (43d1060), closes #200
  • replace deprecated set-output and bump github-tag-action (8b40a1b), closes #200

Release v1.0.20230228

28 Feb 22:34
b29435b
Compare
Choose a tag to compare

1.0.20230228 (2023-02-28)

Bug fixes

Release v1.0.20230224

24 Feb 19:21
927c818
Compare
Choose a tag to compare

1.0.20230224 (2023-02-24)

Features

  • add nsy mapping for the nsyilxcən Language (8d7f04c)
  • improve the g2p-studio static page (79d4257)

Bug Fixes

  • mappings: bullet operator -> middle dot (f5b3d06)
  • studio: upgrade heroku stack and python runtime (b10abee)
  • address CWE-830 by adding integrity to scripts from cloudflare (00ccd31)
  • generate swagger.json the way our pre-commit hooks want it (0373abe)
  • in 2022, "python" is Python 3 (38c41da)
  • in 2022, "python" is Python 3 and "pip" works in CI (f2b892a)
  • on Windows, make generated files out LF so they're not spuriously changed (0613906)
  • ci: add codecov token to ci tests (2fbfe06)
  • ci: change ubuntu version (48158a2)
  • nsy: add glotal stop self-map so g2p knows it is an nsy letter (05a9c75)
  • nsy: fix the nsy->nsy-ipa mapping to the picky requirements of g2p (b6d5389)
  • nsy: handle a few more spelling variants (2a51896)
  • make Undetermined (und) process Arabic characters correctly (53dded4)
  • reqs: update flask to avoid werkzeug error (361c936)

Performance Improvements

Code Refactoring

  • change Nsyilxcən code to oka in all the files too (b5b3a60)
  • change to main (1ad9a98)
  • create class mg-bot for cleaner bottom margin implementation (cb4ab03)
  • rename nsy->oka to the official iso-6639-1 code for Nsyilxcən (f0b5bd6)
  • docs: use unpkg for fetching swagger ui (297b069)

Styles

  • apply a number of pylint recommendations (2e9b067)
  • let git blame ignore black and isort only commits (56896de)

Tests

  • add --describe option to run.py and exit 1 on error (6b560f0)
  • test that eng-ipa->eng-arpabet works ok with NFC and NFD inputs (2712b32)
  • use NFD output in fn_unicode test cases (a51bb46)
  • nsy: add references to most entries in nsy.csv (e7f7726)
  • nsy: fix the last word (question mark -> glottal stop) (946d7c3)

Continuous Integration

  • add CodeQL automated vulnerability scanning (9fc96fd)
  • bump CI actions to current to heed GitHub warnings (4858bbb)
  • g2p codecov action does not use dir (8880845)
  • only run CodeQL on cron and push to master and release (11aaff4)
  • stop failing CI when codecov fails to upload (557fc3d)
  • use ubuntu-20.04 since ubuntu-latest no longer supports Python 3.6 (cb91794)