Releases: roedoejet/g2p
Releases · roedoejet/g2p
v2.2.0
✨ New Features
1262cbb
- add --quiet option to tests/run.py and refactor the runners (commit by @joanise)c419518
- add a lexicon-based tokenizer, esp. for English (commit by @joanise)
🐛 Bug Fixes
⚡ Performance Improvements
24a28e0
- prevent quadratic time cost of degenerate inputs for lexicon-based tok (commit by @joanise)
♻️ Refactors
cf38989
- tests: quiet and reformat some test suites (commit by @joanise)5682125
- simplify merge_if_same_label to clearer merge_same_type_tokens (commit by @joanise)d662622
- move merge_non_word_tokens and split_non_word_tokens to utils (commit by @joanise)163bc39
- import utils as a whole instead of each function (commit by @joanise)c3d73bf
- change tokens from a a custom dict to a Token class (PR #406 by @joanise)
✅ Tests
2b8a803
- heroku: exercise the real Heroku server command in CI (commit by @joanise)0b2c83c
- better unit testing for mappings.utils (commit by @joanise)
🔧 Chores
v2.1.1
This is a primarily a performance improvement patch, reducing the memory footprint by about 45MB, and the initial load time, by:
- using a more compact in-memory structure for the English lexicon, and
- replacing the heavy-weight networkx library by a tiny custom class implementing only the algorithms used.
✨ New Features
966a057
- allow panphon 0.21 where possible (commit by @joanise)aa9de1c
- g2p show-mappings to display language names too (commit by @joanise)c70f30f
- network_lite with minimal DiGraph class (commit by @joanise)123e27b
- add full type signatures to DiGraph (commit by @dhdaines)6eb29ac
- revamp schema versioning and update-schema (commit by @joanise)
🐛 Bug Fixes
8929608
- add [tool.setuptools_scm] in pyproject.toml to please the build system (commit by @joanise)208a8e0
- deps: pydantic 2.9 changes our schemas, so block it (commit by @joanise)16668b2
- enable type-checking and fix things (commit by @dhdaines)6ab8545
- make sure self.rules is always the type we say it is (commit by @dhdaines)3eee1a6
- seeing match_pattern or intermediate_form is an error (commit by @dhdaines)bbcd1e8
- avoid unnecessarily requiring a schema update (commit by @joanise)
⚡ Performance Improvements
e605ae5
- compact lexicon entries to take less RAM (commit by @joanise)96abff3
- replace networkx by network_lite throughout reduces memory footprint and load time (commit by @joanise)
♻️ Refactors
d1b3437
- simplify shortest_path code (commit by @dhdaines)e2def43
- only declare the SCM pretend version in one place (commit by @joanise)
✅ Tests
🔧 Chores
v2.1.0
💥 BREAKING CHANGES
-
due to
74e6172
- reimplement v1 API with FastAPI (commit by @dhdaines):/api/v1 error status code for validation errors is always 422, no longer 400 or 404
✨ Major New Features
74e6172
- reimplement v1 API with FastAPI (commit by @dhdaines)605ccd3
- reimplement Studio app with FastAPI (commit by @dhdaines)c214c6f
- add /api/v2 to studio but also make it standaloneable (commit by @dhdaines)
✨ New Features
36e4dcc
- switch to hatch and dynamic versioning (commit by @dhdaines)e0a0219
- build: autogenerate requirements.txt with hatch-pip-compile (commit by @dhdaines)1fe3385
- add a G2P_LOGLEVEL environment variable (commit by @dhdaines)bd33314
- add redirections for backward compatibility (commit by @dhdaines)74c5c47
- new API supporting textual alignments (commit by @dhdaines)7909e6e
- Add sal-apa generic mapping for APA-based Salish writing systems (commit by @joanise)077afc2
- add logic to auto-delete as_is support in g2p 3 (commit by @joanise)d4bffad
- g2p convert accepts - for stdin and linux /dev/ pipes (commit by @joanise)f0cf073
- g2p convert now accepts --file option to read a file (commit by @joanise)a938917
- bump the current major.minor version to 2.1 (commit by @joanise)
🐛 Bug Fixes
1cc2afe
- ci: eventlet 0.36.0 considered harmful (commit by @dhdaines)d6004f9
- style: bump black to 24.3.0 to fix black's first CVE (commit by @joanise)05f51f9
- do not try to send whole lexicon over the wire (commit by @dhdaines)49ad2ff
- port 5000 is used by MacOS on external interfaces (commit by @dhdaines)629209b
- test: use 127.0.0.1 explicitly to avoid ipv6 confusion (commit by @dhdaines)d105e5f
- allow other mapping arguments, use on-disk alignments (commit by @dhdaines)b29b23f
- ci: eventlet 0.36.0 considered harmful (commit by @dhdaines)baef8fd
- ci: remove bogus sleep (commit by @dhdaines)52b3bfd
- needed apply-longest-first for atj (since the beginning (commit by @dhdaines)d9a07e5
- do not copy the input mapping filename when generating (commit by @dhdaines)ea04262
- do not try to generate mappings for empty outputs (commit by @dhdaines)f50768e
- g2p convert should not add newline when input is a file (commit by @joanise)561817c
- deps: specific anti-dependency on broken coloredlogs version (commit by @dhdaines)9f92f65
- deps: use optional dependencies correctly (for docs too) (commit by @dhdaines)c8cba5f
- test: no longer require flask needlessly for some tests (commit by @dhdaines)1a602ca
- build: various build fixes (commit by @dhdaines)9543c96
- deps: old versions of eventlet are also broken (commit by @dhdaines)4e6c3ab
- docs: add install link for hatch (commit by @dhdaines)656f07a
- ci: ensure version matches schema (commit by @dhdaines)4e23d76
- docs: mention conda (commit by @dhdaines)38d5290
- build: add a hook to make sure we have g2p/_version.py on heroku (commit by @dhdaines)1bba827
- update API for newer FastAPI (commit by @dhdaines)5922f6f
- get Studio working with FastAPI (commit by @dhdaines)98a07f1
- restore compatible 404 response and enable api tests (commit by @dhdaines)89bd9b3
- deps: fix deps for api (commit by @dhdaines)cfc50c6
- update prod environment and workflow (commit by @dhdaines)ebc16ff
- now need python 3.8 on windows (commit by @dhdaines)0a7c78b
- not sure why we need to disable sendfile (commit by @dhdaines)4bcd948
- remove fastapi-socketio (commit by @dhdaines)d5d2086
- make the g2p library tests still run on Python 3.7 (commit by @joanise)f55e6bb
- ci: make coverage work again (commit by @dhdaines)9bc3855
- test: fix coverage (commit by @dhdaines)ff6c92d
- more specific dependency to avoid gnashing of teeth (commit by @dhdaines)2d68577
- deps: correct the gunicorn dependency... again (commit by @dhdaines)5e3c0f1
- split /langs and /nodes as they are not the same thing (commit by @dhdaines)a88df6a
- build: depend on gitlint-core, not gitlint (commit by @joanise)f126e1d
- studio: studio is same-origin so no CORS, also add debug option (commit by @dhdaines)9f88fbf
- studio: make deleting entire input work right (commit by @dhdaines)2a18cdd
- ci: enable G2P_STUDIO_DEBUG to satisfy coverage (commit by @dhdaines)30b572a
- normalize ó in mohawk (commit by @MENGZHEGENG)e6a1280
- app: do not rely on running at the g2p root dir (commit by @joanise)627ca2e
- tests: silence the logs in test_api_resources tests (commit by @joanise)54fc772
- deps: pin panphon to 0.19-0.20 as 0.21 breaks many things (commit by @dhdaines)3323eb4
- ci: remove stale job dependency in pythonpublish w...
v2.0.0
💥 BREAKING CHANGES
-
Mapping configuration files have changed, and the programmatic API has changed.
Please visit the migration guide for information on how to update 1.x mappings to g2p 2.x and other changes. -
due to
1d8e4fb
- switch to pydantic 2 (commit by @roedoejet):
Requires python 3.7 (dropped support for Python 3.6).
✨ New Features
fd33a26
- cli: add update-schema command (commit by @roedoejet)f85c4f2
- use json for network as well (commit by @dhdaines)b01ec23
- upgrade networkx now that we can (commit by @dhdaines)9fe200d
- schema: update schema generation to include dialect spec by default (commit by @roedoejet)a04aeff
- add case preservation option to mappings (commit by @roedoejet)c31c66b
- g2p-studio also needs to support preserve_case (commit by @joanise)7447fe6
- make x caron equiv to x dot below in clm (commit by @joanise)d4fdc8c
- str: accept space+comb-cedilla or space+comb-comma as equiv to cedilla (commit by @joanise)
🐛 Bug Fixes
20e3bcb
- pkl: remove generated default date (commit by @roedoejet)22644e7
- studio: refactor to 'rules' instead of 'mapping' key (commit by @roedoejet)30dc282
- ci: require 3.8 for windows ci (commit by @roedoejet)1df2dfd
- add miscellaneous style fixes and typos (commit by @roedoejet)5ccd595
- update: prevent loading all the mappings multiple times (commit by @roedoejet)45d5ecf
- tests: fix studio tests (commit by @roedoejet)16e4869
- restore Python 3.7 compatibility (commit by @joanise)060a8aa
- use more generic variable names (commit by @dhdaines)ac2d42d
- deps: back off networkx dep for python 3.7 (commit by @dhdaines)fa27730
- crg: fix various rule feeding and ordering bugs for Michif (commit by @joanise)007aef5
- crg: manually clean up crg-ipa -> eng-ipa (commit by @joanise)0e9271a
- test: fix failure in test failure (commit by @dhdaines)15d5b64
- test file could have arbitrary extra fields (commit by @dhdaines)25f4713
- output a compatible config-g2p.yaml though some filenames change (commit by @dhdaines)32fe87c
- add config_only option to export_to_dict (commit by @dhdaines)b5f9747
- um, yes, model_dump() exists (commit by @dhdaines)2e5e560
- do not exclude defaults, just inappropriate keys for config (commit by @dhdaines)9975100
- add missing double vowel vowels to crg (commit by @dhdaines)83b6c1c
- cursèd unicode g strikes again (commit by @dhdaines)f766a66
- remove werkzeug lock since it is no longer necessary (commit by @joanise)1c7792f
- correct the unit testing output for g2p mapping errors (commit by @joanise)996a060
- remove unused kwargs in transducer call (commit by @roedoejet)d1aa6dd
- sort rules without explicit indices (commit by @roedoejet)d768d74
- detect incompatible case_sensitive+preserve_case instances (commit by @joanise)35868bb
- preserve indices through prevent-feeding intermediate form (commit by @joanise)01ff75e
- fix coverage issues and grepping for slow imports (commit by @joanise)251739a
- deps: lock numpy<2 because 2.0.0 is coming and has breaking changes (commit by @joanise)27d0d2d
- rename crj and crl "East Cree, Nor/Southern" so they sort nicely (commit by @joanise)17519d8
- y in oka should go to /j/, palatal glide, not /y/ (commit by @joanise)b52a819
- issue a fatal error when reading an empty mapping (commit by @joanise)95bf4be
- app: errors in mappings should just trigger console warnings (commit by @joanise)5993242
- str: cedilla is now the default glottal stop character (commit by @joanise)d18d17a
- publish schemas only for major.minor, ignoring .patch (commit by @joanise)f2a7563
- assertEquals is removed from Python 3.12 (commit by @joanise)5592659
- close xlsx workbook after reading (commit by @joanise)7f34057
- loading xlsx workbooks should not fail on empty cells (commit by @joanise)
⚡ Performance Improvements
a5f51b7
- only create APP when it is really needed (commit by @joanise)0b8d773
- defer a whole bunch of expensive imports from the CLI (commit by @joanise)978153b
- remove the app from the cli to make the CLI faster (commit by @joanise)
♻️ Refactors
eec8e82
- massive refactor to pydantic (commit by @roedoejet)1d8e4fb
- switch to pydantic 2 (commit by @roedoejet)a753e07
- config: require a 'mappings' key (commit by @roedoejet)006d370
- in_char and out_char to rule_input and rule_output (commit by @roedoejet)b448523
- change to config-g2p.yaml (commit by @roedoejet)5a67040
- change langs.pkl to langs.json (commit by @roedoejet)5b259ff
- separate data and path for rules, abbreviations, and alignments (commit by @roedoejet)ddefe77
- make mapping.rules the only way to get to the rules (commit by @joanise)- [
090145e
](090145eff53470e8a23...
Release v1.1.20230822
1.1.20230822 (2023-08-22)
Features
- deps: make dependencies dependant on the Python version (6e68140)
- clm (Klallam) mapping to g2p (882925a)
- moh: update moh mappings (14e8bc6)
Bug Fixes
- bisect_left does not accept key before Python 3.10 (cbb9fb2)
- updating flask means updating socketio means updating socket.io.js (785f668)
- deps: make sure engineio and socketio are all compatible (600b2ec)
- have generate-mapping create files that pass pre-commit hooks (f6494a9)
- the egg syntax is deprecated, use the at syntax instead (697abcb)
- deps: lock dnspython to compatible 2.3.0 (e4eaa96)
- ^ and $ are null-length so require separate sorting for creating fixed-width lookbehind (1ef573b)
- error with missing apostrophe (8e55e44)
- mapping: fix bug in haa mapping and add test suite lookbehind construction (a9e5e69)
- moh: change name of language to Kanien'kéha (e3ab8c3)
- studio: pin hands on table to 12.4 (b7df593)
Performance Improvements
- build only in_seq or mappings as needed for alignments (4e6de3b)
- store lexicon alignments as strings to save memory (6543214)
- store lexicon k:v entries as joined strings, even less RAM (b984c42)
Tests
- add unit test case mimicking #130 to confirm it works on Windows (b413089)
- exercise the short -h option in unit testing (40db7fc)
Build Systems
- bump gunicorn to latest version, just published (01234c7)
- bump Heroku runtime to 3.10.12 as per Heroku warning (7f249d9)
- force Heroku to bump python to 3.10.11, and docs (a0b9c03)
Continuous Integration
- only run the full matrix test on release (f02f1ff)
- reorganize CI test suites (c04c660)
- run matrix tests on push to main too since that gets deployed (2622913)
Documentation
- tell the user they need python 3.7 if they try to run studio with older (50852d8)
- update phoneset (5eb14b1)
Code Refactoring
- apply dhd feedback to remove dead code and unflatten the alignment (324e1a2)
Release v1.1.20230511
1.1.20230511 (2023-05-11)
⚠ BREAKING CHANGES
- make_g2p(in, out) used to not tokenize, now it does, and its tok_lang argument is deprecated
- g2p convert now tokenizes by default
Features
- expose the tokenize option to api/v1/g2p (3f572c4)
- g2p convert now tokenizes by default (4d67902)
- make_g2p now tokenizes by default and has new signature (ecfe2ca)
Bug Fixes
- adjust all calls to make_g2p to its new signature (bea7cec)
- g2p needs to update both generated .pkl and .json files (2be51f8), closes #237
- remove --path option to g2p convert, which does not work anyway (f99774f)
- use the more canonical DeprecationWarning to flag deprecation (e8a8a4d)
- mappings: output should not be escaped (5bd3250)
Documentation
- add tokenize arg for api/v1/g2p to swagger.json (d2f226f)
Continuous Integration
- make test_studio.py fast enough to run on each push (5fa2a01)
- remove unused coveralls, make our omit compat with coverage 7.x (3f9d2df)
Tests
- execise api/v1/g2p with and without tokenize (c64322f)
- improve coverage of error situations in CLI (0b3f5ee)
Code Refactoring
Release v1.0.20230417
1.0.20230417 (2023-04-17)
Bug Fixes
- eng is already in the langs now, no need to hardcode (b038ad2)
- import g2p should not alter sys.stdout/err globally (80e0d1b)
- the CLI (and only the CLI) needs to ensure utf8 output on Windows (cbeff1f)
Code Refactoring
- move get_langs from Studio/readalongs to g2p (c06ae5d)
- rename get_langs->get_arpabet_langs to make purpose clearer (7c5222e)
Continuous Integration
Release v1.0.20230412
1.0.20230412 (2023-04-12)
⚠ BREAKING CHANGES
- put network_to_echart where we can test it properly
Features
- add -a/--substring-alignments argument to cli (6b41213)
- add accessors for useful things like the input and output languages (cacce3b)
- add aligned cmudict and lexicon transducer type (596ab82)
- add alignments method to get textual alignments (e2303f4)
- add edges for alignments in lexicon (f2c9f6c)
- add proper typing to compose_indices (7bbfb6d)
- add type checking and use Tuples (as they can be type checked) (4780702)
- language name for spelling variants describe the variant (ffba389)
- make the use of None explicit and limited (97aaed5)
- make TransductionGraph and CompositeTransductionGraph compatible (e00790a)
- output monotonic alignments for deletions and reorderings (126aa83)
- properly normalize edges on concatenation (f37897c)
- shrink pickle by optimizing alignment storage (0860ad6)
- support lexicon mappings in Studio (but they are slow) (c824f6b)
- switch script to use phonetisaurus from PyPI (bb91b12)
Bug Fixes
- add spaces and avoid formatting (a5c2894)
- avoid crashing on empty edges (8d57e68)
- avoid creating None in input position (404306d)
- comment and clean up substring_alignments (9cd84d8)
- disable the utf8 fix for windows when running in pytest (bd5690a), closes #241
- do not call logging.basicConfig, just config the logger itself (8ff314f)
- emit input unchanged when no transducers exist (b0db10e)
- fix doctor (0b0f2ed)
- fix speed issues by not deep-copying alignments (56e933b)
- make pretty_edges consistent and fix tests to expect tuples (065fa23)
- make sure we do not output bogus edges (fab9f0a)
- most sensible possible behaviour, keep spaces if user wanted them (70ab1e6)
- remove impossible try/catch (2db239a)
- remove spaces in
sanitize_unidecode_output
as suggested by @littell (bd1b1ec) - remove spontaneous extraneous spaces from und-ipa (9e64b7f)
- remove unnecessary default value (722215a)
- restore original edges API and rename alignments (c054256)
- switching back to Custom did not actually work (7f0f640)
- the only special character we want to escape is ? (7af2f0b)
- update treatment of deletions in lexicon to match rules (18bdc6b)
- use OrderedDict explicitly for clarity (d2ef567)
Documentation
- add documentation for lexicon mappings (dcf5973)
- add links to non-packaged files (9d6275c)
- clarify use of generic type (7bb7df6)
- clean up docstrings (91aa3b3)
Tests
- add alignment tests and improve coverage for tranducers (76f85dd)
- add coverage of invalid regex in rule (bd81a70)
- add coverage to studio tests and app (0945336)
- add test of lexicon loading from config file (22de19b)
- fix studio test (31c9e48)
- long delay no longer necessary (33efc1e)
- make test_tokenizer.py exercise tce and unknown lang and default (1da815b)
- run the expensive doctor test because it can catch errors (bb60f55)
- update lexicon test for eng ipa (f05a513)
Code Refactoring
- add explicit b, m, p, u rules to moh for borrowed words (2dc5e42)
- put network_to_echart where we can test it properly (970e358)
- remove superfluous list comprehension (dd8f5df)
- test: when a mapping fails, show test case filename:lineno (fb309ec)
- tests: quiet yappy test suites (c6423b6)
Styles
- all other badges are rounded, why not the readme one? (ba76f57)
- rewrite moh_equiv and moh_to_ipa in compact form (c781cbe)
Continuous Integration
Release v1.0.20230228
Release v1.0.20230224
1.0.20230224 (2023-02-24)
Features
Bug Fixes
- mappings: bullet operator -> middle dot (f5b3d06)
- studio: upgrade heroku stack and python runtime (b10abee)
- address CWE-830 by adding integrity to scripts from cloudflare (00ccd31)
- generate swagger.json the way our pre-commit hooks want it (0373abe)
- in 2022, "python" is Python 3 (38c41da)
- in 2022, "python" is Python 3 and "pip" works in CI (f2b892a)
- on Windows, make generated files out LF so they're not spuriously changed (0613906)
- ci: add codecov token to ci tests (2fbfe06)
- ci: change ubuntu version (48158a2)
- nsy: add glotal stop self-map so g2p knows it is an nsy letter (05a9c75)
- nsy: fix the nsy->nsy-ipa mapping to the picky requirements of g2p (b6d5389)
- nsy: handle a few more spelling variants (2a51896)
- make Undetermined (und) process Arabic characters correctly (53dded4)
- reqs: update flask to avoid werkzeug error (361c936)
Performance Improvements
- collapse loops (809adaa)
Code Refactoring
- change Nsyilxcən code to oka in all the files too (b5b3a60)
- change to main (1ad9a98)
- create class mg-bot for cleaner bottom margin implementation (cb4ab03)
- rename nsy->oka to the official iso-6639-1 code for Nsyilxcən (f0b5bd6)
- docs: use unpkg for fetching swagger ui (297b069)
Styles
- apply a number of pylint recommendations (2e9b067)
- let git blame ignore black and isort only commits (56896de)
Tests
- add --describe option to run.py and exit 1 on error (6b560f0)
- test that eng-ipa->eng-arpabet works ok with NFC and NFD inputs (2712b32)
- use NFD output in fn_unicode test cases (a51bb46)
- nsy: add references to most entries in nsy.csv (e7f7726)
- nsy: fix the last word (question mark -> glottal stop) (946d7c3)
Continuous Integration
- add CodeQL automated vulnerability scanning (9fc96fd)
- bump CI actions to current to heed GitHub warnings (4858bbb)
- g2p codecov action does not use dir (8880845)
- only run CodeQL on cron and push to master and release (11aaff4)
- stop failing CI when codecov fails to upload (557fc3d)
- use ubuntu-20.04 since ubuntu-latest no longer supports Python 3.6 (cb91794)