Skip to content

Releases: roedoejet/g2p

Release v0.5.20221013

13 Oct 20:26
bb765d4
Compare
Choose a tag to compare

0.5.20221013 (2022-10-13)

Features

  • add dummy mappings for english and mohawk (eaf2c70)
  • add dummy mappings for english and mohawk (85d06f0)
  • add iku-ipa to hamming-eng-ipa mapping (9ae5484)
  • add und and str dummy mappings as well as distance specification for mapping alignment (7f93447)
  • mappings: added more dummy mappings (d89f5b7)
  • add und-ipa to hamming-eng-ipa (7cf4118)
  • basic Finnish mapping (36d17e0)
  • check Python version (9e086e1)
  • do arpabet checking for hamming-eng-arpabet too (11728ec)
  • include NFC/D normalization in g2p graph (f3b918c), closes #158 #158
  • und now maps colon to an empty string (0b8c8d9)
  • mappings: allow abbreviations to be declared recursively (e1a270f)
  • show-mappings: add --csv option (43573ac)
  • show-mappings: added cli cmd g2p show-mappings (9b2e489)

Bug Fixes

  • studio: sort nodes in language echart (11b92f2)
  • accept single or multiple mappings in config.yaml (0c4961f)
  • always declare your file encoding, or Windows barfs (67de22f)
  • always declare your file encoding, or Windows barfs (56e327b)
  • avoid failure on corrupted pickles (6f30f0e)
  • catch same input and output g2p mapping bug (0a9d141)
  • correct fin diphthong mappings slightly (b8ae37d)
  • doh! always run the test suites before pushing your changes... (6af7d5e)
  • edit fin-ipa to eng-ipa mappings to fix some vowels (19a9189)
  • ensure python g2p/mappings/langs/init.py can always run (ea45afa)
  • find language_name robustly (4e0ccb9)
  • g2p show-mappings -v show in, out, rest, in that order (da06abd)
  • generated mappings should prevent feedback and apply-longest-first (d10944a)
  • grammar (e14cbcd)
  • lock click==8.0.4 since we support Python 3.6 (f941c64)
  • make make_tokenizer disambiguate in_lang and tok_path (9e7f986)
  • make Mohawk tokenizer recognize colon as a letter (767fed4)
  • make Mohawk tokenizer recognize colon as a letter (a32a3a7)
  • make und work in g2p studio (0dd3e25), closes #165
  • mic "o" didn't get mapped to proper eng-arpabet (1373b41)
  • name g2p in package.json, not readalongs (b18eea8)
  • recreate langs.pkl to allow merging (387c6fc)
  • remove stray BOMs everywhere (3c0f13b)
  • supported renamed dolgo/dogol distance in panphon (0c73399)
  • indices: handle orphan characters with heuristic of attaching to index of previous character if it exists, otherwise attaching to the index of the following character, if no characters exist before or after, then none type is returned. Fixes #172 (b1ca2cb)
  • moe: add self-mappings for k, m, n, p, s, t (a82d098)
  • regenerate mappings and configs (0d71872)
  • remove UTF-8 BOM and CRLF (will fix code separately) (5806ab9)
  • rules with alternations should tokenize correctly (9fd6407)
  • show default directory in help (679ffe1)
  • tell user to rerun g2p update (they can) (eeac7dd)
  • tli_equiv and tce_equiv had BOMs, update to remove them (5b5d0b5)
  • use an automatically generated mapping for moe-ipa -> eng-ipa (1bc9b05)
  • ci: trailing space for json (a58c205)
  • git: make fixes to ejective mappings (27003e2)
  • indices: fix numerous errors within the indices functionality (7a6c631)
  • mappings: fix normalization issues in win and eng mappings (fd2bb0f)
  • moe: remove two more duplicate rules (bd36902)
  • studio: updated reverse initialization and rule ordering values (2c9c61f)
  • win: use the \u02D0, not :; use prevent-feeding (80f80da), closes #100
  • update mappings (1cba84f)
  • use longest mapping for fin (12dc3da)
  • warn of missing language_name before caching (3eb0ed7)
  • write compact json rules with in+out first, then rest (e9cb0e6)

Performance Improvements

  • dockerfile: bump the OS to bullseye and optimize the build (20f1926)
  • test: speed up test_studio by minizing keyup events (4dad4d5)

Reverts

  • revert accidental removal of moe generated mapping (e11db2f)

Build Systems

  • Dockerfile should update pip before using it (3ca1ea2)
  • move flake8 config to setup.cfg (b637ac8)

Continuous Integration

Read more

Release v0.5.20220318

18 Mar 21:31
Compare
Choose a tag to compare

0.5.20220318 (2022-03-18)

Features

  • new g2p generate-mapping --from --to mode - WIP (3c9f3a9)
  • gen-map: implement and test gen-map with multiple target mappings (67dac09)

Bug Fixes

  • api: add index and debugger flags to documentation, add localhost server option and fix tests (fcc5225)
  • remove unused import (f573038)
  • docs: fixed typo in swagger spec (c6d8ea7)
  • test: fix coverage drop (bae0380)
  • move temporary test output to tmpdir for gen-map (ce95d48)
  • gen-map: allow --from and --to to alternatively be comma separated (24b686d)
  • gen-map: fix obsolete semicolon reference in error message (2d9facf)
  • gen-map: new generated mappings default to NFC (af8ca55)
  • gen-map: several improvements polishing the from/to mode (c0eb5f0)

Documentation

  • gen-map: better usage docs for --from/--to mode (746fce0)

Styles

  • apply some pylint recommended changes (7bd0934)
  • configured isort and mypy like in ReadAlongs/Studio (a346f05)
  • rewrite all generated JSON mapping in human-readable format (d7401f4)

Code Refactoring

  • output mappings in a more compact JSON format (204d8c5)

Tests

  • gen-map: improve unit testing coverage (5522a41)
  • gen-map: unit testing for new --from/--to gen-map (b25b006)
  • scan: make sure g2p scan works with NFC and NFD input (de2c09e)

Release v0.5.20211217

17 Dec 18:40
Compare
Choose a tag to compare

0.5.20211217 (2021-12-17)

Bug Fixes

  • deployment: move back to eventlet but lock master branch commit for gunicorn until new release (223390a)
  • deployment: replace eventlet with gevent (7cd6bfa)
  • deps: use locked gunicorn commit with a syntax that enables caching (333f023), closes /github.com/benoitc/gunicorn/pull/2581#issuecomment-994198667
  • fra: use \b for end of word, so it works before punctuation (22c9fd9)
  • mappings: reverse length sort abbreviations to prevent substring errors. fixes #133 (c7538c6)
  • test: test suite corrected for exceptions.IncorrectFileType change (cc64588)
  • tokenizer;--config: fix case-insensitive tok bug; --config can now load single mapping (3d88a6d)
  • fix bug causing exception with empty rules (4d644a0)
  • transducer: hide dummy rules (ad32663)
  • transducer: include all rules to debugger (d516064)
  • friendlier error messages when mapping or abbrev files not found (6226d4f)

Tests

  • add case feeding test mapping with test case (993ede7)
  • unit testing for bug causing exception with empty rules (9f1eecb)

Continuous Integration

  • bump rtd to 3.8 (0ff7b4b)
  • change from travis to gh-actions (75f8c46)
  • show github workflow build status badge (5d0c97b)
  • stop pointing to file (3a2f1f7)
  • switch to codecov (2e00437)

Styles

  • blackify mappings/init.py test_mappings.py and test_transducer.py (d6ae834)
  • blackify mappings/utils.py and tests/test_z_local_config.py (9b156b8)

Documentation

  • coverage: change to codecov (3f91397)
  • readme: add link to blog (823672d)

Code Refactoring

  • reqs: move requirements to folder (5a606e1)
  • make_g2p should raise more meaningful exceptions on caller errors (adcf91f)

Release v0.5.20211029

29 Oct 19:57
Compare
Choose a tag to compare

0.5.20211029 (2021-10-29)

Features

  • g2p generate-mapping --merge option (fixes #61) (55c1f08)
  • make the CLI command "g2p generate-mapping" more flexible (issue #61) (9ecec71)
  • config: add external config file option for cli (e4bccdc)

Bug Fixes

  • update travis CI URL (16275ef)
  • doctor: oops, that last PR broken "g2p doctor" with no argument (7483d48)
  • fra: fix fra->ipa to map all French characters (1483dea)
  • test: test local config last, since it has side effects (86d8220)
  • use is_ipa everywhere to detect IPA mappings, not .endswith("-ipa") (04d2b4b)
  • studio: fixed config table creation (33a8a58)

Performance Improvements

  • defer some expensive imports and initializations (2948537)
  • optimize Dockerfile to better use the Docker cache (9638e11)
  • optimize g2p generate-mapping by caching reused values, 8x faster (0e9ee3b)

Tests

  • fra: add some NFC test cases, to my existing NFD ones (7ac73c5)
  • fra: test the NFD cases to eng-ipa and eng-arpabet (e3902d3)
  • scan: re-enable test_scan_fra() now that fra is fixed (37fe1e1)

Continuous Integration

  • detect when g2p update is needed (d466c49)
  • detect when g2p update is needed, take 2 (3ba6127)

Code Refactoring

  • create_ipa_mapping.py - stay optimized, but easier to read (1135908)
  • delete unused p2p/init.py file/module (0d141b0)

Documentation

  • update README.md for new generate-mapping option (#61) (b52bc0c)

Styles

  • apply a few pylint recommendataions (b3489f2)
  • isort, black and pylint the files in the previous commit (ce0a4b1)

Release v0.5.20210825

25 Aug 16:22
b3b587f
Compare
Choose a tag to compare

0.5.20210825 (2021-08-25)

Features

  • und -> und-ascii mapping calls text_unidecode.unidecode() (25cdf06)

Bug Fixes

  • deps: make g2p compatible with Flask 2.0.1 (1f8a9b2), closes #111
  • ikt: syllabic ᕼ is sometimes used instead of ASCII H (ee5d0a4)
  • moh: add plain h rule (1ee20d5)
  • tli: remove obsolete -norm- infix in tli-ipa to eng-ipa mapping name (d2fb7f3)
  • unit tests must use windows compat file joining (746f98a)
  • windows compat required declaring utf8 when opening files (e883938)

Performance Improvements

  • test: disable slow and ineffective test_ipa_known_segs_all() (b708bbe)
  • test: remove slow ineffective test from test_doctor in test_cli.py (ac38dba)

Styles

  • isort run.py to ease finding which tests are missing (22f0c4e)
  • remove superfluous whitespace at line ends (13da44e)

Documentation

  • better warning messages when g2p conversion check fails (2d1b89c)

Code Refactoring

  • clean up the UnidecodeMapping code (44a878f)
  • replace unidecode (GPL only) by text_unidecode (Artistic license) (288d79e)

Continuous Integration

  • add a check to make sure we don't introduce GPL dependencies (ea3f53d)
  • GPL test was not working quite right (716e540)

Tests

  • adjust unit testing for changed haa mapping structure (e11fa57)
  • migrated expensive doctor tests to test_doctor_expensive.py (f1aa1ab)
  • und: better coverage in und unit testing (6960f3f)

Release v0.5.20210519

19 May 20:26
Compare
Choose a tag to compare

0.5.20210519 (2021-05-19)

Bug Fixes

  • moh: fixed mohawk equivalencies for low tones missing length marker (60f7432)

Release v0.5.20210514

14 May 18:04
b974074
Compare
Choose a tag to compare

0.5.20210514 (2021-05-14)

Features

  • augment panphon prepro with voiceless and tone markers (adce152)
  • better messages in is_panphon (16f3607)
  • check() implemented for tokenizing transducer (822e198)
  • is_panphon and is_arpabet util methods (f0ca3f3)
  • is_panphon to apply our panphon preprocessor first (20e2aee)
  • is_panphon() to issue more warnings to help the user fix things (ddc6ed2)
  • temporarily make is_panphon() display how Panphon parses words that are not IPA (b53ee46)
  • WIP g2p convert --check option (e51468f)
  • check: display_warnings arg to transducer.check() and is_panphon() (0bf00c5)
  • moh: added context sensitive phonological rules to moh (5c9dd5e)
  • moh: added context sensitive phonological rules to moh (b1e37b1)
  • moh: added new festival compliant mapping to moh (0f72c61)
  • moh: added new festival compliant mapping to moh (4200ad5)
  • moh: added reversible mappings for mohawk (778256d)
  • panphon_preprocessor: filter out primary stress mark, \u02c8 (ˈ) (ac73954)
  • panphon_preprocessor: strip all tone accents and bars for going to eng-ipa (34a89f8)

Bug Fixes

  • correct git merge error (073fdae)
  • correct git merge error and rerun g2p update (90a2c36)
  • g2p convert --tok still outputs two spaces after arpabet words (91f0def)
  • package-lock.json should not be committed (903a419)
  • crl+crj: support accented vowels as equivalent to double vowels in crl and crj (2769668)
  • fra: tidy up generated fra-ipa -> eng-ipa (e1de937)
  • haa: panphon is picky, use \u0261, not g (f407ae9)
  • haa: Panphon is picky, voiceless marker should go below the letter (6c69784)
  • haa: use \u02BC for ejective, not ', to please panphon (2557c76)
  • is_panphon: suppress spurious warnings about non-ipa characters (ce782c8)
  • lml: several corrections to lml (6203194)
  • moh: fixed low tone equivalencies and reordered ipa rules for reversibility (70e05f9)
  • moh: fixed moh specific test in transducer unittest (acef437)
  • moh: update mohawk mappings (8bca18f)
  • oji: use \u02D0 for vowel length, not ascii : (7c2c9d4)
  • reverse: change reverse feature to disregard context (ecfc30f)
  • studio: fix eventlet version (0c91bad)
  • tce: use \u02BC for ejective, not ', to please panphon (df00e19)
  • tli: use \u02BC for ejective, not ', to please panphon (2b6d148)
  • ttm: use \u02BC for ejective, not ', to please panphon (7d2264d)
  • remove g->\u0261 from panphon preprocessor; instead, we issue a warning about it (f5fe151)
  • transducer check should return True for cases with no known checks (4741f94)

Performance Improvements

  • more singleton speed testing work (153f04c)
  • optimize loading panphon distance with a singleton (4285759)
  • script to measure speed of different panphon.distance.Distance() init solutions (f4ace7e)
  • settle for the fastest singleton option (d97f790)
  • make_g2p: cache transducers since we make them over and over again (507c6f7)

Reverts

  • Revert "fix(haa): Panphon is picky, voiceless marker should go below the letter" (3673b04)

Styles

  • rename check test suite more suitably (3febc02)

Code Refactoring

  • even simpler, and slightly faster, singleton implementation (98ffee8)
  • make some parallel code structures more explicit (bc679a9)
  • simplify logic (9e49033)
  • simply logic (30a70c7)
  • Use simpler Singleton pattern implementation (2b66da9)
  • check, tokenizer: use is_ipa() instead of endswith("-ipa") (0bf5837), closes #102
  • moh: replace H and L with 1 and 2for festival format, change stress to high tone (7ce445e)

Tests

  • --check option with tokenizing transducer (4a0a20d)
  • add check ipa arpabet do dev test suite (fb1aed7)
  • add reverse tests for equiv (3dc0369)
  • adjust for arpabet producing trailing space (3122104)
  • default to source=g2p in tests/.coveragerc (7290c03)
  • default to source=g2p in tests/.coveragerc (1321e42)
  • more test cases for check ipa and arpabet (a433529)
  • public/data/ikt.psv (forgot to include in previous commit) (ec0fd04)
  • run check on all test data in public/data via test_cli.py (771634b)
  • tau cases going to eng-ipa and eng-arpabet (0b877ef)
  • unit test case for three hop tokenizer (a9eb917)
  • check: add test cases for display_warnings (b25af73)
  • srs: more srs test cases including arpabet checks (e97e654)

Documentation

  • Add link to PyPI releases to REA...
Read more