Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Korean Revised Romanization to hangeul IME #716

Merged
merged 11 commits into from
Oct 3, 2024

Conversation

annedrewhu
Copy link
Contributor

Any Korean transliteration keyboard will have a lot of edge cases to check because of differences in romanization between revised romanization, the older McCune–Reischauer, and even a different system included in the RR standard. I've added a modest amount of tests based on this official RR document, but I'm open to adding more before merging.

Copy link
Contributor

@amire80 amire80 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few silly, nitpicking comments about code style for now. We'll go over the code later.

[ 'b', 'ᄇ' ],
[ 'ᄇp', 'ᄈ' ],
[ 'ᄉs', 'ᄊ' ],
// [ '\'', 'ᄋ'], // Apostrophe can be written to represent silent ᄋ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks unnecessary.

// This version does not support context rules, but we don't need them
patterns: function(input, context) {
var patterns, regex, rule, replacement, i, result;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many empty lines. There should be one.


// This regex matches jamo that form a syllable so they can be combined
var jamoRegex = /([ᄀ-ᄒ])([ᅡ-ᅵ])([ᆨ-ᇂ])?([ᄀ-ᄒ]|[\- '])(.*)$/;
if (jamoRegex.test(result)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require spaces inside parentheses.

// This regex matches jamo that form a syllable so they can be combined
var jamoRegex = /([ᄀ-ᄒ])([ᅡ-ᅵ])([ᆨ-ᇂ])?([ᄀ-ᄒ]|[\- '])(.*)$/;
if (jamoRegex.test(result)) {
return { noop: false, output: result.replace(jamoRegex, combineJamo) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, too, spaces inside parentheses.

// Conjoining jamo behavior is defined by this Unicode standard
// https://www.unicode.org/versions/Unicode13.0.0/ch03.pdf#G24646
// parameter `final` is optional
function combineJamo(substring, initial, vowel, final, nextSyllableInitial, otherChars) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, too, spaces inside parentheses.

var syllable = String.fromCharCode(syllableNo);

const disambig = /[\- ']/;
if (nextSyllableInitial.match(disambig)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, too, spaces inside parentheses.

const disambig = /[\- ']/;
if (nextSyllableInitial.match(disambig)) {
return syllable;
} else if (otherChars.match(disambig)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

@@ -1274,6 +1278,10 @@
autonym: 'ಕನ್ನಡ',
inputmethods: [ 'kn-transliteration', 'kn-inscript', 'kn-kgp', 'kn-inscript2' ]
},
kor: {
autonym: '한국어',
inputmethods: [ 'kor-rr' ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually give them an identified that is based on the shortest language code, so it should be "ko" and not "kor".

description: 'Korean RR test',
inputmethod: 'kor-rr',
tests: [
// Note that RR is meant to romanize from hangul to latin script, but not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize "Hangul" and "Latin".

];

var koreanRR = {
id: 'kor-rr',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually give them an identified that is based on the shortest language code, so it should be "ko" and not "kor".

Copy link
Member

@srish srish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this patch and it works as intended. @annedrewhu Thanks a lot for your contributions and apologies for the delay in reviewing!

@srish srish merged commit 7150436 into wikimedia:master Oct 3, 2024
3 checks passed
amire80 added a commit to amire80/jquery.ime that referenced this pull request Oct 22, 2024
Follow-up to wikimedia#716.

* Rename the "kor-rr" identifier to "ko-rr" and "kor" to "ko":
  We use two-letter language codes when they are available.
* Whitespace clean-up.
amire80 added a commit to amire80/jquery.ime that referenced this pull request Oct 22, 2024
Follow-up to wikimedia#716.

* Rename the "kor-rr" identifier to "ko-rr" and "kor" to "ko":
  We use two-letter language codes when they are available.
* Whitespace clean-up.
srish added a commit that referenced this pull request Oct 22, 2024
Follow-up to #716.

* Rename the "kor-rr" identifier to "ko-rr" and "kor" to "ko":
  We use two-letter language codes when they are available.
* Whitespace clean-up.

Co-authored-by: SrishAkaTux <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants