-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling Preceding Zeroes #204
Comments
you want to keep an eye on #150 EDIT: nvm, its the reverse problem.... |
Partially misplaced, I think. Apparently planned #150 However, I'm not sure if it retains leading zeroes at the moment, either, because it uses The fundamental challenge here is continuing to treating the input as a string while parsing. Relating this back to the code side, the English number extractors "chunk" numbers as they go based on powers of 10. While parsing a base-10 number left-to-right, whenever you encounter a power of 10, you scan the remainder of the number for larger powers of 10. If you do not find any, you have identified the end of a "place." "1,075,018" -> |
I stand corrected. In the current version of the PR,
|
On reflection, the "fail" case above is OOS. If the input appears to mean something specific - I vote one of two things:
|
Hey @firebladed, If we're looking at STT output, another option might be something like an Can anyone think of cases other than codes or phone numbers, where this would come up? If it won't be supported in the extract_number(s) methods we probably need to add a note to the docstring that leading zero's will be ignored. Probably not what you're referring to, but just in case... >>> totp = "012345"
>>> list(totp)
['0', '1', '2', '3', '4', '5'] If there might be spaces in the source: >>> totp = "01 2 3 45"
>>> list(totp.replace(" ",""))
['0', '1', '2', '3', '4', '5'] or if the source may be an int you would need to do something slightly more verbose: >>> totp = 123456 # note an int cannot have a leading zero
>>> [digit for digit in str(totp)]
['1', '2', '3', '4', '5'. '6'] This could possibly act as a workaround for the STT case: extracted_codes = [
list(utterance.replace(" ","")),
extract_numbers(utterance)[0]
]
if totp in extracted_codes:
authenticated = True |
Describe the bug
Zeroes preceding a non zero digit are ignored, either initially or following a pause
the problem is partly related to the in-predictability of pauses in readings of number sequences
as
"0 1 4 6 0 6" is correct interpreted to [0.0, 1.0, 4.0, 6.0, 0.0, 6.0]
but "01 46 06") incorrectly goes to [1.0, 46.0, 6.0]
To Reproduce
Steps to reproduce the behavior:
Expected behavior
zeros should be added to output as separate numbers,
I think zeros preceding a single non zero digit should be treated as a separate number, either by default or as an option
e.g
"0 1" (zero one) -> [0, 1]
"01 46 06" (zero one four six zero six) -> [0, 1, 46, 0, 6]
Additional context
this is problematic used for reading code numbers e.g totp codes
which could be zero in any digit and can be read in multiple ways
e.g 0 1 4 6 0 6 (zero one four six zero six)
34 45 65 (three four four five six seven ,thirty four forty five sixty five)
234 567 (two hundred and thirty four five hundred and sixty seven
one aspect i'm not sure of is should 46 read as "four six" be interpreted as [46] or [4, 6] when preceding a decimal (or there is no decimal) after a decimal point is different as "normal" reading is e.g 0.01475 (zero point zero one four seven five)
however "46" (fourty six) can always be converted to "4 6" however missing zeroes cannot be recovered
The text was updated successfully, but these errors were encountered: