Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provision iterator syntax sugar in python #40

Merged
merged 8 commits into from
Dec 18, 2023
Merged

Provision iterator syntax sugar in python #40

merged 8 commits into from
Dec 18, 2023

Conversation

sivaprasad2000
Copy link
Collaborator

@sivaprasad2000 sivaprasad2000 commented Dec 12, 2023

The changes enhances the project a flexible and reusable iterator system for processing annotations in Python, along with associated tests.

.gitattributes:
Added a new file with various attributes for Git, specifying binary handling for certain file types.

bindings/python/iterators.py:
Added a new Python module (iterators.py) that defines iterators for processing annotations, sentences, and words.
Includes WordIterator and SentenceIterator classes.
Provides sentences() and words() for syntax sugar.

bindings/python/tests/test_iterators.py:
Added a new test module (test_iterators.py) to test the functionality of the iterators in the iterators.py module.

slimt/CMakeLists.txt:
Modified the CMakeLists.txt file to include changes related to target link libraries and include directories.

@jerinphilip jerinphilip changed the title Included Iterators in the library Provision iterator syntax sugar in python Dec 16, 2023
Copy link
Owner

@jerinphilip jerinphilip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears to be very close to done, save some nits.

.gitattributes Outdated
# Executables
*.exe binary
*.out binary
*.app binary
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no trailing newline

Comment on lines 12 to 13
sen_iter_tgt = iterators.sentences(target)
word_iter_global = iterators.words(target)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not prefer abbreviations when it can be named better.

sentences = iterators.sentences(response.target)
for sentence in sentences:
   # ...
   
words = iterators.words(response.target)
for word in words:
   # ...

In general chaining more entities into the naming connected by _ points to that we've failed somewhere at abstracting (Imagine sentence, sentence_item - instead of sentence, word for an example).

Comment on lines 20 to 39
expected_text_range = target.word_as_range(sentence_idx, word_idx)
reconstructed_text_range = word.range()

# For Sentence Iterator and Word Iterator
assert expected_text_range.begin == reconstructed_text_range.begin
assert expected_text_range.end == reconstructed_text_range.end

expected_word = text[expected_text_range.begin:expected_text_range.end]
reconstructed_word = word.surface()

assert expected_word == reconstructed_word

word_global = next(word_iter_global)

reconstructed_text_range_glob = word_global.range()
reconstructed_word_glob = word_global.surface()

# For Global Word Iterator
assert expected_text_range.begin == reconstructed_text_range_glob.begin
assert expected_text_range.end == reconstructed_text_range_glob.end
Copy link
Owner

@jerinphilip jerinphilip Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of _ chaining. This is a naming nit, but there's room for improvement. Same intentions/ideas as #40 (comment).

@sivaprasad2000 sivaprasad2000 merged commit 87c9105 into main Dec 18, 2023
8 checks passed
@sivaprasad2000 sivaprasad2000 deleted the iterators branch December 18, 2023 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants