Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token-level CodeSearchNet preprocessing option #13

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

bzz
Copy link
Member

@bzz bzz commented Jan 31, 2020

Add --token-level-sources and --print options that would result in

token-level sources, used to reproduce OpenNMT seq2seq baseline from literature

'fastPathOrderedEmit                     ' - 'final Observer < ? super V > observer = '
'amb                                     ' - 'ObjectHelper . requireNonNull ( sources '
'ambArray                                ' - 'ObjectHelper . requireNonNull ( sources '
'concat                                  ' - 'ObjectHelper . requireNonNull ( sources '
'concat                                  ' - 'ObjectHelper . requireNonNull ( sources '
'concatArray                             ' - 'if ( sources . length == 0 ) { return em'
'concatArrayDelayError                   ' - 'if ( sources . length == 0 ) { return em'
'concatArrayEager                        ' - 'return concatArrayEager ( bufferSize ( )'
'concatArrayEager                        ' - 'return fromArray ( sources ) . concatMap'
'concatArrayEagerDelayError              ' - 'return fromArray ( sources ) . concatMap'

word-level source and char-level target (default)

'f a s t P a t h O r d e r e d E m i t   ' - 'final Observer<? super V> observer = dow'
'a m b                                   ' - 'ObjectHelper.requireNonNull(sources, "so'
'a m b A r r a y                         ' - 'ObjectHelper.requireNonNull(sources, "so'
'c o n c a t                             ' - 'ObjectHelper.requireNonNull(sources, "so'
'c o n c a t                             ' - 'ObjectHelper.requireNonNull(sources, "so'
'c o n c a t A r r a y                   ' - 'if (sources.length == 0) {\n            '
'c o n c a t A r r a y D e l a y E r r o ' - 'if (sources.length == 0) {\n            '
'c o n c a t A r r a y E a g e r         ' - 'return concatArrayEager(bufferSize(), bu'
'c o n c a t A r r a y E a g e r         ' - 'return fromArray(sources).concatMapEager'
'c o n c a t A r r a y E a g e r D e l a ' - 'return fromArray(sources).concatMapEager'

after 6cfc883 - token-level source and target

'fast path ordered emit                  ' - 'final Observer < ? super V > observer = '
'amb                                     ' - 'ObjectHelper . requireNonNull ( sources '
'amb array                               ' - 'ObjectHelper . requireNonNull ( sources '
'concat                                  ' - 'ObjectHelper . requireNonNull ( sources '
'concat                                  ' - 'ObjectHelper . requireNonNull ( sources '
'concat array                            ' - 'if ( sources . length == 0 ) { return em'
'concat array delay error                ' - 'if ( sources . length == 0 ) { return em'
'concat array eager                      ' - 'return concatArrayEager ( bufferSize ( )'
'concat array eager                      ' - 'return fromArray ( sources ) . concatMap'
'concat array eager delay error          ' - 'return fromArray ( sources ) . concatMap'

External dependencies like PyTorch (or https://github.com/microsoft/dpu-utils) are dropped for the CLI helper as it does not use it in any way (but implement a similar interface)

@bzz bzz requested a review from m09 January 31, 2020 15:15
Copy link
Collaborator

@m09 m09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I added a few stylistic comments but we can merge as is.

.gitignore Outdated Show resolved Hide resolved
notebooks/codesearchnet-opennmt.py Outdated Show resolved Hide resolved
notebooks/codesearchnet-opennmt.py Outdated Show resolved Hide resolved
notebooks/codesearchnet-opennmt.py Outdated Show resolved Hide resolved
notebooks/codesearchnet-opennmt.py Outdated Show resolved Hide resolved
notebooks/codesearchnet-opennmt.py Outdated Show resolved Hide resolved
notebooks/codesearchnet-opennmt.py Outdated Show resolved Hide resolved
notebooks/codesearchnet-opennmt.py Outdated Show resolved Hide resolved
notebooks/codesearchnet-opennmt.py Show resolved Hide resolved
@bzz
Copy link
Member Author

bzz commented Feb 3, 2020

@m09 thank you for the prompt review! All feedback addressed in aa65cc4

6cfc883 is a hack bringing in --token-level-targets, PR description updated with example.

bzz added 2 commits February 3, 2020 13:26
Signed-off-by: Alexander Bezzubov <[email protected]>
Signed-off-by: Alexander Bezzubov <[email protected]>
@bzz bzz force-pushed the codesearchnet-helper-update branch 2 times, most recently from 63ee209 to 985788a Compare February 3, 2020 13:00
Signed-off-by: Alexander Bezzubov <[email protected]>
@bzz bzz force-pushed the codesearchnet-helper-update branch from 985788a to b20d69b Compare February 3, 2020 13:50
@m09
Copy link
Collaborator

m09 commented Feb 4, 2020

Looks handsome! Good to merge for me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants