Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Seq Packing in NeMo / Neva2 #11633

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open

Add Seq Packing in NeMo / Neva2 #11633

wants to merge 50 commits into from

Conversation

yaoyu-33
Copy link
Collaborator

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

yaoyu-33 and others added 5 commits January 7, 2025 10:33
# Conflicts:
#	nemo/collections/multimodal/data/energon/base.py
#	nemo/collections/multimodal/data/energon/conversation.py

__restore_key__: Tuple[Union[str, int, tuple], ...] = ()
position_ids: torch.Tensor = field(default_factory=lambda: torch.empty(0, dtype=torch.float))
packed_seq_params: PackedSeqParams = field(default_factory=lambda: PackedSeqParams())

Check notice

Code scanning / CodeQL

Unnecessary lambda Note

This 'lambda' is just a simple wrapper around a callable object. Use that object directly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you resolve this?

"""Sample type for image text raw batch"""

position_ids: torch.Tensor = field(default_factory=lambda: torch.empty(0, dtype=torch.float))
packed_seq_params: PackedSeqParams = field(default_factory=lambda: PackedSeqParams())

Check notice

Code scanning / CodeQL

Unnecessary lambda Note

This 'lambda' is just a simple wrapper around a callable object. Use that object directly.
yaoyu-33 and others added 2 commits January 7, 2025 11:39
nemo/collections/multimodal/data/energon/task_encoder.py Outdated Show resolved Hide resolved
nemo/collections/multimodal/data/energon/task_encoder.py Outdated Show resolved Hide resolved

__restore_key__: Tuple[Union[str, int, tuple], ...] = ()
position_ids: torch.Tensor = field(default_factory=lambda: torch.empty(0, dtype=torch.float))
packed_seq_params: PackedSeqParams = field(default_factory=lambda: PackedSeqParams())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you resolve this?

Comment on lines 69 to 70
packed_sequence=False,
packing_seq_length=4096,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use consistent naming as fine_tuning.py? i.e.
use one variable packed_sequence_size, and packed_sequence_size=-1 indicates not using packed sequence

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i kept packed_sequence here for consistency with other vlm data modules where we don't use packed_sequence_size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed name from packing_seq_length to packed_sequence_size

nemo/collections/vlm/neva/data/config.py Outdated Show resolved Hide resolved
@@ -1711,7 +1711,10 @@ def masked_token_loss(tensor: Tensor, mask: Tensor):
"""
losses = tensor.float()
loss_mask = mask.view(-1).float()
loss = torch.sum(losses.view(-1) * loss_mask) / loss_mask.sum() # sequence level nll
num_valid_tokens = loss_mask.sum()
if num_valid_tokens < 0.5: # no valid tokens
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain a bit more when this is the case? is this only valid for neva?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only for neva, also for SFT. If the system and user prompt is very long, and predict answer only. After truncation from right, there might not be any answer/valid tokens.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we truncate the input/context and keep answer intact, so that wouldn't happen

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep the logic is a bit different for vlm. We don't want to truncate from left.

yaoyu-33 and others added 2 commits January 8, 2025 10:35
nemo/collections/vlm/neva/data/config.py Dismissed Show dismissed Hide dismissed
yaoyu-33 and others added 4 commits January 8, 2025 14:13
…a/neva2_seq_packing

# Conflicts:
#	nemo/collections/multimodal/data/energon/task_encoder.py
Signed-off-by: yaoyu-33 <[email protected]>
Signed-off-by: yaoyu-33 <[email protected]>
Copy link
Contributor

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.llm.peft.api
nemo/collections/llm/peft/api.py:38:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.multimodal.data.energon.base
nemo/collections/multimodal/data/energon/base.py:88:0: C0301: Line too long (120/119) (line-too-long)
************* Module nemo.collections.multimodal.data.energon.config
nemo/collections/multimodal/data/energon/config.py:25:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/multimodal/data/energon/config.py:32:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/multimodal/data/energon/config.py:80:0: C0115: Missing class docstring (missing-class-docstring)
************* Module nemo.collections.multimodal.data.energon.task_encoder
nemo/collections/multimodal/data/energon/task_encoder.py:15:0: W0611: Unused import dataclasses (unused-import)
************* Module nemo.collections.vlm.neva.data.config
nemo/collections/vlm/neva/data/config.py:22:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/vlm/neva/data/config.py:32:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/vlm/neva/data/config.py:40:0: C0115: Missing class docstring (missing-class-docstring)
************* Module nemo.collections.vlm.neva.data.lazy
nemo/collections/vlm/neva/data/lazy.py:67:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:72:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:115:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:120:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:134:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:161:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:236:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:246:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/vlm/neva/data/lazy.py:419:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:478:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/vlm/neva/data/lazy.py:560:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:579:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:582:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/lazy.py:585:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.vlm.neva.data.mock
nemo/collections/vlm/neva/data/mock.py:29:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/vlm/neva/data/mock.py:76:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/mock.py:109:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/mock.py:114:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/data/mock.py:119:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.vlm.neva.model.base
nemo/collections/vlm/neva/model/base.py:481:0: C0301: Line too long (139/119) (line-too-long)
nemo/collections/vlm/neva/model/base.py:484:0: C0301: Line too long (128/119) (line-too-long)
nemo/collections/vlm/neva/model/base.py:509:0: C0301: Line too long (123/119) (line-too-long)
nemo/collections/vlm/neva/model/base.py:93:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:139:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:155:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:176:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:243:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:255:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/vlm/neva/model/base.py:282:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:303:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/vlm/neva/model/base.py:334:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:371:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:384:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/vlm/neva/model/base.py:896:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/vlm/neva/model/base.py:913:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:917:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:949:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:952:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:955:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:959:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:965:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:972:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/vlm/neva/model/base.py:48:0: W0611: Unused get_batch_on_this_context_parallel_rank imported from nemo.collections.llm.gpt.model.base (unused-import)
nemo/collections/vlm/neva/model/base.py:48:0: W0611: Unused get_packed_seq_params imported from nemo.collections.llm.gpt.model.base (unused-import)
************* Module nemo.lightning.megatron_parallel
nemo/lightning/megatron_parallel.py:238:0: C0301: Line too long (127/119) (line-too-long)
nemo/lightning/megatron_parallel.py:239:0: C0301: Line too long (140/119) (line-too-long)
nemo/lightning/megatron_parallel.py:240:0: C0301: Line too long (130/119) (line-too-long)
nemo/lightning/megatron_parallel.py:546:0: C0301: Line too long (129/119) (line-too-long)
nemo/lightning/megatron_parallel.py:553:0: C0301: Line too long (135/119) (line-too-long)
nemo/lightning/megatron_parallel.py:829:0: C0301: Line too long (137/119) (line-too-long)
nemo/lightning/megatron_parallel.py:1059:0: C0301: Line too long (136/119) (line-too-long)
nemo/lightning/megatron_parallel.py:1632:0: C0301: Line too long (128/119) (line-too-long)
nemo/lightning/megatron_parallel.py:1671:0: C0301: Line too long (146/119) (line-too-long)
nemo/lightning/megatron_parallel.py:64:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/megatron_parallel.py:65:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:67:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:102:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:106:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:306:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:330:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:356:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:382:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:518:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:561:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:565:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:619:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:654:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:660:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:666:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:673:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:680:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:714:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:722:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:738:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:765:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:777:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/megatron_parallel.py:799:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:1325:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:1500:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/megatron_parallel.py:1506:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:1512:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:1516:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:1521:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/megatron_parallel.py:1526:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/megatron_parallel.py:1554:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:1600:8: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:1622:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/megatron_parallel.py:1695:0: C0115: Missing class docstring (missing-class-docstring)
nemo/lightning/megatron_parallel.py:1741:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/megatron_parallel.py:1755:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.46/10

Mitigation guide:

  • Add sensible and useful docstrings to functions and methods
  • For trivial methods like getter/setters, consider adding # pylint: disable=C0116 inside the function itself
  • To disable multiple functions/methods at once, put a # pylint: disable=C0116 before the first and a # pylint: enable=C0116 after the last.

By applying these rules, we reduce the occurance of this message in future.

Thank you for improving NeMo's documentation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants