Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training documentation update #298

Merged
merged 11 commits into from
Sep 30, 2024

Conversation

laserkelvin
Copy link
Collaborator

This PR adds additional documentation pertaining to training, partially addressing #280:

  • Task API details, up to date to what tasks available today
  • Updated best practices, detailing target normalization and loss scaling

@laserkelvin laserkelvin added the documentation Improvements or additions to documentation label Sep 30, 2024
Copy link
Collaborator

@melo-gonzo melo-gonzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments on this one. Looks good! Feel free to merge when updated.

}
)

The example above will normalize ``energy`` labelsm and can be substituted with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9231641

@@ -223,6 +298,20 @@ inspired by observations made in LLM training research, where the breakdown of
assumptions in the convergent properties of ``Adam``-like optimizers causes large
spikes in the training loss. This callback can help identify these occurrences.

The ``devset``/``fast_dev_run`` approach detailed above is also useful for testing
engineering/infrastructure (e.g. accelerator offload and logging), but not necessarily
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast_dev_run disables logging i believe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - fixed in 17d7582

@laserkelvin laserkelvin merged commit 5bdd353 into IntelLabs:main Sep 30, 2024
1 check passed
@laserkelvin laserkelvin deleted the training-documentation-update branch September 30, 2024 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants