-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding configs related to DCLM #663
base: fineweb_data
Are you sure you want to change the base?
Conversation
LGTM. Can you run pre-commit checks to fix the pre-commit issue? I think it is just some formatting issue.
|
src/levanter/models/llama.py
Outdated
@@ -64,6 +65,7 @@ class LlamaConfig(HFCompatConfig): | |||
activation_function: str = "silu" | |||
initializer_range: float = 0.02 | |||
layer_norm_epsilon: float = 1e-5 | |||
z_loss_weight: float = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather this not be a property of the model's config but an option on TrainLmConfig, and define a loss function in train_lm and pass it into trainer
src/levanter/models/lm_model.py
Outdated
loss = cross_entropy_loss( | ||
logits, self.Vocab, target_y, reduction, reduction_axis=reduction_axis, where=example.loss_mask | ||
) | ||
if hasattr(self.config, "z_loss_weight") and self.config.z_loss_weight > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really don't like using this here. much cleaner to just pull out the loss function
src/levanter/main/train_lm.py
Outdated
from levanter.utils.jax_utils import parameter_count | ||
|
||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class ModuleComputeZLoss(ComputeLossFunction[M, X]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still don't like this but I think I can't really articulate what I want. i'm gonna push a change to my fork
* refactor queued-resources * fix multislice * add auto tear down * reuse docker image * tiny fix * switch to concurrent executor for parallel subprocesses & small fix & logs
* Add llama 1b with fineweb txt * replace with 50 fineweb urls * wip * revert many of the changes, which seems to fix the crashing * revert many of the changes, which seems to fix the crashing * remove now-unused option * cleanup * cleanup * sigh * Adding changes for dclm --------- Co-authored-by: Ivan Zhou <[email protected]> Co-authored-by: Abhinav Garg <[email protected]>
Bumps [ray[default]](https://github.com/ray-project/ray) from 2.32.0 to 2.34.0. - [Release notes](https://github.com/ray-project/ray/releases) - [Commits](ray-project/ray@ray-2.32.0...ray-2.34.0) --- updated-dependencies: - dependency-name: ray[default] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* wandb seems to be broken in latest release * oops * what?
* add mounting dir * minor fix * support abs and rel path * add docs * refactor to extra context * minor fix docs * minor fix * modify docs
1. num_tpus=1 is actually a bad idea because Ray will mask out the other tpus 2. force non-docker workloads to run in a separate process for stability
…ards again, (re)add configuration metadata to cache (#752) Co-authored-by: Ahmed Ahmed <[email protected]>
Pulls in the New Mixture Features Into Audio Space! Tested that this fixes the previous epoching errors in the whisper_tiny config.
Co-authored-by: David Hall <[email protected]>
…g batches instead of a ray actor/task (#757) About a 5x speedup. Memory usage isn't super well controlled in mixtures and that needs some work
… head node, add code to change max size of actor pool
#762) This is marginally slower, but pile now builds fine on a v4-32, which is an improvement.
This PR creates a `ParquetDataSource` class to support loading `.parquet` files. Closes #763
DCLM 7B related configs