Deepspeech pytorch timing gap and torch compile issues #488

priyakasimbeg · 2023-08-15T19:33:57Z

Backwards compilation for deepspeech pytorch torch compile unsupported/broken.
Pytorch Deepspeech is currently 13% slower in pytorch compared to jax.

Description

Currently deepspeech works with torch.compile backend option 'eager' but breaks with 'aot_eager'.
Goal of this bug is to:

work w pytorch contributors to determine whether we can enable full torch compile on this workload.
reduce timing gap with jax.

Steps to reproduce

torchrun --redirects 1:0,2:0,3:0,4:0,5:0,6:0,7:0 --standalone --nnodes=1 --nproc_per_node=8 submission_runner.py --framework=pytorch --workload=librispeech_deepspeech --submission_path=baselines/adamw/pytorch/submission.py --tuning_search_space=baselines/adamw/tuning_search_space.json --data_dir=/data/librispeech --num_tuning_trials=1 --experiment_dir=/experiment_runs --experiment_name=timing_pytorch_2_preliminary_after_pytorch_fixes/adamw --overwrite=True --save_checkpoints=False --max_global_steps=8000 --librispeech_tokenizer_vocab_path=/data/librispeech/spm_model.vocab --torch_compile=true

The text was updated successfully, but these errors were encountered:

priyakasimbeg · 2023-08-22T06:52:23Z

Seems like torch compile breaks in Dynamo tracing step on 2.1.0.dev20230820+cu118.

Traceback

torch._dynamo.exc.TorchRuntimeError: Failed running call_function <function pack_padded_sequence at 0x7feca75669d0>(*(FakeTensor(..., device='cuda:0', size=(32, 500, 512), grad_fn=<CloneBackward0>), FakeTensor(..., size=(32,))), **{'batch_first': True, 'enforce_sorted': False}):
'lengths' argument should be a 1D CPU int64 tensor, but got 1D meta Long tensor

from user code:
   File "/algorithmic-efficiency/algorithmic_efficiency/workloads/librispeech_deepspeech/librispeech_pytorch/models.py", line 285, in <resume in forward>
    packed_inputs = torch.nn.utils.rnn.pack_padded_sequence(

Full error logs in regression test.

Filed separate bug to track specific torch compile issue #498

pomonam · 2023-08-22T19:10:09Z

Related #483

priyakasimbeg · 2023-10-11T18:16:49Z

Current status after enabling eager for deepspeech is that pytorch is 12% slower than jax for this workload.

priyakasimbeg · 2023-12-13T23:56:03Z

Also resolved in #597

priyakasimbeg changed the title ~~Deepspeech torch compile~~ Deepspeech pytorch timing gap and torch compile Aug 15, 2023

priyakasimbeg changed the title ~~Deepspeech pytorch timing gap and torch compile~~ Deepspeech pytorch timing gap and torch compile issues Aug 15, 2023

priyakasimbeg added the 🚀 Launch Blocker Issues that are blocking launch of benchmark label Aug 17, 2023

priyakasimbeg added the ⏰ Timing gap Significant difference (>= 10%) between pytorch and jax workloads label Aug 22, 2023

priyakasimbeg removed the 🚀 Launch Blocker Issues that are blocking launch of benchmark label Aug 31, 2023

BoyuanFeng mentioned this issue Dec 7, 2023

Enable torch.compile for loss_fn #597

Merged

priyakasimbeg closed this as completed Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepspeech pytorch timing gap and torch compile issues #488

Deepspeech pytorch timing gap and torch compile issues #488

priyakasimbeg commented Aug 15, 2023 •

edited

Loading

priyakasimbeg commented Aug 22, 2023 •

edited

Loading

pomonam commented Aug 22, 2023

priyakasimbeg commented Oct 11, 2023

priyakasimbeg commented Dec 13, 2023

Deepspeech pytorch timing gap and torch compile issues #488

Deepspeech pytorch timing gap and torch compile issues #488

Comments

priyakasimbeg commented Aug 15, 2023 • edited Loading

Description

Steps to reproduce

priyakasimbeg commented Aug 22, 2023 • edited Loading

pomonam commented Aug 22, 2023

priyakasimbeg commented Oct 11, 2023

priyakasimbeg commented Dec 13, 2023

priyakasimbeg commented Aug 15, 2023 •

edited

Loading

priyakasimbeg commented Aug 22, 2023 •

edited

Loading