-
-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
Hi @retsuh-bqw ,
I’m currently pretraining the Lam at stage1, and I’ve run into a persistent memory leak issue: the RAM usage keeps increasing steadily during training, eventually leading to an OOM error.
I’d like to ask:
What shuffle_buffer_size do you recommend for 8 A100 machines during stage1 pretraining? Did you encounter similar memory issues during your experiments?

Metadata
Metadata
Assignees
Labels
No labels