-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Description
Hello, I read your ICLR '24 paper AnimateDiff—both its innovative approach and impressive results are remarkable. I’m currently trying to replace its base model with Stable 3.5's transformer for training but have encountered some issues. I’d greatly appreciate your advice.
- Due to hardware constraints, I modified the model using Megatron/DeepSpeed to reduce GPU memory usage. With 100 training clips (test set: random samples from training), the loss converges initially, but outputs degrade into solid-color images as epochs increase (see epoch5-60). No temporal dynamics are learned.
- In the image finetuning stage of official AnimateDiff, the code directly uses diffusers' UNet without visible domain adapter logic (as described in your paper). Is this component implemented elsewhere?
Metadata
Metadata
Assignees
Labels
No labels