can not learn temporal information

Hello, I read your ICLR '24 paper AnimateDiff—both its innovative approach and impressive results are remarkable. I’m currently trying to replace its base model with Stable 3.5's transformer for training but have encountered some issues. I’d greatly appreciate your advice.                                            
1. Due to hardware constraints, I modified the model using Megatron/DeepSpeed to reduce GPU memory usage. With 100 training clips (test set: random samples from training), the loss converges initially, but outputs degrade into solid-color images as epochs increase (see epoch5-60). No temporal dynamics are learned.                                        
2. In the image finetuning stage of official AnimateDiff, the code directly uses diffusers' UNet without visible domain adapter logic (as described in your paper). Is this component implemented elsewhere?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can not learn temporal information #431

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

can not learn temporal information #431

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions