Skip to content

can not learn temporal information #431

@slippersman

Description

@slippersman

Hello, I read your ICLR '24 paper AnimateDiff—both its innovative approach and impressive results are remarkable. I’m currently trying to replace its base model with Stable 3.5's transformer for training but have encountered some issues. I’d greatly appreciate your advice.

  1. Due to hardware constraints, I modified the model using Megatron/DeepSpeed to reduce GPU memory usage. With 100 training clips (test set: random samples from training), the loss converges initially, but outputs degrade into solid-color images as epochs increase (see epoch5-60). No temporal dynamics are learned.
  2. In the image finetuning stage of official AnimateDiff, the code directly uses diffusers' UNet without visible domain adapter logic (as described in your paper). Is this component implemented elsewhere?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions