Skip to content

Comments

feat: Add Chroma1-HD model support#319

Open
jaddai0 wants to merge 8 commits intofilipstrand:mainfrom
jaddai0:feat/add-chroma-support
Open

feat: Add Chroma1-HD model support#319
jaddai0 wants to merge 8 commits intofilipstrand:mainfrom
jaddai0:feat/add-chroma-support

Conversation

@jaddai0
Copy link

@jaddai0 jaddai0 commented Jan 12, 2026

Add support for the Chroma1-HD model (lodestones/Chroma1-HD), a modified FLUX.1-schnell with DistilledGuidanceLayer for efficient inference.

Key features:

  • DistilledGuidanceLayer: Pre-computes 344 modulations upfront
  • T5-only text encoding (no CLIP required)
  • Support for negative prompts
  • 4-bit and 8-bit quantization
  • Save/load quantized models with mflux-save

New CLI command: mflux-generate-chroma

Usage:
mflux-generate-chroma --prompt "a cat" --steps 40 --output cat.png mflux-generate-chroma -q 4 --prompt "a dog" --output dog.png

Note: LoRA support not yet implemented for Chroma.

jaddai0 and others added 8 commits January 12, 2026 18:20
Add support for the Chroma1-HD model (lodestones/Chroma1-HD), a modified
FLUX.1-schnell with DistilledGuidanceLayer for efficient inference.

Key features:
- DistilledGuidanceLayer: Pre-computes 344 modulations upfront
- T5-only text encoding (no CLIP required)
- Support for negative prompts
- 4-bit and 8-bit quantization
- Save/load quantized models with mflux-save

New CLI command: mflux-generate-chroma

Usage:
  mflux-generate-chroma --prompt "a cat" --steps 40 --output cat.png
  mflux-generate-chroma -q 4 --prompt "a dog" --output dog.png

Note: LoRA support not yet implemented for Chroma.
- Create ChromaLoRAMapping with targets for joint and single transformer blocks
- Support BFL/Kohya format LoRA weights with QKV split transforms
- Exclude norm layers (norm1.linear, norm1_context.linear, norm.linear)
  that don't exist in Chroma's DistilledGuidanceLayer architecture
- Add lora_paths and lora_scales parameters to Chroma class
- Enable --lora-paths and --lora-scales CLI arguments
- Add 16 unit tests for mapping coverage and exclusions

Tested with semiosphere/the_artist_for_chromaHD (684/684 keys matched)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add support for Meituan's LongCat-Image model (meituan-longcat/LongCat-Image):

- Implement LongCat transformer architecture with 24 joint blocks and
  12 single blocks using hidden_size=3072 and num_attention_heads=24
- Add Qwen-based text encoder integration via qwen2_vl tokenizer
- Create weight mapping for HuggingFace model conversion
- Add LoRA support for fine-tuning
- Include CLI tool: mflux-generate-longcat
- Add comprehensive tests for transformer, weight loading, LoRA,
  and initializer validation

Model specifications:
- Uses Flow Matching scheduler (no sigma shift)
- 16-channel VAE
- Supports guidance with distilled guidance embedding
- 512 max sequence length

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for Black Forest Labs' FLUX.2-schnell model:

- Implement FLUX.2 transformer with 38 double blocks and 58 single blocks
- Add 32-channel VAE with modified scaling factors
- Integrate Mistral3-based text encoder with sliding window attention
  and 32K max position embeddings
- Create weight mapping for HuggingFace model conversion
- Add LoRA support for fine-tuning
- Include CLI tool: mflux-generate-flux2
- Add comprehensive tests for VAE, encoder, weight mapping,
  quantization, and LoRA

Model specifications:
- Uses rectified flow matching scheduler (no sigma shift)
- 32-channel latent space (vs 16 in FLUX.1)
- Mistral3 encoder (vs CLIP + T5 in FLUX.1)
- 256 max sequence length
- Supports 4/8-bit quantization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for Tencent's Hunyuan-DiT v1.2 model:

- Implement Hunyuan-DiT transformer architecture with 28 DiT blocks
  using hidden_size=1408 and num_attention_heads=16
- Add dual text encoder system (Chinese BERT + T5-XXL) via
  HunyuanPromptEncoder
- Implement DDPM scheduler for diffusion process
- Add num_dit_blocks() method to LoadedWeights for counting
  Hunyuan-style transformer blocks
- Create weight mapping for HuggingFace model conversion
- Add LoRA support for fine-tuning
- Include CLI tool: mflux-generate-hunyuan
- Add comprehensive tests for DiT blocks, DDPM scheduler,
  text encoding, weight loading, and LoRA

Model specifications:
- Uses DDPM scheduler (1000 training steps)
- Supports CFG with Chinese/English prompts
- 256 max sequence length
- Supports 4/8-bit quantization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for NewBie-AI's NewBie-image model (NewBie-AI/NewBie-image-Exp0.1):

- Implement NextDiT transformer architecture with 36 blocks using
  hidden_size=2560 and Grouped Query Attention (24 query heads, 8 KV heads)
- Add dual text encoder system:
  - Gemma3-4B-it for semantic understanding (2560 dim)
  - Jina CLIP v2 for image-text alignment (1024 dim)
- Create weight mapping for HuggingFace model conversion
- Add LoRA support for fine-tuning
- Include CLI tool: mflux-generate-newbie
- Add comprehensive tests for configuration, generation, and LoRA

Model specifications:
- 3.5B parameter model optimized for anime/illustration generation
- Uses Flow Matching scheduler (no sigma shift)
- 16-channel VAE (FLUX.1-dev compatible)
- 512 max sequence length
- Supports 4/8-bit quantization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix save.py: Import Hunyuan (main model class) instead of HunyuanDiT
  (transformer class) which was causing TypeError
- Fix model_config.py: Use FLUX.2-dev (which exists) instead of
  FLUX.2-schnell (which doesn't exist on HuggingFace)
- Update FLUX.2 aliases and enable guidance support

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The NewBie-image HuggingFace repo only contains text_encoder (Gemma3),
not text_encoder_2 (Jina CLIP). The Jina CLIP projection layers exist
in the transformer weights, but the encoder itself is loaded separately
from jinaai/jina-clip-v2 if needed.

Changes:
- Remove jina_clip_encoder from weight definition components
- Remove jina_clip from tokenizer definitions
- Update download patterns to exclude text_encoder_2
- Make jina_clip_encoder optional in initializer (set to None)
- Skip jina_clip_encoder in weight application if None

This fixes FileNotFoundError when loading NewBie-AI/NewBie-image-Exp0.1.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@filipstrand
Copy link
Owner

filipstrand commented Jan 18, 2026

@jaddai0 Sorry for me being so late to comment on this. This is obviously impressive work porting all these models (even including AI help).

I'm also considering the future maintenance of these a bit as the project grows... I've tried to be a bit opinionated with regards to what models to support in mflux and tend to prefer those with most community effort behind them (LoRA support) or the ones that win on some axis (speed, quality etc) . Out of these, which ones do you think is most fitting for the project to start with?

@jaddai0
Copy link
Author

jaddai0 commented Jan 18, 2026

Hey @filipstrand! I hadn't really considered upkeep when I did all of this. Honestly I wouldn't recommend any of them in that case. Chroma is a heavily modified Flux.1 Schnell and would be the easiest to maintain, but they're all niche. Let me know if there's anything for the project I can do for you though to help. I use it regularly for image gen on my mac and I'm happy to give back however I can.

@filipstrand
Copy link
Owner

Thank you very much for the initiative, also good to know your opinion on the models above! What do you think about porting Qwen Image 2512? Im already halfway through the edit-2511, but have not started with 2512 yet.

@jaddai0
Copy link
Author

jaddai0 commented Jan 20, 2026

@filipstrand I submitted the pull request for Qwen Image 2512. Let me know if you need anything changed

@filipstrand
Copy link
Owner

@filipstrand I submitted the pull request for Qwen Image 2512. Let me know if you need anything changed

Amazing, thank you, I'll have a look at it soon, just need to find some space on my disk for this model :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants