Conversation
Add support for the Chroma1-HD model (lodestones/Chroma1-HD), a modified FLUX.1-schnell with DistilledGuidanceLayer for efficient inference. Key features: - DistilledGuidanceLayer: Pre-computes 344 modulations upfront - T5-only text encoding (no CLIP required) - Support for negative prompts - 4-bit and 8-bit quantization - Save/load quantized models with mflux-save New CLI command: mflux-generate-chroma Usage: mflux-generate-chroma --prompt "a cat" --steps 40 --output cat.png mflux-generate-chroma -q 4 --prompt "a dog" --output dog.png Note: LoRA support not yet implemented for Chroma.
- Create ChromaLoRAMapping with targets for joint and single transformer blocks - Support BFL/Kohya format LoRA weights with QKV split transforms - Exclude norm layers (norm1.linear, norm1_context.linear, norm.linear) that don't exist in Chroma's DistilledGuidanceLayer architecture - Add lora_paths and lora_scales parameters to Chroma class - Enable --lora-paths and --lora-scales CLI arguments - Add 16 unit tests for mapping coverage and exclusions Tested with semiosphere/the_artist_for_chromaHD (684/684 keys matched) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add support for Meituan's LongCat-Image model (meituan-longcat/LongCat-Image): - Implement LongCat transformer architecture with 24 joint blocks and 12 single blocks using hidden_size=3072 and num_attention_heads=24 - Add Qwen-based text encoder integration via qwen2_vl tokenizer - Create weight mapping for HuggingFace model conversion - Add LoRA support for fine-tuning - Include CLI tool: mflux-generate-longcat - Add comprehensive tests for transformer, weight loading, LoRA, and initializer validation Model specifications: - Uses Flow Matching scheduler (no sigma shift) - 16-channel VAE - Supports guidance with distilled guidance embedding - 512 max sequence length Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for Black Forest Labs' FLUX.2-schnell model: - Implement FLUX.2 transformer with 38 double blocks and 58 single blocks - Add 32-channel VAE with modified scaling factors - Integrate Mistral3-based text encoder with sliding window attention and 32K max position embeddings - Create weight mapping for HuggingFace model conversion - Add LoRA support for fine-tuning - Include CLI tool: mflux-generate-flux2 - Add comprehensive tests for VAE, encoder, weight mapping, quantization, and LoRA Model specifications: - Uses rectified flow matching scheduler (no sigma shift) - 32-channel latent space (vs 16 in FLUX.1) - Mistral3 encoder (vs CLIP + T5 in FLUX.1) - 256 max sequence length - Supports 4/8-bit quantization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for Tencent's Hunyuan-DiT v1.2 model: - Implement Hunyuan-DiT transformer architecture with 28 DiT blocks using hidden_size=1408 and num_attention_heads=16 - Add dual text encoder system (Chinese BERT + T5-XXL) via HunyuanPromptEncoder - Implement DDPM scheduler for diffusion process - Add num_dit_blocks() method to LoadedWeights for counting Hunyuan-style transformer blocks - Create weight mapping for HuggingFace model conversion - Add LoRA support for fine-tuning - Include CLI tool: mflux-generate-hunyuan - Add comprehensive tests for DiT blocks, DDPM scheduler, text encoding, weight loading, and LoRA Model specifications: - Uses DDPM scheduler (1000 training steps) - Supports CFG with Chinese/English prompts - 256 max sequence length - Supports 4/8-bit quantization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for NewBie-AI's NewBie-image model (NewBie-AI/NewBie-image-Exp0.1): - Implement NextDiT transformer architecture with 36 blocks using hidden_size=2560 and Grouped Query Attention (24 query heads, 8 KV heads) - Add dual text encoder system: - Gemma3-4B-it for semantic understanding (2560 dim) - Jina CLIP v2 for image-text alignment (1024 dim) - Create weight mapping for HuggingFace model conversion - Add LoRA support for fine-tuning - Include CLI tool: mflux-generate-newbie - Add comprehensive tests for configuration, generation, and LoRA Model specifications: - 3.5B parameter model optimized for anime/illustration generation - Uses Flow Matching scheduler (no sigma shift) - 16-channel VAE (FLUX.1-dev compatible) - 512 max sequence length - Supports 4/8-bit quantization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix save.py: Import Hunyuan (main model class) instead of HunyuanDiT (transformer class) which was causing TypeError - Fix model_config.py: Use FLUX.2-dev (which exists) instead of FLUX.2-schnell (which doesn't exist on HuggingFace) - Update FLUX.2 aliases and enable guidance support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The NewBie-image HuggingFace repo only contains text_encoder (Gemma3), not text_encoder_2 (Jina CLIP). The Jina CLIP projection layers exist in the transformer weights, but the encoder itself is loaded separately from jinaai/jina-clip-v2 if needed. Changes: - Remove jina_clip_encoder from weight definition components - Remove jina_clip from tokenizer definitions - Update download patterns to exclude text_encoder_2 - Make jina_clip_encoder optional in initializer (set to None) - Skip jina_clip_encoder in weight application if None This fixes FileNotFoundError when loading NewBie-AI/NewBie-image-Exp0.1. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@jaddai0 Sorry for me being so late to comment on this. This is obviously impressive work porting all these models (even including AI help). I'm also considering the future maintenance of these a bit as the project grows... I've tried to be a bit opinionated with regards to what models to support in mflux and tend to prefer those with most community effort behind them (LoRA support) or the ones that win on some axis (speed, quality etc) . Out of these, which ones do you think is most fitting for the project to start with? |
|
Hey @filipstrand! I hadn't really considered upkeep when I did all of this. Honestly I wouldn't recommend any of them in that case. Chroma is a heavily modified Flux.1 Schnell and would be the easiest to maintain, but they're all niche. Let me know if there's anything for the project I can do for you though to help. I use it regularly for image gen on my mac and I'm happy to give back however I can. |
|
Thank you very much for the initiative, also good to know your opinion on the models above! What do you think about porting Qwen Image 2512? Im already halfway through the edit-2511, but have not started with 2512 yet. |
|
@filipstrand I submitted the pull request for Qwen Image 2512. Let me know if you need anything changed |
Amazing, thank you, I'll have a look at it soon, just need to find some space on my disk for this model :) |
Add support for the Chroma1-HD model (lodestones/Chroma1-HD), a modified FLUX.1-schnell with DistilledGuidanceLayer for efficient inference.
Key features:
New CLI command: mflux-generate-chroma
Usage:
mflux-generate-chroma --prompt "a cat" --steps 40 --output cat.png mflux-generate-chroma -q 4 --prompt "a dog" --output dog.png
Note: LoRA support not yet implemented for Chroma.