add afmoe model support by CodeMan62 · Pull Request #1395 · PrimeIntellect-ai/prime-rl

CodeMan62 · 2025-12-07T05:30:31Z

This PR adds support of afmoe model to prime-rl.

GitHub Issue: #1343
Linear Issue: Resolves N/A

Note

Introduce afmoe CausalLM model (config, modeling, MoE routing) with HF↔Prime state dict converters, register it in auto-mapping, and add unit tests.

Models:
- Add afmoe package with AfMoeConfig, AfMoeModel, AfMoeForCausalLM, and AfMoePreTrainedModel implementing MoE (token-choice routing, shared experts), rotary embeddings, and attention backends.
- Implement state dict converters convert_hf_to_tt_moe/convert_tt_to_hf_moe and per-layer variants to translate HF ↔ Prime formats.
- Register "afmoe" with AutoConfig and map AfMoeConfig→AfMoeForCausalLM in AutoModelForCausalLMPrimeRL.
Tests:
- Add GPU unit tests validating attention-only, MLP/MoE-only, full forward/grad parity, HF↔Prime conversion round-trip, and dense vs MoE layer placement.

^{Written by Cursor Bugbot for commit 4941b3b. This will update automatically on new commits. Configure here.}

cursor · 2025-12-07T05:32:49Z

src/prime_rl/trainer/models/afmoe/configuration_afmoe.py

+        self.layer_types = layer_types
+        if num_key_value_heads is None:
+            self.num_key_value_heads = num_attention_heads
+        self.num_key_value_heads = num_key_value_heads


Bug: Unconditional assignment overwrites conditional default value

The num_key_value_heads assignment logic is broken. Line 201 correctly sets self.num_key_value_heads = num_attention_heads when num_key_value_heads is None, but line 202 unconditionally overwrites it with the original num_key_value_heads parameter (which is still None). This results in config.num_key_value_heads being None instead of defaulting to num_attention_heads, which will cause errors when building the attention layers.

cursor · 2025-12-07T05:32:49Z

src/prime_rl/trainer/models/afmoe/modeling_afmoe.py

+                model_config=config,
+            )
+            self.rotary_emb = RotaryEmbedding(rotary_config)
+            self.gradient_checkpointing = False


Bug: Rotary embedding only initialized in else branch

The self.rotary_emb and self.gradient_checkpointing are only initialized inside the else block when rope_scaling is not a dict. When config.rope_scaling is a dictionary (specifying custom rope parameters), these attributes won't be created, causing an AttributeError when forward() calls self.rotary_emb() on line 228. Comparing with glm4_moe, llama, and qwen3_moe implementations shows the RotaryEmbeddingConfig and subsequent initialization should be outside the else block.

src/prime_rl/trainer/models/afmoe/converting_afmoe.py

mikasenghaas · 2025-12-08T02:47:48Z

nice, thanks! did you do any small sft/ rl sanity checks?

CodeMan62 · 2025-12-08T05:23:13Z

I have tested all changes on colab's T4 GPU

CodeMan62 · 2025-12-09T11:06:47Z

@mikasenghaas can you tell me is there anything else i have to do here ?

samsja · 2025-12-10T20:58:15Z

@mikasenghaas can you tell me is there anything else i have to do here ?

the pr looks good, we will do some testing internally before merging it. Highly appreciate the work and we will try to merge it asap

src/prime_rl/trainer/models/afmoe/converting_afmoe.py

tests/unit/train/models/test_afmoe.py

Jackmin801

Thanks for the PR! The modeling code LGTM. If it passes test against the HF one I think should be good to merge

Jackmin801 · 2025-12-11T23:02:22Z

Ah you need the custom config from AutoConfig.from_pretrained('arcee-ai/Trinity-Mini', trust_remote_code=True) so it loads the custom model impl. One way to get it to work right now is to load the config from arcee-ai/Trinity-Mini and change the n_layers and num_experts to make it smaller for the unit tests.

We can also wait for the next transformers release and up transformers version.

CodeMan62 · 2025-12-12T03:56:21Z

Let me do this we can change the it when we get next transformers release in another patch.

tests/unit/train/models/test_afmoe.py

CodeMan62 · 2025-12-13T03:32:24Z

@Jackmin801 please take a look

Jackmin801 · 2025-12-15T19:04:52Z

@CodeMan62 Can you make sure that uv run pytest -vs tests/unit/train/models/test_afmoe.py works? Right now theres some config attribute mismatch issues and I believe if you solve those, there are then state dict issues as the models have different norms and moe params.

CodeMan62 · 2025-12-17T18:54:36Z

let us wait for next transformers release. Thanks for review @Jackmin801

add afmoe model support

a6b6344

cursor bot reviewed Dec 7, 2025

View reviewed changes

refactor: slight changes

5345e8b

Jackmin801 reviewed Dec 11, 2025

View reviewed changes

src/prime_rl/trainer/models/afmoe/converting_afmoe.py Show resolved Hide resolved

Jackmin801 reviewed Dec 11, 2025

View reviewed changes

tests/unit/train/models/test_afmoe.py Show resolved Hide resolved

Jackmin801 requested changes Dec 11, 2025

View reviewed changes

rename

f9810f2

CodeMan62 force-pushed the afmoe-support branch from 1ffa8b5 to f9810f2 Compare December 11, 2025 01:37

CodeMan62 added 5 commits December 11, 2025 08:16

renaming nit

c80d32c

use Huggingface model to compare

25031f4

Merge branch 'main' into afmoe-support

cd74f36

styling fix + model rename

cf783ef

nit

675a76b

CodeMan62 requested a review from Jackmin801 December 11, 2025 09:50

change config to AutoConfig

f4cad93

cursor bot reviewed Dec 12, 2025

View reviewed changes

tests/unit/train/models/test_afmoe.py Outdated Show resolved Hide resolved

nit

922d588

Merge branch 'main' into afmoe-support

4941b3b

CodeMan62 closed this Jan 12, 2026

CodeMan62 deleted the afmoe-support branch January 31, 2026 05:00

Conversation

CodeMan62 commented Dec 7, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot Dec 7, 2025

Choose a reason for hiding this comment

Bug: Unconditional assignment overwrites conditional default value

Uh oh!

cursor bot Dec 7, 2025

Choose a reason for hiding this comment

Bug: Rotary embedding only initialized in else branch

Uh oh!

Uh oh!

mikasenghaas commented Dec 8, 2025

Uh oh!

CodeMan62 commented Dec 8, 2025

Uh oh!

CodeMan62 commented Dec 9, 2025

Uh oh!

samsja commented Dec 10, 2025

Uh oh!

Uh oh!

Uh oh!

Jackmin801 left a comment

Choose a reason for hiding this comment

Uh oh!

Jackmin801 commented Dec 11, 2025

Uh oh!

CodeMan62 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CodeMan62 commented Dec 13, 2025

Uh oh!

Jackmin801 commented Dec 15, 2025

Uh oh!

CodeMan62 commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CodeMan62 commented Dec 7, 2025 •

edited by cursor bot

Loading

CodeMan62 commented Dec 12, 2025 •

edited

Loading