Skip to content

Fix sft_trainer crash when adapter_file is None (#908)#987

Open
H-A-Khan wants to merge 1 commit intoBlaizzy:mainfrom
H-A-Khan:fix/sft-trainer-adapter-file-none
Open

Fix sft_trainer crash when adapter_file is None (#908)#987
H-A-Khan wants to merge 1 commit intoBlaizzy:mainfrom
H-A-Khan:fix/sft-trainer-adapter-file-none

Conversation

@H-A-Khan
Copy link
Copy Markdown

@H-A-Khan H-A-Khan commented Apr 9, 2026

What

sft_trainer.train (and the analogous orpo_trainer.train_orpo) crash on the first checkpoint or final save if args.adapter_file is None:

File "mlx_vlm/trainer/sft_trainer.py", line 471, in train
    save_adapter(model, args.adapter_file)
File "mlx_vlm/trainer/utils.py", line 235, in save_adapter
    path = Path(adapter_file)
TypeError: argument should be a str or an os.PathLike object
where __fspath__ returns a str, not 'NoneType'

This happens when callers construct TrainingArgs (or pass a SimpleNamespace) without explicitly setting adapter_file. The dataclass default is "adapters.safetensors", but if a caller overrides that to None — or if the field is shadowed during serialisation — the train loop runs to the first save, then crashes and loses the work.

Closes #908.

Fix

Defend at the top of train / train_orpo. If args.adapter_file is None, set it to the dataclass default (adapters.safetensors) and emit a warning on rank 0 so the substitution is visible. Covers all three call sites (steps_per_save checkpoint, indexed checkpoint filename, final save) with one guard per trainer.

Changes

  • mlx_vlm/trainer/sft_trainer.py: defensive guard at top of train().
  • mlx_vlm/trainer/orpo_trainer.py: same guard at top of train_orpo() (the same bug exists in the ORPO trainer).
  • mlx_vlm/tests/test_trainer.py: regression test test_train_handles_none_adapter_file that constructs TrainingArgs(...), sets adapter_file = None, runs a single-iteration training loop, and asserts both that the default is applied and that save_safetensors is still invoked.

Tests

python -m pytest mlx_vlm/tests/test_trainer.py -v
test_dataset_getitem PASSED
test_dataset_initialization PASSED
test_train_handles_none_adapter_file PASSED
test_train_smoke PASSED
test_trainer_initialization PASSED
======================== 5 passed in 3.84s =========================

`sft_trainer.train` (and the analogous `orpo_trainer.train_orpo`)
crash on the first checkpoint or final save if `args.adapter_file`
is `None`:

    File "mlx_vlm/trainer/sft_trainer.py", line 471, in train
        save_adapter(model, args.adapter_file)
    File "mlx_vlm/trainer/utils.py", line 235, in save_adapter
        path = Path(adapter_file)
    TypeError: argument should be a str or an os.PathLike object
    where __fspath__ returns a str, not 'NoneType'

This happens when callers construct `TrainingArgs` (or pass a
`SimpleNamespace`) without explicitly setting `adapter_file`. The
dataclass default is `"adapters.safetensors"`, but if a caller
overrides it to None — or if the field is shadowed during
serialisation — the train loop runs to the first save, then
crashes and loses the work.

Fix: defend at the top of `train` / `train_orpo`. If
`args.adapter_file` is `None`, set it to the dataclass default
(`adapters.safetensors`) and emit a warning on rank 0 so the
substitution is visible. Covers all three call sites
(steps_per_save checkpoint, indexed checkpoint filename, final
save) with one guard.

Includes a regression test in `test_trainer.py` that constructs
`TrainingArgs(...)` then sets `adapter_file = None`, runs a
single-iteration training loop, and asserts both that the default
is applied and that the save still runs.

Closes Blaizzy#908
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] sft_trainer crashes on checkpoint save — args.adapter_file is None

1 participant