Change default sliding window pattern to the recommended "L" when FA3 is not available by ddudek · Pull Request #509 · karpathy/nanochat

ddudek · 2026-02-06T12:01:17Z

Changes default setting of sliding window pattern for setups without out-of-the-box FA3 support to the recommended in the warning.

This simplifies configuration for beginners running nanochat on their local setups, e.g. consumer grade GPUs like 3090/4090 and others without FA3 support.

Before:

$ python -m scripts.base_train --depth=12 --device-batch-size=16
...
GPU: NVIDIA GeForce RTX 3090 | Peak FLOPS (BF16): 7.10e+13
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Flash Attention 3 not available, using PyTorch SDPA fallback
WARNING: Training will be less efficient without FA3
WARNING: SDPA has no support for sliding window attention (window_pattern='SSSL'). Your GPU utilization will be terrible.
WARNING: Recommend using --window-pattern L for full context attention without alternating sliding window patterns.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Vocab size: 32,768
Model config:
{
  "sequence_len": 2048,
  "vocab_size": 32768,
  "n_layer": 12,
  "n_head": 6,
  "n_kv_head": 6,
  "n_embd": 768,
  "window_pattern": "SSSL"
}
...
step 00011/02205 (0.50%) | loss: 8.170549 | lrm: 1.00 | dt: 13119.33ms | tok/sec: 39,963 | mfu: 45.15 | epoch: 1 | total time: 0.22m | eta: 479.7m

After:

$ python -m scripts.base_train --depth=12 --device-batch-size=16
...
GPU: NVIDIA GeForce RTX 3090 | Peak FLOPS (BF16): 7.10e+13
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
WARNING: Flash Attention 3 not available, using PyTorch SDPA fallback
WARNING: Training will be less efficient without FA3
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Vocab size: 32,768
Model config:
{
  "sequence_len": 2048,
  "vocab_size": 32768,
  "n_layer": 12,
  "n_head": 6,
  "n_kv_head": 6,
  "n_embd": 768,
  "window_pattern": "L"
}
...
step 00011/02205 (0.50%) | loss: 8.177470 | lrm: 1.00 | dt: 7127.85ms | tok/sec: 73,554 | mfu: 91.90 | epoch: 1 | total time: 0.12m | eta: 260.6m

svlandeg

Nice idea and super minimal edit. As you mention, merging this would make the script behave more user-friendly for beginners with non-FA3 setups.

Change default sliding window pattern to L when FA3 is not available

fdaebf2

svlandeg approved these changes Feb 6, 2026

View reviewed changes

svlandeg added UX suggest/merge labels Feb 6, 2026

Moved default window pattern to fa3 code and added explanation

9caf669

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default sliding window pattern to the recommended "L" when FA3 is not available#509

Change default sliding window pattern to the recommended "L" when FA3 is not available#509
ddudek wants to merge 2 commits intokarpathy:masterfrom
ddudek:sliding-window-fa3-fallback

ddudek commented Feb 6, 2026

Uh oh!

svlandeg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ddudek commented Feb 6, 2026

Uh oh!

svlandeg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants