DeepSeek V3 16B crashes with 'tensor data not allocated' during backward with flex_attention + compile

## Bug

DeepSeek V3 16B training crashes during `loss.backward()` with `RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet` when using flex_attention. The issue persists regardless of whether `torch.compile` is enabled or disabled.

## Reproduction

```bash
# With compile (default config has compile=True, components=["loss"])
NCCL_NVLS_ENABLE=0 torchrun --nnodes 1 --nproc-per-node 8 -m torchtitan.train \
  --module deepseek_v3 --config deepseek_v3_16b \
  --parallelism.tensor_parallel_degree 8 \
  --parallelism.context_parallel_degree 1 \
  --parallelism.expert_parallel_degree 2 \
  --training.steps 10 \
  --dataloader.dataset c4_test

# Without compile (same error)
NCCL_NVLS_ENABLE=0 torchrun --nnodes 1 --nproc-per-node 8 -m torchtitan.train \
  --module deepseek_v3 --config deepseek_v3_16b \
  --parallelism.tensor_parallel_degree 8 \
  --parallelism.context_parallel_degree 1 \
  --parallelism.expert_parallel_degree 2 \
  --training.steps 10 \
  --dataloader.dataset c4_test \
  --compile.enable false
```

**Works on commit:** `73680eedb7a03635b246a598f3126ca3d945a710`
**Broken on commit:** `786e26f8ee47ffecb523a661535e71031583ff60`

## Environment

- 8x H100 GPUs
- PyTorch: `2.13.0.dev20260417+cu126`
- torchao: nightly

## Error

```
RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet.
If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel.
```

The error occurs during `loss.backward()` on all ranks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSeek V3 16B crashes with 'tensor data not allocated' during backward with flex_attention + compile #3128

Bug

Reproduction

Environment

Error

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DeepSeek V3 16B crashes with 'tensor data not allocated' during backward with flex_attention + compile #3128

Description

Bug

Reproduction

Environment

Error

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions