-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I got an error when running the code.
`
ValueError Traceback (most recent call last)
Cell In[6], line 592
590 reward_functions = [xmlcount_reward, soft_format_reward, strict_format_reward, int_reward, accuracy_reward, sequence_similarity_reward]
591 trainer = GRPOTrainer(model, tokenizer, reward_functions, config, dataset, processor)
--> 592 trainer.train()
594 # -------------------------------
595 # Sample from final
596 # -------------------------------
597 sample_image = "https://raw.githubusercontent.com/Jaykef/ai-algorithms/refs/heads/main/assets/sample.jpg"
Cell In[6], line 463
461 self.clear_memory_cache()
462 if self.scaler is not None:
--> 463 self.scaler.unscale_(self.optimizer)
464 torch.nn.utils.clip_grad_norm_(self.model.parameters(), 0.1)
465 self.scaler.step(self.optimizer)
File ~/anaconda3/envs/vlm/lib/python3.10/site-packages/torch/amp/grad_scaler.py:338, in GradScaler.unscale_(self, optimizer)
335 inv_scale = self._scale.double().reciprocal().float()
336 found_inf = torch.full((), 0.0, dtype=torch.float32, device=self._scale.device)
--> 338 optimizer_state["found_inf_per_device"] = self.unscale_grads(
339 optimizer, inv_scale, found_inf, False
340 )
341 optimizer_state["stage"] = OptState.UNSCALED
...
264 # For scaled fp16 values, there's a good chance coalescing will cause overflow,
265 # so we should check the coalesced _values().
266 if param.grad.dtype is torch.float16:
ValueError: Attempting to unscale FP16 gradients.
`