Skip to content

Add loss scale from data#8430

Open
hpsun1109 wants to merge 5 commits intomodelscope:mainfrom
hpsun1109:add-loss-scale-from-data
Open

Add loss scale from data#8430
hpsun1109 wants to merge 5 commits intomodelscope:mainfrom
hpsun1109:add-loss-scale-from-data

Conversation

@hpsun1109
Copy link
Copy Markdown
Contributor

支持不同数据使用不同的 loss 计算策略
通过数据行级别的 "loss_scale" 字段为每行数据指定不同的 loss 计算策略

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求引入了一项重要功能,允许用户在数据行级别定义损失计算策略,从而提供了对模型训练过程中损失权重的更精细控制。通过在数据集的每行中添加"loss_scale"字段,用户可以为不同的数据点指定独特的损失行为,例如仅计算最后一轮对话的损失或计算所有token的损失。这极大地增强了训练的灵活性,特别是在处理需要差异化损失处理的复杂数据集时。

Highlights

  • 支持数据行级别的损失权重(loss_scale)设置: 现在可以通过数据行中的"loss_scale"字段为每条数据指定不同的损失计算策略,该设置优先级高于命令行参数。
  • 灵活的损失计算策略: 允许使用'default'、'last_round'、'all'以及组合策略(如'last_round+ignore_empty_think')来定义每行数据的损失行为。
  • 更新了核心损失计算逻辑: LossScale类已修改,以识别并应用数据行中指定的loss_scale,并包含对无效策略的警告和回退机制。
  • 完善了数据预处理流程: 数据预处理器现在能够正确地从数据行中提取并传递loss_scale字段。
  • 全面更新了中英文文档: 所有相关文档(包括架构、自定义数据集、命令行参数和Agent支持)都已更新,以反映这一新功能及其使用方法。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature allowing loss_scale strategies to be specified at the data row level, overriding global command-line parameters. This functionality is reflected in updated documentation across various files, including Customization/Architecture.md, Customization/Custom-dataset.md, Instruction/Agent-support.md, Instruction/Command-line-parameters.md, and Megatron-SWIFT/Command-line-parameters.md. Code changes include adding loss_scale to _DATA_ROW_KEYS in swift/dataset/preprocessor/core.py, preserving the loss_scale field in swift/dataset/utils.py, and modifying the CustomLossScale class in swift/loss_scale/base.py to handle per-row loss_scale overrides with fallback logic. A review comment suggests moving the from .mapping import get_loss_scale import statement from within a method to the module level in swift/loss_scale/base.py to improve performance and adhere to Python best practices.

row_loss_scale = kwargs.get('loss_scale')
if row_loss_scale is not None:
# Use per-row loss_scale with higher priority than global setting
from .mapping import get_loss_scale
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

from .mapping import get_loss_scale 语句移动到文件顶部(模块级别)。在方法内部导入模块会导致每次调用该方法时都重新执行导入操作,这会降低性能并可能导致意外行为。将所有导入语句放在文件顶部是 Python 的最佳实践,可以提高代码的可读性和效率。

from swift.template import ContextType, Messages, get_last_user_round
from swift.utils import get_logger
from .utils import calculate_loss_scale
from .mapping import get_loss_scale

encoded = self.template.encode(row, return_length=True)
# Preserve loss_scale from data row if present (for per-row loss_scale strategy)
if 'loss_scale' in row:
encoded['loss_scale'] = row['loss_scale']
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be implemented by modifying here?

def from_dict(cls, inputs: Dict[str, Any]) -> 'StdTemplateInputs':
inputs = deepcopy(inputs)
kwargs = {}
for key in ['label', 'channel', 'margin', 'rejected_response']:
if key in inputs:
kwargs[key] = inputs[key]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be implemented by modifying here?

def from_dict(cls, inputs: Dict[str, Any]) -> 'StdTemplateInputs':
inputs = deepcopy(inputs)
kwargs = {}
for key in ['label', 'channel', 'margin', 'rejected_response']:
if key in inputs:
kwargs[key] = inputs[key]

这部分代码不影响,已还原

- List[float]: Loss scale values corresponding one-to-one with the
returned context list
"""
# Check for per-row loss_scale override in kwargs (from data row)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use different loss_scale in the template?

self.loss_scale: LossScale = get_loss_scale(loss_scale)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use different loss_scale in the template?

self.loss_scale: LossScale = get_loss_scale(loss_scale)

是的,数据中的loss_scale可以传入

res_context_list, loss_scale_list = self.loss_scale(res_context_list, res_context_types, inputs.messages,
**inputs.extra_kwargs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants