Skip to content

The token input to the model decoder at each time step is the same as the output token #4

@LiuZeJie97

Description

@LiuZeJie97

您的代码对我很有帮助,但是我发现代码可能存在如下问题:

  1. 模型解码器每个时间步输入的token和输出的token相同,因此模型准确率很高,但是正确的做法是每次输入t-1时刻的token(而不是t时刻的token),输出t时刻的token。
  2. teacher forcing仅可被用于模型的训练,但是代码在对模型进行评价时依然使用了teacher forcing。

Thanks for the code!I found that the code may have the following problems:

  1. The token input to the model decoder at each time step is the same as the output token, so the accuracy of the model is very high. But the correct input to the decoder is the token at time t-1, not the token at time t.
  2. teacher forcing can only be used for model training, but the code uses it when evaluating the model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions