Skip to content

Transformer implementation seemingly not corresponding to paper #46

@patrickgadd

Description

@patrickgadd

Hi there,

First off - thank you for open sourcing this very interesting work(!).

I'm having a look at this repo along with the paper "Context-Aware Learning to Rank with Self-Attention", and it seems there is a bug, as the code doesn't seem to correspond exactly to what's written in the paper:

https://github.com/allegro/allRank/blob/master/allrank/models/transformer.py#L105
Here you seemingly apply LayerNorm to the inputs to the MultiHeadAttention and Fully-Connected layers.
However, in the paper (and from what I gather is general practice for Transformers), you've written that LayerNorm is applied to the outputs of these corresponding layers.

It seems that this repo isn't actively worked on, but nonetheless I thought I'd let you know.

Thanks & best wishes,
Patrick

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions