I could not find the demonstration retrieval code in code/data_process.py, so I wrote my own implementation, using sentence_transformers and the all-mpnet-base-v2 model. I followed the method outlined in the paper in which I find the top 1 corresponding utterance in the training set, using same-label pairing for training and all-labels pairing for dev/test.
Without demonstration retrieval, I was able to come close to the paper's result. However, with demonstration retrieval, my F1-scores decreased all the way from ~69 to ~25 for MELD (and similar results for other datasets). I find the model is just outputting the demonstration emotion while training, in order to reduce the loss as much as possible, which leads to extreme overfitting. However, in validation, this almost never works since it uses all-labels pairing.
Do you have the demonstration retrieval code we can look at? It seems it is not effective.
I could not find the demonstration retrieval code in
code/data_process.py, so I wrote my own implementation, usingsentence_transformersand theall-mpnet-base-v2model. I followed the method outlined in the paper in which I find the top 1 corresponding utterance in the training set, using same-label pairing for training and all-labels pairing for dev/test.Without demonstration retrieval, I was able to come close to the paper's result. However, with demonstration retrieval, my F1-scores decreased all the way from ~69 to ~25 for MELD (and similar results for other datasets). I find the model is just outputting the demonstration emotion while training, in order to reduce the loss as much as possible, which leads to extreme overfitting. However, in validation, this almost never works since it uses all-labels pairing.
Do you have the demonstration retrieval code we can look at? It seems it is not effective.