-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
One and a half years later, I'm finally getting better results on KimCNN using the original hyperparameters in the paper. There are a few discrepancies with the PyTorch and Castor implementation:
- Kim used an Adadelta Rho of 0.95 instead of 0.9. The paper did not mention this.
- Kim used Xavier uniform initialization for the convolution layers. The paper did not mention this.
- Kim did not use the equivalent of torchtext's BucketIterator. This is a difference in Castor.
- Kim used the dev loss as the criterion for model selection. This is a difference in Castor.
After these changes, the original hyperparameters in the paper work quite well. I'm getting 87.8 for SST-2 multichannel now, which is an improvement over the current 87.4. It's still a bit off from the paper result of 88.1, though.
Reference: https://github.com/yoonkim/CNN_sentence/blob/master/conv_net_sentence.py
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels