Improve KimCNN results

One and a half years later, I'm finally getting better results on KimCNN using the original hyperparameters in the paper. There are a few discrepancies with the PyTorch and Castor implementation:

- Kim used an Adadelta Rho of 0.95 instead of 0.9. The paper did not mention this.
- Kim used Xavier uniform initialization for the convolution layers. The paper did not mention this.
- Kim did not use the equivalent of torchtext's BucketIterator. This is a difference in Castor.
- Kim used the dev loss as the criterion for model selection. This is a difference in Castor.

After these changes, the original hyperparameters in the paper work quite well. I'm getting 87.8 for SST-2 `multichannel` now, which is an improvement over the current 87.4. It's still a bit off from the paper result of 88.1, though.

Reference: https://github.com/yoonkim/CNN_sentence/blob/master/conv_net_sentence.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve KimCNN results #183

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve KimCNN results #183

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions