Hello,
Thank you for your great work. I am currently trying to reproduce the zero-shot unseen classification results reported in your paper.
Description of the Issue: I have successfully reproduced the reported results on most of the disease test sets. However, when evaluating on the PadChest dataset (specifically the unseen/rare subclasses), I observed anomalous behavior in the metric report.
Specifically:
NaN AUC values: Several classes (e.g., round atelectasis, surgery humeral, empyema) show nan for AUC ROC.
Abnormal Recall/Precision: For these specific classes, the Precision is 0 while Recall is 1, which seems mathematically conflicting (unless handled by specific edge-case logic).
Average Metrics: Consequently, the averaged AUC is nan, and the average Accuracy seems lower than expected.
Logs / Results Table: Here is the output table I obtained:
| Class Name |
Accuracy |
Max F1 |
AUC ROC |
Precision |
Recall |
| round atelectasis |
0.000128 |
0 |
nan |
0 |
1 |
| surgery humeral |
0.000128 |
0 |
nan |
0 |
1 |
| empyema |
0.000128 |
0 |
nan |
0 |
1 |
| pulmonary artery hypertension |
0.000128 |
0 |
nan |
0 |
1 |
| aortic aneurysm |
0.000128 |
0 |
nan |
0 |
1 |
| ... (omitted other classes) |
... |
... |
... |
... |
... |
| Average |
0.65647 |
0.0465 |
nan |
0.0288 |
0.5982 |
Hello,
Thank you for your great work. I am currently trying to reproduce the zero-shot unseen classification results reported in your paper.
Description of the Issue: I have successfully reproduced the reported results on most of the disease test sets. However, when evaluating on the PadChest dataset (specifically the unseen/rare subclasses), I observed anomalous behavior in the metric report.
Specifically:
NaN AUC values: Several classes (e.g., round atelectasis, surgery humeral, empyema) show nan for AUC ROC.
Abnormal Recall/Precision: For these specific classes, the Precision is 0 while Recall is 1, which seems mathematically conflicting (unless handled by specific edge-case logic).
Average Metrics: Consequently, the averaged AUC is nan, and the average Accuracy seems lower than expected.
Logs / Results Table: Here is the output table I obtained: