Hi Authors,
Thank you for releasing the code implementation — it’s very helpful for my research.
While reproducing the promoter design experiment from your paper “Dirichlet Flow Matching with Applications to DNA Sequence Design”, I noticed a potential inconsistency in the number of data samples used for training the model and baselines.
Specifically, the baseline “Dirichlet Diffusion Score Model for Biological Sequence Generation (DDSM)” appears to have been trained on only 40,000 samples out of the available 100,000 when generating the reported results. For example:
• Reported in the original paper: DDSM (time dilation 1x) → 0.0363
• Reproduced using 40,000 samples: DDSM (time dilation 1x) → 0.0380
Meanwhile, the main model from your paper seems to use all 100,000 data samples. Using the default script, I reproduced a result of SP-MSE = 0.292. However, when I modify the number of training samples to 40,000 (by changing the n_tsses parameter in lines 42–43 of train_promo.py), the SP-MSE increases to 0.0454, which appears worse than the baseline.
I was wondering:
• Have you noticed this discrepancy during your experiments?
• Is there anything I may have misunderstood in the code or setup?
• Are there additional steps needed to reproduce the same trends as reported in the paper?
Thank you again for your work and support!
Best regards,
YC
Hi Authors,
Thank you for releasing the code implementation — it’s very helpful for my research.
While reproducing the promoter design experiment from your paper “Dirichlet Flow Matching with Applications to DNA Sequence Design”, I noticed a potential inconsistency in the number of data samples used for training the model and baselines.
Specifically, the baseline “Dirichlet Diffusion Score Model for Biological Sequence Generation (DDSM)” appears to have been trained on only 40,000 samples out of the available 100,000 when generating the reported results. For example:
• Reported in the original paper: DDSM (time dilation 1x) → 0.0363
• Reproduced using 40,000 samples: DDSM (time dilation 1x) → 0.0380
Meanwhile, the main model from your paper seems to use all 100,000 data samples. Using the default script, I reproduced a result of SP-MSE = 0.292. However, when I modify the number of training samples to 40,000 (by changing the n_tsses parameter in lines 42–43 of train_promo.py), the SP-MSE increases to 0.0454, which appears worse than the baseline.
I was wondering:
• Have you noticed this discrepancy during your experiments?
• Is there anything I may have misunderstood in the code or setup?
• Are there additional steps needed to reproduce the same trends as reported in the paper?
Thank you again for your work and support!
Best regards,
YC