It may just be on my end but I am experiencing some erroneous ordering of the green bars in the plot created by the function 'gen_sample_wise_prob_plot'. Here the underlying data matches the numbers in the probabilities csv but the green bars seem to be ordered differently to the x-labels and the scatter plot.
The highlighted sample 31 did indeed have 0.99 for TLX1, but that green bar is showing up for the TAL1 AB-like label. It's a very minor problem but figured you might want to know. I think it's simply because the scatter and labels used the order defined in 'label_list' while the bars used the column order from 'probs_raw_df'.
Also - in case you're wondering I retrained TALLSorts based on the data and classifications defined in the Polonen 2024 paper, hence the new subtypes. Thanks for making TALLSorts so easily retrainable.
Not knowing too much of what goes on under the hood of TALLSorts I am curious however: I was only given access to normalized counts for our data - so I normalized the publically available counts from the Polonen paper and trained TALLSorts on that. Knowing that TALLSorts performs its own data processing would you expect any problems with TALLSorts trained on normalized reads making calls on normalized reads?
For what it is worth the classifications line up well with both the default model (fed normalized reads) as well as subtypes determined by the clinic (albeit with more detailed calls and some of the newly defined subtypes).
Thanks for making a great tool,
Kasper
It may just be on my end but I am experiencing some erroneous ordering of the green bars in the plot created by the function 'gen_sample_wise_prob_plot'. Here the underlying data matches the numbers in the probabilities csv but the green bars seem to be ordered differently to the x-labels and the scatter plot.
The highlighted sample 31 did indeed have 0.99 for TLX1, but that green bar is showing up for the TAL1 AB-like label. It's a very minor problem but figured you might want to know. I think it's simply because the scatter and labels used the order defined in 'label_list' while the bars used the column order from 'probs_raw_df'.
Also - in case you're wondering I retrained TALLSorts based on the data and classifications defined in the Polonen 2024 paper, hence the new subtypes. Thanks for making TALLSorts so easily retrainable.
Not knowing too much of what goes on under the hood of TALLSorts I am curious however: I was only given access to normalized counts for our data - so I normalized the publically available counts from the Polonen paper and trained TALLSorts on that. Knowing that TALLSorts performs its own data processing would you expect any problems with TALLSorts trained on normalized reads making calls on normalized reads?
For what it is worth the classifications line up well with both the default model (fed normalized reads) as well as subtypes determined by the clinic (albeit with more detailed calls and some of the newly defined subtypes).
Thanks for making a great tool,
Kasper