Hi, thank you so much for releasing code and data for this inspiring work!
I am wondering if the filtered data used for training your 1B-5x model in table 1 (which is 144B tokens) is also released? Right now it seems like I can only find the entire data pool for that scale.
Thank you very much for your time and help!