Conversation
|
Despite seeing all the green lights for merging don't do it just now. |
|
Thanks. Before we land this, I'd like to run the finetuning to make sure it is still training as expected. I'll do that in the next day or so. |
|
I don't have a GPU (yeah, I know 😄 ) so I want to excuse myself in advance for any stupid questions/suggestions. Basically the problem that I wasn't able to test my suspicions with the checkpoints for this repo.
I feel like you have already knew/discussed this, nevertheless I wanted to mention it. By the way: padding up to nearest multiple of 64 in my opinion is useful only for |
|
Hello @awaelchli
Any luck with this? |
Hi there 👋
As @carmocca mentioned in PR #352 some code changes need to be done:
self.n_embd-->Csince this value is extracted from the shape of input variablexright in the beginning of the forward method.vocab_size-->padded_vocab_sizeto align it withlit_llama/model.py. I assume after this checkpoints won't go south since this is just an expansion in size for better performance (I believe up to 25%). With shrinkage it would be a whole another story.