This PR talks about "chunks". What is the definition of "chunk"? Is that only about assistant content? Does it include reasoning if the model makes a distinction between reasoning and response text? Does it include any notification from the service, like a function call request (or a part of one)? Etc. My assumption is it's any packet of data from the llm, such that each update produced as part of a streaming implementation, regardless of what that update contains, counts as a "chunk".
Originally posted by @stephentoub in #3377 (comment)