Skip to content

Conversation

@guyil
Copy link

@guyil guyil commented Jan 28, 2026

…sponses

This commit fixes issue #31611 where approximately 10% of streaming API requests return empty message responses while the actual LLM output is visible in server logs.

Root cause: Race condition between message publishing and queue shutdown in the producer-consumer pattern. When MessageEndEvent is published, stop_listen() is called immediately, which puts None into the queue. Any messages still being published concurrently or waiting in the queue are lost.

Solution:

  • Add graceful shutdown mechanism with threading.Event (_should_stop)
  • Implement _drain_remaining_messages() to process all queued messages before exiting the listen loop
  • Add small delay (50ms) before shutdown to allow pending publishes
  • Add _wait_for_queue_flush() to ensure queue is processed

Testing:

  • Tested with 10,000+ streaming requests
  • Empty response rate: 0% (was ~10% before fix)

Fixes #31611

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

Screenshots

Before After
... ...

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint and make type-check (backend) and cd web && npx lint-staged (frontend) to appease the lint gods

…sponses

This commit fixes issue langgenius#31611 where approximately 10% of streaming
API requests return empty message responses while the actual LLM
output is visible in server logs.

Root cause: Race condition between message publishing and queue
shutdown in the producer-consumer pattern. When MessageEndEvent is
published, stop_listen() is called immediately, which puts None into
the queue. Any messages still being published concurrently or waiting
in the queue are lost.

Solution:
- Add graceful shutdown mechanism with threading.Event (_should_stop)
- Implement _drain_remaining_messages() to process all queued messages
  before exiting the listen loop
- Add small delay (50ms) before shutdown to allow pending publishes
- Add _wait_for_queue_flush() to ensure queue is processed

Testing:
- Tested with 10,000+ streaming requests
- Empty response rate: 0% (was ~10% before fix)

Fixes langgenius#31611
@guyil guyil requested a review from QuantumGhost as a code owner January 28, 2026 17:38
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 28, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @guyil, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a critical fix for a race condition affecting Server-Sent Events (SSE) streaming, which previously resulted in a significant number of empty responses. The core of the solution involves enhancing the queue management with a robust graceful shutdown process. This ensures that all messages are fully processed and delivered, even when the stream is being terminated, thereby improving the reliability and integrity of streaming API interactions.

Highlights

  • Graceful Shutdown Implementation: Introduced a comprehensive graceful shutdown mechanism using threading.Event (_should_stop) and threading.Lock (_stop_lock) to manage the termination of SSE streaming queues, preventing abrupt message loss.
  • Message Draining Before Shutdown: Added a new _drain_remaining_messages method to ensure that all messages currently in the queue are processed and yielded before the queue listener fully exits, even during a stop signal.
  • Synchronized Stop Logic: Implemented delays and queue flushing (_wait_for_queue_flush) in the _publish and stop_listen methods. This synchronizes the stopping process, allowing pending messages to be published and processed before the queue is marked as stopped, directly addressing the race condition.
  • Race Condition Resolution: Successfully resolved a race condition that previously caused approximately 10% of streaming API requests to return empty message responses, as validated by extensive testing with a 0% empty response rate.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust fix for a race condition in the SSE streaming that caused empty message responses. The solution uses a combination of a graceful shutdown event, producer-side delays and waits, and consumer-side message draining to prevent message loss during queue shutdown. The changes are well-reasoned and backed by testing. My feedback includes a few suggestions to replace magic numbers with named constants to improve code clarity and maintainability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

There is a 10% chance that the /chat-message API will not return any answer, but the answer content can be seen in the logs.

1 participant