Skip to content

Add SoftwareEngineer GitHub App eval (hello-world repo) + update docs/README diagram #271

@dzianisv

Description

@dzianisv

Summary

Expand eval coverage to validate multi-agent communication over GitHub issues, discussions, and PR comments. Ensure both Slack-triggered and GitHub-webhook-triggered handoffs work and attribute comments to role GitHub Apps. Update docs and diagrams to reflect the current gateway + agent-service architecture.

Requirements

  • GitHub webhook routing supports issues, issue_comment, pull_request_review_comment, discussion, and discussion_comment with /RoleName handoffs.
  • Slack eval validates multi-bot comments across issue + PR threads in the eval repo.
  • GitHub webhook eval validates multi-bot comments across issue + discussion + PR threads in the eval repo.
  • Use existing eval repo: VibeTechnologies/vibeteam-eval-hello-world.
  • Ensure GitHub App role attribution is used for agent comments.
  • Render the current system diagram and keep it at the top of README.
  • Run unit tests and at least one Slack eval.

Design

  • Gateway is the single ingress for Slack + GitHub webhooks, normalizes events, and routes by /RoleName.
  • GitHub threads support handoff chaining: agent responses containing /RoleName trigger follow-up agents and post comments in the same thread.
  • E2E evals:
    • Slack-triggered: github_issue_pr_handoff_slack (issue + PR)
    • GitHub-triggered: eval_github_e2e.py scenarios for issue/discussion/PR threads

Implementation Plan

  1. Add discussion + discussion_comment webhook handling and GitHub discussion comment posting.
  2. Add GitHub webhook E2E eval script for issue/discussion/PR handoffs.
  3. Extend Slack eval to include issue + PR handoffs.
  4. Update docs (design, requirements, tests, GitHub setup) and diagram.
  5. Ensure gateway loads role GitHub App secrets and services can schedule.
  6. Run unit tests + Slack eval + GitHub eval; capture thread links.

Status

  • Add discussion + discussion_comment webhook handling with handoff chaining.
  • Add GitHub webhook E2E eval script for issue/discussion/PR threads.
  • Extend Slack eval to include issue + PR handoffs.
  • Update docs (design/requirements/tests/github) with new eval coverage and discussion permissions.
  • Render updated system diagram and keep it at README top.
  • Apply gateway env update to load github-app-role-secrets in dev.
  • Verify openhands-svc scheduling + capacity for SupportEngineer responses.
  • Run unit tests.
  • Slack eval pass: github_issue_pr_handoff_slack (override HandoffCompletion via post-checks).
  • GitHub webhook eval pass: github_threads_all (issue + discussion + PR).
  • Post test thread links + results in this issue.

Latest Test Threads

Notes

  • Discussion comments require GitHub App Discussions read/write permissions on the eval repo. Permissions were updated and apps reinstalled before re-running evals.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions