Skip to content

Iris: Add deep linking to citations in global search#506

Draft
Nayer-kotry wants to merge 4 commits intomainfrom
iris/feature/ask-iris-citations-and-deep-linking
Draft

Iris: Add deep linking to citations in global search#506
Nayer-kotry wants to merge 4 commits intomainfrom
iris/feature/ask-iris-citations-and-deep-linking

Conversation

@Nayer-kotry
Copy link
Copy Markdown
Contributor

@Nayer-kotry Nayer-kotry commented Apr 13, 2026

Motivation

Ask Iris returned answers with sources but no inline citations and no deep linking. Source cards opened the lecture page without navigating to the relevant slide or video moment.

Summary

The /api/v1/search/ask endpoint now returns 202 and pushes two callbacks to Artemis. The first arrives fast with a plain answer and skeleton citation markers so the UI can show loading bubbles immediately. The second replaces them with real inline citation tags once the citation pipeline finishes.

Search results now include startTime — the earliest video timestamp for that slide, fetched from the transcript segments. Artemis can use this alongside the page number to deep link into both the PDF and the video. Slide-only units get startTime: null. This applies to both the search and ask endpoints.

Test plan

  • POST /api/v1/search/ask returns 202 immediately
  • First callback arrives fast with skeleton markers and sources
  • Second callback has full citation tags with page and timestamp where available
  • Slide-only results have null startTime, video results have seconds
  • Source cards and citations deep link to the right place

Artemis counterpart: ls1intum/Artemis#12522

Closes IRIS-289

- LectureSearchAskRequestDTO: add artemisBaseUrl and authenticationToken
  fields for async callback routing
- LectureUnitInfo: add startTime (Optional[int]) for video deep linking —
  null for slide-only units, seconds for video units
- SearchAnswerStatusUpdateDTO: new DTO for the two-phase Artemis callback;
  carries cited flag, answer string, and sources list
- SearchAnswerCallback: sends two-phase HTTP POSTs to Artemis at
  {artemisBaseUrl}/api/iris/internal/search/runs/{token}/status
  with Bearer token auth; handles both cited=false and cited=true payloads
- /api/v1/search/ask: returns HTTP 202 immediately, runs pipeline in
  background thread; propagates request_id for log correlation
Phase 1: HyDE retrieval + answer generation with [cite-loading:N] skeleton
markers embedded by the LLM, replaced with unit names post-generation,
then sent to Artemis immediately with cited=false.

Phase 2: CitationPipeline enriches the answer with full
[cite:L:unit_id:page:start::keyword:summary] markers, sent with cited=true.
Sources with startTime map to LectureTranscriptionRetrievalDTO (timestamp
citations); slide-only sources map to LectureUnitPageChunkRetrievalDTO.

Updated answer_system_prompt to instruct the LLM to embed [cite-loading:<N>]
placeholders per sentence rather than suppressing citations entirely.
…ults

After segment retrieval, batch-fetches LectureTranscriptions filtered by
all lecture_unit_ids in the result set, then matches to (unit_id, page_number)
pairs in Python. Takes the earliest segment_start_time per pair (the first
moment the professor spoke over that slide) and populates it on LectureUnitInfo
as startTime.

Applies to both /api/v1/search/lectures and /api/v1/search/ask responses,
giving Artemis the data needed to construct deep links with both page and
timestamp query params.
@Nayer-kotry Nayer-kotry requested a review from a team as a code owner April 13, 2026 01:37
@github-actions github-actions bot added the iris label Apr 13, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

This pull request transforms the lecture search-answer feature from synchronous to asynchronous processing. It adds authentication and timing fields to DTOs, implements background job handling with status callbacks, integrates citation generation, and updates the API endpoint to return HTTP 202 Accepted with immediate response.

Changes

Cohort / File(s) Summary
Data Transfer Objects
iris/src/iris/domain/search/lecture_search_dto.py, iris/src/iris/domain/status/search_answer_status_update_dto.py
Added start_time field to LectureUnitInfo for timing info. Enhanced LectureSearchAskRequestDTO with artemis_base_url and authentication_token fields for async callback routing. Introduced new SearchAnswerStatusUpdateDTO for two-phase status update payloads.
Answer Pipeline & Prompts
iris/src/iris/pipeline/lecture_search_answer_pipeline.py, iris/src/iris/pipeline/prompts/lecture_search_prompts.py
Refactored pipeline to accept DTO and callback instead of raw query, removing direct response return. Added citation placeholder handling with skeleton markers ([cite-loading:N]), CitationPipeline integration, and two-phase callback sends. Updated prompt to require inline citation placeholders after sentences.
Data Retrieval
iris/src/iris/retrieval/lecture/lecture_global_search_retrieval.py
Integrated transcription schema to fetch segment start times during hybrid search, mapping them into LectureUnitInfo objects for UI timeline support.
Endpoint & Callbacks
iris/src/iris/web/routers/search.py, iris/src/iris/web/status/status_update.py
Converted /api/v1/search/ask endpoint to async (HTTP 202 response) with background worker thread. Added SearchAnswerCallback class to post status updates to Artemis with bearer token authentication.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as /api/v1/search/ask
    participant Worker as Background Worker
    participant Pipeline as Answer Pipeline
    participant Citation as Citation Pipeline
    participant Artemis as Artemis Callback
    participant DB as Lecture Database

    Client->>API: POST (dto with auth token)
    API->>API: Create callback with run_id
    API-->>Client: HTTP 202 + token
    API->>Worker: spawn thread
    
    activate Worker
    Worker->>DB: fetch lecture search results
    Worker->>Pipeline: __call__(dto, callback)
    
    activate Pipeline
    Pipeline->>Pipeline: detect & replace [cite-loading:N]
    Pipeline->>Artemis: callback.send(phase-1: skeleton)
    deactivate Pipeline
    
    activate Pipeline
    Pipeline->>Citation: process citations
    Pipeline->>Pipeline: strip skeleton, generate full [cite:L:...]
    Pipeline->>Artemis: callback.send(phase-2: final answer)
    deactivate Pipeline
    
    Pipeline-->>Worker: complete
    deactivate Worker
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hark, the search now springs forth free,
With callbacks dancing—one, then three!
Citations bloom like carrots sweet,
Two phases hop, the loop's complete!
Async whispers speed the race,
While tokens pass with graceful pace. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main objective: adding deep linking to citations in global search, which aligns with the core changes implementing two-phase citation responses and video timestamp integration.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch iris/feature/ask-iris-citations-and-deep-linking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Nayer-kotry Nayer-kotry marked this pull request as draft April 13, 2026 01:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@iris/src/iris/pipeline/lecture_search_answer_pipeline.py`:
- Around line 215-220: The citation generation is hardcoded to English; update
the call to self.citation_pipeline in the cited_answer assignment to pass the
detected/request language instead of the literal "en" (e.g.,
user_language=retrieval_dto.request_language or retrieval_dto.user_language or
the pipeline's detected language variable), and keep a safe fallback to "en" if
that field is missing; ensure you update the parameters for
self.citation_pipeline(citation args: retrieval_dto, clean_answer,
InformationType.PARAGRAPHS, variant="default", user_language=...) so the
citation payload language matches the student's/request language.

In `@iris/src/iris/retrieval/lecture/lecture_global_search_retrieval.py`:
- Around line 129-148: The current query to transcription_collection
(fetch_objects with limit=500) can drop matching transcriptions for large units;
change the implementation in lecture_global_search_retrieval so you either page
through all results from transcription_collection.query.fetch_objects (looping
with the collection's pagination/cursor until no more objects) or replace it
with an aggregate/min query that returns the minimum SEGMENT_START_TIME per
(LECTURE_UNIT_ID, PAGE_NUMBER) pair; ensure you still filter with
Filter.by_property(LectureTranscriptionSchema.LECTURE_UNIT_ID.value).contains_any(unit_ids)
and then populate the times dict for every requested (uid, page) key (the
variables/fields to touch: fetch_objects call,
LectureTranscriptionSchema.SEGMENT_START_TIME/LECTURE_UNIT_ID/PAGE_NUMBER, the
times dict and requested check).

In `@iris/src/iris/web/routers/search.py`:
- Around line 40-43: The code currently passes untrusted dto.artemis_base_url
and dto.authentication_token directly into SearchAnswerCallback (run_id,
base_url), creating an SSRF/unauthorized-post risk; change this to obtain the
callback URL and bearer token from trusted server-side sources instead: do not
use dto.artemis_base_url or dto.authentication_token directly in constructing
SearchAnswerCallback—derive base_url from your app config/allowlist or a
server-only mapping (e.g., get_callback_url_for_tenant(session.tenant) or an
explicit allowlist check) and retrieve the auth token from server
secrets/session state (e.g., get_service_auth_token(session.user) or
config-based token), or validate dto.artemis_base_url against a strict allowlist
and reject any other values before calling SearchAnswerCallback.
- Around line 69-73: The current handler spawns a raw Thread per request
(Thread(..., target=_run_search_ask_worker, args=(dto, request_id))) which can
exhaust resources; replace this with a bounded background worker model: create
or reuse a module- or app-scoped ThreadPoolExecutor (or an asyncio background
queue/worker) with a fixed max_workers, submit the job via
executor.submit(_run_search_ask_worker, dto, request_id) (or push dto/request_id
to a background queue consumed by worker coroutines), and ensure proper
shutdown/cleanup of the executor (or queue workers) on application stop; update
the request handler to enqueue/submit instead of Thread.start() so requests
won’t spawn unbounded OS threads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69dfdd93-d6c6-46da-8dba-1ceaadd7b873

📥 Commits

Reviewing files that changed from the base of the PR and between ed561ba and 74a9f6f.

📒 Files selected for processing (7)
  • iris/src/iris/domain/search/lecture_search_dto.py
  • iris/src/iris/domain/status/search_answer_status_update_dto.py
  • iris/src/iris/pipeline/lecture_search_answer_pipeline.py
  • iris/src/iris/pipeline/prompts/lecture_search_prompts.py
  • iris/src/iris/retrieval/lecture/lecture_global_search_retrieval.py
  • iris/src/iris/web/routers/search.py
  • iris/src/iris/web/status/status_update.py

Comment on lines +202 to +226
if not used_sources:
return

# ── Phase 2: enrich with full citations ──────────────────────────────────────

# Step 6: Strip skeleton markers → clean prose for CitationPipeline input
clean_answer = _CITE_LOADING_PATTERN.sub("", answer_with_names).strip()

# Step 7: Adapt used_sources to LectureRetrievalDTO (CitationPipeline expects this)
retrieval_dto = _to_lecture_retrieval_dto(used_sources)

# Step 8: Run CitationPipeline — adds [cite:L:unit_id:page:::keyword:summary] markers
try:
cited_answer = self.citation_pipeline(
retrieval_dto,
clean_answer,
InformationType.PARAGRAPHS,
variant="default",
user_language="en",
)
except Exception as e:
logger.error(
"CitationPipeline failed, skipping phase 2 callback", exc_info=e
)
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Every successful run still needs a terminal callback.

These branches return after sending only the interim cited=False payload. If used_sources is empty or CitationPipeline fails, Artemis never receives a final state, so the run can sit in a perpetual loading/timeout state with skeleton citations still rendered. Send a terminal payload in these cases as well—e.g. reuse the cleaned phase-1 answer with cited=True, or explicitly signal failure via the callback contract.

Comment on lines +215 to +220
cited_answer = self.citation_pipeline(
retrieval_dto,
clean_answer,
InformationType.PARAGRAPHS,
variant="default",
user_language="en",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Citation generation is hardcoded to English.

answer_system_prompt explicitly requires matching the student's language, but phase 2 always calls CitationPipeline(..., user_language="en"). For non-English queries, the citation keyword/summary payload will be generated in the wrong language even when phase 1 answered correctly. Pass the detected/request language through instead of forcing English.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/pipeline/lecture_search_answer_pipeline.py` around lines 215 -
220, The citation generation is hardcoded to English; update the call to
self.citation_pipeline in the cited_answer assignment to pass the
detected/request language instead of the literal "en" (e.g.,
user_language=retrieval_dto.request_language or retrieval_dto.user_language or
the pipeline's detected language variable), and keep a safe fallback to "en" if
that field is missing; ensure you update the parameters for
self.citation_pipeline(citation args: retrieval_dto, clean_answer,
InformationType.PARAGRAPHS, variant="default", user_language=...) so the
citation payload language matches the student's/request language.

Comment on lines +129 to +148
transcriptions = self.transcription_collection.query.fetch_objects(
filters=Filter.by_property(
LectureTranscriptionSchema.LECTURE_UNIT_ID.value
).contains_any(unit_ids),
limit=500,
).objects
times: dict[tuple[int, int], int] = {}
for t in transcriptions:
props = t.properties
uid = props.get(LectureTranscriptionSchema.LECTURE_UNIT_ID.value)
page = props.get(LectureTranscriptionSchema.PAGE_NUMBER.value)
start = props.get(LectureTranscriptionSchema.SEGMENT_START_TIME.value)
if uid is None or page is None or start is None:
continue
key = (uid, page)
if key not in requested:
continue
if key not in times or start < times[key]:
times[key] = int(start)
return times
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

limit=500 can silently drop deep-link timestamps.

This lookup fetches all transcriptions for the selected lecture units and then computes the earliest (unit_id, page_number) match in Python, but it stops after 500 objects. Large units can easily exceed that, so some requested pairs will never be seen and startTime will be missing/non-deterministic even though matching transcriptions exist. Please page through the full result set or query the minimum timestamp per requested pair instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/retrieval/lecture/lecture_global_search_retrieval.py` around
lines 129 - 148, The current query to transcription_collection (fetch_objects
with limit=500) can drop matching transcriptions for large units; change the
implementation in lecture_global_search_retrieval so you either page through all
results from transcription_collection.query.fetch_objects (looping with the
collection's pagination/cursor until no more objects) or replace it with an
aggregate/min query that returns the minimum SEGMENT_START_TIME per
(LECTURE_UNIT_ID, PAGE_NUMBER) pair; ensure you still filter with
Filter.by_property(LectureTranscriptionSchema.LECTURE_UNIT_ID.value).contains_any(unit_ids)
and then populate the times dict for every requested (uid, page) key (the
variables/fields to touch: fetch_objects call,
LectureTranscriptionSchema.SEGMENT_START_TIME/LECTURE_UNIT_ID/PAGE_NUMBER, the
times dict and requested check).

Comment on lines +40 to +43
callback = SearchAnswerCallback(
run_id=dto.authentication_token,
base_url=dto.artemis_base_url,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not let the request body choose the callback URL and bearer token.

dto.artemis_base_url and dto.authentication_token are used verbatim for an outbound POST. That gives any caller who can hit this endpoint a server-side request forgery primitive and lets them make Iris send search results to an arbitrary URL under an attacker-chosen Authorization header. Derive the callback target from trusted server-side config/session state, or at minimum enforce a strict allowlist before constructing SearchAnswerCallback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/web/routers/search.py` around lines 40 - 43, The code currently
passes untrusted dto.artemis_base_url and dto.authentication_token directly into
SearchAnswerCallback (run_id, base_url), creating an SSRF/unauthorized-post
risk; change this to obtain the callback URL and bearer token from trusted
server-side sources instead: do not use dto.artemis_base_url or
dto.authentication_token directly in constructing SearchAnswerCallback—derive
base_url from your app config/allowlist or a server-only mapping (e.g.,
get_callback_url_for_tenant(session.tenant) or an explicit allowlist check) and
retrieve the auth token from server secrets/session state (e.g.,
get_service_auth_token(session.user) or config-based token), or validate
dto.artemis_base_url against a strict allowlist and reject any other values
before calling SearchAnswerCallback.

Comment on lines +69 to +73
thread = Thread(
target=_run_search_ask_worker,
args=(dto, request_id),
)
thread.start()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

A raw thread per request will fall over under load.

Each /ask request now creates an unbounded OS thread for work that can run for seconds and may block on 200s callback timeouts. A burst of requests can exhaust the app workers long before the LLM/retrieval stack is saturated. Please move this onto a bounded executor/queue/background worker instead of spawning threads directly from the request handler.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iris/src/iris/web/routers/search.py` around lines 69 - 73, The current
handler spawns a raw Thread per request (Thread(...,
target=_run_search_ask_worker, args=(dto, request_id))) which can exhaust
resources; replace this with a bounded background worker model: create or reuse
a module- or app-scoped ThreadPoolExecutor (or an asyncio background
queue/worker) with a fixed max_workers, submit the job via
executor.submit(_run_search_ask_worker, dto, request_id) (or push dto/request_id
to a background queue consumed by worker coroutines), and ensure proper
shutdown/cleanup of the executor (or queue workers) on application stop; update
the request handler to enqueue/submit instead of Thread.start() so requests
won’t spawn unbounded OS threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant