Iris: Add deep linking to citations in global search#506
Iris: Add deep linking to citations in global search#506Nayer-kotry wants to merge 4 commits intomainfrom
Iris: Add deep linking to citations in global search#506Conversation
- LectureSearchAskRequestDTO: add artemisBaseUrl and authenticationToken fields for async callback routing - LectureUnitInfo: add startTime (Optional[int]) for video deep linking — null for slide-only units, seconds for video units - SearchAnswerStatusUpdateDTO: new DTO for the two-phase Artemis callback; carries cited flag, answer string, and sources list
- SearchAnswerCallback: sends two-phase HTTP POSTs to Artemis at
{artemisBaseUrl}/api/iris/internal/search/runs/{token}/status
with Bearer token auth; handles both cited=false and cited=true payloads
- /api/v1/search/ask: returns HTTP 202 immediately, runs pipeline in
background thread; propagates request_id for log correlation
Phase 1: HyDE retrieval + answer generation with [cite-loading:N] skeleton markers embedded by the LLM, replaced with unit names post-generation, then sent to Artemis immediately with cited=false. Phase 2: CitationPipeline enriches the answer with full [cite:L:unit_id:page:start::keyword:summary] markers, sent with cited=true. Sources with startTime map to LectureTranscriptionRetrievalDTO (timestamp citations); slide-only sources map to LectureUnitPageChunkRetrievalDTO. Updated answer_system_prompt to instruct the LLM to embed [cite-loading:<N>] placeholders per sentence rather than suppressing citations entirely.
…ults After segment retrieval, batch-fetches LectureTranscriptions filtered by all lecture_unit_ids in the result set, then matches to (unit_id, page_number) pairs in Python. Takes the earliest segment_start_time per pair (the first moment the professor spoke over that slide) and populates it on LectureUnitInfo as startTime. Applies to both /api/v1/search/lectures and /api/v1/search/ask responses, giving Artemis the data needed to construct deep links with both page and timestamp query params.
📝 WalkthroughWalkthroughThis pull request transforms the lecture search-answer feature from synchronous to asynchronous processing. It adds authentication and timing fields to DTOs, implements background job handling with status callbacks, integrates citation generation, and updates the API endpoint to return HTTP 202 Accepted with immediate response. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as /api/v1/search/ask
participant Worker as Background Worker
participant Pipeline as Answer Pipeline
participant Citation as Citation Pipeline
participant Artemis as Artemis Callback
participant DB as Lecture Database
Client->>API: POST (dto with auth token)
API->>API: Create callback with run_id
API-->>Client: HTTP 202 + token
API->>Worker: spawn thread
activate Worker
Worker->>DB: fetch lecture search results
Worker->>Pipeline: __call__(dto, callback)
activate Pipeline
Pipeline->>Pipeline: detect & replace [cite-loading:N]
Pipeline->>Artemis: callback.send(phase-1: skeleton)
deactivate Pipeline
activate Pipeline
Pipeline->>Citation: process citations
Pipeline->>Pipeline: strip skeleton, generate full [cite:L:...]
Pipeline->>Artemis: callback.send(phase-2: final answer)
deactivate Pipeline
Pipeline-->>Worker: complete
deactivate Worker
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@iris/src/iris/pipeline/lecture_search_answer_pipeline.py`:
- Around line 215-220: The citation generation is hardcoded to English; update
the call to self.citation_pipeline in the cited_answer assignment to pass the
detected/request language instead of the literal "en" (e.g.,
user_language=retrieval_dto.request_language or retrieval_dto.user_language or
the pipeline's detected language variable), and keep a safe fallback to "en" if
that field is missing; ensure you update the parameters for
self.citation_pipeline(citation args: retrieval_dto, clean_answer,
InformationType.PARAGRAPHS, variant="default", user_language=...) so the
citation payload language matches the student's/request language.
In `@iris/src/iris/retrieval/lecture/lecture_global_search_retrieval.py`:
- Around line 129-148: The current query to transcription_collection
(fetch_objects with limit=500) can drop matching transcriptions for large units;
change the implementation in lecture_global_search_retrieval so you either page
through all results from transcription_collection.query.fetch_objects (looping
with the collection's pagination/cursor until no more objects) or replace it
with an aggregate/min query that returns the minimum SEGMENT_START_TIME per
(LECTURE_UNIT_ID, PAGE_NUMBER) pair; ensure you still filter with
Filter.by_property(LectureTranscriptionSchema.LECTURE_UNIT_ID.value).contains_any(unit_ids)
and then populate the times dict for every requested (uid, page) key (the
variables/fields to touch: fetch_objects call,
LectureTranscriptionSchema.SEGMENT_START_TIME/LECTURE_UNIT_ID/PAGE_NUMBER, the
times dict and requested check).
In `@iris/src/iris/web/routers/search.py`:
- Around line 40-43: The code currently passes untrusted dto.artemis_base_url
and dto.authentication_token directly into SearchAnswerCallback (run_id,
base_url), creating an SSRF/unauthorized-post risk; change this to obtain the
callback URL and bearer token from trusted server-side sources instead: do not
use dto.artemis_base_url or dto.authentication_token directly in constructing
SearchAnswerCallback—derive base_url from your app config/allowlist or a
server-only mapping (e.g., get_callback_url_for_tenant(session.tenant) or an
explicit allowlist check) and retrieve the auth token from server
secrets/session state (e.g., get_service_auth_token(session.user) or
config-based token), or validate dto.artemis_base_url against a strict allowlist
and reject any other values before calling SearchAnswerCallback.
- Around line 69-73: The current handler spawns a raw Thread per request
(Thread(..., target=_run_search_ask_worker, args=(dto, request_id))) which can
exhaust resources; replace this with a bounded background worker model: create
or reuse a module- or app-scoped ThreadPoolExecutor (or an asyncio background
queue/worker) with a fixed max_workers, submit the job via
executor.submit(_run_search_ask_worker, dto, request_id) (or push dto/request_id
to a background queue consumed by worker coroutines), and ensure proper
shutdown/cleanup of the executor (or queue workers) on application stop; update
the request handler to enqueue/submit instead of Thread.start() so requests
won’t spawn unbounded OS threads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 69dfdd93-d6c6-46da-8dba-1ceaadd7b873
📒 Files selected for processing (7)
iris/src/iris/domain/search/lecture_search_dto.pyiris/src/iris/domain/status/search_answer_status_update_dto.pyiris/src/iris/pipeline/lecture_search_answer_pipeline.pyiris/src/iris/pipeline/prompts/lecture_search_prompts.pyiris/src/iris/retrieval/lecture/lecture_global_search_retrieval.pyiris/src/iris/web/routers/search.pyiris/src/iris/web/status/status_update.py
| if not used_sources: | ||
| return | ||
|
|
||
| # ── Phase 2: enrich with full citations ────────────────────────────────────── | ||
|
|
||
| # Step 6: Strip skeleton markers → clean prose for CitationPipeline input | ||
| clean_answer = _CITE_LOADING_PATTERN.sub("", answer_with_names).strip() | ||
|
|
||
| # Step 7: Adapt used_sources to LectureRetrievalDTO (CitationPipeline expects this) | ||
| retrieval_dto = _to_lecture_retrieval_dto(used_sources) | ||
|
|
||
| # Step 8: Run CitationPipeline — adds [cite:L:unit_id:page:::keyword:summary] markers | ||
| try: | ||
| cited_answer = self.citation_pipeline( | ||
| retrieval_dto, | ||
| clean_answer, | ||
| InformationType.PARAGRAPHS, | ||
| variant="default", | ||
| user_language="en", | ||
| ) | ||
| except Exception as e: | ||
| logger.error( | ||
| "CitationPipeline failed, skipping phase 2 callback", exc_info=e | ||
| ) | ||
| return |
There was a problem hiding this comment.
Every successful run still needs a terminal callback.
These branches return after sending only the interim cited=False payload. If used_sources is empty or CitationPipeline fails, Artemis never receives a final state, so the run can sit in a perpetual loading/timeout state with skeleton citations still rendered. Send a terminal payload in these cases as well—e.g. reuse the cleaned phase-1 answer with cited=True, or explicitly signal failure via the callback contract.
| cited_answer = self.citation_pipeline( | ||
| retrieval_dto, | ||
| clean_answer, | ||
| InformationType.PARAGRAPHS, | ||
| variant="default", | ||
| user_language="en", |
There was a problem hiding this comment.
Citation generation is hardcoded to English.
answer_system_prompt explicitly requires matching the student's language, but phase 2 always calls CitationPipeline(..., user_language="en"). For non-English queries, the citation keyword/summary payload will be generated in the wrong language even when phase 1 answered correctly. Pass the detected/request language through instead of forcing English.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@iris/src/iris/pipeline/lecture_search_answer_pipeline.py` around lines 215 -
220, The citation generation is hardcoded to English; update the call to
self.citation_pipeline in the cited_answer assignment to pass the
detected/request language instead of the literal "en" (e.g.,
user_language=retrieval_dto.request_language or retrieval_dto.user_language or
the pipeline's detected language variable), and keep a safe fallback to "en" if
that field is missing; ensure you update the parameters for
self.citation_pipeline(citation args: retrieval_dto, clean_answer,
InformationType.PARAGRAPHS, variant="default", user_language=...) so the
citation payload language matches the student's/request language.
| transcriptions = self.transcription_collection.query.fetch_objects( | ||
| filters=Filter.by_property( | ||
| LectureTranscriptionSchema.LECTURE_UNIT_ID.value | ||
| ).contains_any(unit_ids), | ||
| limit=500, | ||
| ).objects | ||
| times: dict[tuple[int, int], int] = {} | ||
| for t in transcriptions: | ||
| props = t.properties | ||
| uid = props.get(LectureTranscriptionSchema.LECTURE_UNIT_ID.value) | ||
| page = props.get(LectureTranscriptionSchema.PAGE_NUMBER.value) | ||
| start = props.get(LectureTranscriptionSchema.SEGMENT_START_TIME.value) | ||
| if uid is None or page is None or start is None: | ||
| continue | ||
| key = (uid, page) | ||
| if key not in requested: | ||
| continue | ||
| if key not in times or start < times[key]: | ||
| times[key] = int(start) | ||
| return times |
There was a problem hiding this comment.
limit=500 can silently drop deep-link timestamps.
This lookup fetches all transcriptions for the selected lecture units and then computes the earliest (unit_id, page_number) match in Python, but it stops after 500 objects. Large units can easily exceed that, so some requested pairs will never be seen and startTime will be missing/non-deterministic even though matching transcriptions exist. Please page through the full result set or query the minimum timestamp per requested pair instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@iris/src/iris/retrieval/lecture/lecture_global_search_retrieval.py` around
lines 129 - 148, The current query to transcription_collection (fetch_objects
with limit=500) can drop matching transcriptions for large units; change the
implementation in lecture_global_search_retrieval so you either page through all
results from transcription_collection.query.fetch_objects (looping with the
collection's pagination/cursor until no more objects) or replace it with an
aggregate/min query that returns the minimum SEGMENT_START_TIME per
(LECTURE_UNIT_ID, PAGE_NUMBER) pair; ensure you still filter with
Filter.by_property(LectureTranscriptionSchema.LECTURE_UNIT_ID.value).contains_any(unit_ids)
and then populate the times dict for every requested (uid, page) key (the
variables/fields to touch: fetch_objects call,
LectureTranscriptionSchema.SEGMENT_START_TIME/LECTURE_UNIT_ID/PAGE_NUMBER, the
times dict and requested check).
| callback = SearchAnswerCallback( | ||
| run_id=dto.authentication_token, | ||
| base_url=dto.artemis_base_url, | ||
| ) |
There was a problem hiding this comment.
Do not let the request body choose the callback URL and bearer token.
dto.artemis_base_url and dto.authentication_token are used verbatim for an outbound POST. That gives any caller who can hit this endpoint a server-side request forgery primitive and lets them make Iris send search results to an arbitrary URL under an attacker-chosen Authorization header. Derive the callback target from trusted server-side config/session state, or at minimum enforce a strict allowlist before constructing SearchAnswerCallback.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@iris/src/iris/web/routers/search.py` around lines 40 - 43, The code currently
passes untrusted dto.artemis_base_url and dto.authentication_token directly into
SearchAnswerCallback (run_id, base_url), creating an SSRF/unauthorized-post
risk; change this to obtain the callback URL and bearer token from trusted
server-side sources instead: do not use dto.artemis_base_url or
dto.authentication_token directly in constructing SearchAnswerCallback—derive
base_url from your app config/allowlist or a server-only mapping (e.g.,
get_callback_url_for_tenant(session.tenant) or an explicit allowlist check) and
retrieve the auth token from server secrets/session state (e.g.,
get_service_auth_token(session.user) or config-based token), or validate
dto.artemis_base_url against a strict allowlist and reject any other values
before calling SearchAnswerCallback.
| thread = Thread( | ||
| target=_run_search_ask_worker, | ||
| args=(dto, request_id), | ||
| ) | ||
| thread.start() |
There was a problem hiding this comment.
A raw thread per request will fall over under load.
Each /ask request now creates an unbounded OS thread for work that can run for seconds and may block on 200s callback timeouts. A burst of requests can exhaust the app workers long before the LLM/retrieval stack is saturated. Please move this onto a bounded executor/queue/background worker instead of spawning threads directly from the request handler.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@iris/src/iris/web/routers/search.py` around lines 69 - 73, The current
handler spawns a raw Thread per request (Thread(...,
target=_run_search_ask_worker, args=(dto, request_id))) which can exhaust
resources; replace this with a bounded background worker model: create or reuse
a module- or app-scoped ThreadPoolExecutor (or an asyncio background
queue/worker) with a fixed max_workers, submit the job via
executor.submit(_run_search_ask_worker, dto, request_id) (or push dto/request_id
to a background queue consumed by worker coroutines), and ensure proper
shutdown/cleanup of the executor (or queue workers) on application stop; update
the request handler to enqueue/submit instead of Thread.start() so requests
won’t spawn unbounded OS threads.
Motivation
Ask Iris returned answers with sources but no inline citations and no deep linking. Source cards opened the lecture page without navigating to the relevant slide or video moment.
Summary
The
/api/v1/search/askendpoint now returns 202 and pushes two callbacks to Artemis. The first arrives fast with a plain answer and skeleton citation markers so the UI can show loading bubbles immediately. The second replaces them with real inline citation tags once the citation pipeline finishes.Search results now include
startTime— the earliest video timestamp for that slide, fetched from the transcript segments. Artemis can use this alongside the page number to deep link into both the PDF and the video. Slide-only units getstartTime: null. This applies to both the search and ask endpoints.Test plan
Artemis counterpart: ls1intum/Artemis#12522
Closes IRIS-289