Conversation
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@a4449ec5b350b6eb31b91a877a2e14537f1a4cfdRecommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@a4449ec5b350b6eb31b91a877a2e14537f1a4cfdLast updated for commit: |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 WalkthroughAdds a new synthetic-dataset tutorial covering text, image, and audio generation (video redirected), updates the vision tutorial (year, readiness probe loop, HTML-commented profiling/setup sections), and exports FFmpeg paths in the Dockerfile env-builder stage. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/tutorials/synthetic-dataset.md`:
- Around line 93-106: The image example code block for the aiperf profile
invocation is missing the CI test markers used elsewhere; update the
synthetic-dataset.md image section by adding the appropriate <!-- aiperf-run-...
--> marker pair around the image bash block to match the text and audio examples
(or, if images are tested differently, add a short comment immediately above the
block explaining the alternate testing mechanism). Locate the image block (the
bash snippet invoking aiperf with --model Qwen/Qwen2-VL-2B-Instruct and image-*
flags) and either wrap it with the same marker format used for the other
sections or add the explanatory comment so the documentation and CI behavior are
consistent.
- Around line 40-51: The two aiperf code-block tags are identical; update the
prefix-synthesis block so its HTML comment markers use a distinct tag: change
both the opening comment <!-- aiperf-run-vllm-default-openai-endpoint-server -->
and the closing <!-- /aiperf-run-vllm-default-openai-endpoint-server --> that
surround the prefix example to <!--
aiperf-run-vllm-prefix-openai-endpoint-server --> and <!--
/aiperf-run-vllm-prefix-openai-endpoint-server --> respectively so the opening
and closing markers match the new tag name.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
ajcasagrande
left a comment
There was a problem hiding this comment.
The content looks great, but i think potentially some gaps
docs/tutorials/synthetic-dataset.md
Outdated
| --- | ||
|
|
||
| ## Random Pool Dataset Loader | ||
|
|
||
| The `random_pool` dataset type allows custom data with random sampling: | ||
|
|
||
| ### Use Cases | ||
|
|
||
| 1. **Single File**: All entries form one pool, randomly sampled for each request | ||
| 2. **Directory**: Each file becomes a separate pool, merged into conversations | ||
|
|
||
| ### Single File Example | ||
|
|
||
| Create `prompts.jsonl`: | ||
| ```jsonl | ||
| {"text": "Who are you?", "image": "/path/to/image1.png"} | ||
| {"text": "Explain quantum mechanics.", "image": "/path/to/image2.png"} | ||
| {"text": "What is the capital of France?", "image": "/path/to/image3.png"} | ||
| ``` | ||
|
|
||
| Run benchmark: | ||
| ```bash | ||
| aiperf profile \ | ||
| --model your-model \ | ||
| --url localhost:8000 \ | ||
| --endpoint-type chat \ | ||
| --input-file prompts.jsonl \ | ||
| --custom-dataset-type random_pool \ | ||
| --request-count 100 | ||
| ``` | ||
|
|
||
| AIPerf randomly samples from the 3 entries to generate 100 requests (with replacement). | ||
|
|
||
| --- |
There was a problem hiding this comment.
this seems out of place, as random pool is not synthetic
| **Options:** | ||
| - `--synthetic-input-tokens-mean`: Mean token count (default: 550) | ||
| - `--synthetic-input-tokens-stddev`: Standard deviation for length variability (default: 0) | ||
| - `--random-seed`: Seed for reproducible prompt generation (default: None) |
There was a problem hiding this comment.
missing info about the sequence length distribution, which is a big part of text gen
| SPDX-License-Identifier: Apache-2.0 | ||
| --> | ||
|
|
||
| # Synthetic Dataset Generation |
There was a problem hiding this comment.
Should we include details about text generation from hash_ids in trace datasets? Those generate text using shakespeare corpus as well.
There was a problem hiding this comment.
I think the right place to add that is the trace_replay.md instead of here, since we have a dedicate document for it.
Let me know your thoughts on it.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/tutorials/vision.md (1)
28-32: Consider using/v1/modelsfor the readiness probe instead of/v1/chat/completions.The current probe fires an actual inference POST (
/v1/chat/completions) on every 2-second poll during the 15-minute startup window, which wastes GPU resources./healthonly confirms the server process is running, not that the model is loaded. The idiomatic lightweight readiness check for vLLM that verifies the model is loaded is a GET against/v1/models.✏️ Suggested lighter readiness probe
-<!-- health-check-vllm-vision-openai-endpoint-server --> -```bash -timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"Qwen/Qwen2-VL-2B-Instruct\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}],\"max_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "vLLM not ready after 15min"; exit 1; } -``` -<!-- /health-check-vllm-vision-openai-endpoint-server --> +<!-- health-check-vllm-vision-openai-endpoint-server --> +```bash +timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/models)" != "200" ]; do sleep 2; done' || { echo "vLLM not ready after 15min"; exit 1; } +``` +<!-- /health-check-vllm-vision-openai-endpoint-server -->🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/tutorials/vision.md` around lines 28 - 32, The readiness probe currently posts to /v1/chat/completions (causing expensive inferences); change it to perform a lightweight GET against /v1/models instead. In the docs/tutorials/vision.md snippet that defines the health-check block (<!-- health-check-vllm-vision-openai-endpoint-server -->) replace the loop curl target from POSTing to /v1/chat/completions to a simple GET to /v1/models so the loop checks status code 200 without triggering model inference.docs/tutorials/synthetic-dataset.md (1)
27-27: Prefer####subsection headings under##per repo convention.All
### How It Works/### Configurationheadings throughout this file (lines 27, 38, 61, 84, 94, 127, 143) use h3 under h2. Based on learnings, the maintainer prefers h4 (####) for subsections under h2 headings for better visual sizing.✏️ Example fix (apply to all ### subsections)
-### How It Works +#### How It Works -### Configuration +#### Configuration -### Advanced: Prefix Synthesis +#### Advanced: Prefix SynthesisBased on learnings: "In the aiperf repository's docs/metrics_reference.md file, the maintainer prefers using h4 headings (####) for subsections under h2 headings instead of h3 (###) for better visual sizing and readability."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/tutorials/synthetic-dataset.md` at line 27, Update all subsection headings that currently use "###" under top-level "##" sections to use "####" instead for consistent visual sizing; specifically replace headings like "### How It Works" and "### Configuration" (and the other ### subsections at the listed locations) with "#### How It Works" and "#### Configuration" throughout the document so every h3 under an h2 becomes an h4, ensuring consistent styling across the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/tutorials/synthetic-dataset.md`:
- Around line 184-201: The CI marker currently bundles a privileged system
install inside the AIPerf execution block; move the `sudo apt-get install -y
ffmpeg` line out of the `<!-- aiperf-run-vllm-vision-openai-endpoint-server -->`
block and place it in a new or existing setup marker (e.g., `<!--
setup-vllm-vision-video-openai-endpoint-server -->`) before the aiperf block so
CI runs the setup step separately from the benchmark command, keeping the aiperf
block strictly for the `aiperf profile ...` command.
---
Nitpick comments:
In `@docs/tutorials/synthetic-dataset.md`:
- Line 27: Update all subsection headings that currently use "###" under
top-level "##" sections to use "####" instead for consistent visual sizing;
specifically replace headings like "### How It Works" and "### Configuration"
(and the other ### subsections at the listed locations) with "#### How It Works"
and "#### Configuration" throughout the document so every h3 under an h2 becomes
an h4, ensuring consistent styling across the file.
In `@docs/tutorials/vision.md`:
- Around line 28-32: The readiness probe currently posts to /v1/chat/completions
(causing expensive inferences); change it to perform a lightweight GET against
/v1/models instead. In the docs/tutorials/vision.md snippet that defines the
health-check block (<!-- health-check-vllm-vision-openai-endpoint-server -->)
replace the loop curl target from POSTing to /v1/chat/completions to a simple
GET to /v1/models so the loop checks status code 200 without triggering model
inference.
a0751b3 to
5e6fd89
Compare
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/tutorials/synthetic-dataset.md`:
- Around line 184-201: The apt-get installation command is currently inside the
aiperf-run marker block; move that system-level prerequisite into its own setup
marker. Create a separate fenced bash block wrapped with <!--
setup-vllm-vision-video-openai-endpoint-server --> ... <!--
/setup-vllm-vision-video-openai-endpoint-server --> containing "apt-get update
&& apt-get install -y ffmpeg" and remove that line from the <!--
aiperf-run-vllm-vision-openai-endpoint-server --> block so the aiperf profile
invocation (the aiperf profile command) remains only inside the aiperf-run
marker.
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
…ile for CI Signed-off-by: lkomali <lkomali@nvidia.com>
d13e4d0 to
4639e88
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/tutorials/vision.md (1)
86-111:⚠️ Potential issue | 🟡 MinorDuplicate HTML comment marker — copy-paste error
The marker
aiperf-run-vllm-vision-openai-endpoint-serveris used three times in the file: lines 40–53 (synthetic image profiling), lines 86–96 (custom JSONL creation), and lines 100–111 (run with custom input). Since each section serves a different purpose, the markers should be uniquely named to avoid confusion and prevent issues if these markers are ever consumed by CI/automation pipelines.✏️ Proposed fix
-<!-- aiperf-run-vllm-vision-openai-endpoint-server --> +<!-- create-custom-input-file-vllm-vision-openai-endpoint-server --> ```bash cat <<EOF > inputs.jsonl ... EOF-
+Run AIPerf using the custom input file:
-
+aiperf profile \ ...-
+</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/tutorials/vision.mdaround lines 86 - 111, The three HTML comment
markers named aiperf-run-vllm-vision-openai-endpoint-server are duplicated;
rename the closing marker after the custom JSONL block and the open/close
markers around the "Run AIPerf using the custom input file" block to unique
names so each section is distinct (e.g., change the closing marker after the
inputs.jsonl here and change the surrounding open/close markers for the run
block), updating the matching start/end comment pairs so the markers are unique
and paired correctly (refer to the markers
aiperf-run-vllm-vision-openai-endpoint-server, the closing comment after the
inputs.jsonl block, and the open/close comments around the aiperf profile run).</details> </blockquote></details> </blockquote></details>🧹 Nitpick comments (1)
docs/tutorials/vision.md (1)
28-32: Consider polling/v1/modelsinstead of/v1/chat/completionsfor the readiness checkvLLM's
/healthendpoint only indicates that the server process is running, not that the model is loaded and ready to serve. The current approach polls/v1/chat/completionswithmax_tokens:1, which does confirm the model is fully loaded, but it also triggers a real inference request on every 2-second poll iteration for up to 15 minutes — wasteful for a large VLM startup./v1/modelsis the standard model-aware readiness endpoint recommended for startup probes, returning 200 only once the model is loaded.✏️ Suggested alternative
-timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"Qwen/Qwen2-VL-2B-Instruct\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}],\"max_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "vLLM not ready after 15min"; exit 1; } +timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/models)" != "200" ]; do sleep 2; done' || { echo "vLLM not ready after 15min"; exit 1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/tutorials/vision.md` around lines 28 - 32, The readiness check currently polls the heavy inference endpoint '/v1/chat/completions' (see the curl command in the diff), causing repeated real inferences; change the probe to query the model-aware endpoint '/v1/models' instead so it returns 200 only when the model is loaded; update the bash timeout loop to curl localhost:8000/v1/models (no request body) and retain the same timeout/sleep behavior and error handling message so startup waits for model readiness without triggering inference.🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed. Inline comments: In `@Dockerfile`: - Around line 119-121: The Dockerfile's ENV line sets LD_LIBRARY_PATH with a trailing colon by expanding a possibly-empty ${LD_LIBRARY_PATH}, which appends the CWD to the linker search path; update the ENV declaration that sets PATH and LD_LIBRARY_PATH so LD_LIBRARY_PATH is assigned directly to "/opt/ffmpeg/lib" (no ${LD_LIBRARY_PATH} expansion or trailing colon) and apply the same change in the runtime stage where LD_LIBRARY_PATH is set (the ENV entries for PATH and LD_LIBRARY_PATH in the build and runtime stages) to avoid unintentionally adding the current working directory to the dynamic loader search path. --- Outside diff comments: In `@docs/tutorials/vision.md`: - Around line 86-111: The three HTML comment markers named aiperf-run-vllm-vision-openai-endpoint-server are duplicated; rename the closing marker after the custom JSONL block and the open/close markers around the "Run AIPerf using the custom input file" block to unique names so each section is distinct (e.g., change the closing marker after the inputs.jsonl here and change the surrounding open/close markers for the run block), updating the matching start/end comment pairs so the markers are unique and paired correctly (refer to the markers aiperf-run-vllm-vision-openai-endpoint-server, the closing comment after the inputs.jsonl block, and the open/close comments around the aiperf profile run). --- Nitpick comments: In `@docs/tutorials/vision.md`: - Around line 28-32: The readiness check currently polls the heavy inference endpoint '/v1/chat/completions' (see the curl command in the diff), causing repeated real inferences; change the probe to query the model-aware endpoint '/v1/models' instead so it returns 200 only when the model is loaded; update the bash timeout loop to curl localhost:8000/v1/models (no request body) and retain the same timeout/sleep behavior and error handling message so startup waits for model readiness without triggering inference.
Signed-off-by: lkomali <lkomali@nvidia.com>
Summary by CodeRabbit