Skip to content

Commit 5033446

Browse files
committed
Add reasoning_effort to Nemotron Super params, update stale docs
- Add extra_body.reasoning_effort=medium to NEMOTRON_3_SUPER_120B_A12B_INFERENCE_PARAMS (mirrors GPT-5 config) - Update README telemetry example and model-configs.md to use nvidia/nemotron-3-super-120b-a12b instead of openai/gpt-oss-20b - Broaden inference-parameters.md reasoning effort tip to cover Nemotron Super
1 parent b7e3c81 commit 5033446

7 files changed

Lines changed: 218 additions & 12 deletions

File tree

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -156,17 +156,17 @@ Specifically, a model name that is defined a `ModelConfig` object, is what will
156156
```python
157157
ModelConfig(
158158
alias="nv-reasoning",
159-
model="openai/gpt-oss-20b",
159+
model="nvidia/nemotron-3-super-120b-a12b",
160160
provider="nvidia",
161161
inference_parameters=ChatCompletionInferenceParams(
162-
temperature=0.3,
163-
top_p=0.9,
162+
temperature=1.0,
163+
top_p=0.95,
164164
max_tokens=4096,
165165
),
166166
)
167167
```
168168

169-
The value `openai/gpt-oss-20b` would be collected.
169+
The value `nvidia/nemotron-3-super-120b-a12b` would be collected.
170170

171171
To disable telemetry capture, set `NEMO_TELEMETRY_ENABLED=false`.
172172

docs/concepts/models/default-model-settings.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ The following model configurations are automatically available when `NVIDIA_API_
4444
| Alias | Model | Use Case | Inference Parameters |
4545
|-------|-------|----------|---------------------|
4646
| `nvidia-text` | `nvidia/nemotron-3-nano-30b-a3b` | General text generation | `temperature=1.0, top_p=1.0` |
47-
| `nvidia-reasoning` | `nvidia/nemotron-3-super-120b-a12b` | Reasoning and analysis tasks | `temperature=1.0, top_p=0.95` |
47+
| `nvidia-reasoning` | `nvidia/nemotron-3-super-120b-a12b` | Reasoning and analysis tasks | `temperature=1.0, top_p=0.95, extra_body={"reasoning_effort": "medium"}` |
4848
| `nvidia-vision` | `nvidia/nemotron-nano-12b-v2-vl` | Vision and image understanding | `temperature=0.85, top_p=0.95` |
4949
| `nvidia-embedding` | `nvidia/llama-3.2-nv-embedqa-1b-v2` | Text embeddings | `encoding_format="float", extra_body={"input_type": "query"}` |
5050

docs/concepts/models/inference-parameters.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ The `ChatCompletionInferenceParams` class controls how models generate text comp
2424
!!! note "Default Values"
2525
If `temperature`, `top_p`, or `max_tokens` are not provided, the model provider's default values will be used. Different providers and models may have different defaults.
2626

27-
!!! tip "Controlling Reasoning Effort for GPT-OSS Models"
28-
For gpt-oss models like `gpt-oss-20b` and `gpt-oss-120b`, you can control the reasoning effort using the `extra_body` parameter:
27+
!!! tip "Controlling Reasoning Effort for Reasoning Models"
28+
For reasoning models like Nemotron 3 Super (`nvidia/nemotron-3-super-120b-a12b`) and GPT-OSS (`gpt-oss-20b`, `gpt-oss-120b`), you can control the reasoning effort using the `extra_body` parameter:
2929

3030
```python
3131
import data_designer.config as dd

docs/concepts/models/model-configs.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,11 +70,11 @@ model_configs = [
7070
# Reasoning and structured tasks
7171
dd.ModelConfig(
7272
alias="reasoning-model",
73-
model="openai/gpt-oss-20b",
73+
model="nvidia/nemotron-3-super-120b-a12b",
7474
provider="nvidia",
7575
inference_parameters=dd.ChatCompletionInferenceParams(
76-
temperature=0.3,
77-
top_p=0.9,
76+
temperature=1.0,
77+
top_p=0.95,
7878
max_tokens=4096,
7979
),
8080
),

packages/data-designer-config/src/data_designer/config/utils/constants.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,11 @@ class NordColor(Enum):
336336
DEFAULT_VISION_INFERENCE_PARAMS = {"temperature": 0.85, "top_p": 0.95}
337337
DEFAULT_EMBEDDING_INFERENCE_PARAMS = {"encoding_format": "float"}
338338
NEMOTRON_3_NANO_30B_A3B_INFERENCE_PARAMS = {"temperature": 1.0, "top_p": 1.0}
339-
NEMOTRON_3_SUPER_120B_A12B_INFERENCE_PARAMS = {"temperature": 1.0, "top_p": 0.95}
339+
NEMOTRON_3_SUPER_120B_A12B_INFERENCE_PARAMS = {
340+
"temperature": 1.0,
341+
"top_p": 0.95,
342+
"extra_body": {"reasoning_effort": "medium"},
343+
}
340344
GPT5_INFERENCE_PARAMS = {"extra_body": {"reasoning_effort": "medium"}}
341345

342346
PREDEFINED_PROVIDERS_MODEL_MAP = {

packages/data-designer-config/tests/config/test_default_model_settings.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,11 @@ def test_get_default_inference_parameters():
3030
top_p=0.95,
3131
)
3232
assert get_default_inference_parameters(
33-
"reasoning", {"temperature": 1.0, "top_p": 0.95}
33+
"reasoning", {"temperature": 1.0, "top_p": 0.95, "extra_body": {"reasoning_effort": "medium"}}
3434
) == ChatCompletionInferenceParams(
3535
temperature=1.0,
3636
top_p=0.95,
37+
extra_body={"reasoning_effort": "medium"},
3738
)
3839
assert get_default_inference_parameters(
3940
"vision", {"temperature": 0.85, "top_p": 0.95}

packages/data-designer/README.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
# 🎨 NeMo Data Designer
2+
3+
[![CI](https://github.com/NVIDIA-NeMo/DataDesigner/actions/workflows/ci.yml/badge.svg)](https://github.com/NVIDIA-NeMo/DataDesigner/actions/workflows/ci.yml)
4+
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
5+
[![Python 3.10 - 3.13](https://img.shields.io/badge/🐍_Python-3.10_|_3.11_|_3.12_|_3.13-blue.svg)](https://www.python.org/downloads/) [![NeMo Microservices](https://img.shields.io/badge/NeMo-Microservices-76b900)](https://docs.nvidia.com/nemo/microservices/latest/index.html) [![Code](https://img.shields.io/badge/Code-Documentation-8A2BE2.svg)](https://nvidia-nemo.github.io/DataDesigner/) ![Tokens](https://img.shields.io/badge/250+_Billion-Tokens_Generated-76b900.svg?logo=nvidia&logoColor=white)
6+
7+
**Generate high-quality synthetic datasets from scratch or using your own seed data.**
8+
9+
---
10+
11+
## Welcome!
12+
13+
Data Designer helps you create synthetic datasets that go beyond simple LLM prompting. Whether you need diverse statistical distributions, meaningful correlations between fields, or validated high-quality outputs, Data Designer provides a flexible framework for building production-grade synthetic data.
14+
15+
## What can you do with Data Designer?
16+
17+
- **Generate diverse data** using statistical samplers, LLMs, or existing seed datasets
18+
- **Control relationships** between fields with dependency-aware generation
19+
- **Validate quality** with built-in Python, SQL, and custom local and remote validators
20+
- **Score outputs** using LLM-as-a-judge for quality assessment
21+
- **Iterate quickly** with preview mode before full-scale generation
22+
23+
---
24+
25+
### ⚠️ Security Notice: LiteLLM Supply-Chain Incident (2026-03-24)
26+
27+
On March 24, 2026, malicious versions of `litellm` ([1.82.7 and 1.82.8](https://github.com/BerriAI/litellm/issues/24518)) were published to PyPI containing a credential stealer. The compromised packages were available for [approximately five hours](https://www.okta.com/blog/threat-intelligence/litellm-supply-chain-attack--an-explainer-for-identity-pros/) (10:39 – 16:00 UTC) before being removed.
28+
29+
The only Data Designer releases that could resolve to these versions are **v0.2.2** (Dec 2025) and **v0.2.3** (Jan 2026), which carried a looser `litellm<2` upper bound. These are nearly three months old and have been superseded by eight subsequent releases — both have been yanked from PyPI as a precaution. All other releases (v0.3.0 – v0.5.3) pinned `litellm` to `>=1.73.6,<1.80.12` and were never compatible with 1.82.x. Starting with v0.5.4, `litellm` is no longer a dependency.
30+
31+
To have been impacted through Data Designer, you would need to have had one of these two old versions explicitly pinned *and* run a fresh `pip install` or dependency-cache update that resolved `litellm` during the five-hour window on March 24. If you believe you may be affected, see [BerriAI's incident report](https://github.com/BerriAI/litellm/issues/24518) for remediation steps.
32+
33+
---
34+
35+
## Quick Start
36+
37+
### 1. Install
38+
39+
```bash
40+
pip install data-designer
41+
```
42+
43+
Or install from source:
44+
45+
```bash
46+
git clone https://github.com/NVIDIA-NeMo/DataDesigner.git
47+
cd DataDesigner
48+
make install
49+
```
50+
51+
### 2. Set your API key
52+
53+
Start with one of our default model providers:
54+
55+
- [NVIDIA Build API](https://build.nvidia.com)
56+
- [OpenAI](https://platform.openai.com/api-keys)
57+
- [OpenRouter](https://openrouter.ai)
58+
59+
Grab your API key(s) using the above links and set one or more of the following environment variables:
60+
```bash
61+
export NVIDIA_API_KEY="your-api-key-here"
62+
63+
export OPENAI_API_KEY="your-openai-api-key-here"
64+
65+
export OPENROUTER_API_KEY="your-openrouter-api-key-here"
66+
```
67+
68+
### 3. Start generating data!
69+
```python
70+
import data_designer.config as dd
71+
from data_designer.interface import DataDesigner
72+
73+
# Initialize with default settings
74+
data_designer = DataDesigner()
75+
config_builder = dd.DataDesignerConfigBuilder()
76+
77+
# Add a product category
78+
config_builder.add_column(
79+
dd.SamplerColumnConfig(
80+
name="product_category",
81+
sampler_type=dd.SamplerType.CATEGORY,
82+
params=dd.CategorySamplerParams(
83+
values=["Electronics", "Clothing", "Home & Kitchen", "Books"],
84+
),
85+
)
86+
)
87+
88+
# Generate personalized customer reviews
89+
config_builder.add_column(
90+
dd.LLMTextColumnConfig(
91+
name="review",
92+
model_alias="nvidia-text",
93+
prompt="Write a brief product review for a {{ product_category }} item you recently purchased.",
94+
)
95+
)
96+
97+
# Preview your dataset
98+
preview = data_designer.preview(config_builder=config_builder)
99+
preview.display_sample_record()
100+
```
101+
102+
---
103+
104+
## What's next?
105+
106+
### 📚 Learn more
107+
108+
- **[Getting Started](https://nvidia-nemo.github.io/DataDesigner/latest/)** – Install, configure, and generate your first dataset
109+
- **[Tutorial Notebooks](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/)** – Step-by-step interactive tutorials
110+
- **[Column Types](https://nvidia-nemo.github.io/DataDesigner/latest/concepts/columns/)** – Explore samplers, LLM columns, validators, and more
111+
- **[Validators](https://nvidia-nemo.github.io/DataDesigner/latest/concepts/validators/)** – Learn how to validate generated data with Python, SQL, and remote validators
112+
- **[Model Configuration](https://nvidia-nemo.github.io/DataDesigner/latest/concepts/models/model-configs/)** – Configure custom models and providers
113+
- **[Person Sampling](https://nvidia-nemo.github.io/DataDesigner/latest/concepts/person_sampling/)** – Learn how to sample realistic person data with demographic attributes
114+
115+
### 🔧 Configure models via CLI
116+
117+
```bash
118+
data-designer config providers # Configure model providers
119+
data-designer config models # Set up your model configurations
120+
data-designer config list # View current settings
121+
```
122+
123+
### 🤖 Agent Skill
124+
125+
Data Designer has a [skill](https://nvidia-nemo.github.io/DataDesigner/latest/devnotes/data-designer-got-skills/) for coding agents. Just describe the dataset you want, and your agent handles schema design, validation, and generation. While the skill should work with other coding agents that support skills, our development and testing has focused on [Claude Code](https://code.claude.com) at this stage.
126+
127+
**Install via [skills.sh](https://skills.sh)** (be sure to select Claude Code as an additional agent):
128+
129+
```bash
130+
npx skills add NVIDIA-NeMo/DataDesigner
131+
```
132+
133+
After installation, type `/data-designer` or describe the dataset you want and the skill will kick in.
134+
135+
### 🤝 Get involved
136+
137+
This repository supports agent-assisted development — see [CONTRIBUTING.md](CONTRIBUTING.md) for the recommended workflow.
138+
139+
- **[Contributing Guide](CONTRIBUTING.md)** – How to contribute, including agent-assisted workflows
140+
- **[GitHub Issues](https://github.com/NVIDIA-NeMo/DataDesigner/issues)** – Report bugs or make a feature request
141+
142+
---
143+
144+
## Telemetry
145+
146+
Data Designer collects telemetry to help us improve the library for developers. We collect:
147+
148+
* The names of models used
149+
* The count of input tokens
150+
* The count of output tokens
151+
152+
**No user or device information is collected.** This data is not used to track any individual user behavior. It is used to see an aggregation of which models are the most popular for SDG. We will share this usage data with the community.
153+
154+
Specifically, a model name that is defined a `ModelConfig` object, is what will be collected. In the below example config:
155+
156+
```python
157+
ModelConfig(
158+
alias="nv-reasoning",
159+
model="nvidia/nemotron-3-super-120b-a12b",
160+
provider="nvidia",
161+
inference_parameters=ChatCompletionInferenceParams(
162+
temperature=1.0,
163+
top_p=0.95,
164+
max_tokens=4096,
165+
),
166+
)
167+
```
168+
169+
The value `nvidia/nemotron-3-super-120b-a12b` would be collected.
170+
171+
To disable telemetry capture, set `NEMO_TELEMETRY_ENABLED=false`.
172+
173+
### Top Models
174+
175+
This chart represents the breakdown of models used for Data Designer across all synthetic data generation jobs from 2/23/2026 to 3/23/2026.
176+
177+
![Top models used for synthetic data generation](docs/images/top-models.png)
178+
179+
_Last updated on 3/23/2026_
180+
181+
---
182+
183+
## License
184+
185+
Apache License 2.0 – see [LICENSE](LICENSE) for details.
186+
187+
---
188+
189+
## Citation
190+
191+
If you use NeMo Data Designer in your research, please cite it using the following BibTeX entry:
192+
193+
```bibtex
194+
@misc{nemo-data-designer,
195+
author = {The NeMo Data Designer Team, NVIDIA},
196+
title = {NeMo Data Designer: A framework for generating synthetic data from scratch or based on your own seed data},
197+
howpublished = {\url{https://github.com/NVIDIA-NeMo/DataDesigner}},
198+
year = {2025},
199+
note = {GitHub Repository},
200+
}
201+
```

0 commit comments

Comments
 (0)