[None][feat] serve-config-guide skill for basic aggregate single-node serving configs by venkywonka · Pull Request #12054 · NVIDIA/TensorRT-LLM

venkywonka · 2026-03-09T23:01:26Z

Summary

Claude Code skill (2 files, ~127 lines) that generates source-backed starting trtllm-serve --config YAMLs for basic aggregate (IFB, single-node, PyTorch backend) serving. Iteratively refined through 4 eval iterations against golden holdout configs from IBDB.

What it does:

Database-first lookup: searches examples/configs/database/lookup.yaml then examples/configs/curated/lookup.yaml before interpolating
Preserves user's stated performance objective (Min Latency / Balanced / Max Throughput) through config selection
Excludes speculative, disaggregated, and multi-node configs (DeepSeek-R1 MTP is the one exception — copied verbatim from checked-in source)
For interpolated configs: cites starting-point source + single "What to benchmark" knob list

Key design choices validated by eval:

max_batch_size: scheduler ceiling, NOT proportional to concurrency — copy from nearest source config
cuda_graph_config.max_batch_size: default to max_batch_size (not concurrency-proportional — eval showed the old heuristic was wrong in 3/3 golden configs)
KV cache estimation formulas for GQA and MLA (covers MHA as special case where num_kv_heads == num_attention_heads)
Chunked prefill MLA hardware constraint (SM90+ only)
Excluded sources guardrail: docs/source/legacy/ values don't transfer to PyTorch backend

Files

File	Lines	Role
`SKILL.md`	74	Scope, constraints, 5-step decision flow (lock objective → exact match → nearest config → read docs → adjust fields), validation checklist
`references/knob-heuristics.md`	53	Tuning table for 7 commonly adjusted fields, KV cache estimation formulas, chunked prefill MLA guidance

references/architecture.md was deleted — its model-to-source mapping table was derivable by the agent via ls/grep; the 2 non-obvious pieces (legacy exclusion, DeepSeek-V3 caveat) were inlined into Step 3.

Test plan

Exact match: DeepSeek-R1-0528 on 8x B200, 1k1k, conc=1, min latency → copies from database
Interpolation: Llama-3.3-70B on 4x H100, 4k/2k, conc=128, max throughput → cites curated source + What to benchmark
Hallucination guard: Llama-4-Ultra-1T (nonexistent) → refuses to fabricate paths
Golden holdout: 3 IBDB configs not in checked-in database → 90% field accuracy
No hardcoded numeric thresholds in heuristics (all guidance says "copy from source" or "benchmark")

coderabbitai · 2026-03-09T23:06:46Z

📝 Walkthrough

Walkthrough

This pull request adds a comprehensive skill guide for configuring basic aggregate single-node serving with PyTorch backend. It includes step-by-step instructions, architecture classification logic, tuning heuristics for critical configuration knobs, and template configurations organized across a main guide file and three reference documents.

Changes

Cohort / File(s)	Summary
Skill Guide and References `.claude/skills/basic-agg-singlenode-config-guide/SKILL.md`, `.claude/skills/basic-agg-singlenode-config-guide/references/architecture.md`, `.claude/skills/basic-agg-singlenode-config-guide/references/knob-heuristics.md`, `.claude/skills/basic-agg-singlenode-config-guide/references/templates.md`	Adds comprehensive documentation for manual tuning workflow of basic aggregate single-node configurations, including model classification rules, decision tables for four critical knobs (moe_config.backend, enable_attention_dp, kv_cache_config.free_gpu_memory_fraction, max_num_tokens), YAML config templates for MoE/Dense architectures, and known model mappings with failure mode guidance.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The PR title clearly identifies the main change: adding a configuration guide skill for basic aggregate single-node serving, accurately reflecting the added documentation files.
Description check	✅ Passed	The PR description is comprehensive, explaining the skill's purpose, design choices, test validation, and key files with clear structure and rationale.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

.claude/skills/basic-agg-singlenode-config-guide/references/knob-heuristics.md (1)
44-56: Clarify relationship between formula-based and fixed-value guidance.

Lines 50-52 recommend "ISL to 2x ISL" as a sweet spot, but line 54 mentions "some models use fixed values (e.g., 1152 for DeepSeek-R1)." This creates ambiguity: when should users apply the formula versus using a fixed value? Consider adding explicit precedence: "Check database configs first for model-specific fixed values; if not found, use the ISL-based formula."
📋 Proposed clarification
 **Rules**:
+- **Check database configs first**: some models use validated fixed values (e.g., 1152 for DeepSeek-R1).
+- If no database config exists, use the following formula:
 - Must be >= ISL + 64 (chat template overhead) when chunked prefill is disabled, or requests get HTTP 400.
 - Sweet spot: ISL to 2x ISL.
 - Low concurrency (<=32): `max(ISL, 2048)`.
 - High concurrency (>=64): `max(ISL * 2, 4096)`.
 - Never > ISL * 4 — wastes activation buffer, counterintuitively shrinks KV cache.
-- Check database configs first: some models use fixed values (e.g., 1152 for DeepSeek-R1).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
@.claude/skills/basic-agg-singlenode-config-guide/references/knob-heuristics.md
around lines 44 - 56, The guidance for max_num_tokens is ambiguous about when to
use model-specific fixed values versus the ISL-based formula; update the text
for max_num_tokens to explicitly state precedence: first check database configs
or model-specific fixed values (e.g., DeepSeek-R1 = 1152) and use those if
present, otherwise compute the scheduler token budget using the ISL-based
heuristics (sweet spot ISL–2x ISL, low/high concurrency rules, never > ISL*4),
and note that fixed values override the formula.
.claude/skills/basic-agg-singlenode-config-guide/SKILL.md (1)
23-29: Consider using approximate counts to reduce maintenance.

Lines 27-28 cite specific config counts ("169 benchmark-optimal configs", "10 production configs") that may become stale as the repository evolves. Consider rephrasing to "100+ benchmark configs" and "~10 production configs" for resilience.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/basic-agg-singlenode-config-guide/SKILL.md around lines 23 -
29, The document names exact counts ("169 benchmark-optimal configs", "10
production configs") which will become stale; update the wording in SKILL.md
near the Database and Curated bullets to use approximate counts like "100+
benchmark configs" and "~10 production configs" (or similar) while preserving
the pointers to examples/configs/database/lookup.yaml, RecipeList.from_yaml(),
and examples/configs/curated/ and the note about stripping speculative_config;
keep references to references/architecture.md and references/templates.md
unchanged.
.claude/skills/basic-agg-singlenode-config-guide/references/templates.md (1)
7-24: Clarify the relationship between example values.

The template shows max_num_tokens: 2048 and max_seq_len: 2068, but the relationship isn't immediately clear. The comment states >= ISL + OSL + 68, which suggests ISL+OSL=2000 in this example. Consider adding a comment like # example assumes ISL+OSL=2000 to help users understand how to adapt these values.
📝 Proposed clarification
 ```yaml
 backend: pytorch
-max_num_tokens: 2048          # adjust: see knob-heuristics.md §max_num_tokens
-max_seq_len: 2068             # >= ISL + OSL + 68
+max_num_tokens: 2048          # adjust: see knob-heuristics.md §max_num_tokens (example: ISL=1k, OSL=1k)
+max_seq_len: 2068             # >= ISL + OSL + 68 (2000 + 68 in this example)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/basic-agg-singlenode-config-guide/references/templates.md
around lines 7 - 24, Add a clarifying comment that links the example values for
max_num_tokens and max_seq_len to an assumed ISL+OSL total so readers know how
the numbers were derived: update the comment next to max_num_tokens and
max_seq_len (the YAML keys) to note the example uses ISL=1k and OSL=1k
(ISL+OSL=2000) and show that max_seq_len = ISL+OSL+68 (2000+68=2068) so users
can adjust those fields accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/basic-agg-singlenode-config-guide/references/templates.md:
- Around line 26-45: The MoE+GQA template currently sets max_batch_size: 256 but
conflicts with architecture.md which caps MoE models at 512; update the template
so max_batch_size: 512 for the MoE + GQA block (and remove/adjust the incorrect
Dense comment that says "(cap 512 for MoE)" in the Dense section), or
alternatively add an explicit inline note in the MoE+GQA template explaining
that 256 is an intentional conservative value for MoE_GQA — be explicit about
which variant the cap applies to (MoE_MLA vs MoE_GQA) and reference the
max_batch_size key and the Dense comment so readers are not confused.

---

Nitpick comments:
In
@.claude/skills/basic-agg-singlenode-config-guide/references/knob-heuristics.md:
- Around line 44-56: The guidance for max_num_tokens is ambiguous about when to
use model-specific fixed values versus the ISL-based formula; update the text
for max_num_tokens to explicitly state precedence: first check database configs
or model-specific fixed values (e.g., DeepSeek-R1 = 1152) and use those if
present, otherwise compute the scheduler token budget using the ISL-based
heuristics (sweet spot ISL–2x ISL, low/high concurrency rules, never > ISL*4),
and note that fixed values override the formula.

In @.claude/skills/basic-agg-singlenode-config-guide/references/templates.md:
- Around line 7-24: Add a clarifying comment that links the example values for
max_num_tokens and max_seq_len to an assumed ISL+OSL total so readers know how
the numbers were derived: update the comment next to max_num_tokens and
max_seq_len (the YAML keys) to note the example uses ISL=1k and OSL=1k
(ISL+OSL=2000) and show that max_seq_len = ISL+OSL+68 (2000+68=2068) so users
can adjust those fields accordingly.

In @.claude/skills/basic-agg-singlenode-config-guide/SKILL.md:
- Around line 23-29: The document names exact counts ("169 benchmark-optimal
configs", "10 production configs") which will become stale; update the wording
in SKILL.md near the Database and Curated bullets to use approximate counts like
"100+ benchmark configs" and "~10 production configs" (or similar) while
preserving the pointers to examples/configs/database/lookup.yaml,
RecipeList.from_yaml(), and examples/configs/curated/ and the note about
stripping speculative_config; keep references to references/architecture.md and
references/templates.md unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bbcddd23-5df0-403c-ad4c-2fb9fb744118

📥 Commits

Reviewing files that changed from the base of the PR and between 35ccedd and 65d2ada.

📒 Files selected for processing (4)

.claude/skills/basic-agg-singlenode-config-guide/SKILL.md
.claude/skills/basic-agg-singlenode-config-guide/references/architecture.md
.claude/skills/basic-agg-singlenode-config-guide/references/knob-heuristics.md
.claude/skills/basic-agg-singlenode-config-guide/references/templates.md

Copilot

Pull request overview

Adds a new Claude skill that guides users to generate source-backed trtllm-serve --config YAMLs for basic aggregate, single-node (PyTorch backend) serving, prioritizing checked-in config databases and deployment docs.

Changes:

Introduces the basic-agg-singlenode-config-guide skill with a step-by-step decision flow, constraints (incl. MTP carveout), and validation checklist.
Adds a reference mapping of model families to the repo’s primary config/doc sources.
Adds a reference doc of tuning knobs with “source-backed” guidance plus clearly-labeled fallback heuristics derived from tensorrt_llm/bench/.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`.claude/skills/basic-agg-singlenode-config-guide/SKILL.md`	Main skill prompt defining scope, constraints, workflow, and validation checklist for config selection/tuning.
`.claude/skills/basic-agg-singlenode-config-guide/references/architecture.md`	Model-family → primary checked-in sources mapping used by the skill’s doc-reading step.
`.claude/skills/basic-agg-singlenode-config-guide/references/knob-heuristics.md`	Guidance on commonly tuned config fields and “fallback only” bench-derived heuristics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

venkywonka · 2026-03-20T21:38:46Z

/bot run

tensorrt-cicd · 2026-03-20T21:45:07Z

PR_Github #39772 [ run ] triggered by Bot. Commit: 59347a8 Link to invocation

tensorrt-cicd · 2026-03-21T01:15:20Z

PR_Github #39772 [ run ] completed with state FAILURE. Commit: 59347a8
/LLM/main/L0_MergeRequest_PR pipeline #30966 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

venkywonka · 2026-03-25T20:46:06Z

/bot kill

venkywonka · 2026-03-25T20:46:12Z

/bot run

tensorrt-cicd · 2026-03-25T20:57:18Z

PR_Github #40370 [ run ] triggered by Bot. Commit: 52ae5e5 Link to invocation

tensorrt-cicd · 2026-03-25T20:57:19Z

PR_Github #40371 [ kill ] triggered by Bot. Commit: 52ae5e5 Link to invocation

tensorrt-cicd · 2026-03-25T20:57:20Z

PR_Github #40370 [ run ] completed with state ABORTED. Commit: 52ae5e5

Link to invocation

tensorrt-cicd · 2026-03-25T20:57:50Z

PR_Github #40371 [ kill ] completed with state SUCCESS. Commit: 52ae5e5
Successfully killed previous jobs for commit 52ae5e5

Link to invocation

venkywonka · 2026-03-31T00:47:44Z

Skill evaluation summary

Ran a structured eval loop (3 test dimensions + hold-out golden configs) comparing with-skill vs without-skill (baseline) outputs.

Where the skill clearly helps:

Exact match retrieval — finds the right checked-in config from the database. This is the primary use case for supported models with existing configs.
Guardrails — catches max_num_tokens < ISL (would silently reject requests), excludes legacy TRT-engine docs, avoids hallucinating config paths or claiming interpolated values are source-backed.
Starting point for interpolation — when no exact match exists, produces a safe config that won't OOM or reject on first launch, with a clear "what to benchmark" list.

Where both skill and baseline perform equally:

Structural decisions (TP, EP, max_seq_len) — both get these right
Both refuse to fabricate configs for entirely unsupported models

What the skill does NOT do:

Produce production-optimal configs for unseen models. Comparison against a hold-out set of known-good unpublished configs showed the skill's interpolated values are safe starting points but not optimal — real sweep data beats heuristics every time.
Replace benchmarking. The response format explicitly frames interpolated configs as "starting point + what to benchmark."

Key improvements driven by eval:

Legacy docs exclusion (baseline cited docs/source/legacy/ TRT-engine tuning data as PyTorch backend guidance)
max_batch_size is a scheduler ceiling, not a memory reservation — don't reduce it from source
moe_config.backend depends on the specific model, not just GPU+precision — mark as unverified when interpolating
cuda_graph_config.max_batch_size should reflect expected peak concurrency, not server max_batch_size
Simplified response format: exact match gives config directly; interpolation gives starting point + what to benchmark

venkywonka · 2026-03-31T16:27:11Z

/bot kill

venkywonka · 2026-03-31T16:27:17Z

/bot run

tensorrt-cicd · 2026-03-31T16:33:36Z

PR_Github #40976 [ kill ] triggered by Bot. Commit: a75d419 Link to invocation

Light skill (~250 lines, 4 files) for tuning basic aggregate (IFB, single node, PyTorch, non-speculative) TRT-LLM serving configs. Covers architecture-class taxonomy (MoE_MLA, MoE_GQA, Dense_GQA), 4 critical knobs with empirical heuristics from 72K+ IBDB records, and starter templates with validation checklists. Signed-off-by: Venkatesh Gururajan <[email protected]> Signed-off-by: Venky Ganesh <[email protected]>

Light skill (~250 lines, 4 files) for tuning basic aggregate (IFB, single node, PyTorch, non-speculative) TRT-LLM serving configs. Covers architecture-class taxonomy (MoE_MLA, MoE_GQA, Dense_GQA), 4 critical knobs with empirical heuristics from 72K+ benchmark records, and starter templates with validation checklists. Signed-off-by: Venkatesh Gururajan <[email protected]> Signed-off-by: Venky Ganesh <[email protected]>

Add negative triggers (multi-node, speculative, disaggregated, TensorRT engine) and differentiation from trtllm-config-generator to the frontmatter description. This ensures Claude picks the right skill when both are available. Signed-off-by: Venkatesh Gururajan <[email protected]> Signed-off-by: Venky Ganesh <[email protected]>

Signed-off-by: Venkatesh Gururajan <[email protected]> Signed-off-by: Venky Ganesh <[email protected]>

Preserve workload intent when selecting checked-in configs and keep the guidance source-backed for model-specific exceptions like Kimi-K2. Signed-off-by: Venky <[email protected]>

Keep the basic aggregate config skill aligned with current checked-in DeepSeek-R1 serve recipes while preserving its conservative non-spec default. Signed-off-by: Venky <[email protected]>

Keep exact source-backed config selection for in-scope requests while making adjacent out-of-scope answers more useful. This preserves trust by clearly separating verified guidance from assembled best guesses instead of failing closed. Signed-off-by: Venky <[email protected]>

…ation Define constraints once in a canonical section and reference by label throughout steps and checklist, reducing MTP exception from ~22 restatements to 5 touchpoints. Delete dead templates.md, make SKILL.md delegate to architecture.md for model mapping, trim out-of-scope handling and knob-heuristics.md duplicated sections. Add Llama-3.3-70B coverage and Qwen3 disagg filter note. Total: 323 -> 182 lines (-44%). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: venkywonka <[email protected]>

Add a "Generally Observed Patterns" section with architecture-level directional hints for the five most impactful knobs (enable_attention_dp, moe_config.backend, kv_cache_config, max_num_tokens, max_batch_size). Enrich the Commonly Tuned Fields table with brief directional context. All hints are directional only (no hard numbers) and defer to checked-in configs and deployment guides. Signed-off-by: venkywonka <[email protected]>

Signed-off-by: venkywonka <[email protected]>

…ss definitions - Expand IFB, ISL, OSL acronyms on first mention in SKILL.md (Copilot review) - Expand ADP abbreviation on first use in knob-heuristics.md (Copilot review) - Strengthen enable_attention_dp as high-throughput-only knob (venkywonka review) - Add Architecture Classes section with MoE/Dense and MLA/GQA definitions to architecture.md (Copilot review) Signed-off-by: venkywonka <[email protected]>

…rence Signed-off-by: venkywonka <[email protected]>

…rchitecture reference Signed-off-by: venkywonka <[email protected]>

…les to knob-heuristics Signed-off-by: venkywonka <[email protected]>

…to serve-config-guide Signed-off-by: venkywonka <[email protected]>

…canonicalize MTP, cut prompt Signed-off-by: venkywonka <[email protected]>

…tch_size as scheduler ceiling, MoE backend uncertainty, max_seq_len and cuda_graph guidance Signed-off-by: venkywonka <[email protected]>

…irectly, interpolation gives starting point + what to benchmark Signed-off-by: venkywonka <[email protected]>

… fix harmful heuristics Delete architecture.md (model-to-source mapping table — agent can ls/grep the repo). Fuse its 2 non-obvious pieces (legacy exclusion, V3 caveat) into SKILL.md Step 3. Remove redundant checklist items, Repo Resources table, low-value knob-heuristics rows, OOM Triage section, and chunked prefill duplication. Fix 2 heuristics that golden holdout eval showed were actively harmful: - cuda_graph_config.max_batch_size: "next power of 2 above concurrency" → "default to max_batch_size" (wrong in 3/3 golden configs) - max_batch_size: clarify NOT proportional to concurrency, copy from source (misinterpreted in 2/3 golden configs) Net: -55 lines across 3 files (182 → 127), 0 knowledge lost. Signed-off-by: venkywonka <[email protected]>

venkywonka · 2026-04-01T04:38:07Z

/bot run

tensorrt-cicd · 2026-04-01T04:43:37Z

PR_Github #41103 [ run ] triggered by Bot. Commit: 4f78207 Link to invocation

venkywonka · 2026-04-01T06:09:17Z

/bot skip --comment "no L0 CI coverage for skill changes"

tensorrt-cicd · 2026-04-01T06:14:49Z

PR_Github #41124 [ skip ] triggered by Bot. Commit: 4f78207 Link to invocation

tensorrt-cicd · 2026-04-01T06:14:57Z

PR_Github #41103 [ run ] completed with state ABORTED. Commit: 4f78207
/LLM/main/L0_MergeRequest_PR pipeline #32077 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tensorrt-cicd · 2026-04-01T06:21:59Z

PR_Github #41124 [ skip ] completed with state SUCCESS. Commit: 4f78207
Skipping testing for commit 4f78207

Link to invocation

… serving configs (NVIDIA#12054) Signed-off-by: venkywonka <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

github-actions Bot assigned venkywonka Mar 9, 2026

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

Comment thread .claude/skills/basic-agg-singlenode-config-guide/references/templates.md Outdated

venkywonka commented Mar 10, 2026

View reviewed changes

Comment thread .claude/skills/basic-agg-singlenode-config-guide/SKILL.md Outdated

venkywonka marked this pull request as draft March 11, 2026 16:11

venkywonka commented Mar 17, 2026

View reviewed changes

Comment thread .claude/skills/basic-agg-singlenode-config-guide/references/knob-heuristics.md Outdated

venkywonka marked this pull request as ready for review March 20, 2026 05:11

venkywonka requested review from brb-nv, Copilot, jgangani, kaiyux and litaotju March 20, 2026 05:46

Copilot started reviewing on behalf of venkywonka March 20, 2026 05:46 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Comment thread .claude/skills/basic-agg-singlenode-config-guide/references/architecture.md Outdated

Comment thread .claude/skills/basic-agg-singlenode-config-guide/references/knob-heuristics.md Outdated

Comment thread .claude/skills/basic-agg-singlenode-config-guide/SKILL.md Outdated

xmchen1987 reviewed Mar 25, 2026

View reviewed changes

Comment thread .claude/skills/basic-agg-singlenode-config-guide/references/knob-heuristics.md Outdated

venkywonka requested a review from a team as a code owner March 25, 2026 20:46

venkywonka requested a review from chang-l March 25, 2026 20:46

kaiyux approved these changes Apr 1, 2026

View reviewed changes

venkywonka and others added 19 commits March 31, 2026 21:37

[None][fix] Remove reference to unshipped skill in description

cac06dc

Signed-off-by: Venkatesh Gururajan <[email protected]> Signed-off-by: Venky Ganesh <[email protected]>

[None][fix] tighten basic aggregate config skill guidance

be607c6

Preserve workload intent when selecting checked-in configs and keep the guidance source-backed for model-specific exceptions like Kimi-K2. Signed-off-by: Venky <[email protected]>

[None][fix] allow exact-source-backed MTP DeepSeek-R1 configs in skill

d01dddb

Keep the basic aggregate config skill aligned with current checked-in DeepSeek-R1 serve recipes while preserving its conservative non-spec default. Signed-off-by: Venky <[email protected]>

[None][fix] simplify basic aggregate config skill constraints and flow

393ba28

Signed-off-by: venkywonka <[email protected]>

[None][feat] add KV cache estimation formulas to knob-heuristics refe…

ac8aed5

…rence Signed-off-by: venkywonka <[email protected]>

[None][fix] remove cache_transceiver_config from Kimi-K2 config and a…

67a5a2e

…rchitecture reference Signed-off-by: venkywonka <[email protected]>

[None][feat] add chunked prefill guidance and scope max_num_tokens ru…

8abab57

…les to knob-heuristics Signed-off-by: venkywonka <[email protected]>

[None][refactor] rename skill from basic-agg-singlenode-config-guide …

31ae644

…to serve-config-guide Signed-off-by: venkywonka <[email protected]>

[None][fix] address all 16 inspection report findings: fix formulas, …

1f36226

…canonicalize MTP, cut prompt Signed-off-by: venkywonka <[email protected]>

[None][fix] eval-driven improvements: legacy source exclusion, max_ba…

7d6c8d4

…tch_size as scheduler ceiling, MoE backend uncertainty, max_seq_len and cuda_graph guidance Signed-off-by: venkywonka <[email protected]>

[None][refactor] simplify response format: exact match gives config d…

866c92c

…irectly, interpolation gives starting point + what to benchmark Signed-off-by: venkywonka <[email protected]>

venkywonka force-pushed the venky/add-basic-agg-config-skills branch from a75d419 to 4f78207 Compare April 1, 2026 04:37

venkywonka enabled auto-merge (squash) April 1, 2026 05:14

venkywonka merged commit bb06399 into NVIDIA:main Apr 1, 2026
5 checks passed

Conversation

venkywonka commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Test plan

Uh oh!

coderabbitai Bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

venkywonka commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 21, 2026

Uh oh!

Uh oh!

venkywonka commented Mar 25, 2026

Uh oh!

venkywonka commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

venkywonka commented Mar 31, 2026

Skill evaluation summary

Uh oh!

venkywonka commented Mar 31, 2026

Uh oh!

venkywonka commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

venkywonka commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

venkywonka commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

venkywonka commented Mar 9, 2026 •

edited

Loading

coderabbitai Bot commented Mar 9, 2026 •

edited

Loading