Releases · defilantech/LLMKube

0.7.11 (2026-05-23)

Bug Fixes

foreman: drop chart-level subchart dep on llmkube (unblock v0.7.11 chart-releaser) (#519) (207ddc6)

0.7.10 (2026-05-23)

Features

add --llama-server-port for a fixed llama-server runtime port (#499) (cc30b0d)
add make lint-all target for cross-arch linting (#508) (f57dd5b)
capability-aware scheduler + AgenticTaskWatcher + stub executor (Foreman v0.1 M2) (#504) (74b3d6e)
foreman: gate-role Agent on a verifier node (M4) (#518) (40a340e)
foreman: native agent loop + Agent CRD + coder role on M5 Max (M3) (#509) (6661343)
scaffold Foreman as an opt-in add-on (M0 + M1) (#501) (cd40491)

Bug Fixes

report Stopped phase when InferenceService.spec.replicas=0 on Metal path (#498) (7787239)

Documentation

add AGENTS.md (#496) (89d3766)
bump broken bartowski phi-4-mini URL to renamed repo (#514) (9f15d98)
macos-metal: derive curl port from Endpoints (follow-up to #513) (#515) (83085c2)
macos-metal: replace broken port-forward step with host-localhost curl (#513) (0f7f7a7)

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

Foreman is an opt-in add-on for LLMKube that schedules agentic workloads (Workload, AgenticTask) across a fleet of nodes (FleetNode). Installing LLMKube alone does not install or require Foreman. Foreman is a SIBLING chart to llmkube, not a subchart: install llmkube first (helm install llmkube defilantech/llmkube), then install foreman alongside it. They share no Helm relationship at packaging or install time; the only coupling is that the foreman-operator's RBAC reads inference.llmkube.dev CRDs that llmkube installs.

0.7.9 (2026-05-18)

Features

add mlx-server runtime to the metal-agent (#471) (8bf9808)
add scale sub resource (#474) (73419a5)

Bug Fixes

clear stale conditions when a model reaches Ready without a download (#476) (06325b0)
inference PodMonitor selector matched no pods (#481) (31ee4d6)
mark Metal local-path models Ready instead of stuck Copying (#472) (c513c84)
metal-path InferenceService status and memory pre-flight (#488) (98ef2c4)
point metal-agent mlx-server install hint at the Homebrew formula (#477) (74b3333)
prevent concurrent runtime respawn in metal-agent (#469) (f34640b)
stop the operator fighting the HPA over Deployment replicas (#485) (8fc70e2)

Documentation

add MAINTAINERS file and recommend private vulnerability reporting (#479) (aaccb4d)

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

0.7.8 (2026-05-14)

Features

configurable proxy + per-route/backend timeouts (closes #457, #458) (#461) (03d222a)
external provider URL defaults + cluster-wide LiteLLM URL (closes #438) (#451) (26cd5ae)
Helm packaging, sample manifest, and concept doc for ModelRouter (#448) (a513fdc)
ModelRouterReconciler skeleton with spec validation (#445) (9b1a259)
reconcile router-proxy Deployment, Service, and ConfigMap (#447) (856ecc3)
router-proxy binary with OpenAI streaming passthrough (#446) (942d09a)
router-proxy cluster e2e + runtime fail-closed 503 (closes #430) (#450) (75151fa)
scaffold ModelRouter CRD types and deepcopy (#442) (e6c60b3)

Bug Fixes

close cloud-tier conns + drop local idle timeout (closes #459) (#460) (173c26a)
don't quarantine backends on per-attempt context deadline (closes #462) (#463) (80ef9c8)
e2e: unblock MicroShift SCC diagnostics + bump bootstrap timeout (#466) (0c793b7)
half-open circuit breaker on proxy + scale-to-zero status (closes #452, #453) (#454) (ac9302c)
preserve external annotations on reconciler Deployment updates (#468) (de580c1)

Documentation

add consumer-hardware model matrix guide (#444) (dd07397)
readme: land ModelRouter prominently for the 0.7.8 release (#464) (deb24bb)
site: air-gapped, OpenShift, macOS Metal guides + architecture refresh (Tier 1) (#465) (5996a1e)
site: drop stale "fifteen lines" claim in openshift-install Reference (#467) (ec52ca8)

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

0.7.7 (2026-05-11)

Features

agent: vllm-swift runtime + TurboQuant passthrough (#391) (#393) (2691e67)
ci+chart: make OpenShift a first-class deploy target (closes #421) (#422) (798a13e)
crd: add gpuMemoryUtilization and cpuOffloadGB to VLLMConfig (#394) (6883f78)
metal-agent: emit Kubernetes events for memory-pressure transitions, evictions, skips, and respawn blocks (closes #390) (#411) (e0d17d1)
observability: runtime label on inference pods + recording rules + starter dashboard (refs #409) (#410) (71743ed)

Bug Fixes

controller: default FSGroup to curl_group + Longhorn-backed e2e job (closes #418, closes #420) (adce90f)
controller: stop hot-spinning on unreachable file:// model sources (closes #405) (#412) (4ac6f57)

Documentation

add NVIDIA Blackwell B200 (sm_100) validation matrix (refs #413) (#414) (bfda149)
operations: seed runbooks index + first 2 entries (file:// hot-spin, metal-agent memory pressure) (#417) (d3bce8d)
port concepts/comparison to markdown (first Phase 1C content port) (#403) (51c396b)
readme: HN-launch readiness fixes (broken link, Apple Silicon CTA, quickstart memory) (#401) (3e44bfb)
refresh quickstart cast for v0.7.6 (HN launch) (#404) (5abaddb)
split docs/ into site/ and contributors/, prep for site rendering (#396) (9299a31)
upgrade: OpenShift / OKD / MicroShift installs must use helm ... -f charts/llmkube/values-openshift.yaml so restricted-v2 SCC can inject fsGroup from the namespace's allocated range (adce90f)
upgrade: operators using a custom --init-container-image whose user is not curl (uid=101 gid=102) should set spec.podSecurityContext on each InferenceService or pass --default-fsgroup=<gid> to the controller (adce90f)
upgrade: v0.7.7 rolls every InferenceService Pod once on first reconcile (Deployment template gains fsGroup=102 and the new inference.llmkube.dev/runtime label) (adce90f)

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

Releases: defilantech/LLMKube

v0.7.11

0.7.11 (2026-05-23)

Bug Fixes

Uh oh!

v0.7.10

0.7.10 (2026-05-23)

Features

Bug Fixes

Documentation

Uh oh!

llmkube-0.7.11

Uh oh!

foreman-0.7.11

Uh oh!

v0.7.9

0.7.9 (2026-05-18)

Features

Bug Fixes

Documentation

Uh oh!

llmkube-0.7.9

Uh oh!

v0.7.8

0.7.8 (2026-05-14)

Features

Bug Fixes

Documentation

Uh oh!

llmkube-0.7.8

Uh oh!

v0.7.7

0.7.7 (2026-05-11)

Features

Bug Fixes

Documentation

Uh oh!

llmkube-0.7.7

Uh oh!