Releases: defilantech/LLMKube
v0.7.11
v0.7.10
0.7.10 (2026-05-23)
Features
- add --llama-server-port for a fixed llama-server runtime port (#499) (cc30b0d)
- add make lint-all target for cross-arch linting (#508) (f57dd5b)
- capability-aware scheduler + AgenticTaskWatcher + stub executor (Foreman v0.1 M2) (#504) (74b3d6e)
- foreman: gate-role Agent on a verifier node (M4) (#518) (40a340e)
- foreman: native agent loop + Agent CRD + coder role on M5 Max (M3) (#509) (6661343)
- scaffold Foreman as an opt-in add-on (M0 + M1) (#501) (cd40491)
Bug Fixes
Documentation
llmkube-0.7.11
A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference
foreman-0.7.11
Foreman is an opt-in add-on for LLMKube that schedules agentic workloads (Workload, AgenticTask) across a fleet of nodes (FleetNode). Installing LLMKube alone does not install or require Foreman. Foreman is a SIBLING chart to llmkube, not a subchart: install llmkube first (helm install llmkube defilantech/llmkube), then install foreman alongside it. They share no Helm relationship at packaging or install time; the only coupling is that the foreman-operator's RBAC reads inference.llmkube.dev CRDs that llmkube installs.
v0.7.9
0.7.9 (2026-05-18)
Features
Bug Fixes
- clear stale conditions when a model reaches Ready without a download (#476) (06325b0)
- inference PodMonitor selector matched no pods (#481) (31ee4d6)
- mark Metal local-path models Ready instead of stuck Copying (#472) (c513c84)
- metal-path InferenceService status and memory pre-flight (#488) (98ef2c4)
- point metal-agent mlx-server install hint at the Homebrew formula (#477) (74b3333)
- prevent concurrent runtime respawn in metal-agent (#469) (f34640b)
- stop the operator fighting the HPA over Deployment replicas (#485) (8fc70e2)
Documentation
llmkube-0.7.9
A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference
v0.7.8
0.7.8 (2026-05-14)
Features
- configurable proxy + per-route/backend timeouts (closes #457, #458) (#461) (03d222a)
- external provider URL defaults + cluster-wide LiteLLM URL (closes #438) (#451) (26cd5ae)
- Helm packaging, sample manifest, and concept doc for ModelRouter (#448) (a513fdc)
- ModelRouterReconciler skeleton with spec validation (#445) (9b1a259)
- reconcile router-proxy Deployment, Service, and ConfigMap (#447) (856ecc3)
- router-proxy binary with OpenAI streaming passthrough (#446) (942d09a)
- router-proxy cluster e2e + runtime fail-closed 503 (closes #430) (#450) (75151fa)
- scaffold ModelRouter CRD types and deepcopy (#442) (e6c60b3)
Bug Fixes
- close cloud-tier conns + drop local idle timeout (closes #459) (#460) (173c26a)
- don't quarantine backends on per-attempt context deadline (closes #462) (#463) (80ef9c8)
- e2e: unblock MicroShift SCC diagnostics + bump bootstrap timeout (#466) (0c793b7)
- half-open circuit breaker on proxy + scale-to-zero status (closes #452, #453) (#454) (ac9302c)
- preserve external annotations on reconciler Deployment updates (#468) (de580c1)
Documentation
- add consumer-hardware model matrix guide (#444) (dd07397)
- readme: land ModelRouter prominently for the 0.7.8 release (#464) (deb24bb)
- site: air-gapped, OpenShift, macOS Metal guides + architecture refresh (Tier 1) (#465) (5996a1e)
- site: drop stale "fifteen lines" claim in openshift-install Reference (#467) (ec52ca8)
llmkube-0.7.8
A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference
v0.7.7
0.7.7 (2026-05-11)
Features
- agent: vllm-swift runtime + TurboQuant passthrough (#391) (#393) (2691e67)
- ci+chart: make OpenShift a first-class deploy target (closes #421) (#422) (798a13e)
- crd: add gpuMemoryUtilization and cpuOffloadGB to VLLMConfig (#394) (6883f78)
- metal-agent: emit Kubernetes events for memory-pressure transitions, evictions, skips, and respawn blocks (closes #390) (#411) (e0d17d1)
- observability: runtime label on inference pods + recording rules + starter dashboard (refs #409) (#410) (71743ed)
Bug Fixes
- controller: default FSGroup to curl_group + Longhorn-backed e2e job (closes #418, closes #420) (adce90f)
- controller: stop hot-spinning on unreachable file:// model sources (closes #405) (#412) (4ac6f57)
Documentation
- add NVIDIA Blackwell B200 (sm_100) validation matrix (refs #413) (#414) (bfda149)
- operations: seed runbooks index + first 2 entries (file:// hot-spin, metal-agent memory pressure) (#417) (d3bce8d)
- port concepts/comparison to markdown (first Phase 1C content port) (#403) (51c396b)
- readme: HN-launch readiness fixes (broken link, Apple Silicon CTA, quickstart memory) (#401) (3e44bfb)
- refresh quickstart cast for v0.7.6 (HN launch) (#404) (5abaddb)
- split docs/ into site/ and contributors/, prep for site rendering (#396) (9299a31)
- upgrade: OpenShift / OKD / MicroShift installs must use
helm ... -f charts/llmkube/values-openshift.yamlso restricted-v2 SCC can inject fsGroup from the namespace's allocated range (adce90f) - upgrade: operators using a custom
--init-container-imagewhose user is not curl (uid=101 gid=102) should setspec.podSecurityContexton each InferenceService or pass--default-fsgroup=<gid>to the controller (adce90f) - upgrade: v0.7.7 rolls every InferenceService Pod once on first reconcile (Deployment template gains fsGroup=102 and the new
inference.llmkube.dev/runtimelabel) (adce90f)
llmkube-0.7.7
A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference