Skip to content

Releases: defilantech/LLMKube

v0.7.11

23 May 01:04
4c2480b

Choose a tag to compare

0.7.11 (2026-05-23)

Bug Fixes

  • foreman: drop chart-level subchart dep on llmkube (unblock v0.7.11 chart-releaser) (#519) (207ddc6)

v0.7.10

23 May 00:10
f2aca7a

Choose a tag to compare

0.7.10 (2026-05-23)

Features

  • add --llama-server-port for a fixed llama-server runtime port (#499) (cc30b0d)
  • add make lint-all target for cross-arch linting (#508) (f57dd5b)
  • capability-aware scheduler + AgenticTaskWatcher + stub executor (Foreman v0.1 M2) (#504) (74b3d6e)
  • foreman: gate-role Agent on a verifier node (M4) (#518) (40a340e)
  • foreman: native agent loop + Agent CRD + coder role on M5 Max (M3) (#509) (6661343)
  • scaffold Foreman as an opt-in add-on (M0 + M1) (#501) (cd40491)

Bug Fixes

  • report Stopped phase when InferenceService.spec.replicas=0 on Metal path (#498) (7787239)

Documentation

  • add AGENTS.md (#496) (89d3766)
  • bump broken bartowski phi-4-mini URL to renamed repo (#514) (9f15d98)
  • macos-metal: derive curl port from Endpoints (follow-up to #513) (#515) (83085c2)
  • macos-metal: replace broken port-forward step with host-localhost curl (#513) (0f7f7a7)

llmkube-0.7.11

23 May 01:05
4c2480b

Choose a tag to compare

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

foreman-0.7.11

23 May 01:05
4c2480b

Choose a tag to compare

Foreman is an opt-in add-on for LLMKube that schedules agentic workloads (Workload, AgenticTask) across a fleet of nodes (FleetNode). Installing LLMKube alone does not install or require Foreman. Foreman is a SIBLING chart to llmkube, not a subchart: install llmkube first (helm install llmkube defilantech/llmkube), then install foreman alongside it. They share no Helm relationship at packaging or install time; the only coupling is that the foreman-operator's RBAC reads inference.llmkube.dev CRDs that llmkube installs.

v0.7.9

18 May 08:33
060858e

Choose a tag to compare

0.7.9 (2026-05-18)

Features

Bug Fixes

  • clear stale conditions when a model reaches Ready without a download (#476) (06325b0)
  • inference PodMonitor selector matched no pods (#481) (31ee4d6)
  • mark Metal local-path models Ready instead of stuck Copying (#472) (c513c84)
  • metal-path InferenceService status and memory pre-flight (#488) (98ef2c4)
  • point metal-agent mlx-server install hint at the Homebrew formula (#477) (74b3333)
  • prevent concurrent runtime respawn in metal-agent (#469) (f34640b)
  • stop the operator fighting the HPA over Deployment replicas (#485) (8fc70e2)

Documentation

  • add MAINTAINERS file and recommend private vulnerability reporting (#479) (aaccb4d)

llmkube-0.7.9

18 May 08:34
060858e

Choose a tag to compare

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.8

14 May 04:49
2b9616d

Choose a tag to compare

0.7.8 (2026-05-14)

Features

  • configurable proxy + per-route/backend timeouts (closes #457, #458) (#461) (03d222a)
  • external provider URL defaults + cluster-wide LiteLLM URL (closes #438) (#451) (26cd5ae)
  • Helm packaging, sample manifest, and concept doc for ModelRouter (#448) (a513fdc)
  • ModelRouterReconciler skeleton with spec validation (#445) (9b1a259)
  • reconcile router-proxy Deployment, Service, and ConfigMap (#447) (856ecc3)
  • router-proxy binary with OpenAI streaming passthrough (#446) (942d09a)
  • router-proxy cluster e2e + runtime fail-closed 503 (closes #430) (#450) (75151fa)
  • scaffold ModelRouter CRD types and deepcopy (#442) (e6c60b3)

Bug Fixes

  • close cloud-tier conns + drop local idle timeout (closes #459) (#460) (173c26a)
  • don't quarantine backends on per-attempt context deadline (closes #462) (#463) (80ef9c8)
  • e2e: unblock MicroShift SCC diagnostics + bump bootstrap timeout (#466) (0c793b7)
  • half-open circuit breaker on proxy + scale-to-zero status (closes #452, #453) (#454) (ac9302c)
  • preserve external annotations on reconciler Deployment updates (#468) (de580c1)

Documentation

  • add consumer-hardware model matrix guide (#444) (dd07397)
  • readme: land ModelRouter prominently for the 0.7.8 release (#464) (deb24bb)
  • site: air-gapped, OpenShift, macOS Metal guides + architecture refresh (Tier 1) (#465) (5996a1e)
  • site: drop stale "fifteen lines" claim in openshift-install Reference (#467) (ec52ca8)

llmkube-0.7.8

14 May 04:50
2b9616d

Choose a tag to compare

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference

v0.7.7

11 May 12:27
437c2c3

Choose a tag to compare

0.7.7 (2026-05-11)

Features

  • agent: vllm-swift runtime + TurboQuant passthrough (#391) (#393) (2691e67)
  • ci+chart: make OpenShift a first-class deploy target (closes #421) (#422) (798a13e)
  • crd: add gpuMemoryUtilization and cpuOffloadGB to VLLMConfig (#394) (6883f78)
  • metal-agent: emit Kubernetes events for memory-pressure transitions, evictions, skips, and respawn blocks (closes #390) (#411) (e0d17d1)
  • observability: runtime label on inference pods + recording rules + starter dashboard (refs #409) (#410) (71743ed)

Bug Fixes

  • controller: default FSGroup to curl_group + Longhorn-backed e2e job (closes #418, closes #420) (adce90f)
  • controller: stop hot-spinning on unreachable file:// model sources (closes #405) (#412) (4ac6f57)

Documentation

  • add NVIDIA Blackwell B200 (sm_100) validation matrix (refs #413) (#414) (bfda149)
  • operations: seed runbooks index + first 2 entries (file:// hot-spin, metal-agent memory pressure) (#417) (d3bce8d)
  • port concepts/comparison to markdown (first Phase 1C content port) (#403) (51c396b)
  • readme: HN-launch readiness fixes (broken link, Apple Silicon CTA, quickstart memory) (#401) (3e44bfb)
  • refresh quickstart cast for v0.7.6 (HN launch) (#404) (5abaddb)
  • split docs/ into site/ and contributors/, prep for site rendering (#396) (9299a31)
  • upgrade: OpenShift / OKD / MicroShift installs must use helm ... -f charts/llmkube/values-openshift.yaml so restricted-v2 SCC can inject fsGroup from the namespace's allocated range (adce90f)
  • upgrade: operators using a custom --init-container-image whose user is not curl (uid=101 gid=102) should set spec.podSecurityContext on each InferenceService or pass --default-fsgroup=<gid> to the controller (adce90f)
  • upgrade: v0.7.7 rolls every InferenceService Pod once on first reconcile (Deployment template gains fsGroup=102 and the new inference.llmkube.dev/runtime label) (adce90f)

llmkube-0.7.7

11 May 12:27
437c2c3

Choose a tag to compare

A Helm chart for LLMKube - Kubernetes operator for GPU-accelerated LLM inference