Shared Go infrastructure for local-first crawler archives.
crawlkit is not a universal Slack, Discord, Notion, or GitHub crawler. It is
the reusable foundation beneath those tools: SQLite hygiene, TOML config
defaults, portable JSONL/Gzip packing, git-backed snapshot sharing, sync state,
CLI output helpers, control/status metadata, a shared terminal explorer, and
safe desktop-cache snapshot utilities.
go get github.com/openclaw/crawlkit@latest
go install github.com/openclaw/crawlkit/cmd/crawlctl@latestGo packages are published by tagging this repository. There is no separate
package registry step. See docs/publishing.md for the release commands.
See docs/boundary.md for the crawlkit-versus-app ownership boundary.
config: standard TOML config paths, opt-in platform-native runtime dirs, migration-safe legacy path fallback, and token diagnostics.store: SQLite open/read-only/transaction/query helpers.snapshot:manifest.jsonplus JSONL/Gzip table snapshot export, file fingerprints, full import, and planned incremental shard import.backup: age-encrypted JSONL/Gzip shards, backup manifests, recipient/identity helpers, and shard restore verification.mirror: clone/init/pull/commit/push helpers for private snapshot repos.state: generic crawler cursor and freshness records.embed: reusable OpenAI-compatible, Ollama, and llama.cpp embedding providers plus local probe diagnostics.vector: float32 vector encoding, dimension validation, cosine scoring, top-k helpers, and reciprocal-rank fusion.releasecheck: GitHub release checks, 24-hour cache handling, scripted-output suppression, and stderr update notice formatting for crawl app CLIs.output: text/json/log output helpers.control: crawl app metadata, command manifests, status payloads, and database inventory for launchers and automation.scheduler: crawl app discovery, job config, single-process run locking, JSONL run history, log paths, and launchd/systemd/Windows/cron schedule rendering for controller CLIs.tui: shared terminal archive explorer with gitcrawl-style responsive panes, entity/member/detail lanes, compact sortable headers, mouse selection, floating right-click actions, sorting/filtering, and local/remote source status.cache: safe read-only local cache snapshot helpers.
crawlctl is the shared controller for keeping local crawl archives warm.
It discovers installed crawl apps through metadata --json, falls back to
temporary legacy adapters for older apps, runs configured jobs with a lock, and
records one JSONL run record per command.
crawlctl init --repo openclaw/openclaw
crawlctl run
crawlctl status
crawlctl logs gitcrawl --tail 80
crawlctl install --dry-runNative install backends:
- macOS:
launchd - Linux:
systemd --user - Windows: Task Scheduler
- portable fallback: cron line rendering
gitcrawl,discrawl,notcrawl,wacrawl,telecrawl, andslacrawlconsumecrawlkitonmain.- The apps keep provider schemas, auth, desktop/API parsing, privacy filters,
and user-facing CLI contracts.
crawlkitowns only the reusable mechanics.
Library tests use temporary directories. They do not touch app runtime stores
such as ~/.config/gitcrawl, ~/.slacrawl, ~/.discrawl, or ~/.notcrawl.