🧱 crawlkit

Shared Go infrastructure for local-first crawler archives.

crawlkit is not a universal Slack, Discord, Notion, or GitHub crawler. It is the reusable foundation beneath those tools: SQLite hygiene, TOML config defaults, portable JSONL/Gzip packing, git-backed snapshot sharing, sync state, CLI output helpers, control/status metadata, a shared terminal explorer, and safe desktop-cache snapshot utilities.

Install

go get github.com/openclaw/crawlkit@latest
go install github.com/openclaw/crawlkit/cmd/crawlctl@latest

Go packages are published by tagging this repository. There is no separate package registry step. See docs/publishing.md for the release commands. See docs/boundary.md for the crawlkit-versus-app ownership boundary.

Packages

config: standard TOML config paths, opt-in platform-native runtime dirs, migration-safe legacy path fallback, and token diagnostics.
store: SQLite open/read-only/transaction/query helpers.
snapshot: manifest.json plus JSONL/Gzip table snapshot export, file fingerprints, full import, and planned incremental shard import.
backup: age-encrypted JSONL/Gzip shards, backup manifests, recipient/identity helpers, and shard restore verification.
mirror: clone/init/pull/commit/push helpers for private snapshot repos.
state: generic crawler cursor and freshness records.
embed: reusable OpenAI-compatible, Ollama, and llama.cpp embedding providers plus local probe diagnostics.
vector: float32 vector encoding, dimension validation, cosine scoring, top-k helpers, and reciprocal-rank fusion.
releasecheck: GitHub release checks, 24-hour cache handling, scripted-output suppression, and stderr update notice formatting for crawl app CLIs.
output: text/json/log output helpers.
control: crawl app metadata, command manifests, status payloads, and database inventory for launchers and automation.
scheduler: crawl app discovery, job config, single-process run locking, JSONL run history, log paths, and launchd/systemd/Windows/cron schedule rendering for controller CLIs.
tui: shared terminal archive explorer with gitcrawl-style responsive panes, entity/member/detail lanes, compact sortable headers, mouse selection, floating right-click actions, sorting/filtering, and local/remote source status.
cache: safe read-only local cache snapshot helpers.

crawlctl

crawlctl is the shared controller for keeping local crawl archives warm. It discovers installed crawl apps through metadata --json, falls back to temporary legacy adapters for older apps, runs configured jobs with a lock, and records one JSONL run record per command.

crawlctl init --repo openclaw/openclaw
crawlctl run
crawlctl status
crawlctl logs gitcrawl --tail 80
crawlctl install --dry-run

Native install backends:

macOS: launchd
Linux: systemd --user
Windows: Task Scheduler
portable fallback: cron line rendering

Downstream apps

gitcrawl, discrawl, notcrawl, wacrawl, telecrawl, and slacrawl consume crawlkit on main.
The apps keep provider schemas, auth, desktop/API parsing, privacy filters, and user-facing CLI contracts. crawlkit owns only the reusable mechanics.

Safety

Library tests use temporary directories. They do not touch app runtime stores such as ~/.config/gitcrawl, ~/.slacrawl, ~/.discrawl, or ~/.notcrawl.

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.agents/skills/crabbox		.agents/skills/crabbox
.github		.github
backup		backup
cache		cache
cmd/crawlctl		cmd/crawlctl
config		config
control		control
docs		docs
embed		embed
mirror		mirror
output		output
progress		progress
releasecheck		releasecheck
scheduler		scheduler
snapshot		snapshot
state		state
store		store
tui		tui
vector		vector
.crabbox.yaml		.crabbox.yaml
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧱 crawlkit

Install

Packages

crawlctl

Downstream apps

Safety

About

Uh oh!

Releases 4

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧱 crawlkit

Install

Packages

crawlctl

Downstream apps

Safety

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Uh oh!

Contributors

Uh oh!

Languages