Firecrawl-compatible shim that routes Hermes web tools through a local Camofox server
  • Python 99.5%
  • Shell 0.5%
Find a file
Clawlter Agent 9d9b0eaf09
All checks were successful
CI / Validate formatting, linting, and tests (pull_request) Successful in 3m58s
CI / Validate formatting, linting, and tests (push) Successful in 1m37s
ci: add Forgejo validation workflow
2026-05-24 15:09:28 +00:00
.forgejo/workflows ci: add Forgejo validation workflow 2026-05-24 15:09:28 +00:00
cron ci: add Forgejo validation workflow 2026-05-24 15:09:28 +00:00
tests ci: add Forgejo validation workflow 2026-05-24 15:09:28 +00:00
.gitignore feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
check_updates.py ci: add Forgejo validation workflow 2026-05-24 15:09:28 +00:00
ensure_shim.py ci: add Forgejo validation workflow 2026-05-24 15:09:28 +00:00
LICENSE feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
pyproject.toml ci: add Forgejo validation workflow 2026-05-24 15:09:28 +00:00
README.md ci: add Forgejo validation workflow 2026-05-24 15:09:28 +00:00
requirements-dev.txt ci: add Forgejo validation workflow 2026-05-24 15:09:28 +00:00
shim.py ci: add Forgejo validation workflow 2026-05-24 15:09:28 +00:00
start_shim.sh feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00

camofox-firecrawl-shim

A small Firecrawl-compatible shim that lets Hermes route web_extract, web_search, and web_crawl through a local Camofox server without modifying Hermes source.

This repository contains only the shim code, compatibility tests, helper scripts, and setup instructions. It does not include a Camofox installation, browser binaries, vendored dependencies, or host-specific credentials.

What this solves

Hermes browser automation can already target Camofox through CAMOFOX_URL, but Hermes web tools still speak to a Firecrawl backend. This shim bridges that gap by exposing the subset of the Firecrawl v2 API that Hermes actually uses and fulfilling it with a local Camofox instance.

In practice this gives you a setup like:

  • Hermes browser tools -> Camofox
  • Hermes web tools -> this shim -> Camofox

No Hermes source patching required.

Implemented endpoints

The shim currently implements:

  • GET /health
  • GET /v2/health
  • POST /v2/scrape
  • POST /v2/search
  • POST /v2/map
  • POST /v2/crawl
  • GET /v2/crawl/:id
  • DELETE /v2/crawl/:id

That is enough for the validated Hermes paths:

  • web_extract
  • web_search
  • web_map
  • web_crawl

Repository layout

  • shim.py — Firecrawl-compatible shim server
  • ensure_shim.py — local launcher/health-check wrapper
  • check_updates.py — lightweight version drift report for the shim stack
  • start_shim.sh — canonical manual startup wrapper
  • tests/test_shim_compat.py — SDK-level compatibility tests against a fake Camofox server and local fixture site
  • cron/ensure_camofox_firecrawl_shim.py — copy of the launcher script suitable for Hermes cron script execution

Requirements

You need:

  • Python 3.11+
  • a working local Camofox server
  • Hermes configured to use a Firecrawl backend
  • pandoc available for HTML -> Markdown conversion
  • optional: Hermes cron support for self-healing restart behavior

For tests, you also need:

  • pytest
  • firecrawl-py

Install Camofox separately

This repository does not install Camofox for you.

A typical source checkout flow looks like:

git clone https://github.com/jo-inc/camofox-browser /opt/hermes-runtime/camofox-browser
cd /opt/hermes-runtime/camofox-browser
npm install

If you are in a rootless Linux runtime, you may also need to stage shared libraries and expose them through LD_LIBRARY_PATH before starting the Camofox server.

Example runtime defaults used by this shim:

  • Camofox server dir: /opt/hermes-runtime/camofox-browser
  • Camofox URL: http://127.0.0.1:9377
  • shim URL: http://127.0.0.1:33879
  • rootless browser libs: /opt/hermes-runtime/camofox-deps/root/usr/lib/x86_64-linux-gnu

Treat those as examples, not hard requirements. All important paths are configurable by environment variables.

Manual shim startup

Prefer the wrapper so manual runs default to the canonical maintained endpoint:

./start_shim.sh

Equivalent direct launch:

CAMOFOX_FIRECRAWL_SHIM_HOST=127.0.0.1 \
CAMOFOX_FIRECRAWL_SHIM_PORT=33879 \
python3 shim.py

Default listen address:

  • 127.0.0.1:33879

Shim configuration

Core settings:

  • CAMOFOX_URL — upstream Camofox URL, default http://127.0.0.1:9377
  • CAMOFOX_ACCESS_KEY — optional upstream Camofox v1.8.0+ global bearer token; when set, the shim sends an Authorization bearer header on Camofox requests
  • CAMOFOX_API_KEY — optional upstream Camofox per-route bearer token; used as the shim auth token when CAMOFOX_ACCESS_KEY is unset, which is required for Camofox v1.11.2+ routes such as /tabs/:tabId/evaluate. If the shim is launched without this variable in its process environment, it also falls back to HERMES_ENV_PATH or /media/data/volumes/hermes_agent/data/.env.
  • CAMOFOX_FIRECRAWL_SHIM_HOST — bind host, default 127.0.0.1
  • CAMOFOX_FIRECRAWL_SHIM_PORT — bind port, default 33879
  • CAMOFOX_FIRECRAWL_SHIM_TIMEOUT — upstream request timeout, default 60
  • CAMOFOX_FIRECRAWL_SHIM_WAIT_TIMEOUT_MS — scrape/search wait timeout, default 12000
  • CAMOFOX_FIRECRAWL_CRAWL_WAIT_TIMEOUT_MS — crawl page wait timeout, default 4000
  • CAMOFOX_FIRECRAWL_CRAWL_TIME_BUDGET_SECONDS — overall crawl time budget, default 75
  • CAMOFOX_FIRECRAWL_SEARCH_URL — search page template, default DuckDuckGo HTML search
  • CAMOFOX_FIRECRAWL_PANDOC_BIN — pandoc launcher, default /opt/hermes-runtime/tools/mise/use-mise.sh
  • CAMOFOX_FIRECRAWL_PANDOC_TIMEOUT — pandoc timeout in seconds, default 30
  • CAMOFOX_FIRECRAWL_TRACE — pass trace: true when creating Camofox tabs and add trace identifiers to document metadata, default disabled

Lazy-start settings for Camofox:

  • CAMOFOX_SERVER_DIR
  • CAMOFOX_SERVER_COMMAND — optional command used for lazy-start, parsed with shell-style quoting but executed without a shell; unset by default because the maintained deployment uses an external Camofox service
  • CAMOFOX_SERVER_START_TIMEOUT
  • CAMOFOX_SERVER_LOG
  • CAMOFOX_SERVER_ERROR_LOG
  • CAMOFOX_SERVER_LD_LIBRARY_PATH

Point Hermes at the shim

Set Hermes so its Firecrawl client talks to the shim instead of a separate Firecrawl instance.

Typical environment values:

CAMOFOX_URL=http://127.0.0.1:9377
FIRECRAWL_API_URL=http://127.0.0.1:33879

If you are changing Hermes config programmatically, use Hermes' supported config writer rather than directly editing protected environment files.

Self-healing restart behavior

This setup uses two layers:

  1. the shim can lazy-start Camofox if CAMOFOX_SERVER_COMMAND is explicitly configured and the upstream browser server is missing
  2. a small launcher script can restart the shim itself if the shim is down

That is what ensure_shim.py is for.

Run it locally:

python3 ensure_shim.py

Behavior:

  • if the shim is already healthy, it exits cleanly
  • if the shim is missing, it starts ./start_shim.sh
  • it waits until /health reports success

Hermes cron setup

If you want the shim to recover automatically after environment restarts, place the launcher script in your Hermes scripts directory and schedule it with Hermes cron.

This repository includes a cron-suitable copy at:

  • cron/ensure_camofox_firecrawl_shim.py

A practical Hermes cron job is:

  • name: ensure camofox firecrawl shim
  • schedule: every 1m
  • script: ensure_camofox_firecrawl_shim.py

The logic is simple:

  • shim up -> do nothing
  • shim down -> start it
  • shim up but Camofox down -> the shim reports upstream failure unless CAMOFOX_SERVER_COMMAND is explicitly configured for shim-managed Camofox startup

Update checks

check_updates.py reports:

  • the current Camofox checkout tag and commit
  • the remote default-branch head commit
  • the latest remote tag
  • npm outdated --json from the Camofox checkout
  • installed Firecrawl Python SDK version vs latest PyPI version

It discovers the Camofox checkout in /opt/hermes-runtime/repos/camofox-browser first, falls back to /opt/hermes-runtime/camofox-browser, and can be overridden with CAMOFOX_REPO.

Run it with:

python3 check_updates.py

It does not auto-upgrade anything. It is just an inspection/reporting tool.

Tests and CI

Run the full validation suite in an environment that has the development requirements installed:

python -m pip install -r requirements-dev.txt
ruff format --check .
ruff check .
python -m py_compile shim.py ensure_shim.py cron/ensure_camofox_firecrawl_shim.py check_updates.py
pytest -q

Forgejo Actions runs the same validation on pull requests and pushes to main from .forgejo/workflows/ci.yml.

What the tests verify:

  • Firecrawl SDK compatibility for scrape(), search(), map(), and crawl()
  • redirect unwrapping and ad filtering for search results
  • crawl traversal semantics
  • shim behavior against a deterministic fake Camofox server and local fixture pages
  • Camofox v1.7.2+ structured extract passthrough for x-ref JSON Schemas
  • optional Camofox trace propagation and returned trace metadata
  • update-check path discovery for the maintained Camofox checkout
  • Camofox auth-token forwarding and v1.11.2 evaluate-auth fallback behavior

Camofox integration notes

Camofox v1.7.2 added structured extraction, opt-in Playwright tracing, OpenAPI docs, and default-on crash/hang telemetry. Camofox v1.7.3 focused on native memory stability and crash-reporter signal quality. Camofox v1.8.0 added the opt-in CAMOFOX_ACCESS_KEY global bearer-auth gate while retaining backward-compatible unauthenticated behavior when the variable is unset. Camofox v1.11.2 also gates sensitive per-route endpoints such as /tabs/:tabId/evaluate and DELETE /sessions/:userId behind CAMOFOX_API_KEY/CAMOFOX_ACCESS_KEY; the shim forwards CAMOFOX_ACCESS_KEY first and falls back to CAMOFOX_API_KEY so Firecrawl scrape/search flows keep working when only the API key is configured.

The shim uses the new features conservatively:

  • If a scrape payload includes a JSON Schema with x-ref hints, the shim calls Camofox GET /tabs/:tabId/snapshot followed by POST /tabs/:tabId/extract and returns the typed result as data.extract.
  • Supported schema locations are jsonOptions.schema, json_options.schema, extract.schema, extractOptions.schema, extract_options.schema, or top-level schema.
  • If CAMOFOX_FIRECRAWL_TRACE is enabled, shim-created tabs include trace: true. Returned document metadata includes camofoxTraceEnabled, camofoxTraceUserId, and camofoxTraceSessionKey so operators can find the matching upstream trace archive.
  • If the upstream Camofox service sets CAMOFOX_ACCESS_KEY, configure the same value on the shim so it can authenticate to all upstream routes. If the upstream only sets CAMOFOX_API_KEY, configure that on the shim; it is used as the bearer token fallback for per-route auth in Camofox v1.11.2+.
  • The shim still uses evaluate for normal page extraction because extract is deterministic ref-to-value coercion, not full document-to-Markdown conversion. If Camofox v1.11.2+ rejects evaluate because the shim is not loopback and no upstream API/access key is configured, the shim falls back to snapshot-derived text/links so Hermes web tools degrade instead of failing outright. When a token is configured, evaluate auth errors are surfaced instead of hidden so bad credentials do not silently degrade extraction quality.

Operational caveats:

  • Camofox extract requires refs, so the shim intentionally snapshots before calling the endpoint.
  • Camofox trace listing/downloading may require CAMOFOX_API_KEY unless called from loopback in non-production.
  • CAMOFOX_ACCESS_KEY gates every route except GET /health and conditionally exempted dedicated-key routes. If it is enabled upstream but not configured on the shim, shim-backed web_extract, web_search, map, and crawl calls will fail with upstream 401 Unauthorized.
  • CAMOFOX_API_KEY gates sensitive dedicated endpoints in Camofox v1.11.2+ including POST /tabs/:tabId/evaluate. If Camofox has CAMOFOX_API_KEY but the shim does not, normal scrape/search extraction fails with upstream 403 Forbidden.
  • Some live Camofox instances have advertised an interactive /docs route while only serving the machine-readable /openapi.json; use the spec endpoint as the reliable source during compatibility checks.
  • Camofox crash/hang telemetry is enabled by default upstream. Disable it with CAMOFOX_CRASH_REPORT_ENABLED=false if that is not desired for a local lazy-started Camofox process.

Camofox v1.7.3 notes:

  • /health now includes upstream memory fields: memory.rssMb, memory.heapUsedMb, and memory.nativeMemMb. The shim already forwards these under upstream.memory in /health; no Firecrawl API shape change was needed.
  • Upstream now closes browser processes more aggressively through a centralized full-close path and stale Firefox/Camoufox temp-profile cleanup. That should reduce native memory leaks and orphaned browser processes after idle shutdown, restart, admin stop, disconnect, and graceful shutdown paths.
  • Crash/stall reports now include better sleep-vs-real-stall classification, event-loop delay histogram data, session tab URL summaries, and native memory growth signals. This is useful for diagnostics but does not change normal shim scrape/search/crawl behavior.

Design notes

Markdown conversion

The shim uses pandoc for HTML -> Markdown conversion and falls back to plain text extraction if pandoc fails or times out.

Search backend

The default search implementation uses a browser-rendered DuckDuckGo HTML search page through Camofox and normalizes results into a Firecrawl-like response.

Crawl model

Crawls are asynchronous jobs:

  • POST /v2/crawl returns a job id quickly
  • a worker thread performs traversal and scraping
  • GET /v2/crawl/:id returns status and results
  • DELETE /v2/crawl/:id removes job state

Known limitations

  • Hermes still blocks private/internal URLs before the request reaches the shim
  • Google SERP access is still affected by your egress IP / proxy quality
  • This shim targets the Firecrawl API surface Hermes uses today, not full Firecrawl parity
  • If Hermes or the Firecrawl SDK changes its required API shape, the shim may need updates

Publishing and safety notes

This repository is intended to be publishable without credentials.

It should contain:

  • shim source code
  • compatibility tests
  • launcher and maintenance scripts
  • setup documentation

It should not contain:

  • browser binaries
  • staged system libraries
  • local logs
  • cache directories
  • tokens, passwords, or environment dumps

Suggested bootstrap sequence

  1. Install and verify Camofox separately
  2. Start the Camofox server and confirm http://127.0.0.1:9377/health
  3. Start this shim and confirm http://127.0.0.1:33879/health
  4. Point Hermes FIRECRAWL_API_URL at the shim
  5. Run at least one Hermes-side extract/search validation
  6. Install the cron launcher if you want restart resilience

License

MIT