Firecrawl-compatible shim that routes Hermes web tools through a local Camofox server

Python 99.5%
Shell 0.5%

Find a file

Clawlter Agent 9d9b0eaf09 All checks were successful CI / Validate formatting, linting, and tests (pull_request) Successful in 3m58s Details CI / Validate formatting, linting, and tests (push) Successful in 1m37s Details ci: add Forgejo validation workflow		2026-05-24 15:09:28 +00:00
.forgejo/workflows	ci: add Forgejo validation workflow	2026-05-24 15:09:28 +00:00
cron	ci: add Forgejo validation workflow	2026-05-24 15:09:28 +00:00
tests	ci: add Forgejo validation workflow	2026-05-24 15:09:28 +00:00
.gitignore	feat: publish camofox firecrawl shim	2026-04-14 11:03:16 +00:00
check_updates.py	ci: add Forgejo validation workflow	2026-05-24 15:09:28 +00:00
ensure_shim.py	ci: add Forgejo validation workflow	2026-05-24 15:09:28 +00:00
LICENSE	feat: publish camofox firecrawl shim	2026-04-14 11:03:16 +00:00
pyproject.toml	ci: add Forgejo validation workflow	2026-05-24 15:09:28 +00:00
README.md	ci: add Forgejo validation workflow	2026-05-24 15:09:28 +00:00
requirements-dev.txt	ci: add Forgejo validation workflow	2026-05-24 15:09:28 +00:00
shim.py	ci: add Forgejo validation workflow	2026-05-24 15:09:28 +00:00
start_shim.sh	feat: publish camofox firecrawl shim	2026-04-14 11:03:16 +00:00

README.md

camofox-firecrawl-shim

A small Firecrawl-compatible shim that lets Hermes route web_extract, web_search, and web_crawl through a local Camofox server without modifying Hermes source.

This repository contains only the shim code, compatibility tests, helper scripts, and setup instructions. It does not include a Camofox installation, browser binaries, vendored dependencies, or host-specific credentials.

What this solves

Hermes browser automation can already target Camofox through CAMOFOX_URL, but Hermes web tools still speak to a Firecrawl backend. This shim bridges that gap by exposing the subset of the Firecrawl v2 API that Hermes actually uses and fulfilling it with a local Camofox instance.

In practice this gives you a setup like:

Hermes browser tools -> Camofox
Hermes web tools -> this shim -> Camofox

No Hermes source patching required.

Implemented endpoints

The shim currently implements:

GET /health
GET /v2/health
POST /v2/scrape
POST /v2/search
POST /v2/map
POST /v2/crawl
GET /v2/crawl/:id
DELETE /v2/crawl/:id

That is enough for the validated Hermes paths:

web_extract
web_search
web_map
web_crawl

Repository layout

shim.py — Firecrawl-compatible shim server
ensure_shim.py — local launcher/health-check wrapper
check_updates.py — lightweight version drift report for the shim stack
start_shim.sh — canonical manual startup wrapper
tests/test_shim_compat.py — SDK-level compatibility tests against a fake Camofox server and local fixture site
cron/ensure_camofox_firecrawl_shim.py — copy of the launcher script suitable for Hermes cron script execution

Requirements

You need:

Python 3.11+
a working local Camofox server
Hermes configured to use a Firecrawl backend
pandoc available for HTML -> Markdown conversion
optional: Hermes cron support for self-healing restart behavior

For tests, you also need:

pytest
firecrawl-py

Install Camofox separately

This repository does not install Camofox for you.

A typical source checkout flow looks like:

git clone https://github.com/jo-inc/camofox-browser /opt/hermes-runtime/camofox-browser
cd /opt/hermes-runtime/camofox-browser
npm install

If you are in a rootless Linux runtime, you may also need to stage shared libraries and expose them through LD_LIBRARY_PATH before starting the Camofox server.

Example runtime defaults used by this shim:

Camofox server dir: /opt/hermes-runtime/camofox-browser
Camofox URL: http://127.0.0.1:9377
shim URL: http://127.0.0.1:33879
rootless browser libs: /opt/hermes-runtime/camofox-deps/root/usr/lib/x86_64-linux-gnu

Treat those as examples, not hard requirements. All important paths are configurable by environment variables.

Manual shim startup

Prefer the wrapper so manual runs default to the canonical maintained endpoint:

./start_shim.sh

Equivalent direct launch:

CAMOFOX_FIRECRAWL_SHIM_HOST=127.0.0.1 \
CAMOFOX_FIRECRAWL_SHIM_PORT=33879 \
python3 shim.py

Default listen address:

127.0.0.1:33879

Shim configuration

Core settings:

CAMOFOX_URL — upstream Camofox URL, default http://127.0.0.1:9377
CAMOFOX_ACCESS_KEY — optional upstream Camofox v1.8.0+ global bearer token; when set, the shim sends an Authorization bearer header on Camofox requests
CAMOFOX_API_KEY — optional upstream Camofox per-route bearer token; used as the shim auth token when CAMOFOX_ACCESS_KEY is unset, which is required for Camofox v1.11.2+ routes such as /tabs/:tabId/evaluate. If the shim is launched without this variable in its process environment, it also falls back to HERMES_ENV_PATH or /media/data/volumes/hermes_agent/data/.env.
CAMOFOX_FIRECRAWL_SHIM_HOST — bind host, default 127.0.0.1
CAMOFOX_FIRECRAWL_SHIM_PORT — bind port, default 33879
CAMOFOX_FIRECRAWL_SHIM_TIMEOUT — upstream request timeout, default 60
CAMOFOX_FIRECRAWL_SHIM_WAIT_TIMEOUT_MS — scrape/search wait timeout, default 12000
CAMOFOX_FIRECRAWL_CRAWL_WAIT_TIMEOUT_MS — crawl page wait timeout, default 4000
CAMOFOX_FIRECRAWL_CRAWL_TIME_BUDGET_SECONDS — overall crawl time budget, default 75
CAMOFOX_FIRECRAWL_SEARCH_URL — search page template, default DuckDuckGo HTML search
CAMOFOX_FIRECRAWL_PANDOC_BIN — pandoc launcher, default /opt/hermes-runtime/tools/mise/use-mise.sh
CAMOFOX_FIRECRAWL_PANDOC_TIMEOUT — pandoc timeout in seconds, default 30
CAMOFOX_FIRECRAWL_TRACE — pass trace: true when creating Camofox tabs and add trace identifiers to document metadata, default disabled

Lazy-start settings for Camofox:

CAMOFOX_SERVER_DIR
CAMOFOX_SERVER_COMMAND — optional command used for lazy-start, parsed with shell-style quoting but executed without a shell; unset by default because the maintained deployment uses an external Camofox service
CAMOFOX_SERVER_START_TIMEOUT
CAMOFOX_SERVER_LOG
CAMOFOX_SERVER_ERROR_LOG
CAMOFOX_SERVER_LD_LIBRARY_PATH

Point Hermes at the shim

Set Hermes so its Firecrawl client talks to the shim instead of a separate Firecrawl instance.

Typical environment values:

CAMOFOX_URL=http://127.0.0.1:9377
FIRECRAWL_API_URL=http://127.0.0.1:33879

If you are changing Hermes config programmatically, use Hermes' supported config writer rather than directly editing protected environment files.

Self-healing restart behavior

This setup uses two layers:

the shim can lazy-start Camofox if CAMOFOX_SERVER_COMMAND is explicitly configured and the upstream browser server is missing
a small launcher script can restart the shim itself if the shim is down

That is what ensure_shim.py is for.

Run it locally:

python3 ensure_shim.py

Behavior:

if the shim is already healthy, it exits cleanly
if the shim is missing, it starts ./start_shim.sh
it waits until /health reports success

Hermes cron setup

If you want the shim to recover automatically after environment restarts, place the launcher script in your Hermes scripts directory and schedule it with Hermes cron.

This repository includes a cron-suitable copy at:

cron/ensure_camofox_firecrawl_shim.py

A practical Hermes cron job is:

name: ensure camofox firecrawl shim
schedule: every 1m
script: ensure_camofox_firecrawl_shim.py

The logic is simple:

shim up -> do nothing
shim down -> start it
shim up but Camofox down -> the shim reports upstream failure unless CAMOFOX_SERVER_COMMAND is explicitly configured for shim-managed Camofox startup

Update checks

check_updates.py reports:

the current Camofox checkout tag and commit
the remote default-branch head commit
the latest remote tag
npm outdated --json from the Camofox checkout
installed Firecrawl Python SDK version vs latest PyPI version

It discovers the Camofox checkout in /opt/hermes-runtime/repos/camofox-browser first, falls back to /opt/hermes-runtime/camofox-browser, and can be overridden with CAMOFOX_REPO.

Run it with:

python3 check_updates.py

It does not auto-upgrade anything. It is just an inspection/reporting tool.

Tests and CI

Run the full validation suite in an environment that has the development requirements installed:

python -m pip install -r requirements-dev.txt
ruff format --check .
ruff check .
python -m py_compile shim.py ensure_shim.py cron/ensure_camofox_firecrawl_shim.py check_updates.py
pytest -q

Forgejo Actions runs the same validation on pull requests and pushes to main from .forgejo/workflows/ci.yml.

What the tests verify:

Firecrawl SDK compatibility for scrape(), search(), map(), and crawl()
redirect unwrapping and ad filtering for search results
crawl traversal semantics
shim behavior against a deterministic fake Camofox server and local fixture pages
Camofox v1.7.2+ structured extract passthrough for x-ref JSON Schemas
optional Camofox trace propagation and returned trace metadata
update-check path discovery for the maintained Camofox checkout
Camofox auth-token forwarding and v1.11.2 evaluate-auth fallback behavior

Camofox integration notes

Camofox v1.7.2 added structured extraction, opt-in Playwright tracing, OpenAPI docs, and default-on crash/hang telemetry. Camofox v1.7.3 focused on native memory stability and crash-reporter signal quality. Camofox v1.8.0 added the opt-in CAMOFOX_ACCESS_KEY global bearer-auth gate while retaining backward-compatible unauthenticated behavior when the variable is unset. Camofox v1.11.2 also gates sensitive per-route endpoints such as /tabs/:tabId/evaluate and DELETE /sessions/:userId behind CAMOFOX_API_KEY/CAMOFOX_ACCESS_KEY; the shim forwards CAMOFOX_ACCESS_KEY first and falls back to CAMOFOX_API_KEY so Firecrawl scrape/search flows keep working when only the API key is configured.

The shim uses the new features conservatively:

If a scrape payload includes a JSON Schema with x-ref hints, the shim calls Camofox GET /tabs/:tabId/snapshot followed by POST /tabs/:tabId/extract and returns the typed result as data.extract.
Supported schema locations are jsonOptions.schema, json_options.schema, extract.schema, extractOptions.schema, extract_options.schema, or top-level schema.
If CAMOFOX_FIRECRAWL_TRACE is enabled, shim-created tabs include trace: true. Returned document metadata includes camofoxTraceEnabled, camofoxTraceUserId, and camofoxTraceSessionKey so operators can find the matching upstream trace archive.
If the upstream Camofox service sets CAMOFOX_ACCESS_KEY, configure the same value on the shim so it can authenticate to all upstream routes. If the upstream only sets CAMOFOX_API_KEY, configure that on the shim; it is used as the bearer token fallback for per-route auth in Camofox v1.11.2+.
The shim still uses evaluate for normal page extraction because extract is deterministic ref-to-value coercion, not full document-to-Markdown conversion. If Camofox v1.11.2+ rejects evaluate because the shim is not loopback and no upstream API/access key is configured, the shim falls back to snapshot-derived text/links so Hermes web tools degrade instead of failing outright. When a token is configured, evaluate auth errors are surfaced instead of hidden so bad credentials do not silently degrade extraction quality.

Operational caveats:

Camofox extract requires refs, so the shim intentionally snapshots before calling the endpoint.
Camofox trace listing/downloading may require CAMOFOX_API_KEY unless called from loopback in non-production.
CAMOFOX_ACCESS_KEY gates every route except GET /health and conditionally exempted dedicated-key routes. If it is enabled upstream but not configured on the shim, shim-backed web_extract, web_search, map, and crawl calls will fail with upstream 401 Unauthorized.
CAMOFOX_API_KEY gates sensitive dedicated endpoints in Camofox v1.11.2+ including POST /tabs/:tabId/evaluate. If Camofox has CAMOFOX_API_KEY but the shim does not, normal scrape/search extraction fails with upstream 403 Forbidden.
Some live Camofox instances have advertised an interactive /docs route while only serving the machine-readable /openapi.json; use the spec endpoint as the reliable source during compatibility checks.
Camofox crash/hang telemetry is enabled by default upstream. Disable it with CAMOFOX_CRASH_REPORT_ENABLED=false if that is not desired for a local lazy-started Camofox process.

Camofox v1.7.3 notes:

/health now includes upstream memory fields: memory.rssMb, memory.heapUsedMb, and memory.nativeMemMb. The shim already forwards these under upstream.memory in /health; no Firecrawl API shape change was needed.
Upstream now closes browser processes more aggressively through a centralized full-close path and stale Firefox/Camoufox temp-profile cleanup. That should reduce native memory leaks and orphaned browser processes after idle shutdown, restart, admin stop, disconnect, and graceful shutdown paths.
Crash/stall reports now include better sleep-vs-real-stall classification, event-loop delay histogram data, session tab URL summaries, and native memory growth signals. This is useful for diagnostics but does not change normal shim scrape/search/crawl behavior.

Design notes

Markdown conversion

The shim uses pandoc for HTML -> Markdown conversion and falls back to plain text extraction if pandoc fails or times out.

Search backend

The default search implementation uses a browser-rendered DuckDuckGo HTML search page through Camofox and normalizes results into a Firecrawl-like response.

Crawl model

Crawls are asynchronous jobs:

POST /v2/crawl returns a job id quickly
a worker thread performs traversal and scraping
GET /v2/crawl/:id returns status and results
DELETE /v2/crawl/:id removes job state

Known limitations

Hermes still blocks private/internal URLs before the request reaches the shim
Google SERP access is still affected by your egress IP / proxy quality
This shim targets the Firecrawl API surface Hermes uses today, not full Firecrawl parity
If Hermes or the Firecrawl SDK changes its required API shape, the shim may need updates

Publishing and safety notes

This repository is intended to be publishable without credentials.

It should contain:

shim source code
compatibility tests
launcher and maintenance scripts
setup documentation

It should not contain:

browser binaries
staged system libraries
local logs
cache directories
tokens, passwords, or environment dumps

Suggested bootstrap sequence

Install and verify Camofox separately
Start the Camofox server and confirm http://127.0.0.1:9377/health
Start this shim and confirm http://127.0.0.1:33879/health
Point Hermes FIRECRAWL_API_URL at the shim
Run at least one Hermes-side extract/search validation
Install the cron launcher if you want restart resilience

License

MIT