Firecrawl-compatible shim that routes Hermes web tools through a local Camofox server
  • Python 99.4%
  • Shell 0.6%
Find a file
2026-04-27 23:18:34 +00:00
cron fix: avoid shell when launching helper processes 2026-04-24 02:24:43 +00:00
tests feat: support camofox access key auth 2026-04-27 23:18:34 +00:00
.gitignore feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
check_updates.py docs: refresh shim notes for camofox 1.7.3 2026-04-26 18:40:45 +00:00
ensure_shim.py fix: avoid shell when launching helper processes 2026-04-24 02:24:43 +00:00
LICENSE feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
README.md feat: support camofox access key auth 2026-04-27 23:18:34 +00:00
requirements-dev.txt feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
shim.py feat: support camofox access key auth 2026-04-27 23:18:34 +00:00
start_shim.sh feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00

camofox-firecrawl-shim

A small Firecrawl-compatible shim that lets Hermes route web_extract, web_search, and web_crawl through a local Camofox server without modifying Hermes source.

This repository contains only the shim code, compatibility tests, helper scripts, and setup instructions. It does not include a Camofox installation, browser binaries, vendored dependencies, or host-specific credentials.

What this solves

Hermes browser automation can already target Camofox through CAMOFOX_URL, but Hermes web tools still speak to a Firecrawl backend. This shim bridges that gap by exposing the subset of the Firecrawl v2 API that Hermes actually uses and fulfilling it with a local Camofox instance.

In practice this gives you a setup like:

  • Hermes browser tools -> Camofox
  • Hermes web tools -> this shim -> Camofox

No Hermes source patching required.

Implemented endpoints

The shim currently implements:

  • GET /health
  • GET /v2/health
  • POST /v2/scrape
  • POST /v2/search
  • POST /v2/map
  • POST /v2/crawl
  • GET /v2/crawl/:id
  • DELETE /v2/crawl/:id

That is enough for the validated Hermes paths:

  • web_extract
  • web_search
  • web_map
  • web_crawl

Repository layout

  • shim.py — Firecrawl-compatible shim server
  • ensure_shim.py — local launcher/health-check wrapper
  • check_updates.py — lightweight version drift report for the shim stack
  • start_shim.sh — canonical manual startup wrapper
  • tests/test_shim_compat.py — SDK-level compatibility tests against a fake Camofox server and local fixture site
  • cron/ensure_camofox_firecrawl_shim.py — copy of the launcher script suitable for Hermes cron script execution

Requirements

You need:

  • Python 3.11+
  • a working local Camofox server
  • Hermes configured to use a Firecrawl backend
  • pandoc available for HTML -> Markdown conversion
  • optional: Hermes cron support for self-healing restart behavior

For tests, you also need:

  • pytest
  • firecrawl-py

Install Camofox separately

This repository does not install Camofox for you.

A typical source checkout flow looks like:

git clone https://github.com/jo-inc/camofox-browser /opt/hermes-runtime/camofox-browser
cd /opt/hermes-runtime/camofox-browser
npm install

If you are in a rootless Linux runtime, you may also need to stage shared libraries and expose them through LD_LIBRARY_PATH before starting the Camofox server.

Example runtime defaults used by this shim:

  • Camofox server dir: /opt/hermes-runtime/camofox-browser
  • Camofox URL: http://127.0.0.1:9377
  • shim URL: http://127.0.0.1:33879
  • rootless browser libs: /opt/hermes-runtime/camofox-deps/root/usr/lib/x86_64-linux-gnu

Treat those as examples, not hard requirements. All important paths are configurable by environment variables.

Manual shim startup

Prefer the wrapper so manual runs default to the canonical maintained endpoint:

./start_shim.sh

Equivalent direct launch:

CAMOFOX_FIRECRAWL_SHIM_HOST=127.0.0.1 \
CAMOFOX_FIRECRAWL_SHIM_PORT=33879 \
python3 shim.py

Default listen address:

  • 127.0.0.1:33879

Shim configuration

Core settings:

  • CAMOFOX_URL — upstream Camofox URL, default http://127.0.0.1:9377
  • CAMOFOX_ACCESS_KEY — optional upstream Camofox v1.8.0+ global bearer token; when set, the shim sends an Authorization bearer header on Camofox requests
  • CAMOFOX_FIRECRAWL_SHIM_HOST — bind host, default 127.0.0.1
  • CAMOFOX_FIRECRAWL_SHIM_PORT — bind port, default 33879
  • CAMOFOX_FIRECRAWL_SHIM_TIMEOUT — upstream request timeout, default 60
  • CAMOFOX_FIRECRAWL_SHIM_WAIT_TIMEOUT_MS — scrape/search wait timeout, default 12000
  • CAMOFOX_FIRECRAWL_CRAWL_WAIT_TIMEOUT_MS — crawl page wait timeout, default 4000
  • CAMOFOX_FIRECRAWL_CRAWL_TIME_BUDGET_SECONDS — overall crawl time budget, default 75
  • CAMOFOX_FIRECRAWL_SEARCH_URL — search page template, default DuckDuckGo HTML search
  • CAMOFOX_FIRECRAWL_PANDOC_BIN — pandoc launcher, default /opt/hermes-runtime/tools/mise/use-mise.sh
  • CAMOFOX_FIRECRAWL_PANDOC_TIMEOUT — pandoc timeout in seconds, default 30
  • CAMOFOX_FIRECRAWL_TRACE — pass trace: true when creating Camofox tabs and add trace identifiers to document metadata, default disabled

Lazy-start settings for Camofox:

  • CAMOFOX_SERVER_DIR
  • CAMOFOX_SERVER_COMMAND — command used for lazy-start, parsed with shell-style quoting but executed without a shell
  • CAMOFOX_SERVER_START_TIMEOUT
  • CAMOFOX_SERVER_LOG
  • CAMOFOX_SERVER_ERROR_LOG
  • CAMOFOX_SERVER_LD_LIBRARY_PATH

Point Hermes at the shim

Set Hermes so its Firecrawl client talks to the shim instead of a separate Firecrawl instance.

Typical environment values:

CAMOFOX_URL=http://127.0.0.1:9377
FIRECRAWL_API_URL=http://127.0.0.1:33879

If you are changing Hermes config programmatically, use Hermes' supported config writer rather than directly editing protected environment files.

Self-healing restart behavior

This setup uses two layers:

  1. the shim lazy-starts Camofox if the upstream browser server is missing
  2. a small launcher script can restart the shim itself if the shim is down

That is what ensure_shim.py is for.

Run it locally:

python3 ensure_shim.py

Behavior:

  • if the shim is already healthy, it exits cleanly
  • if the shim is missing, it starts python3 shim.py
  • it waits until /health reports success

Hermes cron setup

If you want the shim to recover automatically after environment restarts, place the launcher script in your Hermes scripts directory and schedule it with Hermes cron.

This repository includes a cron-suitable copy at:

  • cron/ensure_camofox_firecrawl_shim.py

A practical Hermes cron job is:

  • name: ensure camofox firecrawl shim
  • schedule: every 1m
  • script: ensure_camofox_firecrawl_shim.py

The logic is simple:

  • shim up -> do nothing
  • shim down -> start it
  • shim up but Camofox down -> the shim can lazy-start Camofox on demand

Update checks

check_updates.py reports:

  • the current Camofox checkout tag and commit
  • the remote default-branch head commit
  • the latest remote tag
  • npm outdated --json from the Camofox checkout
  • installed Firecrawl Python SDK version vs latest PyPI version

It discovers the Camofox checkout in /opt/hermes-runtime/repos/camofox-browser first, falls back to /opt/hermes-runtime/camofox-browser, and can be overridden with CAMOFOX_REPO.

Run it with:

python3 check_updates.py

It does not auto-upgrade anything. It is just an inspection/reporting tool.

Tests

Run the compatibility suite in an environment that has pytest and firecrawl-py installed:

pytest -q tests/test_shim_compat.py

What the tests verify:

  • Firecrawl SDK compatibility for scrape(), search(), map(), and crawl()
  • redirect unwrapping and ad filtering for search results
  • crawl traversal semantics
  • shim behavior against a deterministic fake Camofox server and local fixture pages
  • Camofox v1.7.2+ structured extract passthrough for x-ref JSON Schemas
  • optional Camofox trace propagation and returned trace metadata
  • update-check path discovery for the maintained Camofox checkout

Camofox 1.7.x and 1.8.x integration notes

Camofox v1.7.2 added structured extraction, opt-in Playwright tracing, OpenAPI docs, and default-on crash/hang telemetry. Camofox v1.7.3 focused on native memory stability and crash-reporter signal quality. Camofox v1.8.0 added the opt-in CAMOFOX_ACCESS_KEY global bearer-auth gate while retaining backward-compatible unauthenticated behavior when the variable is unset.

The shim uses the new features conservatively:

  • If a scrape payload includes a JSON Schema with x-ref hints, the shim calls Camofox GET /tabs/:tabId/snapshot followed by POST /tabs/:tabId/extract and returns the typed result as data.extract.
  • Supported schema locations are jsonOptions.schema, json_options.schema, extract.schema, extractOptions.schema, extract_options.schema, or top-level schema.
  • If CAMOFOX_FIRECRAWL_TRACE is enabled, shim-created tabs include trace: true. Returned document metadata includes camofoxTraceEnabled, camofoxTraceUserId, and camofoxTraceSessionKey so operators can find the matching upstream trace archive.
  • If the upstream Camofox service sets CAMOFOX_ACCESS_KEY, configure the same value on the shim so it can authenticate to all upstream routes.
  • The shim still uses evaluate for normal page extraction because extract is deterministic ref-to-value coercion, not full document-to-Markdown conversion.

Operational caveats:

  • Camofox extract requires refs, so the shim intentionally snapshots before calling the endpoint.
  • Camofox trace listing/downloading may require CAMOFOX_API_KEY unless called from loopback in non-production.
  • CAMOFOX_ACCESS_KEY gates every route except GET /health and conditionally exempted dedicated-key routes. If it is enabled upstream but not configured on the shim, shim-backed web_extract, web_search, map, and crawl calls will fail with upstream 401 Unauthorized.
  • The interactive docs route was advertised as /docs, but the live v1.7.2, v1.7.3, and v1.8.0 instances here served /openapi.json while /docs returned 404; use the spec endpoint as the reliable machine-readable source.
  • Camofox crash/hang telemetry is enabled by default upstream. Disable it with CAMOFOX_CRASH_REPORT_ENABLED=false if that is not desired for a local lazy-started Camofox process.

Camofox v1.7.3 notes:

  • /health now includes upstream memory fields: memory.rssMb, memory.heapUsedMb, and memory.nativeMemMb. The shim already forwards these under upstream.memory in /health; no Firecrawl API shape change was needed.
  • Upstream now closes browser processes more aggressively through a centralized full-close path and stale Firefox/Camoufox temp-profile cleanup. That should reduce native memory leaks and orphaned browser processes after idle shutdown, restart, admin stop, disconnect, and graceful shutdown paths.
  • Crash/stall reports now include better sleep-vs-real-stall classification, event-loop delay histogram data, session tab URL summaries, and native memory growth signals. This is useful for diagnostics but does not change normal shim scrape/search/crawl behavior.

Design notes

Markdown conversion

The shim uses pandoc for HTML -> Markdown conversion and falls back to plain text extraction if pandoc fails or times out.

Search backend

The default search implementation uses a browser-rendered DuckDuckGo HTML search page through Camofox and normalizes results into a Firecrawl-like response.

Crawl model

Crawls are asynchronous jobs:

  • POST /v2/crawl returns a job id quickly
  • a worker thread performs traversal and scraping
  • GET /v2/crawl/:id returns status and results
  • DELETE /v2/crawl/:id removes job state

Known limitations

  • Hermes still blocks private/internal URLs before the request reaches the shim
  • Google SERP access is still affected by your egress IP / proxy quality
  • This shim targets the Firecrawl API surface Hermes uses today, not full Firecrawl parity
  • If Hermes or the Firecrawl SDK changes its required API shape, the shim may need updates

Publishing and safety notes

This repository is intended to be publishable without credentials.

It should contain:

  • shim source code
  • compatibility tests
  • launcher and maintenance scripts
  • setup documentation

It should not contain:

  • browser binaries
  • staged system libraries
  • local logs
  • cache directories
  • tokens, passwords, or environment dumps

Suggested bootstrap sequence

  1. Install and verify Camofox separately
  2. Start the Camofox server and confirm http://127.0.0.1:9377/health
  3. Start this shim and confirm http://127.0.0.1:33879/health
  4. Point Hermes FIRECRAWL_API_URL at the shim
  5. Run at least one Hermes-side extract/search validation
  6. Install the cron launcher if you want restart resilience

License

MIT