Firecrawl-compatible shim that routes Hermes web tools through a local Camofox server
  • Python 99.3%
  • Shell 0.7%
Find a file
2026-04-14 11:03:16 +00:00
cron feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
tests feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
.gitignore feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
check_updates.py feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
ensure_shim.py feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
LICENSE feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
README.md feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
requirements-dev.txt feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
shim.py feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00
start_shim.sh feat: publish camofox firecrawl shim 2026-04-14 11:03:16 +00:00

camofox-firecrawl-shim

A small Firecrawl-compatible shim that lets Hermes route web_extract, web_search, and web_crawl through a local Camofox server without modifying Hermes source.

This repository contains only the shim code, compatibility tests, helper scripts, and setup instructions. It does not include a Camofox installation, browser binaries, vendored dependencies, or host-specific credentials.

What this solves

Hermes browser automation can already target Camofox through CAMOFOX_URL, but Hermes web tools still speak to a Firecrawl backend. This shim bridges that gap by exposing the subset of the Firecrawl v2 API that Hermes actually uses and fulfilling it with a local Camofox instance.

In practice this gives you a setup like:

  • Hermes browser tools -> Camofox
  • Hermes web tools -> this shim -> Camofox

No Hermes source patching required.

Implemented endpoints

The shim currently implements:

  • GET /health
  • GET /v2/health
  • POST /v2/scrape
  • POST /v2/search
  • POST /v2/map
  • POST /v2/crawl
  • GET /v2/crawl/:id
  • DELETE /v2/crawl/:id

That is enough for the validated Hermes paths:

  • web_extract
  • web_search
  • web_map
  • web_crawl

Repository layout

  • shim.py — Firecrawl-compatible shim server
  • ensure_shim.py — local launcher/health-check wrapper
  • check_updates.py — lightweight version drift report for the shim stack
  • start_shim.sh — canonical manual startup wrapper
  • tests/test_shim_compat.py — SDK-level compatibility tests against a fake Camofox server and local fixture site
  • cron/ensure_camofox_firecrawl_shim.py — copy of the launcher script suitable for Hermes cron script execution

Requirements

You need:

  • Python 3.11+
  • a working local Camofox server
  • Hermes configured to use a Firecrawl backend
  • pandoc available for HTML -> Markdown conversion
  • optional: Hermes cron support for self-healing restart behavior

For tests, you also need:

  • pytest
  • firecrawl-py

Install Camofox separately

This repository does not install Camofox for you.

A typical source checkout flow looks like:

git clone https://github.com/jo-inc/camofox-browser /opt/hermes-runtime/camofox-browser
cd /opt/hermes-runtime/camofox-browser
npm install

If you are in a rootless Linux runtime, you may also need to stage shared libraries and expose them through LD_LIBRARY_PATH before starting the Camofox server.

Example runtime defaults used by this shim:

  • Camofox server dir: /opt/hermes-runtime/camofox-browser
  • Camofox URL: http://127.0.0.1:9377
  • shim URL: http://127.0.0.1:33879
  • rootless browser libs: /opt/hermes-runtime/camofox-deps/root/usr/lib/x86_64-linux-gnu

Treat those as examples, not hard requirements. All important paths are configurable by environment variables.

Manual shim startup

Prefer the wrapper so manual runs default to the canonical maintained endpoint:

./start_shim.sh

Equivalent direct launch:

CAMOFOX_FIRECRAWL_SHIM_HOST=127.0.0.1 \
CAMOFOX_FIRECRAWL_SHIM_PORT=33879 \
python3 shim.py

Default listen address:

  • 127.0.0.1:33879

Shim configuration

Core settings:

  • CAMOFOX_URL — upstream Camofox URL, default http://127.0.0.1:9377
  • CAMOFOX_FIRECRAWL_SHIM_HOST — bind host, default 127.0.0.1
  • CAMOFOX_FIRECRAWL_SHIM_PORT — bind port, default 33879
  • CAMOFOX_FIRECRAWL_SHIM_TIMEOUT — upstream request timeout, default 60
  • CAMOFOX_FIRECRAWL_SHIM_WAIT_TIMEOUT_MS — scrape/search wait timeout, default 12000
  • CAMOFOX_FIRECRAWL_CRAWL_WAIT_TIMEOUT_MS — crawl page wait timeout, default 4000
  • CAMOFOX_FIRECRAWL_CRAWL_TIME_BUDGET_SECONDS — overall crawl time budget, default 75
  • CAMOFOX_FIRECRAWL_SEARCH_URL — search page template, default DuckDuckGo HTML search
  • CAMOFOX_FIRECRAWL_PANDOC_BIN — pandoc launcher, default /opt/hermes-runtime/tools/mise/use-mise.sh
  • CAMOFOX_FIRECRAWL_PANDOC_TIMEOUT — pandoc timeout in seconds, default 30

Lazy-start settings for Camofox:

  • CAMOFOX_SERVER_DIR
  • CAMOFOX_SERVER_COMMAND
  • CAMOFOX_SERVER_START_TIMEOUT
  • CAMOFOX_SERVER_LOG
  • CAMOFOX_SERVER_ERROR_LOG
  • CAMOFOX_SERVER_LD_LIBRARY_PATH

Point Hermes at the shim

Set Hermes so its Firecrawl client talks to the shim instead of a separate Firecrawl instance.

Typical environment values:

CAMOFOX_URL=http://127.0.0.1:9377
FIRECRAWL_API_URL=http://127.0.0.1:33879

If you are changing Hermes config programmatically, use Hermes' supported config writer rather than directly editing protected environment files.

Self-healing restart behavior

This setup uses two layers:

  1. the shim lazy-starts Camofox if the upstream browser server is missing
  2. a small launcher script can restart the shim itself if the shim is down

That is what ensure_shim.py is for.

Run it locally:

python3 ensure_shim.py

Behavior:

  • if the shim is already healthy, it exits cleanly
  • if the shim is missing, it starts python3 shim.py
  • it waits until /health reports success

Hermes cron setup

If you want the shim to recover automatically after environment restarts, place the launcher script in your Hermes scripts directory and schedule it with Hermes cron.

This repository includes a cron-suitable copy at:

  • cron/ensure_camofox_firecrawl_shim.py

A practical Hermes cron job is:

  • name: ensure camofox firecrawl shim
  • schedule: every 1m
  • script: ensure_camofox_firecrawl_shim.py

The logic is simple:

  • shim up -> do nothing
  • shim down -> start it
  • shim up but Camofox down -> the shim can lazy-start Camofox on demand

Update checks

check_updates.py reports:

  • the current Camofox checkout tag and commit
  • the remote default-branch head commit
  • the latest remote tag
  • npm outdated --json from the Camofox checkout
  • installed Firecrawl Python SDK version vs latest PyPI version

Run it with:

python3 check_updates.py

It does not auto-upgrade anything. It is just an inspection/reporting tool.

Tests

Run the compatibility suite in an environment that has pytest and firecrawl-py installed:

pytest -q tests/test_shim_compat.py

What the tests verify:

  • Firecrawl SDK compatibility for scrape(), search(), map(), and crawl()
  • redirect unwrapping and ad filtering for search results
  • crawl traversal semantics
  • shim behavior against a deterministic fake Camofox server and local fixture pages

Design notes

Markdown conversion

The shim uses pandoc for HTML -> Markdown conversion and falls back to plain text extraction if pandoc fails or times out.

Search backend

The default search implementation uses a browser-rendered DuckDuckGo HTML search page through Camofox and normalizes results into a Firecrawl-like response.

Crawl model

Crawls are asynchronous jobs:

  • POST /v2/crawl returns a job id quickly
  • a worker thread performs traversal and scraping
  • GET /v2/crawl/:id returns status and results
  • DELETE /v2/crawl/:id removes job state

Known limitations

  • Hermes still blocks private/internal URLs before the request reaches the shim
  • Google SERP access is still affected by your egress IP / proxy quality
  • This shim targets the Firecrawl API surface Hermes uses today, not full Firecrawl parity
  • If Hermes or the Firecrawl SDK changes its required API shape, the shim may need updates

Publishing and safety notes

This repository is intended to be publishable without credentials.

It should contain:

  • shim source code
  • compatibility tests
  • launcher and maintenance scripts
  • setup documentation

It should not contain:

  • browser binaries
  • staged system libraries
  • local logs
  • cache directories
  • tokens, passwords, or environment dumps

Suggested bootstrap sequence

  1. Install and verify Camofox separately
  2. Start the Camofox server and confirm http://127.0.0.1:9377/health
  3. Start this shim and confirm http://127.0.0.1:33879/health
  4. Point Hermes FIRECRAWL_API_URL at the shim
  5. Run at least one Hermes-side extract/search validation
  6. Install the cron launcher if you want restart resilience

License

MIT