- Python 99.4%
- Shell 0.6%
| cron | ||
| tests | ||
| .gitignore | ||
| check_updates.py | ||
| ensure_shim.py | ||
| LICENSE | ||
| README.md | ||
| requirements-dev.txt | ||
| shim.py | ||
| start_shim.sh | ||
camofox-firecrawl-shim
A small Firecrawl-compatible shim that lets Hermes route web_extract, web_search, and web_crawl through a local Camofox server without modifying Hermes source.
This repository contains only the shim code, compatibility tests, helper scripts, and setup instructions. It does not include a Camofox installation, browser binaries, vendored dependencies, or host-specific credentials.
What this solves
Hermes browser automation can already target Camofox through CAMOFOX_URL, but Hermes web tools still speak to a Firecrawl backend. This shim bridges that gap by exposing the subset of the Firecrawl v2 API that Hermes actually uses and fulfilling it with a local Camofox instance.
In practice this gives you a setup like:
- Hermes browser tools -> Camofox
- Hermes web tools -> this shim -> Camofox
No Hermes source patching required.
Implemented endpoints
The shim currently implements:
GET /healthGET /v2/healthPOST /v2/scrapePOST /v2/searchPOST /v2/mapPOST /v2/crawlGET /v2/crawl/:idDELETE /v2/crawl/:id
That is enough for the validated Hermes paths:
web_extractweb_searchweb_mapweb_crawl
Repository layout
shim.py— Firecrawl-compatible shim serverensure_shim.py— local launcher/health-check wrappercheck_updates.py— lightweight version drift report for the shim stackstart_shim.sh— canonical manual startup wrappertests/test_shim_compat.py— SDK-level compatibility tests against a fake Camofox server and local fixture sitecron/ensure_camofox_firecrawl_shim.py— copy of the launcher script suitable for Hermes cron script execution
Requirements
You need:
- Python 3.11+
- a working local Camofox server
- Hermes configured to use a Firecrawl backend
pandocavailable for HTML -> Markdown conversion- optional: Hermes cron support for self-healing restart behavior
For tests, you also need:
pytestfirecrawl-py
Install Camofox separately
This repository does not install Camofox for you.
A typical source checkout flow looks like:
git clone https://github.com/jo-inc/camofox-browser /opt/hermes-runtime/camofox-browser
cd /opt/hermes-runtime/camofox-browser
npm install
If you are in a rootless Linux runtime, you may also need to stage shared libraries and expose them through LD_LIBRARY_PATH before starting the Camofox server.
Example runtime defaults used by this shim:
- Camofox server dir:
/opt/hermes-runtime/camofox-browser - Camofox URL:
http://127.0.0.1:9377 - shim URL:
http://127.0.0.1:33879 - rootless browser libs:
/opt/hermes-runtime/camofox-deps/root/usr/lib/x86_64-linux-gnu
Treat those as examples, not hard requirements. All important paths are configurable by environment variables.
Manual shim startup
Prefer the wrapper so manual runs default to the canonical maintained endpoint:
./start_shim.sh
Equivalent direct launch:
CAMOFOX_FIRECRAWL_SHIM_HOST=127.0.0.1 \
CAMOFOX_FIRECRAWL_SHIM_PORT=33879 \
python3 shim.py
Default listen address:
127.0.0.1:33879
Shim configuration
Core settings:
CAMOFOX_URL— upstream Camofox URL, defaulthttp://127.0.0.1:9377CAMOFOX_ACCESS_KEY— optional upstream Camofoxv1.8.0+global bearer token; when set, the shim sends anAuthorizationbearer header on Camofox requestsCAMOFOX_FIRECRAWL_SHIM_HOST— bind host, default127.0.0.1CAMOFOX_FIRECRAWL_SHIM_PORT— bind port, default33879CAMOFOX_FIRECRAWL_SHIM_TIMEOUT— upstream request timeout, default60CAMOFOX_FIRECRAWL_SHIM_WAIT_TIMEOUT_MS— scrape/search wait timeout, default12000CAMOFOX_FIRECRAWL_CRAWL_WAIT_TIMEOUT_MS— crawl page wait timeout, default4000CAMOFOX_FIRECRAWL_CRAWL_TIME_BUDGET_SECONDS— overall crawl time budget, default75CAMOFOX_FIRECRAWL_SEARCH_URL— search page template, default DuckDuckGo HTML searchCAMOFOX_FIRECRAWL_PANDOC_BIN— pandoc launcher, default/opt/hermes-runtime/tools/mise/use-mise.shCAMOFOX_FIRECRAWL_PANDOC_TIMEOUT— pandoc timeout in seconds, default30CAMOFOX_FIRECRAWL_TRACE— passtrace: truewhen creating Camofox tabs and add trace identifiers to document metadata, default disabled
Lazy-start settings for Camofox:
CAMOFOX_SERVER_DIRCAMOFOX_SERVER_COMMAND— command used for lazy-start, parsed with shell-style quoting but executed without a shellCAMOFOX_SERVER_START_TIMEOUTCAMOFOX_SERVER_LOGCAMOFOX_SERVER_ERROR_LOGCAMOFOX_SERVER_LD_LIBRARY_PATH
Point Hermes at the shim
Set Hermes so its Firecrawl client talks to the shim instead of a separate Firecrawl instance.
Typical environment values:
CAMOFOX_URL=http://127.0.0.1:9377
FIRECRAWL_API_URL=http://127.0.0.1:33879
If you are changing Hermes config programmatically, use Hermes' supported config writer rather than directly editing protected environment files.
Self-healing restart behavior
This setup uses two layers:
- the shim lazy-starts Camofox if the upstream browser server is missing
- a small launcher script can restart the shim itself if the shim is down
That is what ensure_shim.py is for.
Run it locally:
python3 ensure_shim.py
Behavior:
- if the shim is already healthy, it exits cleanly
- if the shim is missing, it starts
python3 shim.py - it waits until
/healthreports success
Hermes cron setup
If you want the shim to recover automatically after environment restarts, place the launcher script in your Hermes scripts directory and schedule it with Hermes cron.
This repository includes a cron-suitable copy at:
cron/ensure_camofox_firecrawl_shim.py
A practical Hermes cron job is:
- name:
ensure camofox firecrawl shim - schedule:
every 1m - script:
ensure_camofox_firecrawl_shim.py
The logic is simple:
- shim up -> do nothing
- shim down -> start it
- shim up but Camofox down -> the shim can lazy-start Camofox on demand
Update checks
check_updates.py reports:
- the current Camofox checkout tag and commit
- the remote default-branch head commit
- the latest remote tag
npm outdated --jsonfrom the Camofox checkout- installed Firecrawl Python SDK version vs latest PyPI version
It discovers the Camofox checkout in /opt/hermes-runtime/repos/camofox-browser first, falls back to /opt/hermes-runtime/camofox-browser, and can be overridden with CAMOFOX_REPO.
Run it with:
python3 check_updates.py
It does not auto-upgrade anything. It is just an inspection/reporting tool.
Tests
Run the compatibility suite in an environment that has pytest and firecrawl-py installed:
pytest -q tests/test_shim_compat.py
What the tests verify:
- Firecrawl SDK compatibility for
scrape(),search(),map(), andcrawl() - redirect unwrapping and ad filtering for search results
- crawl traversal semantics
- shim behavior against a deterministic fake Camofox server and local fixture pages
- Camofox
v1.7.2+structured extract passthrough forx-refJSON Schemas - optional Camofox trace propagation and returned trace metadata
- update-check path discovery for the maintained Camofox checkout
Camofox 1.7.x and 1.8.x integration notes
Camofox v1.7.2 added structured extraction, opt-in Playwright tracing, OpenAPI docs, and default-on crash/hang telemetry. Camofox v1.7.3 focused on native memory stability and crash-reporter signal quality. Camofox v1.8.0 added the opt-in CAMOFOX_ACCESS_KEY global bearer-auth gate while retaining backward-compatible unauthenticated behavior when the variable is unset.
The shim uses the new features conservatively:
- If a scrape payload includes a JSON Schema with
x-refhints, the shim calls CamofoxGET /tabs/:tabId/snapshotfollowed byPOST /tabs/:tabId/extractand returns the typed result asdata.extract. - Supported schema locations are
jsonOptions.schema,json_options.schema,extract.schema,extractOptions.schema,extract_options.schema, or top-levelschema. - If
CAMOFOX_FIRECRAWL_TRACEis enabled, shim-created tabs includetrace: true. Returned document metadata includescamofoxTraceEnabled,camofoxTraceUserId, andcamofoxTraceSessionKeyso operators can find the matching upstream trace archive. - If the upstream Camofox service sets
CAMOFOX_ACCESS_KEY, configure the same value on the shim so it can authenticate to all upstream routes. - The shim still uses
evaluatefor normal page extraction becauseextractis deterministic ref-to-value coercion, not full document-to-Markdown conversion.
Operational caveats:
- Camofox
extractrequires refs, so the shim intentionally snapshots before calling the endpoint. - Camofox trace listing/downloading may require
CAMOFOX_API_KEYunless called from loopback in non-production. CAMOFOX_ACCESS_KEYgates every route exceptGET /healthand conditionally exempted dedicated-key routes. If it is enabled upstream but not configured on the shim, shim-backedweb_extract,web_search,map, andcrawlcalls will fail with upstream401 Unauthorized.- The interactive docs route was advertised as
/docs, but the livev1.7.2,v1.7.3, andv1.8.0instances here served/openapi.jsonwhile/docsreturned 404; use the spec endpoint as the reliable machine-readable source. - Camofox crash/hang telemetry is enabled by default upstream. Disable it with
CAMOFOX_CRASH_REPORT_ENABLED=falseif that is not desired for a local lazy-started Camofox process.
Camofox v1.7.3 notes:
/healthnow includes upstream memory fields:memory.rssMb,memory.heapUsedMb, andmemory.nativeMemMb. The shim already forwards these underupstream.memoryin/health; no Firecrawl API shape change was needed.- Upstream now closes browser processes more aggressively through a centralized full-close path and stale Firefox/Camoufox temp-profile cleanup. That should reduce native memory leaks and orphaned browser processes after idle shutdown, restart, admin stop, disconnect, and graceful shutdown paths.
- Crash/stall reports now include better sleep-vs-real-stall classification, event-loop delay histogram data, session tab URL summaries, and native memory growth signals. This is useful for diagnostics but does not change normal shim scrape/search/crawl behavior.
Design notes
Markdown conversion
The shim uses pandoc for HTML -> Markdown conversion and falls back to plain text extraction if pandoc fails or times out.
Search backend
The default search implementation uses a browser-rendered DuckDuckGo HTML search page through Camofox and normalizes results into a Firecrawl-like response.
Crawl model
Crawls are asynchronous jobs:
POST /v2/crawlreturns a job id quickly- a worker thread performs traversal and scraping
GET /v2/crawl/:idreturns status and resultsDELETE /v2/crawl/:idremoves job state
Known limitations
- Hermes still blocks private/internal URLs before the request reaches the shim
- Google SERP access is still affected by your egress IP / proxy quality
- This shim targets the Firecrawl API surface Hermes uses today, not full Firecrawl parity
- If Hermes or the Firecrawl SDK changes its required API shape, the shim may need updates
Publishing and safety notes
This repository is intended to be publishable without credentials.
It should contain:
- shim source code
- compatibility tests
- launcher and maintenance scripts
- setup documentation
It should not contain:
- browser binaries
- staged system libraries
- local logs
- cache directories
- tokens, passwords, or environment dumps
Suggested bootstrap sequence
- Install and verify Camofox separately
- Start the Camofox server and confirm
http://127.0.0.1:9377/health - Start this shim and confirm
http://127.0.0.1:33879/health - Point Hermes
FIRECRAWL_API_URLat the shim - Run at least one Hermes-side extract/search validation
- Install the cron launcher if you want restart resilience
License
MIT