Custom bots — implementation deep-dive.
The dashboard feature page covers using custom bots. This page covers how they work under the hood — for power users who want to build robust bots that don't break.
TL;DR
Custom bots are headless scrapers + extractors + schedulers, plus a webhook-in fallback. Architecture is queue-based: source fetcher → extraction rule → idempotency check → result router. Idempotency is keyed on (source, content_hash) so re-runs don't dup-flood your signal feed. Debug surface: per-run logs, last-100 matches preview, manual trigger.
01Architecture overview
Each bot is a Lambda-style execution. On schedule trigger:
- Fetcher pulls the source (HTTP, RSS, sitemap, careers parser)
- Extractor runs your rule against the fetched content
- Idempotency check against (bot_id, content_hash) — duplicate matches dropped
- Result router writes to signal feed, fires action (alert/auto-brief/CRM push)
- Logger persists the run with status, latency, match count
02Source connectors
| Connector | What it fetches | Notes |
|---|---|---|
| URL fetcher | Single HTTP GET, follows redirects, runs JS via headless | Headless render adds ~2s latency |
| RSS reader | Standard RSS / Atom, dedup on item GUID | Most reliable connector |
| Sitemap walker | Crawls a sitemap.xml, detects new entries | Use for site-wide change monitoring |
| Careers parser | Greenhouse / Lever / Ashby / Workday — structured extraction | Pre-built per-ATS extractors |
| Webhook in | Your system POSTs; bot reacts | Real-time, push-based |
03Extraction rule types
3 modes, increasing power + cost:
- Keyword set — match if any phrase in the list appears (case-insensitive). Cheap, fast.
- Regex — Python regex against the fetched content. Powerful, pattern-precise.
- Semantic match — pre-defined intents (e.g., "is this a hiring announcement?") classified by Mama's local model. More accurate, slower, Pro+ only.
04Schedule engine
Cron-like syntax, but exposed as friendly options (hourly, daily, weekly). Behind the scenes: each bot has a next-run timestamp; scheduler polls every 60s and triggers due bots.
Webhook-in bots have no schedule — they run on inbound POST.
Per-tier limits on concurrent bots running:
- Pro: 5 bots, max 1 running at a time per workspace
- Company: 50 bots, max 5 concurrent
05Idempotency
Same content fetched twice doesn't produce two signals. Idempotency key:
If the same key was emitted in the last 90 days, the bot logs the match but doesn't fire actions. Override via "Force fire on every match" toggle (rare, mostly used for testing).
06Debugging
Each bot has a "Debug" tab showing:
- Last 100 runs with timestamp, latency, status, match count
- Last 100 matches with extracted content snippet
- Last error with stack trace if any
- "Test fetch" button — run the fetcher once without firing actions
- "Manual trigger" button — run the full bot now
Bots that fail 3 runs in a row auto-pause and a workspace alert fires.