Custom bots — implementation deep-dive.

The dashboard feature page covers using custom bots. This page covers how they work under the hood — for power users who want to build robust bots that don't break.

Time: 8 min·Updated: 2026-05-25·Audience: Pro+ power users, RevOps building data pipelines·Pair with: Dashboard feature page

TL;DR

Custom bots are headless scrapers + extractors + schedulers, plus a webhook-in fallback. Architecture is queue-based: source fetcher → extraction rule → idempotency check → result router. Idempotency is keyed on (source, content_hash) so re-runs don't dup-flood your signal feed. Debug surface: per-run logs, last-100 matches preview, manual trigger.

01Architecture overview

Each bot is a Lambda-style execution. On schedule trigger:

Fetcher pulls the source (HTTP, RSS, sitemap, careers parser)
Extractor runs your rule against the fetched content
Idempotency check against (bot_id, content_hash) — duplicate matches dropped
Result router writes to signal feed, fires action (alert/auto-brief/CRM push)
Logger persists the run with status, latency, match count

02Source connectors

Connector	What it fetches	Notes
URL fetcher	Single HTTP GET, follows redirects, runs JS via headless	Headless render adds ~2s latency
RSS reader	Standard RSS / Atom, dedup on item GUID	Most reliable connector
Sitemap walker	Crawls a sitemap.xml, detects new entries	Use for site-wide change monitoring
Careers parser	Greenhouse / Lever / Ashby / Workday — structured extraction	Pre-built per-ATS extractors
Webhook in	Your system POSTs; bot reacts	Real-time, push-based

03Extraction rule types

3 modes, increasing power + cost:

Keyword set — match if any phrase in the list appears (case-insensitive). Cheap, fast.
Regex — Python regex against the fetched content. Powerful, pattern-precise.
Semantic match — pre-defined intents (e.g., "is this a hiring announcement?") classified by Mama's local model. More accurate, slower, Pro+ only.

04Schedule engine

Cron-like syntax, but exposed as friendly options (hourly, daily, weekly). Behind the scenes: each bot has a next-run timestamp; scheduler polls every 60s and triggers due bots.

Webhook-in bots have no schedule — they run on inbound POST.

Per-tier limits on concurrent bots running:

Pro: 5 bots, max 1 running at a time per workspace
Company: 50 bots, max 5 concurrent

05Idempotency

Same content fetched twice doesn't produce two signals. Idempotency key:

sha256(bot_id + '|' + extracted_content)
        

If the same key was emitted in the last 90 days, the bot logs the match but doesn't fire actions. Override via "Force fire on every match" toggle (rare, mostly used for testing).

06Debugging

Each bot has a "Debug" tab showing:

Last 100 runs with timestamp, latency, status, match count
Last 100 matches with extracted content snippet
Last error with stack trace if any
"Test fetch" button — run the fetcher once without firing actions
"Manual trigger" button — run the full bot now

Bots that fail 3 runs in a row auto-pause and a workspace alert fires.

07Common mistakes

Building bots without testing the extraction

The "Test fetch" button shows you exactly what the fetcher sees. Use it before saving. Bots that match 0 things are useless; bots that match everything are noise.

Polling hourly when daily would do

Hourly costs 24× the compute. Most use cases don't need it. Daily is the right default.

Disabling idempotency for "more matches"

You don't get more value — you get duplicate flooding. Idempotency is what keeps the signal feed clean.

Was this page helpful?

Yes No Report unclear