Every signal Mama fires comes from one of these places.
A signal is only useful if you trust where it came from. Below is the full list — every category of data we use, every upstream source we can name publicly, and a list at the bottom of what we explicitly don't touch. Published for the same reason /open is — so you can audit us against your procurement checklist instead of asking us on a call.
How sources are scored.
A "source" here means an upstream data feed Mama pulls from. A "signal" is what we derive from one or more sources. Most signals fuse 2–4 sources to reduce false positives — the ICP rubric weights the resulting confidence into the score.
The three tiers
- Tier 1 · Contracted partner — data we pay for from a vendor with an SLA and a usage contract. Highest confidence weight. Most expensive per record, used sparingly.
- Tier 2 · Public web — scraped or fetched from sources that explicitly allow it (robots.txt + Terms-of-Service compliant). Refreshed on a fixed schedule. Most of our signals fuse 2+ Tier-2 sources.
- Tier 3 · Customer-provided — data you uploaded or that we read from your CRM. Treated as authoritative for your workspace, never shared with others.
Tiers are not quality rankings — Tier 2 sources are often more current than Tier 1 partner feeds, which is why we mix them. The tier label is about provenance and contractual basis, not signal quality.
Firmographic data · ICP fit.
The "is this account my customer?" data — industry, employee count, revenue band, location, business model. Used for the ICP fit dimension (35% weight) of every account score.
| Source | What we use it for | Tier |
|---|---|---|
| Crunchbase | Company-level firmographics: industry, employee count, location, founded date. | Partner |
| SEC EDGAR | Public-company filings — official revenue, executive list, board composition. | Public |
| OpenCorporates | Cross-jurisdictional company-registry lookups for non-US accounts. | Public |
| Your CRM | Account list + custom fields — treated as authoritative for your workspace. | Customer |
Funding events.
Closed rounds, M&A, debt raises, IPOs. Highest-weight signal type in the working-list calculation. Refreshed every 2 hours — a Series B announced at 9am will be in your dashboard before lunch.
| Source | What we use it for | Tier |
|---|---|---|
| Crunchbase | Round size, lead investor, follow-on investors, post-money valuation when disclosed. | Partner |
| Press release feeds | Real-time scraping of major business-wire press releases. De-duped against Crunchbase to avoid double-firing. | Public |
| SEC Form D filings | For US private placements — sometimes faster than the press release. | Public |
Exec moves.
VP+ joins, departures, lateral moves. Hardest source category to get right — LinkedIn changes happen in real time but data quality varies. We fuse 3 sources to confirm before firing the signal.
| Source | What we use it for | Tier |
|---|---|---|
| LinkedIn (public profiles) | Public profile change detection at VP+ titles. Read-only via official API access. Subject to LinkedIn's rate limits — see the April 5 incident post-mortem for what happens when they change those. | Public |
| Press releases & PR feeds | "New CRO" announcements via Business Wire, PR Newswire. Used to confirm a LinkedIn change isn't a profile error. | Public |
| SEC 8-K filings | For public companies — material exec changes are required disclosures. | Public |
Hiring spikes.
Job-posting volume changes, role-type changes, geo expansion. The most-noisy signal category — most teams hire all the time. We threshold for "spike" relative to that team's 90-day baseline, not absolute count.
| Source | What we use it for | Tier |
|---|---|---|
| Greenhouse public job boards | Most B2B SaaS companies use Greenhouse — public boards are scrape-friendly via official API. | Public |
| Lever public job boards | Same pattern for Lever-based ATS deployments. Combined coverage hits ~70% of our target market. | Public |
Tech stack changes.
Software added, dropped, swapped on a target's marketing site, app, or job postings. Powers /lookup and the stack-change signal type. We detect ~1,400 tools across CRM, sequencer, MarTech, observability, data warehouse.
| Source | What we use it for | Tier |
|---|---|---|
| HTTP fingerprinting | Headers, cookies, embedded scripts on the target's public sites. Same fingerprint pattern Wappalyzer / BuiltWith use. | Public |
| JS bundle inspection | Pattern-match against ~1,400 known vendor SDKs in the page's JavaScript bundles. | Public |
| Job-posting tool mentions | "Experience with Salesforce required" in a job description is a strong stack-confirmation signal. Cross-checked with HTTP detection. | Public |
Voice mining.
Podcasts, interviews, conference talks, panel discussions, public blog posts from exec-level speakers at target accounts. Lowest false-positive rate of any signal type — execs only say things publicly that they want quoted.
| Source | What we use it for | Tier |
|---|---|---|
| Podcast RSS feeds | ~600 B2B-focused podcast feeds. Audio transcribed via in-house Whisper deployment, then quote-extracted by exec name. | Public |
| YouTube channel feeds | Conference recordings, panel talks. Auto-captions used where available, fallback to in-house transcription. | Public |
| Substack & corporate blogs | Text content under exec bylines. Polled hourly; quote extraction same as audio pipeline. | Public |
| Conference speaker lists | Public speaker-list pages from SaaStr, RevOps Co-op, similar events. Used to seed which execs to listen for in upcoming releases. | Public |
What we never use.
Sources we've evaluated and explicitly decided not to touch — either because the legal basis is weak, the data is unreliable, or it would betray the trust that makes the rest of this site credible.
If a source you'd expect to see is missing from both lists, email us — we either haven't gotten to it, or we've evaluated and rejected it for a reason we should add here.
Refresh cadence at a glance.
How often each source category gets re-pulled. Cadences are deliberately staggered — funding is fastest because the news window is short; tech stack is slowest because it changes slowly and false positives spike on tighter polling.
The account score itself re-runs every 6 hours regardless — even if no new source data lands, the recency decay shifts the score. Customers on Pro can force a re-score outside the cycle via the API.
The three crawler tiers
"Crawlers" gets thrown around loosely. In Mama, there are three distinct tiers, and they do different jobs. Don't conflate them.
Tier 1 — Built-in 24/7 crawlers. Mama's own 10+ crawlers covering the 1,000+ source feeds listed above. Run continuously, hitting every account in every workspace on the cadences in §9. Free on every plan. Out of your control by design — we run them, we tune them, you benefit.
Tier 2 — On-demand crawls. Pro and Company override: re-run our built-in crawlers on one specific account, right now. Returns in ~30 seconds. Useful for consultancies prepping for a client call, teams responding to a same-day event, or anyone who just heard news they want validated against fresh data. Credit-based — 100/mo on Pro, 500/mo on Company. Available via the dashboard and the API.
Tier 3 — Custom bots. The Pro/Company-only tier where you tell Mama what else to watch. Point at any public source we don't cover by default — a competitor's pricing page, a specific subreddit, a niche Substack, a GitHub repo, a public Discord channel, a custom URL with CSS-selector extraction. Custom bots run on the same crawler infrastructure as our built-in ones (same auto-retry, rate-limiting, proxy rotation, failover). Pro = 25 active; Company = 100 active. Hard rules: public sources only (no scraping behind logins), respect robots.txt, auto-pause after 3 consecutive failures. See /changelog for limits + the dashboard for the builder.
Why this matters: built-in covers the obvious sources (funding wires, job boards, G2, Reddit). Custom bots cover the niche-but-high-signal ones — the specific Substack a buyer reads, the Discord channel where deals get sourced. Compounding signal advantage over tools that only resell vendor data.
When sources change.
Sources change. Vendors deprecate APIs. Press wires re-architect. LinkedIn ships a tighter rate limit (see the April 5 post-mortem for what happens when they do). Three commitments about how we handle source-side changes:
- Adds get a /changelog entry tagged
sources. New source, new tier, what it's used for, what changed. - Removals get a 30-day notice in the customer dashboard before the source stops being used. Plus a changelog entry. Nothing silent.
- Material confidence changes get an alert on accounts where that source was a primary signal driver. "We used to rate this Tier 2; now it's Tier 3 because the vendor changed terms" — flagged on every affected brief.
If you ever see a signal where the source link 404s or the cited article has been pulled, that's a bug — tell us and we'll pull the signal from the affected briefs while we fix the source.
Questions about a specific source before you start?
If a procurement or security person needs more depth than this page offers, email [email protected]. We'll share the per-source legal-basis memo we keep internally, under NDA.