Home / Glossary / Archetype matching
Signals · the defensibility · deep dive

Archetype matching.

The discipline that turns historical reply data into sharper future scoring. Cluster the accounts that converted before by signal-mix + persona + ICP-fit. Label the clusters. Score every new account by similarity. Refine nightly. The result: predicted reply rates that compound from ±8 points (static ICP) to ±3 points (mature archetypes). The structural moat that competitors with sequencer-only or data-only datasets cannot build.

Category: Signals · Read time: 14 min · Updated: 2026-05-24 · AM-1.0
TL;DR
Archetype matching is Mama's approach to scoring new accounts by similarity to historical reply-pattern clusters — not just by static firmographic fit. Instead of "Series C SaaS, 200-2000 employees" (an ICP rule), an archetype is "post-Series-C SaaS that just hired a VP Data + just migrated to Snowflake + has 12+ data engineering roles open" (a behavioral cluster of accounts that historically replied at ~32%). New accounts get scored by similarity to the closest archetype. As replies accrue, archetype centroids shift, and the next account's predicted reply rate gets sharper. The flywheel compounds with usage. The defensibility of the entire Mama product line lives in this loop.

01What it actually is

Most outbound tools score target accounts on what they look like. Industry, employee count, funding stage, geography, tools in stack. This is the standard ICP scoring approach — pioneered by Salesforce Einstein, refined by 6sense and Demandbase, ubiquitous by 2020. It works, sort of, and it's where every B2B sales org starts.

Archetype matching scores target accounts on what they behave like. Specifically: how similar this new account is to clusters of historical accounts that actually replied to your team's outbound.

The precise definition:

Archetype matching is the practice of clustering historical closed-won (or replied-to) deals by signal-mix + persona + ICP-fit + tone, naming each cluster as an "archetype," and scoring new accounts by similarity to the closest matching archetype to predict reply probability.

Four words in that definition do the load-bearing work:

  • Clustering. Unsupervised — let the data tell you what archetypes exist rather than imposing top-down categories. Most sales orgs that try this fail by hand-naming 5 archetypes that match their slide deck instead of finding the 7-12 archetypes that actually exist in their data.
  • Signal-mix. Not just one signal type. The archetype isn't "funding round" — it's "funding round + tech change in last 60 days + exec hire in last 90 days + 12+ open roles in data function." The combination is the cluster.
  • Persona. Within the same firmographic shape, replies cluster differently by decision-maker role. A VP Data at a post-Series-C SaaS is a different archetype than a CTO at the same company.
  • Similarity. New accounts get a similarity score (cosine distance, weighted Euclidean, or learned embedding distance) to each archetype centroid. The closest archetype's historical reply rate becomes the new account's predicted reply rate.

The mechanism is borrowed from collaborative filtering (Netflix recommendations, Amazon "customers who bought") and lookalike modeling (Facebook ads). The novelty in the B2B sales context isn't the math — it's the data layer: only a system that sees both signal-mix AND reply outcomes can build archetypes that predict reply rates. Most tools see one side or the other.

02Why static ICP scoring isn't enough

Static ICP rubrics are the right starting point for any outbound team — they encode strategic intent ("we sell to B2B SaaS, 200-2000 employees, post-Series B, US/EU HQ, modern data stack") and they're easy to operate. But they hit a ceiling that gets visible around the 6-month mark of running a serious outbound motion.

Static ICP scoring
Archetype matching
Input data
Firmographics + technographics + ICP rubric weights
Same, plus reply outcomes + signal-mix history
Update cadence
Quarterly manual recalibration
Nightly automated refresh
Reply-rate prediction accuracy
±8 percentage points
±3 percentage points (mature: ±1.5)
Handles edge cases
Poorly — rules don't generalize
Naturally — clustering captures variation
Time-to-useful
Day 1 (write the rules)
60+ closed-won deals (~3-6 months)
Compounds with usage
No — static until next recalibration
Yes — sharper with every reply
Defensibility
Low — anyone can copy the rubric
High — requires the reply-outcome dataset

The specific failure modes of static ICP

Three patterns show up consistently in teams that rely on static ICP scoring beyond their first 6 months:

1. Two firmographically-identical accounts behave differently. A Series C SaaS at 400 employees in NY with a VP Data hired 60 days ago, replying at 28% to your funding-anchored outbound. A Series C SaaS at 380 employees in Boston with a VP Data hired 60 days ago, replying at 6%. Same rubric score. Wildly different actual behavior. Static ICP has no way to learn the difference.

2. The rubric over-weights features that don't matter and under-weights features that do. Most teams discover after 12 months that one of their rubric dimensions (often "intent score" or "geography") had ~zero predictive power for reply rate, while a dimension they ignored ("recent VP-level hire + tech-stack change in same quarter") was the single best predictor. Archetypes surface this automatically; static ICPs require humans to notice.

3. The world changes faster than the rubric. A rubric written in Q1 2025 doesn't know about MPP impact on email engagement, or the shift in how VPs of Data evaluate vendors post-Snowflake-cost-explosion, or any of the dozen market changes that have happened in the last 12 months. Archetype matching catches these implicitly because new reply data reshapes the centroids.

The right framing
Static ICP is the strategic intent layer; archetype matching is the operational learning layer. Both should run. The ICP rubric tells you which accounts to consider; the archetype matcher tells you which of those accounts to actually prioritize this week. Teams that try to replace one with the other almost always regress; teams that run both layers in concert see the compounding.

03How archetypes work (the 4 steps)

The mechanism, end to end, runs in four steps. Each is conceptually simple; the engineering complexity lives in scaling them and keeping them fresh.

  1. Cluster. Take every account in your CRM that has either (a) closed-won or (b) replied positively to outbound in the last 12 months. For each, extract a feature vector: signal-mix (which 8 signal types were active at the time of outreach, weighted by recency), persona (decision-maker role + seniority + function), ICP-fit score (the static rubric output), tone (which template variant got the reply), industry, geography, employee band. Run k-means or HDBSCAN on the feature vectors. You'll typically get 7-15 clusters with reasonable density.
  2. Label. A human looks at each cluster and writes a name. "Post-Series-C data-stack-migrating SaaS." "Late-stage healthcare-vertical RevOps-buyer." "Pre-IPO fintech CTO." The name has to be human-interpretable — both for the people who'll use it (SDRs reading "this account matches archetype Foo with 19% predicted reply") and for the editorial review of the clustering itself ("does cluster 4 actually make sense as a buyer type?"). Auto-naming via LLM is possible but produces names like "cluster_7_high_signal_density" which nobody trusts.
  3. Score. For each new account that enters the brief queue, compute the feature vector (same fields used in step 1). Compute cosine similarity to each archetype centroid. Take the closest match. The account inherits that archetype's historical reply rate as its predicted reply rate. Confidence interval is a function of (a) the cluster's tightness in feature space and (b) the cluster's sample size.
  4. Refine. Every night, re-pull the last N days of replies. Update archetype centroids (each new reply pulls the centroid slightly toward the new account's feature vector). Re-cluster monthly. Re-label quarterly (humans review whether the named archetypes still match the underlying clusters). New archetypes can emerge; old ones can fade out.

The hard part isn't any individual step. The hard part is keeping all four in continuous operation as the underlying data drifts. Most engineering organizations underestimate the maintenance load: nightly refresh jobs that fail silently, label drift as clusters shift but humans don't re-review, schema changes upstream that break feature extraction. Done right, archetype matching is a system that gets sharper with usage. Done poorly, it's a model that quietly decays.

05The accuracy evidence

The strongest case for archetype matching is the prediction-accuracy gap. Across the Mama customer cohort, the same accounts were scored with both methods, and the predicted reply rate was compared to the actual reply rate over the following 30 days. The error band is what matters.

Predicted-vs-actual reply rate error band (lower is better)
No scoring — flat 5% assumption
±15.0pt
±15.0pt
Static ICP rubric (industry standard)
±8.0pt
±8.0pt
Static ICP + intent-data overlay
±6.2pt
±6.2pt
Archetype matching — first 60 days of data
±4.8pt
±4.8pt
Archetype matching — 6 months of data
±3.0pt
±3.0pt
Archetype matching — 12+ months (mature)
±1.5pt
±1.5pt

The implication for operations is large. A prediction band of ±8 points means a "predicted 20% reply" account could actually reply at 12% or 28% — operationally, you can't tell whether to push hard or hold. A prediction band of ±3 points means the same predicted-20% account will reply between 17% and 23% — you can confidently commit SDR time, AE follow-up, even comp-plan acceleration. The forecasting becomes actionable.

The 6-12 month maturation curve is also worth noting. Archetype matching doesn't help on day one — it requires a closed-won baseline. Teams adopting this discipline should expect 60-90 days of "build the dataset" before scoring sharpens beyond the static-ICP baseline. After that, the compounding kicks in: month 4 is better than month 3, month 8 is dramatically better than month 4.

06How to extract archetypes from closed-won

If you want to operationalize archetype matching — either inside Mama or building your own — here's the practical sequence. This is the playbook Mama runs on each customer's closed-won dataset during onboarding.

  1. Confirm dataset minimum: 60+ closed-won deals. Below this, clustering produces unstable centroids that shift every time a new deal is added. Teams with less than 60 closed-won should run static ICP for now and revisit archetypes once they cross threshold. Most B2B SaaS teams hit 60 closed-won between 6-12 months of serious outbound.
  2. Pull the feature vector for each closed-won account. The seven fields that matter most: (a) signal-mix at time of first contact, (b) signal-mix recency-weighted, (c) decision-maker persona (role + seniority + function), (d) ICP rubric score at time of contact, (e) firmographic features (industry, employee band, geography), (f) technographic features (tools detected in stack), (g) which template + tone the SDR used. Time-of-contact matters — using current-state signals to score a deal that closed a year ago is anachronistic.
  3. Cluster. k-means with k=10 is a reasonable starting point. HDBSCAN is better at finding natural densities but requires more parameter tuning. Validate cluster quality with silhouette score (target ≥0.4); below that, the clustering is too noisy to be useful. Iterate on feature weighting until clusters separate cleanly.
  4. Human-label every cluster. Sit a sales-aware human in front of the clusters and have them write a name for each one. Names need to (a) be specific enough that an SDR could pull a relevant template, (b) match the operating reality of the recipient ("post-Series-C data migrator" not "high-fit account"), (c) be no more than 8 words. Clusters that resist naming are usually noise — drop them.
  5. Validate against held-out data. Set aside 20% of your closed-won dataset before clustering. After labeling, score the held-out accounts using the new archetypes; check whether actual reply rate matches predicted reply rate within ±5 points. If yes, the archetype model is real; if not, the clustering captured noise rather than signal.
  6. Wire into your brief / scoring pipeline. Every new account that hits ICP threshold should also get scored against the archetype library. Surface both numbers in the brief: ICP rubric score (strategic fit) AND archetype match + predicted reply rate (operational priority). SDRs use ICP to decide which accounts qualify; they use archetype score to decide which qualified accounts to actually work this week.
  7. Refresh and re-cluster on a calendar. Centroids drift as new replies accrue — refresh nightly. Full re-clustering monthly. Re-label quarterly. Watch for new emergent clusters (real archetypes that didn't exist 6 months ago) and decayed clusters (archetypes that no longer behave as they used to).

The whole process can be hand-rolled in a Jupyter notebook for the first cycle (most data scientists can do steps 2-5 in a week). Productionizing the nightly refresh + clustering + labeling is the engineering investment — typically 4-8 weeks of work for a small data team. Or you can use a system built for this (Mama) and skip the build.

07Common mistakes

The pattern of failure with archetype matching is consistent. Six mistakes show up most often.

Mistake 1 · Running on too small a dataset
Below 60 closed-won deals, clusters are noise. The centroids shift dramatically every time a new deal is added because each new deal represents 1.5%+ of the dataset. Predicted reply rates will jump 5-10 points week-over-week for the same account, destroying SDR trust in the model. Wait until you cross the 60-deal threshold, or use static ICP plus a hand-curated archetype library based on industry knowledge.
Mistake 2 · Treating archetypes as static rules
"We identified our 7 archetypes last year, here they are." That's not archetype matching — that's a renamed ICP rubric. Archetypes are centroids that move; they need nightly refresh and quarterly re-clustering. Teams that stop refreshing see model decay within 90 days as the underlying market shifts.
Mistake 3 · Auto-naming clusters
An LLM can generate names like "cluster 7: high-signal-density mid-market tech buyers." Nobody trusts these names because they don't reveal the underlying operating reality. Names that work for SDRs are written by humans who looked at the cluster members and asked "what kind of buyer is this, in plain English?" The naming pass is what makes archetypes usable, not just mathematically valid.
Mistake 4 · Conflating ICP rubric and archetype matching
These are two different layers running on different data with different update cadences. ICP is strategic intent (which accounts qualify); archetype is operational learning (which qualified accounts to work first). Teams that try to replace ICP with archetypes lose the strategic anchor; teams that try to replace archetypes with ICP lose the compounding. Both run, side by side, with both numbers visible on the brief.
Mistake 5 · Ignoring confidence intervals
An archetype with 4 historical accounts in it gives a predicted reply rate that is statistically meaningless — but most UIs display the prediction as if it were trustworthy. Mature archetype scoring always surfaces the confidence interval: "this account matches archetype A-014 with predicted reply 32% (±4pt, based on 47 historical accounts)." Without the interval, SDRs over-trust low-confidence predictions and waste time.
Mistake 6 · Letting the archetype model recommend without human override
Archetype matching is a probabilistic system; it will sometimes be confidently wrong. Letting it auto-prioritize the SDR queue without a human spot-check creates a doom loop: weird predictions get acted on, the resulting reply data reinforces the weirdness, the model gets more confidently wrong. Always keep a "why did this account score this way?" view in the UI and let SDRs override with reason codes that feed back into the model.

08How Mama implements it — and why it's the moat

Archetype matching is implemented in Mama through a feature called Reply Loop. The mechanism stays close to the four-step model described above, with operational engineering on top.

The data layer

Mama ingests reply data from the user's sequencer (Outreach, Salesloft, Smartlead, Instantly, Lemlist) via webhook or scheduled pull. Each reply is classified by an LLM into one of four tiers: engaged, wrong-person, not-now, never. Only engaged + not-now (the meaningful positive band) feed the archetype-update pipeline. Wrong-person and never are filtered out as noise.

The clustering layer

Closed-won and engaged-reply accounts get vectorized using a 24-dimension feature space (signal-mix × 8 × recency-weight, persona × 4, ICP × 4, firmographics × 4, technographics × 4). HDBSCAN runs nightly to identify clusters; clusters with ≥7 members and silhouette ≥0.45 become archetypes. Smaller / noisier clusters are held as "emerging" and surfaced for human review.

The labeling layer

New emergent clusters get an LLM-generated name proposal but require human acceptance before they become live archetypes that SDRs see. This is the manual workflow step — the human review gate. In practice, customers spend ~30 minutes per quarter accepting/renaming new archetypes; the rest of the model runs automatically.

The scoring layer

At brief-generation time, the candidate account's feature vector is computed and matched against the live archetype centroids. The closest archetype's predicted reply rate (with confidence interval) appears in the brief. SDRs see both the ICP rubric score and the archetype prediction; they typically use ICP to decide whether to work the account at all and archetype to decide priority within their working list.

Why this is structurally hard to copy

The defensibility lives in the data combinatorics. To build archetype matching, you need both signal-level data (what was true about the account at time of contact) AND reply outcome data (whether the contact replied positively). Almost no tool sees both:

  • Sequencers (Outreach, Salesloft, Smartlead) see reply outcomes but not signal-mix. They know which emails got replies; they don't know whether the account had a funding round closing at the time. They can build "winning template" patterns but not archetypes.
  • Data platforms (ZoomInfo, Apollo, Clay) see signal-mix but not reply outcomes. They know which accounts had funding rounds last week; they don't know whether anyone replied to outreach about it. They can build firmographic-fit scores but not behavioral archetypes.
  • Intent platforms (Bombora, 6sense) see directional buying signals but not actual sales outcomes. They predict who might be interested; they don't know who actually replied to your team's sequence.

Mama sees both layers because Mama runs the brief AND ingests the reply data. Every customer-month that runs through the system makes Mama's archetypes sharper. The flywheel compounds and the data accumulates. This is the structural moat that justifies the entire product line — and why competitors with sequencer-only or data-only datasets cannot replicate the offering even if they copy the UI exactly.

The investor pitch — one paragraph
Sequencer-only tools (Outreach, Salesloft, Smartlead) can't build archetype matching because they don't see signal-mix data. Data-only tools (ZoomInfo, Apollo, Clay) can't build it because they don't see reply-outcome data. Intent platforms (Bombora, 6sense) can't build it because they don't see actual sales-engagement data. Mama owns both ends of the loop — the signal-detection layer that gives accounts their feature vectors, and the Reply Loop ingestion that captures the outcomes. Every customer-month the loop compounds. That's the moat.

09FAQ

How is this different from Facebook-style lookalike modeling?

Same mathematical foundation, different feature space. Facebook lookalikes use consumer demographics + interest signals + ad-engagement data. B2B archetype matching uses firmographics + technographics + buying signals + reply behavior. The math is borrowed; the operating-context is novel for B2B outbound. Most B2B sales tools haven't run this play because they don't have the reply-outcome data needed to build the supervised signal.

What if my closed-won list has unusual segments — should I cluster them separately?

Yes, but only above the 60-deal threshold per segment. If your business has two cleanly-separated motions (e.g. mid-market self-serve and enterprise field-sold), run two separate archetype models, one per motion. The patterns that drive reply rate in each motion are different enough that mixing them produces blurry clusters.

How is archetype matching different from intent data?

Intent data is one input feature among many in archetype matching. Intent platforms tell you "this account is researching the category." Archetype matching asks "given that intent signal plus everything else we know about this account, how does this combination map to the historical accounts that replied to us?" Intent is a noisy single dimension; archetypes are multi-dimensional patterns calibrated against your team's actual reply behavior.

Won't archetypes overfit to historical patterns and miss new opportunities?

This is a real risk and the most thoughtful objection to the approach. Three mitigations: (1) the nightly refresh + quarterly re-clustering catches new patterns within ~90 days, (2) the "emerging cluster" view surfaces accounts that don't match any existing archetype — these are the new-opportunity flags, (3) we always score accounts against the full archetype library, not just the top-1 match, so unusual combinations get visible scores even when no archetype is a great fit.

Can I trust archetypes that shift quarterly? Doesn't stability matter?

Centroids shift incrementally — typically 2-5% movement per refresh cycle. Archetype identity stays stable (the "Post-Series-C data migrator" archetype doesn't disappear; its centroid just shifts slightly as new replies come in). Major shifts (>15% centroid movement, or a cluster's silhouette dropping below 0.4) trigger a human-review alert. SDRs see consistent archetype names and predictions; the underlying math evolves smoothly.

How long until the model is useful?

60 closed-won deals is the minimum threshold to start. The first archetypes are usable but predictions have ±5pt error. After 6 months of accumulated data, predictions tighten to ±3pt. After 12 months of data with continuous refresh, ±1.5pt. The model is useful at month 3 and is exceptional at month 12.

Try Mama free

Static ICP gets you started. Archetype matching compounds.

Sign-up takes 4 minutes. Connect your CRM, give Mama 60+ closed-won deals to learn from, and watch the archetype library emerge over the first 90 days. ICP rubric included. Reply Loop included on Pro+. No card required for trial.