Archetype matching.
The discipline that turns historical reply data into sharper future scoring. Cluster the accounts that converted before by signal-mix + persona + ICP-fit. Label the clusters. Score every new account by similarity. Refine nightly. The result: predicted reply rates that compound from ±8 points (static ICP) to ±3 points (mature archetypes). The structural moat that competitors with sequencer-only or data-only datasets cannot build.
01What it actually is
Most outbound tools score target accounts on what they look like. Industry, employee count, funding stage, geography, tools in stack. This is the standard ICP scoring approach — pioneered by Salesforce Einstein, refined by 6sense and Demandbase, ubiquitous by 2020. It works, sort of, and it's where every B2B sales org starts.
Archetype matching scores target accounts on what they behave like. Specifically: how similar this new account is to clusters of historical accounts that actually replied to your team's outbound.
The precise definition:
Archetype matching is the practice of clustering historical closed-won (or replied-to) deals by signal-mix + persona + ICP-fit + tone, naming each cluster as an "archetype," and scoring new accounts by similarity to the closest matching archetype to predict reply probability.
Four words in that definition do the load-bearing work:
- Clustering. Unsupervised — let the data tell you what archetypes exist rather than imposing top-down categories. Most sales orgs that try this fail by hand-naming 5 archetypes that match their slide deck instead of finding the 7-12 archetypes that actually exist in their data.
- Signal-mix. Not just one signal type. The archetype isn't "funding round" — it's "funding round + tech change in last 60 days + exec hire in last 90 days + 12+ open roles in data function." The combination is the cluster.
- Persona. Within the same firmographic shape, replies cluster differently by decision-maker role. A VP Data at a post-Series-C SaaS is a different archetype than a CTO at the same company.
- Similarity. New accounts get a similarity score (cosine distance, weighted Euclidean, or learned embedding distance) to each archetype centroid. The closest archetype's historical reply rate becomes the new account's predicted reply rate.
The mechanism is borrowed from collaborative filtering (Netflix recommendations, Amazon "customers who bought") and lookalike modeling (Facebook ads). The novelty in the B2B sales context isn't the math — it's the data layer: only a system that sees both signal-mix AND reply outcomes can build archetypes that predict reply rates. Most tools see one side or the other.
02Why static ICP scoring isn't enough
Static ICP rubrics are the right starting point for any outbound team — they encode strategic intent ("we sell to B2B SaaS, 200-2000 employees, post-Series B, US/EU HQ, modern data stack") and they're easy to operate. But they hit a ceiling that gets visible around the 6-month mark of running a serious outbound motion.
The specific failure modes of static ICP
Three patterns show up consistently in teams that rely on static ICP scoring beyond their first 6 months:
1. Two firmographically-identical accounts behave differently. A Series C SaaS at 400 employees in NY with a VP Data hired 60 days ago, replying at 28% to your funding-anchored outbound. A Series C SaaS at 380 employees in Boston with a VP Data hired 60 days ago, replying at 6%. Same rubric score. Wildly different actual behavior. Static ICP has no way to learn the difference.
2. The rubric over-weights features that don't matter and under-weights features that do. Most teams discover after 12 months that one of their rubric dimensions (often "intent score" or "geography") had ~zero predictive power for reply rate, while a dimension they ignored ("recent VP-level hire + tech-stack change in same quarter") was the single best predictor. Archetypes surface this automatically; static ICPs require humans to notice.
3. The world changes faster than the rubric. A rubric written in Q1 2025 doesn't know about MPP impact on email engagement, or the shift in how VPs of Data evaluate vendors post-Snowflake-cost-explosion, or any of the dozen market changes that have happened in the last 12 months. Archetype matching catches these implicitly because new reply data reshapes the centroids.
03How archetypes work (the 4 steps)
The mechanism, end to end, runs in four steps. Each is conceptually simple; the engineering complexity lives in scaling them and keeping them fresh.
- Cluster. Take every account in your CRM that has either (a) closed-won or (b) replied positively to outbound in the last 12 months. For each, extract a feature vector: signal-mix (which 8 signal types were active at the time of outreach, weighted by recency), persona (decision-maker role + seniority + function), ICP-fit score (the static rubric output), tone (which template variant got the reply), industry, geography, employee band. Run k-means or HDBSCAN on the feature vectors. You'll typically get 7-15 clusters with reasonable density.
- Label. A human looks at each cluster and writes a name. "Post-Series-C data-stack-migrating SaaS." "Late-stage healthcare-vertical RevOps-buyer." "Pre-IPO fintech CTO." The name has to be human-interpretable — both for the people who'll use it (SDRs reading "this account matches archetype Foo with 19% predicted reply") and for the editorial review of the clustering itself ("does cluster 4 actually make sense as a buyer type?"). Auto-naming via LLM is possible but produces names like "cluster_7_high_signal_density" which nobody trusts.
- Score. For each new account that enters the brief queue, compute the feature vector (same fields used in step 1). Compute cosine similarity to each archetype centroid. Take the closest match. The account inherits that archetype's historical reply rate as its predicted reply rate. Confidence interval is a function of (a) the cluster's tightness in feature space and (b) the cluster's sample size.
- Refine. Every night, re-pull the last N days of replies. Update archetype centroids (each new reply pulls the centroid slightly toward the new account's feature vector). Re-cluster monthly. Re-label quarterly (humans review whether the named archetypes still match the underlying clusters). New archetypes can emerge; old ones can fade out.
The hard part isn't any individual step. The hard part is keeping all four in continuous operation as the underlying data drifts. Most engineering organizations underestimate the maintenance load: nightly refresh jobs that fail silently, label drift as clusters shift but humans don't re-review, schema changes upstream that break feature extraction. Done right, archetype matching is a system that gets sharper with usage. Done poorly, it's a model that quietly decays.
04A sample archetype gallery
The abstract definition is harder to internalize than concrete examples. Here are four archetypes Mama has surfaced from real data inside actual customer engagements (anonymized). Each represents a tight cluster of historically high-replying accounts; new accounts that match an archetype's feature signature inherit its reply rate as the prediction.
Three things to notice about this gallery. First, the names are specific enough that an SDR reading the archetype label knows what to do: "Post-Series-C data-stack migrators" tells you exactly which template to pull and what tone to use. Second, the predicted reply rates vary widely (19-32%) even though all four archetypes are inside the same customer's ICP — confirming that within-ICP variation is real and operationally important. Third, the signal-mix descriptions reveal which combinations of signals actually correlate with replies — information that static ICP rubrics cannot capture.
A mature Mama-customer's archetype library typically has 10-15 archetypes, refreshed monthly, with one to two new archetypes emerging per quarter as the customer's outbound motion evolves.
05The accuracy evidence
The strongest case for archetype matching is the prediction-accuracy gap. Across the Mama customer cohort, the same accounts were scored with both methods, and the predicted reply rate was compared to the actual reply rate over the following 30 days. The error band is what matters.
The implication for operations is large. A prediction band of ±8 points means a "predicted 20% reply" account could actually reply at 12% or 28% — operationally, you can't tell whether to push hard or hold. A prediction band of ±3 points means the same predicted-20% account will reply between 17% and 23% — you can confidently commit SDR time, AE follow-up, even comp-plan acceleration. The forecasting becomes actionable.
The 6-12 month maturation curve is also worth noting. Archetype matching doesn't help on day one — it requires a closed-won baseline. Teams adopting this discipline should expect 60-90 days of "build the dataset" before scoring sharpens beyond the static-ICP baseline. After that, the compounding kicks in: month 4 is better than month 3, month 8 is dramatically better than month 4.
06How to extract archetypes from closed-won
If you want to operationalize archetype matching — either inside Mama or building your own — here's the practical sequence. This is the playbook Mama runs on each customer's closed-won dataset during onboarding.
- Confirm dataset minimum: 60+ closed-won deals. Below this, clustering produces unstable centroids that shift every time a new deal is added. Teams with less than 60 closed-won should run static ICP for now and revisit archetypes once they cross threshold. Most B2B SaaS teams hit 60 closed-won between 6-12 months of serious outbound.
- Pull the feature vector for each closed-won account. The seven fields that matter most: (a) signal-mix at time of first contact, (b) signal-mix recency-weighted, (c) decision-maker persona (role + seniority + function), (d) ICP rubric score at time of contact, (e) firmographic features (industry, employee band, geography), (f) technographic features (tools detected in stack), (g) which template + tone the SDR used. Time-of-contact matters — using current-state signals to score a deal that closed a year ago is anachronistic.
- Cluster. k-means with k=10 is a reasonable starting point. HDBSCAN is better at finding natural densities but requires more parameter tuning. Validate cluster quality with silhouette score (target ≥0.4); below that, the clustering is too noisy to be useful. Iterate on feature weighting until clusters separate cleanly.
- Human-label every cluster. Sit a sales-aware human in front of the clusters and have them write a name for each one. Names need to (a) be specific enough that an SDR could pull a relevant template, (b) match the operating reality of the recipient ("post-Series-C data migrator" not "high-fit account"), (c) be no more than 8 words. Clusters that resist naming are usually noise — drop them.
- Validate against held-out data. Set aside 20% of your closed-won dataset before clustering. After labeling, score the held-out accounts using the new archetypes; check whether actual reply rate matches predicted reply rate within ±5 points. If yes, the archetype model is real; if not, the clustering captured noise rather than signal.
- Wire into your brief / scoring pipeline. Every new account that hits ICP threshold should also get scored against the archetype library. Surface both numbers in the brief: ICP rubric score (strategic fit) AND archetype match + predicted reply rate (operational priority). SDRs use ICP to decide which accounts qualify; they use archetype score to decide which qualified accounts to actually work this week.
- Refresh and re-cluster on a calendar. Centroids drift as new replies accrue — refresh nightly. Full re-clustering monthly. Re-label quarterly. Watch for new emergent clusters (real archetypes that didn't exist 6 months ago) and decayed clusters (archetypes that no longer behave as they used to).
The whole process can be hand-rolled in a Jupyter notebook for the first cycle (most data scientists can do steps 2-5 in a week). Productionizing the nightly refresh + clustering + labeling is the engineering investment — typically 4-8 weeks of work for a small data team. Or you can use a system built for this (Mama) and skip the build.
07Common mistakes
The pattern of failure with archetype matching is consistent. Six mistakes show up most often.
08How Mama implements it — and why it's the moat
Archetype matching is implemented in Mama through a feature called Reply Loop. The mechanism stays close to the four-step model described above, with operational engineering on top.
The data layer
Mama ingests reply data from the user's sequencer (Outreach, Salesloft, Smartlead, Instantly, Lemlist) via webhook or scheduled pull. Each reply is classified by an LLM into one of four tiers: engaged, wrong-person, not-now, never. Only engaged + not-now (the meaningful positive band) feed the archetype-update pipeline. Wrong-person and never are filtered out as noise.
The clustering layer
Closed-won and engaged-reply accounts get vectorized using a 24-dimension feature space (signal-mix × 8 × recency-weight, persona × 4, ICP × 4, firmographics × 4, technographics × 4). HDBSCAN runs nightly to identify clusters; clusters with ≥7 members and silhouette ≥0.45 become archetypes. Smaller / noisier clusters are held as "emerging" and surfaced for human review.
The labeling layer
New emergent clusters get an LLM-generated name proposal but require human acceptance before they become live archetypes that SDRs see. This is the manual workflow step — the human review gate. In practice, customers spend ~30 minutes per quarter accepting/renaming new archetypes; the rest of the model runs automatically.
The scoring layer
At brief-generation time, the candidate account's feature vector is computed and matched against the live archetype centroids. The closest archetype's predicted reply rate (with confidence interval) appears in the brief. SDRs see both the ICP rubric score and the archetype prediction; they typically use ICP to decide whether to work the account at all and archetype to decide priority within their working list.
Why this is structurally hard to copy
The defensibility lives in the data combinatorics. To build archetype matching, you need both signal-level data (what was true about the account at time of contact) AND reply outcome data (whether the contact replied positively). Almost no tool sees both:
- Sequencers (Outreach, Salesloft, Smartlead) see reply outcomes but not signal-mix. They know which emails got replies; they don't know whether the account had a funding round closing at the time. They can build "winning template" patterns but not archetypes.
- Data platforms (ZoomInfo, Apollo, Clay) see signal-mix but not reply outcomes. They know which accounts had funding rounds last week; they don't know whether anyone replied to outreach about it. They can build firmographic-fit scores but not behavioral archetypes.
- Intent platforms (Bombora, 6sense) see directional buying signals but not actual sales outcomes. They predict who might be interested; they don't know who actually replied to your team's sequence.
Mama sees both layers because Mama runs the brief AND ingests the reply data. Every customer-month that runs through the system makes Mama's archetypes sharper. The flywheel compounds and the data accumulates. This is the structural moat that justifies the entire product line — and why competitors with sequencer-only or data-only datasets cannot replicate the offering even if they copy the UI exactly.
09FAQ
How is this different from Facebook-style lookalike modeling?
Same mathematical foundation, different feature space. Facebook lookalikes use consumer demographics + interest signals + ad-engagement data. B2B archetype matching uses firmographics + technographics + buying signals + reply behavior. The math is borrowed; the operating-context is novel for B2B outbound. Most B2B sales tools haven't run this play because they don't have the reply-outcome data needed to build the supervised signal.
What if my closed-won list has unusual segments — should I cluster them separately?
Yes, but only above the 60-deal threshold per segment. If your business has two cleanly-separated motions (e.g. mid-market self-serve and enterprise field-sold), run two separate archetype models, one per motion. The patterns that drive reply rate in each motion are different enough that mixing them produces blurry clusters.
How is archetype matching different from intent data?
Intent data is one input feature among many in archetype matching. Intent platforms tell you "this account is researching the category." Archetype matching asks "given that intent signal plus everything else we know about this account, how does this combination map to the historical accounts that replied to us?" Intent is a noisy single dimension; archetypes are multi-dimensional patterns calibrated against your team's actual reply behavior.
Won't archetypes overfit to historical patterns and miss new opportunities?
This is a real risk and the most thoughtful objection to the approach. Three mitigations: (1) the nightly refresh + quarterly re-clustering catches new patterns within ~90 days, (2) the "emerging cluster" view surfaces accounts that don't match any existing archetype — these are the new-opportunity flags, (3) we always score accounts against the full archetype library, not just the top-1 match, so unusual combinations get visible scores even when no archetype is a great fit.
Can I trust archetypes that shift quarterly? Doesn't stability matter?
Centroids shift incrementally — typically 2-5% movement per refresh cycle. Archetype identity stays stable (the "Post-Series-C data migrator" archetype doesn't disappear; its centroid just shifts slightly as new replies come in). Major shifts (>15% centroid movement, or a cluster's silhouette dropping below 0.4) trigger a human-review alert. SDRs see consistent archetype names and predictions; the underlying math evolves smoothly.
How long until the model is useful?
60 closed-won deals is the minimum threshold to start. The first archetypes are usable but predictions have ±5pt error. After 6 months of accumulated data, predictions tighten to ±3pt. After 12 months of data with continuous refresh, ±1.5pt. The model is useful at month 3 and is exceptional at month 12.
Static ICP gets you started. Archetype matching compounds.
Sign-up takes 4 minutes. Connect your CRM, give Mama 60+ closed-won deals to learn from, and watch the archetype library emerge over the first 90 days. ICP rubric included. Reply Loop included on Pro+. No card required for trial.