Reply rate.
The most-gamed metric in B2B outbound. Almost every public reply-rate number you've seen is inflated — by counting auto-responders, by hiding bounces, by ignoring "please stop" replies, by quietly excluding the bad sequences. The honest definition, what most sequencers actually count, real benchmarks by signal type, and the discipline that turns a vanity number into a useful one.
Plain English.
If 100 cold emails go out and 12 things land in your inbox in response, your sequencer probably shows "12% reply rate." But if 6 of those were out-of-office auto-responders, 3 were "unsubscribe me" requests, and 1 was a hard bounce that got mis-classified, your actual reply rate is 2%. The 12% is what looks good on a screenshot. The 2% is what predicts pipeline.
This matters because reply rate is the most-quoted metric in outbound, the basis for compensation conversations, and the number most teams optimize against. Optimizing against the wrong number for a quarter is how outbound desks end up with a healthy dashboard and no pipeline.
What counts vs what tools count.
The gap between the honest reply rate and the dashboard reply rate isn't a software bug — it's a UX choice the sequencer vendors made because higher numbers look better in product screenshots. Side-by-side:
The honest reply-rate number for a sequence is roughly: (genuine human replies + "wrong person" referrals) ÷ delivered emails. Most sequencer dashboards show: (any inbox event including OOOs, NDRs, and unsubs) ÷ sent emails. The two are not the same number. They can diverge by 3–5×, especially during summer (when OOOs spike).
Five ways the number gets gamed.
The most common patterns in the wild — observed across the outbound corpora we have audit access to. Some are tooling defaults; some are deliberate.
- Auto-responder inflation. The largest single contributor. Every OOO email counts as a "reply" in most sequencer UIs. During Q3 (summer holidays) and December, OOO volume can be 2–4× higher than baseline — inflating reported reply rates by ~ 4–8 percentage points across the same campaign.
- Cherry-picked sequence selection. The "we get 18% reply rates" claim usually means "our best-performing sequence in our best month had 18% replies." Asking for the trailing-12-month median across all sequences typically returns half that number. Public benchmarks are nearly always cherry-picked.
- Step-1-only reporting. Reply rates degrade sharply by step (step 1: 4–8%, step 2: 1–2%, step 3: < 1%). Quoting step-1 reply rate as "campaign reply rate" inflates the number by ~ 2×. Honest reporting uses campaign-level (all-steps) reply rate.
- Excluded bounces from the denominator. Reply rate is "replies ÷ sent." Some teams quietly switch the denominator to "replies ÷ delivered" — which makes the rate look better if delivery is low. Both are valid choices, but the change without note is a tell.
- Treating "unsubscribe please" as engagement. An "unsub me" reply is a real signal — but it's a negative one. Counting it as engagement is technically correct (the human did engage) but misleading when used to justify campaign performance. Most public reply-rate claims include unsub replies in the numerator.
Honest benchmarks by signal anchor.
What "good" actually looks like, measured honestly (OOOs excluded, all-steps, trailing-12-month median across our and four customer corpora). Variance comes from ICP fit, send timing, and how strong the signal anchor is.
The numbers most outbound vendors quote — "our customers see 25%+ reply rates!" — are nearly always cherry-picked best-month, step-1, OOO-included variants. The 10–16% range at the top of the chart above is what's achievable on the honest definition with disciplined signal-anchored outbound. Above ~ 18% honestly-measured, you're probably looking at warm outbound or a misclassified metric.
How to measure it honestly.
The discipline is simple — costing nothing more than the willingness to look at a smaller number on your dashboard.
- Filter OOOs out of the reply count. Every major sequencer supports auto-reply detection. Turn it on. The number drops; the number becomes useful.
- Report campaign-level, not step-level. "Reply rate" should mean "of the people who entered this sequence, what % responded at any step." Step-1 is interesting but secondary.
- Use replies ÷ sent, not ÷ delivered. Delivered is endogenous — your deliverability problems become your reply rate problems. Sent is the honest denominator.
- Exclude unsubs from the numerator. They're engagement, but they're not the engagement you're optimizing for. Track them separately.
- Report trailing-12-month median, not best month. Best month is marketing copy; median is what predicts next quarter.
Teams that adopt the honest definition typically see their reported reply rates drop by 40–60% in week 1. Don't flinch. The actual performance hasn't changed — only the measurement honesty has.
How Mama measures it.
Mama doesn't send the email — your sequencer does — so we don't own the reply-rate reporting directly. But when we publish reply-rate data on this site (in playbooks, in customer stories), here's the discipline we use:
- OOOs excluded from the numerator. Always.
- Campaign-level reporting. Step-1 numbers are called out separately when relevant.
- Trailing-12-month median where possible, or a named window for shorter datasets.
- Sample size disclosed. A 14% reply rate across 40 sends is anecdote; across 2,000 sends it's evidence.
- Source of data disclosed. Our outbound, customer outbound (anonymized with permission), or industry corpora — never mixed without note.
The numbers in our playbooks are honestly-measured, which is why they look smaller than the marketing-copy versions you'll see elsewhere. The smaller honest number is the one your finance team should trust.