AnalyticsPythonPower BI2025

Cohort Retention & LTV Modeler

Replaced a 16-hour manual process with a Python pipeline feeding Power BI. First time the business had channel-level retention and LTV/CAC it could trust — went straight into leadership reviews.

40%Time cut

30%Accuracy ↑

₹40 CrTracked

Chapter

We were doing this in Excel

The retention numbers leadership wanted lived in a Google Sheet, refreshed twice a month by one analyst. It broke any time a row went in out of order. Each refresh cost about 16 hours.

The data was right there in the revenue ledger. It just wasn't reaching anyone who could use it — so every spend conversation that should have leaned on data leaned on opinion instead.

An analyst shouldn't be on call to rebuild a spreadsheet every Sunday.

Chapter

What I actually needed

So I tried doing the cohort math inside Power BI first — that's where the reports already lived. Wrong tool.

Cohort matrices need self-joins, and DAX isn't built for that. The realization: cohorts aren't a BI calculation. They're a data pipeline. The BI layer should just read the answer.

The rule I follow now

If a calculation is complex enough to need unit tests, it doesn't belong in a BI tool.

Chapter

Rebuilt it in Python

I rebuilt it as a small Pandas pipeline. Every night it computes the weekly cohort matrix and writes a flat table to Postgres. Power BI reads one view — no DAX, no joins, no off-by-one.

Two specific choices made it work. Weekly cohorts — monthly aggregation hides the week 2–6 churn cliff entirely. And the simplest defensible LTV formula: ARPU ÷ (1 − retention rate). I benchmarked it against fancier models; the difference was under 8% for mature cohorts.

Simple and auditable beats clever and opaque when leadership has to trust the number.

Cohort retention · Synthetic data

12-week retention by cohort

Cohort	W1	W2	W3	W4	W5	W6	W7	W8	W9	W10	W11	W12
2024-W01	100	76	63	56	50	46	43	41	39	38	36	35
2024-W05	100	79	66	58	52	48	45	42	40	38	37	36
2024-W09	100	77	64	57	52	47	44	42	40	38	37	36
2024-W13	100	81	68	60	55	50	47	44	42	40	39	38

Hover a cell for cohort detail.

Lower

Higher

Synthetic data above, same shape as the live model. The public version — running on the GA4 e-commerce sample on BigQuery — is on Kagglearrow_outward. Every filter recomputes the heatmap, channel curves, LTV:CAC, and a payback simulator. Nothing is hardcoded.

Chapter

Now I just update the data

Now it's the same routine every month: drop in the new ledger extract, run the notebook, done. The board deck pulls itself from the same view.

The first month we ran it, one channel showed a 30% higher 12-week retention than the next best — a finding that immediately reshaped acquisition spend. Today, ₹40 Cr+ of revenue sits in the LTV/CAC model that drives quarterly reallocation at the leadership level.

LTV / CAC calculator · Live

Adjust inputs, watch ratio

Average revenue per user (ARPU)5,000 ₹/month

Monthly retention rate85 %

Customer acquisition cost (CAC)3,000 ₹

Lifetime value (LTV)

₹33,333

LTV : CAC ratio

11.11

Strong

Formula: LTV = ARPU / (1 − retention). Ratio = LTV / CAC.
Healthy > 3 · Marginal 1–3 · Loss < 1.

The internal pipeline still uses the static LTV formula. The public Kaggle build adds a polynomial layer that predicts full LTV from first-week spend (R² ≈ 0.5–0.7 on the GA4 sample). Porting it back is next — especially for cohorts under 8 weeks old, where the static formula underestimates.

Financial Intelligence Dashboard