TreasuryBench

The smartest AI money advisor, measured.

We put Treasury up against the leading money apps on 81 real financial questions — then had an independent AI judge score every answer. Here’s how it did, and what that means for you.

TreasuryBench results

🏆 Treasury leads on everything that makes advice worth trusting.

	Treasury	GPT-5.5	Origin	Monarch
Grounding Uses your real numbers	91	N/A	76	51
Correctness Gets the facts right	86	83	75	57
Resolution Actually answers it	83	81	68	48
Prudence Won’t steer you wrong	90	82	75	59
Overall TreasuryBench score	86	80	71	52

GPT-5.5 was tested with full context — your entire financial picture placed directly in its prompt, an advantage no chatbot has in everyday use (and why its grounding is left unscored).

Head to head, on your kind of questions.

Same questions, same financial life. Treasury didn’t edge it out — it pulled away.

Treasury

GPT-5.5

Origin

Monarch

The top score, without the wait.

A brilliant answer is useless if you’ve given up waiting for it. Treasury lands the highest score in seconds — while Monarch averaged over a minute and a half.

Fast & intelligent

What the score is made of

A score only matters if you know what it buys you.

Transaction Intelligence

It knows where your money actually went.

Ask where your money went and Treasury breaks it down by merchant, category, and trend — and gets it right, so you can act on the answer.

Treasury

GPT-5.5

Origin

Monarch

Savings & Expense Reduction

It finds the money you’re wasting.

Unused subscriptions, cheaper alternatives, cash sitting idle — Treasury catches what the others walk right past. Monarch barely registered here.

Treasury

GPT-5.5

Origin

Monarch

Retirement & Tax-Advantaged Accounts

It gets retirement and taxes right.

HSA, 401(k), backdoor Roth, current contribution limits — the high-stakes moves where a wrong number is genuinely expensive.

Treasury

GPT-5.5

Origin

Monarch

Life Planning & Major Decisions

It’s a real advisor for the big calls.

“Can I afford this? Should I do it?” Treasury reasons through the trade-offs against your actual finances — not a generic rule of thumb.

Treasury

GPT-5.5

Origin

Monarch

Employer Benefits & Perks

It knows the perks you’re not using.

Corporate discounts, benefit matches, FSA/HSA programs — Treasury surfaces the money your employer already offers that you’re leaving on the table.

Treasury

GPT-5.5

Origin

Monarch

Safe to act on

Confident is easy. Right is hard.

An AI that sounds certain but gets your money wrong is worse than none at all — you’d act on it. We flagged every answer that could actually cost you: a wrong contribution limit, a stale tax figure, a bad payoff order. Treasury gave one in 81. GPT-5.5, even handed all your data, gave twelve.

Financially dangerous · of 81

Treasury

Origin

Monarch

11%

GPT-5.5

15%

Every category, one table.

Highest overall — even with ChatGPT handed your full financial data up front.

	Treasury	GPT-5.5	Origin	Monarch
Transaction Intelligence Where your money went	92	89	82	64
Life Planning & Major Decisions Big-decision guidance	90	71	79	51
Housing & Rent Rent, buy & move	89	91	80	39
Insurance & Risk Protection Coverage gaps	89	90	79	73
Cashflow & Budgeting Income vs. spending	87	89	67	60
Employer Benefits & Perks Workplace perks	87	76	74	54
Retirement & Tax-Advantaged 401(k), HSA & Roth	87	71	62	44
Tax Strategy Deductions & timing	85	73	74	58
Debt & Credit Health Payoff & credit score	84	96	80	81
Investing & Equity Comp Portfolio & RSUs	82	78	60	65
Credit Cards & Rewards Cards & points	80	75	66	29
Savings & Expense Reduction Cutting waste	77	70	58	25
Overall	86	80	71	52

How we test. TreasuryBench runs each app through a realistic financial life — connected accounts, budgets, and a year of transactions across three personas — and asks 81 of the money questions a real person actually has. Every answer is scored by an independent AI judge for accuracy, usefulness, and honesty, with hard penalties for any answer that contradicts the ground-truth facts. It’s an evolving benchmark; scores reflect our latest full run.

See it for yourself

Same question. Three different answers.

Same accounts, same data — every app got the same three everyday questions. Here’s how the answers compared.

Asked by Maria, who works at Microsoft

“Are any of my subscriptions worth changing?”

88 Treasury

As a Microsoft employee, you likely qualify for the Stay Fit reimbursement, up to $1,500/year — submitting your ClassPass receipts could make this $89/mo membership effectively free.

Found a work perk that makes her $89/mo gym free

Where the others landed

Asked by Priya, a parent with a toddler in daycare

“Can I earn rewards on daycare without losing it to fees?”

95 Treasury

Stop chasing credit-card points — look at tax arbitrage. On your $1,650/mo Bright Horizons bill, a 2.5–3% card fee runs $41–$50/mo — more than the $33 you’d earn on a 2% cashback card.

Showed the card fees would eat the rewards

Where the others landed

Asked by Jordan, a freelance UX consultant

“Which expenses should I deduct this month?”

95 Treasury

From your actual spending: Adobe Creative Cloud ($59.99), Figma ($15) and Google Workspace ($14.40) are 100% deductible; your AWS and Office Depot bills are standard operating costs.

Pulled real deductions from his own transactions

Where the others landed

Verbatim from the TreasuryBench run, trimmed only for length. Scores are the independent judge’s.

Put the highest score in your pocket.

Numbers are nice. Try the advisor that earned them — on your own money.

Start my free trial