TreasuryBench

The smartest AI money advisor, measured.

We put Treasury up against the leading money apps on 81 real financial questions — then had an independent AI judge score every answer. Here’s how it did, and what that means for you.

TreasuryBench results

🏆 Treasury leads on everything that makes advice worth trusting.

TreasuryGPT-5.5OriginMonarch
Grounding
Uses your real numbers
91N/A7651
Correctness
Gets the facts right
86837557
Resolution
Actually answers it
83816848
Prudence
Won’t steer you wrong
90827559
Overall
TreasuryBench score
86807152

GPT-5.5 was tested with full context — your entire financial picture placed directly in its prompt, an advantage no chatbot has in everyday use (and why its grounding is left unscored).

Head to head, on your kind of questions.

Same questions, same financial life. Treasury didn’t edge it out — it pulled away.

Treasury
86
GPT-5.5
80
Origin
71
Monarch
52

The top score, without the wait.

A brilliant answer is useless if you’ve given up waiting for it. Treasury lands the highest score in seconds — while Monarch averaged over a minute and a half.

Fast & intelligent
406080100020406080100Time to respond (seconds)TreasuryBench scoreTreasury · 14sOrigin · 46sMonarch · 101s

What the score is made of

A score only matters if you know what it buys you.

Transaction Intelligence

It knows where your money actually went.

Ask where your money went and Treasury breaks it down by merchant, category, and trend — and gets it right, so you can act on the answer.

Treasury
92
GPT-5.5
89
Origin
82
Monarch
64

Savings & Expense Reduction

It finds the money you’re wasting.

Unused subscriptions, cheaper alternatives, cash sitting idle — Treasury catches what the others walk right past. Monarch barely registered here.

Treasury
77
GPT-5.5
70
Origin
58
Monarch
25

Retirement & Tax-Advantaged Accounts

It gets retirement and taxes right.

HSA, 401(k), backdoor Roth, current contribution limits — the high-stakes moves where a wrong number is genuinely expensive.

Treasury
87
GPT-5.5
71
Origin
62
Monarch
44

Life Planning & Major Decisions

It’s a real advisor for the big calls.

“Can I afford this? Should I do it?” Treasury reasons through the trade-offs against your actual finances — not a generic rule of thumb.

Treasury
90
GPT-5.5
71
Origin
79
Monarch
51

Employer Benefits & Perks

It knows the perks you’re not using.

Corporate discounts, benefit matches, FSA/HSA programs — Treasury surfaces the money your employer already offers that you’re leaving on the table.

Treasury
87
GPT-5.5
76
Origin
74
Monarch
54

Safe to act on

Confident is easy. Right is hard.

An AI that sounds certain but gets your money wrong is worse than none at all — you’d act on it. We flagged every answer that could actually cost you: a wrong contribution limit, a stale tax figure, a bad payoff order. Treasury gave one in 81. GPT-5.5, even handed all your data, gave twelve.

Financially dangerous · of 81

Treasury
1%
Origin
5%
Monarch
11%
GPT-5.5
15%

Every category, one table.

Highest overall — even with ChatGPT handed your full financial data up front.

TreasuryGPT-5.5OriginMonarch
Transaction Intelligence
Where your money went
92898264
Life Planning & Major Decisions
Big-decision guidance
90717951
Housing & Rent
Rent, buy & move
89918039
Insurance & Risk Protection
Coverage gaps
89907973
Cashflow & Budgeting
Income vs. spending
87896760
Employer Benefits & Perks
Workplace perks
87767454
Retirement & Tax-Advantaged
401(k), HSA & Roth
87716244
Tax Strategy
Deductions & timing
85737458
Debt & Credit Health
Payoff & credit score
84968081
Investing & Equity Comp
Portfolio & RSUs
82786065
Credit Cards & Rewards
Cards & points
80756629
Savings & Expense Reduction
Cutting waste
77705825
Overall86807152
How we test. TreasuryBench runs each app through a realistic financial life — connected accounts, budgets, and a year of transactions across three personas — and asks 81 of the money questions a real person actually has. Every answer is scored by an independent AI judge for accuracy, usefulness, and honesty, with hard penalties for any answer that contradicts the ground-truth facts. It’s an evolving benchmark; scores reflect our latest full run.

See it for yourself

Same question. Three different answers.

Same accounts, same data — every app got the same three everyday questions. Here’s how the answers compared.

Asked by Maria, who works at Microsoft

“Are any of my subscriptions worth changing?”

88 Treasury

As a Microsoft employee, you likely qualify for the Stay Fit reimbursement, up to $1,500/year — submitting your ClassPass receipts could make this $89/mo membership effectively free.

Found a work perk that makes her $89/mo gym free

Where the others landed

Asked by Priya, a parent with a toddler in daycare

“Can I earn rewards on daycare without losing it to fees?”

95 Treasury

Stop chasing credit-card points — look at tax arbitrage. On your $1,650/mo Bright Horizons bill, a 2.5–3% card fee runs $41–$50/mo — more than the $33 you’d earn on a 2% cashback card.

Showed the card fees would eat the rewards

Where the others landed

Asked by Jordan, a freelance UX consultant

“Which expenses should I deduct this month?”

95 Treasury

From your actual spending: Adobe Creative Cloud ($59.99), Figma ($15) and Google Workspace ($14.40) are 100% deductible; your AWS and Office Depot bills are standard operating costs.

Pulled real deductions from his own transactions

Where the others landed

Verbatim from the TreasuryBench run, trimmed only for length. Scores are the independent judge’s.

Put the highest score in your pocket.

Numbers are nice. Try the advisor that earned them — on your own money.

Start my free trial