Case Study

First to Market

How One Australian Financial Services Organisation Put AI in Front of Members

Financial Services6 MonthsProduction AI
98%
Initial Accuracy
3mo
Prototype → Production
¼×
Traditional Cost
01

The Challenge

Two years of experiments. Zero member-facing deployments.

Financial services organisations have spent two years experimenting with generative AI. Yet most deployments remain safely internal-facing, where employees understand AI limitations and tolerate occasional errors.
Internal AI Projects~95%
Member-Facing AI~5%

Quality at Scale

Ensuring quality in a risk-sensitive environment

Evaluation Systems

Creating evaluation systems that scale

Secure Deployment

Deploying securely within tightly controlled infrastructure

External systems face a different reality. Demanding members expect consistent, reliable service. Regulatory requirements demand auditability. Mistakes have consequences. For financial services, there's no room for a response that's "mostly correct."

02

The 2% Problem

When 98% isn't good enough.

"We had a prototype that worked beautifully. Ninety-eight percent of the time."

— Client CTO

Large language models are non-deterministic. Give them identical inputs, and they might produce different outputs.

A test conversation that passes dozens of times might fail on the thirty-seventh run with a subtle phrasing variation.

For an AI assistant providing insurance guidance, that 2% failure rate could mean hundreds of members receiving incorrect information about their entitlements.

98%
Prototype Success
Works correctly
Failures

In a regulated industry: That's not acceptable.

03

Our Approach

Four pillars for production-ready AI.

Testing at Scale
01

The Solution

Co-developed an agent platform that uses AI to test AI, automatically generating diverse, multi-turn test scenarios and running batch tests to expose intermittent failures.

Impact

Hundreds of test conversations generated in hours, catching failures that appear once every fifty interactions

Evaluation Bottleneck
02

The Solution

Streamlined interface for claims assessors to review conversations in five-minute blocks during operational downtime, prioritised by user feedback scores.

Impact

Hundreds of expert evaluations without significant operational disruption

Security & Innovation
03

The Solution

Deployed entire platform within private Azure tenant with separate "Labs" environment for safe experimentation without touching production data.

Impact

Member data never left security boundary; innovation continued unimpeded

Lean Delivery
04

The Solution

Focused team of four to five people using AI-assistance at every layer—code generation, compliance documentation, test generation—working directly with engineers and claims experts.

Impact

Three months from prototype to production at one-quarter to one-third the cost of traditional engagements

04

What Shipped

From concept to production in six months.

Production Member-Facing AI Assistant for Insurance Claims

Six months from initial concept, the organisation deployed a production system providing 24/7 multilingual support. Members can explore options, understand eligibility, and initiate claims any time, including evenings and weekends when call centres are closed.

Unexpected finding

For sensitive topics like illness or death claims, some members preferred the privacy and lack of judgement from an AI interaction

Operational impact

AI-initiated cases arrive more complete, with faster assessment processes

Reusability

Infrastructure, processes, and expertise developed are reusable for additional use cases

Key Lessons

Don't skip strategy

Move from ideation to validated prototype in three months, then prototype to production in another three, with a small focused team.

Build evaluation capability alongside agent capability

Evaluation systems are as important as the AI itself. Recruit experts for evaluation work during prototyping, not after launch.

Deploy in your environment from day one

The Labs-then-production pattern resolves the security-versus-innovation tension. Start technical work inside your security perimeter.

Learn at production scale

Every production conversation generates evaluation data. Every edge case refined improves the system. Every month of operation builds expertise that compounds.

Freshwater Futures were the right partner to take us from idea to prototype to production. Their small, highly collaborative team iterated in days, not weeks.

Chief Technology Officer

Australian Financial Services Organisation

05

What This Means for Financial Services Leaders

Current State

Member-facing AI assistants are operational

Processing real claims, supporting real members, operating under real regulatory scrutiny

Still Running Pilots?

You're falling behind

Not because you lack technology, but because you're not learning at production scale

The competitive advantage comes from being first to learn.

The organisations still running pilots are falling behind. Not because they lack technology, but because they're not learning at production scale.

The question is whether you'll learn these lessons whilst leading or whilst catching up.

Ready to explore what's possible?

A proof-of-concept engagement typically takes 4–6 weeks. The platform deploys to your private environment, connects to your knowledge base, and produces a working agent using your actual content, validating the approach with your data, constraints, and stakeholders before significant investment decisions.

  • ✓ Seamless integration
  • ✓ Scalable to your needs
  • ✓ Expert support, every step
Get in Touch

No obligation — just a friendly chat about what's possible

AI Solutions