First to Market
How One Australian Financial Services Organisation Put AI in Front of Members
The Challenge
Two years of experiments. Zero member-facing deployments.
Financial services organisations have spent two years experimenting with generative AI. Yet most deployments remain safely internal-facing, where employees understand AI limitations and tolerate occasional errors.
Quality at Scale
Ensuring quality in a risk-sensitive environment
Evaluation Systems
Creating evaluation systems that scale
Secure Deployment
Deploying securely within tightly controlled infrastructure
External systems face a different reality. Demanding members expect consistent, reliable service. Regulatory requirements demand auditability. Mistakes have consequences. For financial services, there's no room for a response that's "mostly correct."
The 2% Problem
When 98% isn't good enough.
"We had a prototype that worked beautifully. Ninety-eight percent of the time."
— Client CTO
Large language models are non-deterministic. Give them identical inputs, and they might produce different outputs.
A test conversation that passes dozens of times might fail on the thirty-seventh run with a subtle phrasing variation.
For an AI assistant providing insurance guidance, that 2% failure rate could mean hundreds of members receiving incorrect information about their entitlements.
In a regulated industry: That's not acceptable.
Our Approach
Four pillars for production-ready AI.
The Solution
Co-developed an agent platform that uses AI to test AI, automatically generating diverse, multi-turn test scenarios and running batch tests to expose intermittent failures.
Impact
Hundreds of test conversations generated in hours, catching failures that appear once every fifty interactions
The Solution
Streamlined interface for claims assessors to review conversations in five-minute blocks during operational downtime, prioritised by user feedback scores.
Impact
Hundreds of expert evaluations without significant operational disruption
The Solution
Deployed entire platform within private Azure tenant with separate "Labs" environment for safe experimentation without touching production data.
Impact
Member data never left security boundary; innovation continued unimpeded
The Solution
Focused team of four to five people using AI-assistance at every layer—code generation, compliance documentation, test generation—working directly with engineers and claims experts.
Impact
Three months from prototype to production at one-quarter to one-third the cost of traditional engagements
What Shipped
From concept to production in six months.
Production Member-Facing AI Assistant for Insurance Claims
Six months from initial concept, the organisation deployed a production system providing 24/7 multilingual support. Members can explore options, understand eligibility, and initiate claims any time, including evenings and weekends when call centres are closed.
Unexpected finding
For sensitive topics like illness or death claims, some members preferred the privacy and lack of judgement from an AI interaction
Operational impact
AI-initiated cases arrive more complete, with faster assessment processes
Reusability
Infrastructure, processes, and expertise developed are reusable for additional use cases
Key Lessons
Don't skip strategy
Move from ideation to validated prototype in three months, then prototype to production in another three, with a small focused team.
Build evaluation capability alongside agent capability
Evaluation systems are as important as the AI itself. Recruit experts for evaluation work during prototyping, not after launch.
Deploy in your environment from day one
The Labs-then-production pattern resolves the security-versus-innovation tension. Start technical work inside your security perimeter.
Learn at production scale
Every production conversation generates evaluation data. Every edge case refined improves the system. Every month of operation builds expertise that compounds.
Freshwater Futures were the right partner to take us from idea to prototype to production. Their small, highly collaborative team iterated in days, not weeks.
What This Means for Financial Services Leaders
Member-facing AI assistants are operational
Processing real claims, supporting real members, operating under real regulatory scrutiny
You're falling behind
Not because you lack technology, but because you're not learning at production scale
The competitive advantage comes from being first to learn.
The organisations still running pilots are falling behind. Not because they lack technology, but because they're not learning at production scale.
The question is whether you'll learn these lessons whilst leading or whilst catching up.
Ready to explore what's possible?
A proof-of-concept engagement typically takes 4–6 weeks. The platform deploys to your private environment, connects to your knowledge base, and produces a working agent using your actual content, validating the approach with your data, constraints, and stakeholders before significant investment decisions.
- ✓ Seamless integration
- ✓ Scalable to your needs
- ✓ Expert support, every step
No obligation — just a friendly chat about what's possible