AI Powered Product Practice

In this post I explore what it means to be an AI Product Practitioner as opposed to a "ChatGPT user". The proliferation of AI tech materially makes being a product person more rewarding. We keep hearing it allows us to "operate at speed and scale never possible before" — but what does it actually mean?

How AI Changes Product's Job

Aside from using these tools and agents to carry out research, write PRDs in specific formats, analyse data, produce wireframes, roadmaps and artefacts — all good stuff which will enable you to operate at scale — the change goes deeper than tooling.

We're no longer specifying requirements for deterministic systems and platforms. LLMs are probabilistic — the same input can produce different results. That changes how we write acceptance criteria, how we define "done", and how we measure success. We need a workable understanding of context windows, temperature settings, hallucination modes, latency trade-offs, and token costs. We need to be able to have credible conversations with Engineering and Architects to make good trade-off calls.

Prompt design is a product decision. The system prompts you give a model shape its entire behaviour — that belongs on the product side of the fence. Practice it, experiment, iterate, learn, improve — just like we would with any other new tool or technique.

How Traditional SaaS Measurements Will Evolve

We used to think and talk about traditional product metrics — conversion, retention, NPS. These are all still valid and valuable top-level product metrics, but an AI Product Leader in 2026 is now starting to think about:

Containment Rate — % of queries fully resolved without human escalation
Task Completion Rate — did the user accomplish the thing they came to do?
Hallucination / Error Rate — how often does the model produce confident wrong answers?
Retrieval Quality — are the right chunks being surfaced, and how are we maintaining freshness and accuracy?
Token Cost — how expensive is the service to run, and does the cost-benefit analysis stack up?

Building User Trust with AI Outputs

LLMs are probabilistic — they can produce different results each time. Product leaders now need to ask: "how do we reliably evaluate the output, and when should there be a human in the loop?" Here are the key levers:

Confidence Indicators — a numerical representation of how sure the AI model is on an answer. There are limitations to the technology (calibration, overconfidence), but implementing confidence indicators as part of any AI customer offering is a must.
Source Citations — allow users to verify the source of automated answers. Build further trust by flagging uncertainty and surfacing whether other users found the citation useful.
Graceful Fallbacks — LLM models can fail: API timeouts, rate limits, network availability, model unavailability. Graceful degradation allows the application to continue operating with reduced functionality. Having a plan here is critical to building user trust.
Human-in-the-Loop Guardrails — essential for ensuring safe, reliable, and trustworthy systems, especially in highly regulated industries where the cost of wrong AI output is high (Financial Services, Medical Diagnosis, Legal). Establishing risk thresholds and designing models which align to HITL guardrails is another critical piece of the product leader's toolkit.

Conclusions

Product leaders who matter in 2–3 years from now aren't the ones who have automated their backlog grooming, PRD production, and prototyping using an off-the-shelf chatbot. The ones who will stand out are those who can scope AI problems well, measure them honestly, and cultivate human trust incrementally. Knowing when not to use AI will be as important as knowing when to.

Feeling overwhelmed? Don't — practice makes perfect. Try it, build prototypes, set up agents and evals, learn from your mistakes. You can also follow people like Aakash Gupta and Carl Vellotti for great AI tips, techniques, and thinking in the AI Product space.