OrygoAI
AI Support Agents Handbook

The AI Customer Support Agents Handbook

Hello there and welcome 😊!

Thank you for showing interest in AI customer support agents and in this guide.

Before we dive into the real meat, I want to start with a quick premise and a short introduction.

Who am I and why am I writing this?

Fair question. And also: why should you listen to me when it comes to AI customer support agents?

Davide Petruzzi - Founder of OrygoAI

My name is Davide Petruzzi. If you're here, you've probably come across me on LinkedIn, where I've been putting a lot of effort into sharing what I'm learning in this space.

I'm an AI engineer and the founder of OrygoAI, an AI studio I recently launched that focuses on building tailor-made AI agents for customer support and inbound sales.

Over the last two years, I completely fell in love with autonomous AI agents and built a lot of them, especially customer-facing agents.

I've worked with scale-ups and brands like Spiagge.it, ShopCircle, Walliance, Ermenegildo Zegna, CallMeWine, TeamSystem, WeRoad, Utravel, and many others.

In total, I've worked directly on more than 10 AI customer support agents and studied hundreds of them in the wild. And across almost all of them, I noticed the same serious problem:

AI customer support agents often fail and fall short of expectations, not because of technology, but because companies don't know how to properly train and optimize them.

Training AI agents to reach something close to human-quality performance is actually very very hard. Once I realized this, I decided to go deep on this topic and set myself a pretty clear goal: become one of the best in this field.

A couple of examples

At Spiagge.it, the AI agent we built handles more than 10,000 conversations per month (both B2C and B2B) during peak season. We obsessively optimize it down to the smallest details, and today it reaches a 70% autonomous resolution rate. That means around 7,000 conversations every month are fully handled by AI, with positive feedback from users.

At ShopCircle, we built the AI agent for ReleasIT, one of their most successful Shopify apps. With 100,000+ customers served, they receive around 6,000 "how-to" questions per month. We worked like madmen to push the AI agent to 99% accuracy on up-to-date answers, leaving to human support only the requests that actually require human intervention.

Why I started OrygoAI

I started OrygoAI less than two months ago to solve AI customer experience and performance problems, and go all-in on this goal:

To make it extremely easy for businesses to train and deploy human-quality AI customer-facing agents that deliver great customer support and scale customer operations and inbound sales.

I'm approaching this in three ways:

  • by building OrygoAI as the best possible software to create, train, and customize AI customer support agents
  • by delivering the best possible, done-with-you implementation, training, and customization service for businesses
  • by sharing knowledge and education on how to train AI agents at the highest level possible

About this guide

This guide is basically a bullet-point, hyper-dense summary of the best practices, tips, and lessons I've learned while building AI customer support agents in real production over the last months.

Does This Guide Work Only on OrygoAI?

Short answer: no.

The advice, best practices, and lessons in this guide apply to any AI customer support agent, regardless of the product or platform you use to build it.

Before building Orygo, I worked with tools like Zendesk, Freshdesk, Intercom Fin, Decagon, Gorgias, Tidio, Chatbase, and many others. Some are better than others, of course, but the core principles of training AI agents are the same everywhere.

So yes, I hope you'll use Orygo, but that's completely up to you.

My goal is simple: to make Orygo the best possible product for training autonomous AI customer support agents, while also delivering top-level implementation services and practical education for businesses that want real results.

With that said, let's start.

Why AI Agent Training Is So Important

You can use the most powerful and expensive AI customer support platform on the market, but if you don't train it properly, it will still fail.

Instead of improving the customer experience, a poorly trained AI agent will frustrate users, create confusion, and end up damaging your business.

From hands-on experience, I've come to a pretty clear conclusion: the performance of an AI customer support agent is about 50% product and technology, and 50% training and ongoing optimization. Both matter equally. Great technology without great training will underperform every single time.

Additionally:

  • AI agents are still a relatively new technology, and there's a big knowledge gap when it comes to how to train them properly.
  • Most tools focus heavily on power, features, and models, but then leave users almost completely alone when it comes to operating, training, and improving the agent in real-world scenarios.

This guide exists to fill these gaps.

Benefits of Well-Trained AI Agents

  • Faster and more consistent customer support, which customers immediately notice and appreciate.
  • Much more scalable support operations, reducing dependence on labor-intensive human workflows.
  • Better use of human talent, allowing your team to focus on high-value work like customer success, sales, and complex or high-stakes requests instead of repetitive questions.
  • Better internal processes through documentation. Most customer operations rely heavily on tribal knowledge. Training an AI agent forces you to write things down, clarify edge cases, and clean up messy processes. In every company where I've implemented AI agents, this alone significantly improved both AI performance and human workflows.

Risks and Damage from Poorly Trained AI Agents (or No AI at All)

On the flip side, poorly trained, or completely absent, AI agents come with serious downsides:

  • Limited scalability, because customer support stays manual, knowledge-heavy, and hard to grow.
  • Inability to provide high-quality, personalized support at scale, especially as your customer or prospect base grows.
  • Frequent mistakes and inconsistent answers, which directly hurt customer satisfaction.
  • Revenue risk. Customer satisfaction is the foundation of retention and growth. Damage satisfaction, and you put customers, revenue, and long-term profitability at risk.

In short, AI agents are not "set and forget" tools. Their success depends just as much on thoughtful training and continuous optimization as on the underlying technology itself.

How AI Customer Service Agents Work

Customer service AI agents are basically a type of LLM agent built on a few core pillars.

At the heart of every agent is the large language model (LLM), this is the engine that understands and generates text. In Orygo, for example, we use frontier models like Claude or Gemini.

On top of the LLM, there are three major components that form the backbone of a customer support agent:

1. Knowledge Base This is where the AI pulls information to answer customer questions like how-tos, FAQs, and general product or service info.

2. API Actions These are the operating actions the AI can perform on your systems to resolve inquiries autonomously. Examples include:

  • Fetching a tracking number from Shopify
  • Reading a customer's ticket history
  • Booking a call
  • Opening a ticket
  • Cancelling an order

3. Operating Procedures This is the bread and butter of customer support AI. Most customer inquiries require multiple steps to resolve. They're rarely "one-shot" questions.

With procedures, we instruct the AI in natural language, step by step, on how to behave depending on the case in front of it. Think of this as the modern replacement for old, rigid if-then workflows.

A simplified example: The escalation procedure

"If a customer asks to speak with a human, Step 1: gather their name and phone number Step 2: open a ticket in the help center summarizing the request Step 3: tell the customer someone from our team will call within 2 hours."

Not all three components need to be fully used for every agent. For instance, some AI customer support agents are just Q&A systems that only use the knowledge base, with no API actions or procedures. Still, it's important to understand how all three work.

How the Flow Works

It's actually pretty simple:

  1. Customer asks a question AI understands the intent
  2. AI decides what to do
    • Pull from the knowledge base
    • Run a procedure
    • Execute an API action
  3. AI provides the response

Let's dive into each component in more detail.

How to build and optimize the knowledge base

A knowledge base is needed to answer informative questions - how-tos and FAQs, for example.

  • How do I do X on the web app?
  • Are there any discounts at the moment?
  • Will the store in Milan be open on the 31st of December?

It is a simple question and answer.

Newly born AI customer support agents have no knowledge. It must be created from scratch.

The best starting point to build a knowledge base are past tickets, usually from the last 3 months. Not too old, so they are still fresh and relevant. An FAQs list or help desk with articles can also help.

Let's first start with what NOT to do.

Here are the most common mistakes that lead to a bad knowledge base. And remember: Bad knowledge base = AI mistakes = bad customer experience = money goes away.

  • Simply uploading the FAQs or the help center into the AI agent knowledge without any other work. This is the most common and lazy mistake. 99% of the time, your list of FAQs or help center has many gaps and outdated areas.
  • Not extracting the FAQs from past tickets. Tickets are where real customer questions are. If your knowledge is not based on them, it is mostly fiction and will be full of gaps and inaccuracies.
  • Using tickets that are too old and irrelevant. This can create confusion and mistakes. We don't want the knowledge to be outdated or the AI agent will provide false info and mislead customers.
  • Not structuring the knowledge into FAQs but keeping paragraphs or other non-FAQs forms of content. I will tell you below why FAQs are the best way to structure the AI customer support agent knowledge base.
  • Uploading knowledge by scraping your website. Extremely bad. Websites are full of irrelevant info and are badly structured for knowledge retrieval in customer support.

What to do instead.

Build the Initial knowledge base from scratch.

Here is the exact playbook to follow to create a very well-made initial knowledge base from scratch.

In Orygo, we have built a feature called "IMPORTER" that does all of the following work automatically with a click of a button. However, you can still do an amazing job manually and step by step with free tools online.

Step 1: Get the raw input. Export the last 3 months of real tickets (in CSV or other format) from your help center. Here we have the source of truth. This is what customers actually contact us for. Nothing more real than this.

Step 2: Extract the initial list of FAQs for the knowledge. We need to extract all the questions and inquiries in bullet points from the export file. So take the raw ticket file and put it into ChatGPT, Claude, or Gemini and ask the AI to read everything and extract a list of FAQs.

Tips:

  • Deep research is great for this task.
  • You can also do it with a Python script that reads tickets one by one and appends new questions and inquiries into a final list. This works great.
  • Split the file and do the extraction multiple times if the number of tickets is large.

When you have extracted ALL the questions and inquiries from the last 3 months of tickets, then move to answer preparation.

Step 3: Prepare the answers for each FAQ in the knowledge. Now we add the answer to the extracted questions. To do this, we can use AI once again. So with a Python script or with ChatGPT/Claude/Gemini or whatever AI you want to use, write the FAQ answers one by one by taking raw knowledge from the ticket extraction file.

Tips:

  • Don't do question extraction and answer preparation in the same step. Focus the AI on 1 task at a time for better performance.
  • Do FAQ answers 1 by 1 for better focus. Don't batch FAQ answers. RAG in ChatGPT and similar tools works best when channeled towards 1 thing.
  • For easier work, if you do this manually, add the raw tickets file into a project/folder in ChatGPT so that it will automatically use that as a source for answering the FAQs.
  • NotebookLM is also a good free tool to do this.

Step 4: Human quality check. At this point, we have a rough but complete list of FAQs to add to the knowledge base of our AI agents. But most of the extraction and writing work has been done by AI. It is extremely important that a member of the team reviews this work before putting it into the AI agent:

  • Read all the answers and ensure they are correct.
  • Ensure the list of FAQs is as complete as possible based on your experience. It can easily happen that by doing this work manually with free tools, some questions will be missed.

Good, we have built a human-approved FAQ knowledge base for our agent. If you do this manually, this will take you a few hours of work. If you use our IMPORTER feature, you only have to do step 4.

Very important: If you already have an initial list of FAQs or a help center, you will start from a better starting point and everything will be much faster, but do steps 1, 2, and 3 on recent tickets anyway. It's 100% guaranteed that many customer questions will not be present in the help desk because most stuff is tribal knowledge in customer service.

You may think this is over, right? No, baby, it's just the start.

Let's now work on OPTIMIZING the knowledge base for the AI.

Optimize the knowledge base for the AI

To properly optimize a knowledge base for AI, you must know how semantic similarity works behind the scenes.

Here is a crash course in 5 bullet points without going too much into details:

  • AI doesn't actually read text; it transforms text into vectors of numbers. (This process is called embedding). In rough words, these numbers represent the "meaning" of the text.
  • Similarity is calculated by measuring the distance between these vectors. If two pieces of text have vectors that point in nearly the same direction (are "close" to each other), the AI considers them semantically similar.
  • This allows the AI to match "intent" rather than just keywords. Because "canine" and "dog" will share a similar space in the vector map, the AI knows they are related even though they share no common letters.
  • Now, when a customer asks a question that we need to answer using the knowledge base, the AI agent uses semantic similarity to identify in the knowledge base which FAQs are MOST relevant (aka similar) to answer the customer's question.

Our goal is to OPTIMIZE the knowledge so that it is easier and more effective for the AI to identify and retrieve the right FAQs, minimizing mistakes.

And this is also the crucial reason why we structure the AI agent knowledge as FAQs. The customer question is much more easily and effectively associated with the title of the FAQs. We are going as near as possible in meaning and similarity to the customer question.

Customer asks "how can I do X" → our FAQ title will mostly be "how can I do X". Perfect match.

To optimize, it is fundamental we add 2 other things:

a. A list of similar intent questions.

Basically, for each core question, we enhance the title by adding similar intent questions, alternative words, and synonyms so that the meaning is strengthened and retrieval is more successful.

Example: Optimizing the "Refund" FAQ

1. The Standard (Weak) Approach FAQ Title: "How do I request a refund?"

The Problem: If a user asks, "Can I get my money back?", the AI might miss this if its training on the link between "refund" and "money back" isn't strong enough in that specific context. The vector for "money back" might be slightly too far from "refund" in the geometric space.

2. The Optimized Approach (The "Cluster" Strategy) We strengthen the FAQ by manually attaching a list of alternative ways users ask the same thing. This creates a larger "gravity well" in the vector space, catching more variations.

Primary Question: "How do I request a refund?"

Optimized List of Similar Intents:

  • "I want to get my money back." (Catches the "money" keyword)
  • "Can I return this item for a full reimbursement?" (Catches "reimbursement")
  • "I am not happy with the product and want a reversal of the charge." (Catches emotional intent + technical banking terms)
  • "What is the process to cancel my purchase and be credited?" (Catches "credit" and "cancel")
  • "Issue a chargeback." (Catches aggressive/banking terminology)

b. A description of the FAQ that defines when to use the FAQ and what it contains.

This is especially important when FAQs are long and how-to guide-like. This helps the AI agent's knowledge retrieval by giving visibility into the content of the FAQs. In fact, in most advanced FAQ similarity search and retrieval systems, the answer is not included when doing the similarity evaluation because it dilutes the core meaning of the FAQ too much. Only the FAQ title, intents, and the description are included. The description helps by giving the AI concise visibility into the content of the answer.

Example: Optimizing a "How-to" Guide

The Scenario: You have a long, 10-step technical guide on how customers can migrate their data from an old platform to your new one.

1. FAQ Title (The Core Goal) "How do I migrate my data from my old account?"

2. Similar Intent Questions (The Variations)

  • "I want to move my files from my previous provider."
  • "Steps to import my database into this platform."
  • "Transferring my history from another app."

3. FAQ Description (The Semantic "Anchor") "This guide provides step-by-step instructions for the bulk transfer of user data, including file formats supported (.csv, .json), estimated timeframes, and troubleshooting for common import errors."

How to Set Up Procedures and API Actions

Alright, now that we've built the knowledge base, it's time to tackle Operating Procedures and API Actions.

These are two separate things, but I like to treat them together because procedures basically use API actions.

  • Operating Procedures – Step-by-step instructions the AI follows when it needs to handle an operational inquiry.
  • API Actions – GET, POST, PUT, DELETE actions on your systems that the AI can execute to complete a step in a procedure.

API Actions

In Orygo, we have two types of API actions:

1. Native API Actions

  • Built-in integrations directly in Orygo AI.
  • Examples (Shopify):
    • Read order status and tracking number
    • Cancel an order
    • Fetch product details

2. Custom API Actions

  • Custom integrations you can add with a URL and a schema.

Tips for API Actions:

  • Don't overload your agent. Keep the number of actions below 20 if possible.
  • Too many actions = prompt overload, higher chance of mistakes, higher LLM costs.

Operating Procedures

Procedures are step-by-step instructions for the AI to follow. They are crucial because most customer inquiries aren't solved with a single answer, they require multiple steps.

Here's how to write them properly:

1. Procedure Title

  • Should be simple and descriptive so the AI understands it immediately.
  • Example: cancellation_procedure

2. When to Use

  • This tells the AI when to trigger this procedure.
  • Include:
    • What the procedure does
    • When it should be used

Example:

Use this procedure when the customer wants to cancel an order. This procedure contains step-by-step instructions to cancel an order on Shopify.

3. Procedure Instructions

  • Always use step-by-step instructions.
  • For complex cases, you can use simple if/then conditions.

A simplified example: the Cancellation Procedure:

  1. Ask the customer for the order number, email, and reason for cancellation.
  2. Use get_order_status API action to check shipping status.
  3. If the order is not fulfilled, proceed with cancellation (steps 4–5). If it's already shipped, tell the customer a return must be made.
  4. Execute cancel_order action if the order is not fulfilled. Confirm the cancellation and notify the customer via email.
  5. Ask if you can help with anything else.

Special Procedures

Some procedures are almost always needed across customer support agents.

Here are some simplified versions of the most common ones.

1. Identity Verification Procedure

  • When to use: At the beginning of the conversation or before any sensitive API action.
  • Instructions:
    1. Request customer email and account number.
    2. Use fetch_user_record API action.
    3. Compare provided details with system data. If they match, continue; if not, ask the customer to verify or escalate.

2. Out-of-Scope / General Inquiry Procedure

  • When to use: When the customer asks something outside your business scope.
  • Instructions:
    1. Politely acknowledge the question.
    2. Explain your specialization (e.g., shipping, returns, product info).
    3. Redirect to relevant FAQs or services.

3. "I Can't Find an Answer" Procedure (Safety Net)

  • When to use: When the AI can't find relevant content in the knowledge base.
  • Instructions:
    1. Apologize and say the info isn't available.
    2. Don't hallucinate or guess.
    3. Offer the Escalation Procedure so a human can handle it.

4. Escalation Procedure (Requested Escalation)

  • When to use: Customer explicitly asks to "speak to a human" or "transfer to an agent."
  • Instructions:
    1. Acknowledge and empathize: "I understand. I'll get a human support team member to help you right away."
    2. Collect context (email, description) if missing.
    3. Check business hours:
      • During hours: connect to a live agent.
      • Outside hours: create a priority ticket and inform the customer of response time.
    4. Execute API action (e.g., zendesk_create_ticket) including chat transcript.
    5. Provide ticket reference and close politely.

5. Handoff to Human (Emergency Escalation)

  • When to use: If the AI detects frustration, repeated questions, or profanity.
  • Instructions:
    1. Acknowledge frustration: "I'm sorry I haven't resolved this yet."
    2. Prioritize the case for a human agent.
    3. Use API action (create_zendesk_ticket) and tag as "Priority_Support."

6. AI Triage to specialized Human teams (Programmed Escalation)

  • When to use: If the customer needs support with topic X.
  • Instructions:
    1. Ask the customer what is the issue with topic X
    2. Summarize the issue for our human team.
    3. Send a message into the "topic X" slack channel with high priority.

How to train and optimize the Agent: playbook and best practices

At this point, you have pretty much set up the bulk of your AI agent.

It is a lot of work, right? And you think you are ready to go live? Consider this: in my experience, at this stage, even if everything is done very well, AI customer service agents fail 3 inquiries out of 10.

A 30% error rate is a very high risk of damage for customer satisfaction.

We worked a lot, but at the end we simply built a good knowledge base and some operating procedures. But now the hard part. This is the real training. What we have done by now is merely initial setup.

70% has been easy to achieve, true. If you worked well, it took mostly 1 or 2 days to set up everything well. But going from 70% to 9X% accuracy is a tedious job and will require multiple sessions of work. Each % point is increasingly difficult to achieve and requires consistent analysis and optimization.

Here is the playbook.

1. What we are going to do now is a series of rounds of tests. Real world tests. Meaning we will take real NEW customer questions (not the ones we used for knowledge) and give them to the AI agent.

2. Before starting, we must appoint 1 member of the customer service team as the AI Quality Auditor. This is our domain expert. This person knows what a good answer looks like vs. a bad one, what a correct procedure looks like vs. a bad one.

3. Each round is approx 50-100 inquiry simulations. Do more if necessary, do less if too much based on your specific situation.

4. For each round, the Quality Auditor, aided with an AI engineer if possible:

  • Simulates the conversation with the AI agent
  • Rates the answer good or bad (good for tracking and for easy filters later)
  • For each bad answer: one by one, analyzes the reasoning, tool calls, inputs, and outputs. It is absolutely crucial to eliminate the AI black box and investigate behind the scenes step by step everything that the AI agent did to optimize it
  • Optimizes the cases ONE BY ONE

If you need to fix & optimize cases where knowledge has been used:

  • Analyze the reasoning = what did the AI understand and what was the logic behind its behavior
  • Analyze the search query = what search query did the AI craft to find the right FAQs in the knowledge base
  • Analyze the FAQs retrieved = what FAQs were retrieved, how many, were they the right ones?
  • Analyze the answer = why was the answer wrong, in which step did the AI make errors

To optimize the knowledge base:

  • If bad response was due to gaps in the knowledge base FAQs: fill the gaps by adding a new FAQ or by adding the missing info to the current FAQs
  • If bad response was due to a poorly written search query: improve search query writing instructions
  • If bad response was due to retrieval issues (not finding the right FAQs) → improve titles, intent questions, and descriptions
  • If bad response was due to bad understanding of the FAQ content → write the FAQ answer content in a simpler, clearer way or split long complex FAQs into smaller, easier to digest FAQs

If you need to fix & optimize cases where procedures and actions have been used:

  • Analyze the reasoning = what did the AI understand and what was the logic behind its behavior
  • Analyze the tool call decisions and steps = did the AI pick the right procedure/action to execute? Did the AI correctly follow the procedure? Did it miss some steps? etc.
  • Analyze tool call input = was the input data for the tool call correct?
  • Analyze tool call output = was the API response correct? Were there some errors?
  • Analyze answer = what was the final answer

To optimize procedures and actions:

  • If bad response was due to bad decision regarding which procedure/action to follow: write better "when to use", or reduce the number of procedures/tools for easier, simplified orchestration
  • If bad response was due to difficulties in following a procedure's steps: simplify the procedure, write simpler and fewer steps. Formulate the steps in a clearer, more straightforward way
  • If bad response was due to action call failure caused by bad inputs: improve the action description and schema to provide better guidance to the AI agent
  • If bad response was due to action call failure caused by API errors: ensure custom API actions are correct and server is working properly

5. Repeat for multiple rounds until the rate of correct good answers goes above 90%, ideally more than 95%.

We consider this a great performance rate. It will always be impossible to reach 100% as we assume there will always be new, unseen questions and inquiries from customers due to the simple fact that companies evolve, products change, and therefore this leads to never-seen-before questions and potential issues to train the AI on.

6. When you reach a satisfactory rate of correct answers, deploy and go live.

After go-live, run recurring AI quality audits on real AI-customer conversations. By reading random samples of conversations and optimizing detail by detail based on where the AI was lacking.

Frequency of AI audits depends on volume of inquiries and rate of business changes. Best practice in 90% of cases is to do them weekly on random samples of 100-ish questions.

Helpful Orygo Features

Now, all this can be done manually in various top platforms. But with Orygo we want to build the absolute best platform to train AI agents and we have built various features that make this process way easier.

Conversations Audit Logs: In the conversation section, you will find the Audit Logs = detailed information for each answer received, including the user data, the AI reasoning, tool calls executed, use of knowledge base, and any errors behind each answer.

AI monitoring and tags: An AI monitoring agent works in the background and automatically labels relevant conversations and questions with useful tags for you. For example:

  • Missed → tagged when the AI was not able to answer
  • Misstep → tagged when the AI didn't follow a procedure properly

Automatic topic analysis: Our topic analysis feature automatically extracts the most frequently asked topics (questions, intents, issues…) from AI-customer conversations and aggregates stats by topic so you can investigate and optimize on topics where the AI performs poorly and go faster directly where the fire is.

Key Metrics for AI Support Agents

1. Work Completed KPIs

  • Nr of Conversations:

    • The total count of unique customer sessions initiated with the AI.
    • Why it matters: This is the baseline for calculating Deflection Rate. It shows the volume of work the AI is taking off the human team's plate.
  • Nr of Questions:

    • The total number of individual prompts or queries sent by users across all sessions.
    • Why it matters: Helps identify the "chattiness" of customers and the complexity of their needs. High volume here can indicate either high engagement or a struggle to get a direct answer.
  • Avg Questions per Conversation:

    • The total number of questions divided by total conversations.
    • Why it matters: A high average often signals a "Looping" or "Comprehension" issue—where the AI is failing to resolve the intent quickly, forcing the user to rephrase or ask follow-ups.

2. Quality of Performance KPIs

  • % of Resolved Conversations:

    • The percentage of sessions where the user's issue was fully addressed without human help.
    • Why it matters: This is the "North Star" metric for ROI. Using DAIR (Direct AI Resolution) surveys provides ground-truth data rather than just guessing based on a closed window.
  • % of Good vs. Bad Answers:

    • A ratio based on user "thumbs up/down" or AI-automated QA scoring.
    • Why it matters: Helps pinpoint specific "Knowledge Gaps" or hallucination risks in the AI's training data.
  • % of Missed (Fallback Rate):

    • How often the AI triggers an "I don't know" or "I didn't understand that" response.
    • Importance: Directly tracks Intent Recognition health. High missed rates mean your AI needs more training on specific synonyms or topics.
  • % of Escalations / Deflection Rate:

    • The rate at which the AI hands off a session to a human agent.
    • Importance: Essential for capacity planning. If this is high, the AI is merely a "gatekeeper" rather than a "solver," which can frustrate customers.
  • % Errors:

    • Technical failures (API timeouts, crashes, or incorrect UI elements).
    • Why it matters: Monitors the stability of the integration. Even a smart AI is useless if the system connection is broken.
  • Hallucination Rate:

    • Percentage of answers flagged by AI auditors as factually inconsistent with your docs.
    • Why it matters: Protects your brand from legal or reputational damage caused by "made up" policies.

3. Cost KPIs

  • Credits/Monetary Cost per Answer:

    • The specific cost (API fees + compute) for a single response.
    • Why it matters: Helps you identify if specific complex topics (requiring more RAG/search) are becoming too expensive to automate.
  • Credits/Monetary Cost per Conversation:

    • The total cost from the "Hello" to the "Resolved."
    • Why it matters: Used to compare against Cost Per Ticket for human agents. If an AI session costs $2.00 and a human costs $5.00, your ROI is clearly defined.
  • Input/Output Tokens per Answer:

    • The raw data volume processed by the LLM.
    • Why it matters: Critical for latency and optimization. More tokens usually mean slower response times and higher costs.

4. AI Insights

  • 1. Topic & Issue Emergence

    • This KPI tracks the specific themes, product features, or pain points that trigger customer inquiries. It moves beyond simple keyword counting to identify the "Why" behind the contact.
    • Why it matters: This acts as an early warning system. By identifying new questions or recurring issues in real-time, your product and marketing teams can address the root cause of customer friction before it scales.
  • 2. Sentiment Analysis (Conversation vs. User)

    • Sentiment analysis measures the emotional tone of an interaction. Analyzing it by conversation shows how a specific issue was handled, while analyzing it by user tracks the long-term emotional health and loyalty of a specific customer across multiple touchpoints.
    • Why it matters: It provides a "gut check" on AI performance. A fast resolution is a failure if the customer leaves feeling frustrated; tracking sentiment ensures your AI is maintaining brand voice and customer satisfaction.
  • 3. Potential Lead & Customer Tagging

    • This involves the AI identifying "buying signals" or specific high-value personas during a support chat and automatically flagging them for the sales or account management teams.
    • Why it matters: It transforms your support center from a cost center into a revenue generator. By instantly routing potential leads to the right humans, you ensure that no growth opportunities are lost in a sea of support tickets.

Extra tips and best practices for optimizing performance and instructions

  • Use powerful models. Customer support ops are all about RELIABILITY. Based on my experience, I strongly recommend using only agents that are based on the best possible models. For example, Gemini 3.0 or Claude 4.5. You can use smaller, lighter, open-source agents, but as I said, it is a matter of reliability. What matters is reducing and eliminating the margin of error down to the smallest possible percentage. It is not because we need power for the sake of having brainpower in the agent; it is because we are dealing with customers and we want our agents to make as few hallucinations and mistakes as possible. Top frontier models are good at that.

  • Use of reasoning mostly depends on how complex operations are in customer service: In my experience, it is not really needed for straightforward Q&A support agents and can be left on the table in exchange for faster answers (better latency). However, if we have operating procedures involved, we definitely want reasoning power in our agent, because missing a step in the procedure could mean fucking up the entire case. We don't want that.

  • Provide strong context to your AI agents.

    • About the company and products: Always provide good context about your company and products. This really helps the AI support agent with relevant information to give customers answers that are better contextualized based on which company the AI agent is working for.
    • About its role: Clearly give and describe the role to the AI customer support agent.
    • About the channel it is working on: Clearly specify which channel the AI agent is working on, so that it can adapt the style of conversations based on it (e.g., email vs. chat vs. phone vs. Slack…).
  • Add a glossary to the AI. It helps the AI agent understand company-specific terms and tone of voice.

  • Separate AI support channels from human support channels. For example, AI on the chat and humans on the phone or email. I found:

    • Better satisfaction when the support expectations are clear to the user on where they can use AI support and where instead they can find human support
    • Better processes, escalations, and more effective human performance too by virtue of separation
    • No misunderstanding from user perspective regarding who is providing support, AI or humans
    • Elevation of human support to truly concierge support via call and dedicated channels

Alright, this is everything I know about AI customer support agents.

If you read until here, it means you are really, really interested in building one for your business too. Maybe with Orygo!

Let's have a chat if you want. We would love to help!

About this guide: keep coming back every now and then because I am going to add more and more content every couple of weeks. I will fill it with case studies, new lessons, and much more.

Cheers, Davide