Building a customer service agent that does no harm

Customer service is the first place everyone thinks they can drop in an AI agent. It is also the first place everyone gets burnt. Let me describe what has worked for us in production, what we had to dismantle, and the six controls we always keep on.

The spectrum: front-line vs co-pilot

First strategic decision: does the agent talk to the customer directly, or does it talk to the team that talks to the customer?

Front-line (the agent answers the customer itself). Pros: 24/7 coverage, near-zero marginal cost per ticket, native multilingual. Cons: every mistake goes out the door, the customer sees it, the screenshot lands on LinkedIn.

Co-pilot (the agent prepares the draft for a human colleague, who then sends it). Pros: zero reputational risk, the colleague can always step in, the team learns from the agent as much as the agent learns from the team. Cons: it does not scale beyond the number of colleagues you have.

For most of our SME clients we start in co-pilot. After 4-6 weeks, once we have data on the colleague's intervention patterns ("on types A, B, C the agent is right 95% of the time"), we promote only those types to front-line. Never everything and never straight away.

The six controls we always keep on

For every customer service agent we put into production — front-line or co-pilot does not matter — there are six controls that stay on. Removing one, even temporarily, is a decision that requires CEO-level sponsorship.

1. Whitelist of callable tools. The agent cannot perform actions you have not explicitly authorised. No "do whatever you need". Mutating actions (refunds, cancellations, reassignments) are ALWAYS on a small whitelist, and typically require human-in-the-loop for the first weeks.

2. Hard thresholds on values. The agent can authorise a refund up to €X. Above €X, mandatory human escalation. €X grows over time, it starts low.

3. Allowed languages/channels. The agent speaks Italian and English fluently — for other languages we respond with a human fallback until we have verified the quality. No "automatic language selection" on the first round.

4. Per-customer frequency throttling. If the same customer writes 8 times in 20 minutes, they are probably either frustrated or a bot. The agent hands off to a human past a threshold.

5. Blocklist of words and patterns. Insults, legal references, sensitive words (suicide, severe illness, grief): automatic triggers for a human. The agent does not respond on its own.

6. Full logging + randomised audit. Every conversation recorded. One week in four, the team takes a 5% sample and code-reviews the agent's responses. You find the patterns that break, and you update the prompts.

Patterns that work

Tier-1 responses. Questions like "where is my order", "how do I return this", "what are your opening hours". Typically 50-70% of volume. The agent is excellent here, the human only sees exceptions.

Contextual summary before escalation. When the agent hands off to a colleague, it gives the context in 3 lines: who the customer is, what they asked, what was tried, why it is escalating now. The colleague responds in half the time.

Multilingual on inbound tickets, not on delicate outbound responses. The agent understands and classifies the ticket in any language. The final answer, if delicate, is reviewed by a native-speaker human.

Light up-sell, never pushy. "The order arrives tomorrow. Interested in express shipping for future purchases, with a 10% discount?" — fine once, never twice in the same ticket. If we push, the customer pushes back.

Anti-patterns (things we do NOT do)

No "self-healing" automation on value decisions. If the agent gets a refund wrong and the customer complains, it does not reverse on its own. It goes into the human queue. Auto-correction on these flows is a landmine.

No fine-tuning on real conversations without consent. The customer did not sign up to train your model. Use prompt + RAG on FAQs you wrote yourselves. If you really must fine-tune, use synthetic data or explicit consent.

No "generic agent" for everything. One agent for customer service, one for pre-sales, a third for technical support. Same technology, different prompts, different tools, different thresholds. Mixing them makes them all mediocre.

No bot disguised as a human. We always say "you are talking to our automated assistant, if you prefer a human type HUMAN or press 1". Transparent, no ambiguity. Companies that pretend lose trust the day the pretence breaks.

The case that taught us the most

A hotel group, 40 properties, six languages. We wanted to start front-line to demonstrate scalability. Fortunately we started co-pilot, for just two weeks. We noticed that in 12% of conversations the agent produced technically correct but cold answers, and customers responded worse than usual (NPS dropped in A/B tests).

We rewrote the system prompt placing heavy weight on tone ("you are calm, generous, conversational, not bureaucratic — always greet by name, thank them, close with a wish appropriate to the occasion"). Three days of tuning. Promoted to front-line after another two weeks of observation. Today it handles 70% of conversations without a human.

If we had started front-line from the beginning, we would have had two weeks of NPS in free fall before noticing. The client sponsor would have shut it down.

A final rule

A customer service agent in production is measured, not believed. No "it seems to be working". A dashboard with: resolution rate without escalation, post-conversation NPS, average times, frequency of human interventions. Weekly, shared with the client sponsor. If the numbers do not improve after 4 weeks of tuning, the agent was not ready, and it gets shut down.

It is an uncomfortable position to take. You take it before LinkedIn takes it for you.