Agent washing: how to tell a real AI agent from an expensive chatbot
Most products sold as “AI agents” are rebranded chatbots — “agent washing.” How to tell a real AI agent from a fake, and what you should actually pay.
The phrase “AI agent” is on almost every software product sold in 2026. Most of the things wearing it are not agents. There is a name for the gap now — “agent washing” — and learning to see through it is worth a five-figure mistake avoided.
The term is Gartner’s. In mid-2025 the firm defined agent washing as the rebranding of existing products — chatbots, robotic process automation, simple assistants — as agentic, without the substance to back the claim. This is not a hype cycle that quietly corrected itself, either: Gartner was still formally warning buyers about agent washing in May 2026, a full year on. The label outran the technology, and the gap between them is exactly where buyers lose money.
What actually separates an agent from a chatbot
A chatbot responds. You send a message, it makes one call to a language model, it sends a reply, and it is finished. It has no plan, and no memory of why it is doing anything. That is not an insult — for plenty of jobs it is exactly the right tool. It is simply not an agent.
A real agent pursues a goal. Given an objective, it breaks the work into steps, uses tools — your CRM, a database, an API — to act on real systems, checks whether each step worked, and adapts when one does not. The technical name for that cycle is the ReAct loop — reason, act, observe, then reason again — and it comes from research published by Princeton and Google in 2022. Reason, act, observe, repeat, until the goal is met.
The number that gives it away
The cleanest tell, and one you do not need an engineer to check, is how many model calls a single task makes. A chatbot makes one — ask, answer. An agent makes many: three, ten, twenty, looping until it is done. That is also why agents genuinely cost more to run — every turn of the loop is a paid model call. If a vendor’s “agent” makes one call and returns a reply, you are looking at a chatbot, whatever the invoice says.
The test that exposes it
You do not need to read code to catch agent washing. The litmus test is recovery. Ask the vendor a single question: when this handles a request, can it take an action in another system, notice that the action failed, and try a different path on its own? A genuine agent can — the observe step in its loop exists for precisely that. A chatbot cannot, because it has no steps to recover between.
Make them break it
Take it further in the demo. Ask the vendor to make a tool call fail on purpose — a wrong password, an API that times out — and watch what happens. A real agent observes the failure and re-plans. A dressed-up chatbot stalls, or worse, invents a confident answer. Then ask to see the decision log: what the agent decided, why, and what it did at each step. A real agent produces one as a matter of course. A chatbot has nothing to log, because it never decided anything.
Why the distinction costs real money
This is not pedantry about vocabulary. Agent washing matters because you are very likely being quoted agent prices for chatbot capability.
Rough industry build costs make the gap concrete — treat these as estimates rather than gospel, but the orders of magnitude are right. A scripted chatbot or FAQ bot is a modest build. A retrieval bot that answers from your own documents is a real project. A genuine multi-step agent that plans and acts across systems is a different scale of work again. The same two words on a proposal — “AI agent” — can sit on top of any of the three.
And the step up from a single agent to a multi-agent system is not double the cost — it is commonly five to ten times, because the orchestration, the failure handling between agents, the shared memory and the evaluation framework are the actual work. A vendor quoting multi-agent money for a single scripted flow is not selling you capability. They are selling you a label.
A chatbot answers questions. An agent gets the job done. The price should follow the capability — and under agent washing, it too often follows the label instead.
When you actually need an agent — and when you do not
Here is the thing the agent-washing vendor will never volunteer: most businesses asking for an “AI agent” do not need one. There are three tools, and only the third is an agent.
- A chatbot is the right — and far cheaper — tool when the job is answering: predictable questions, FAQ deflection, routing, scripted support flows.
- A retrieval bot is right when people need accurate answers drawn from your own documents — policy lookups, internal knowledge search. It answers well; it still does not do anything.
- A genuine agent earns its price only when the work is truly multi-step, judgment-based, and crosses systems — planning a path, acting on several tools, checking results, and adapting when something breaks.
The honest vendor tells you which of the three your problem needs, even when the answer is the cheap one. The agent-washing vendor sells you the expensive one regardless. Whether a vendor will ever talk you down to the simpler tool is the single most reliable signal you will get about who you are dealing with.
A buyer’s checklist
Bring these to any conversation that has the words “AI agent” in it. None of them needs a technical background to ask, and the answers sort the real from the rebranded quickly.
- Show me this handle a goal that needs three or more steps and a real action in another system — live, not on a slide.
- Make a step fail on purpose. Does it recover, or does it stop?
- Show me a decision log — what did it decide, why, and what did it do?
- How many model calls does a typical task make?
- Is this genuinely an agent, or a chatbot or retrieval bot — and why is that the right tool for my problem?
- What does it cost to run each month at my volume, and what moves that number?
Red flags
A few patterns should make you slow down. A demo that dwells on how naturally the thing chats rather than what it actually does. Vague metrics, and vaguer answers to technical questions. Unrealistic timelines and promises of a return in the first month. No decision logging and no audit trail. And a vendor who cannot clearly diagram their own system — where the data comes in, where the reasoning happens, how actions are logged. A real builder can draw it on request. A reseller cannot.
One genuinely new thing worth knowing
If you want a single current question that signals you have done your homework, ask a vendor whether they build on MCP — the Model Context Protocol. Introduced by Anthropic in late 2024, by 2026 it has become the industry-standard way for agents to use tools. Before it, every tool integration was bespoke plumbing; MCP is closer to a universal adapter, and it means an agent is not locked to a single model provider. A vendor who can talk fluently about it is usually a builder. A vendor who looks blank is usually a reseller.
How Zaibex approaches it
We build all three — chatbots, retrieval systems, and genuine agents — and the first thing we do on any project is tell you which one your problem actually needs, even when that is the cheaper one. The free discovery call and the free audit exist precisely so that decision is made in the open, with your real workflow on the table, before anyone quotes you for a capability you may not need. You should pay agent prices for agent work — and not a dollar more.