I Gave an AI Agent the Keys to My Blog's SEO. Its First Report Lied to Me.
It called my sitemap broken (it was not) and my best query busywork (it converts at 20%). What LLM SEO agents get wrong about proportion, and the one thing mine got right.
I built a small agent to run the SEO on this blog. Nothing exotic: once a week it pulls Search Console, fetches the sitemaps, checks robots.txt and a sample of canonical tags, then writes a report with a tidy "Auto-Actions" section and hands it to me. The dream is obvious. Wake up, read a briefing, sip coffee, feel like I have a team.

drive the headless AI box from anywhere when SSH is not enough and you need the BIOS.
The first report arrived. It opened with this:
CRITICAL: Sitemap is broken. sitemap-tags.xml contains 316 URLs while sitemap-posts.xml has only 87. The site is likely missing 200+ pages from indexing.
My stomach dropped for roughly as long as it took to open the sitemap myself.
What was actually true
All 87 posts were in the sitemap. Every single one, including the two I had published that week. The "missing 200 pages" were tag archive pages: /tag/hpe/, /tag/ilo/, and three hundred-odd others. Ghost generates one page per tag. That is not a broken sitemap. That is a blog with a lot of tags. The agent had seen a number it disliked and pattern-matched its way to a disaster.
There is a real issue buried in there, to be fair. 233 of my 322 tags have exactly one post each, which is thin-content bloat worth pruning. But that is the opposite of "200 pages missing." The agent got the direction backwards and then set the severity to CRITICAL for good measure.

cheap edge inference for the small always-on jobs you do not want eating the GPU
Then it argued with itself
The same report dismissed my single best search query as garbage:
MAJOR: The query p52574_001_spp-gen9... is a filename, not a keyword. Optimizing for it is busywork.
That "filename" is an HPE Gen9 firmware ISO. People search the exact string, land on my BIOS-update post, and click through at twenty percent. By clicks it is the most valuable query the site has. And three sections later, in its own opportunities list, the same agent wrote: "Target this explicitly. High intent. 50% CTR." It held both the right and the wrong answer in one document and put the wrong one in the summary, which is the only part anyone reads.
The one thing it got right was admitting what it did not know
Buried near the bottom was the most useful sentence it produced all week:
Canonical probes only cover 4 URLs. The model falsely claims it scanned the whole site. This is a hallucination of scope.
It caught its own earlier overreach. That is the behaviour I actually want from an automated analyst. Not confidence, but a clear note of how little it really checked. I probed a dozen URLs by hand afterwards and the canonicals were fine, but the agent flagging its own thin sample was worth more than all of its CRITICALs put together.

What this taught me about LLM agents and SEO
- They hallucinate scope. "I scanned the site" tends to mean "I looked at four URLs." Make them report sample sizes, then refuse any verdict that outruns the sample.
- They confuse noise and signal at the wrong granularity. Mine wrote off pages as "5 impressions, ignore" when those pages had 80 to 300 impressions. It was reading query-level numbers and reasoning as though they were page-level.
- They flatten weird-but-real data. A query that does not look like a keyword, a filename or a part number, gets filed under "accident," even when it is quietly converting.
- Proportion is the hard part, not volume. The agent can produce forty observations without breaking a sweat. Knowing which one matters is the entire job, and it is exactly the thing it is worst at.
What I changed
I did not fire it. I rewrote its instructions. Feed it page-level and query-level data side by side. State plainly that tag pages are normal in Ghost and a tag-heavy sitemap is not an emergency. Forbid it from dismissing any query that has clicks. And force the summary to reconcile with the detailed sections instead of contradicting them.
The agent is a good intern. It reads everything, never gets bored, and produces a neat report. It also says CRITICAL about a non-problem with total confidence, and it will gently talk you out of your best traffic if you let it. So it drafts, and I decide. Which, now that I write it down, is exactly the arrangement I ought to disclose for this very post.

model weights and datasets eat disk faster than you think; buy the bigger one.
Support This Blog — Because Heroes Deserve Recognition!
Whether it's a one-time tip or a subscription, your support keeps this blog alive and kicking. Thank you for being awesome!
Tip OnceHey, Want to Join Me on This Journey? ☕
While I'm brewing my next technical deep-dive (and probably another cup of coffee), why not become a regular part of this caffeinated adventure?
SubscribeDOGE: DSYxsbfWKAX8wWED9aWeqLEVXU7KihKk6h