I Stopped Updating Sitemaps by Hand. Here is What Happened.
Technical SEO has a reputation for being boring. It is not the exciting part of marketing. You do not get applause for fixing a sitemap. But if your sitemap is wrong, search engines miss pages that could have brought you traffic.
I learned this the hard way.
Our website runs on WordPress. We publish pages, landing pages, blog posts, and documentation. Over time, the structure became messy. Old pages stayed indexed. New pages took days to show up in Google. Some pages never showed up at all.
The problem was not the content. The problem was discovery.
Google cannot rank what it cannot find.
What a sitemap actually does
A sitemap is a list. It tells search engines which pages exist on your website, where they live, and when they were last changed.
Think of it like a table of contents for a very large book. If the table of contents is missing pages, the reader will never find those chapters. Even if those chapters are the best part of the book.
Search engines crawl the web by following links. But websites are not always linked well. Some pages are orphans. They exist, but nothing points to them. A sitemap gives those pages an address that search engines can read.
Without a sitemap, Google has to guess. It crawls what it finds through links. Important pages can sit unnoticed for weeks.
With a sitemap, you raise your hand and say: “Here are the pages that matter.”
Why manual sitemaps break
Most WordPress sites use plugins to generate sitemaps. That works fine at first. But once your site grows, the plugin becomes a box you stopped thinking about.
Here is what kept going wrong for us.
First, the plugin generated one giant XML file. It included everything. Draft pages. Test pages. Old landing pages we forgot about. Search engines wasted crawl budget on junk instead of focusing on the pages that actually moved the business.
Second, the sitemap never matched reality. We deleted pages. We merged pages. We changed URLs. The plugin updated eventually, but not always in the right order. We would find 404 pages still listed in the sitemap weeks later.
Third, we had no visual view of the site structure. The XML file was just a list of URLs. I could not see which pages linked where. I could not spot orphan pages. I could not show the structure to our content team in a way they understood.
At some point, I realized we were treating the sitemap like a chore. Update the site. Hope the plugin catches it. Submit to Google Search Console. Wait.
That is a bad loop.
The automation idea
I wanted a different flow.
What if the sitemap updated itself automatically whenever the site changed?
What if we could crawl our own site, compare what we found to what should exist, and generate a clean sitemap from that?
What if we could also visualize the structure, so humans could actually read it?
That became our project: an automated visual sitemap generator built from our own crawler.
The stack
We already had three pieces in place.
- WordPress runs the public website.
- FastAPI handles background jobs and APIs.
- React Flow renders interactive diagrams in the browser.
We did not add a new platform. We connected what we already had.
WordPress stays the source of truth for content. FastAPI does the crawling and processing. React Flow turns the result into a map we can look at, share, and act on.
How the crawler works
Every day, FastAPI starts a crawl from the homepage. It visits each internal link it finds. It records the URL, the title, the last modified date, and the links on that page.
The crawler does not just make a list. It builds a graph. Page A links to Page B. Page B links to Page C and Page D. Page D links nowhere.
That graph becomes the input for everything else.
Once the crawl finishes, we run a few checks.
- Does this page still exist, or did it return a 404?
- Is this page reachable through internal links, or is it an orphan?
- Has the content changed since the last crawl?
- Should this page be in the sitemap at all?
Only pages that pass those checks make it into the final sitemap.
Why we generate instead of update
A static sitemap file is a snapshot. It represents the website at one moment.
A generated sitemap is a decision. It answers the question: “Based on what the crawler found today, what should Google care about?”
That difference matters.
When we merge two blog posts into one, the old URLs should leave the sitemap. When we publish a new landing page, it should enter immediately. When we change a title, the lastmod date should reflect that.
A generated sitemap reflects reality because it is built from reality.
We still schedule the generation. It runs every night. But we can also trigger it manually after a big content push. The output is always fresh.
What the visual map gives you
The XML sitemap is for Google. The visual map is for us.
React Flow draws the site as a network. Each page is a node. Each internal link is a line. You can zoom in, drag nodes around, and click into any page for details.
This changed how our team talks about the website.
Content writers can see which posts are well connected and which ones are stranded. Designers can see where the navigation creates dead ends. SEO people can find orphan pages in seconds instead of hours.
Before this, we argued about site structure in spreadsheets. Spreadsheets hide relationships. A visual map shows them.
The map also helps us spot patterns. We noticed one cluster of pages that all linked to each other but nothing linked back to them. They were a walled garden. We added three internal links from high-traffic pages, and those pages started ranking within weeks.
We would not have seen that in a text file.
The technical flow in plain terms
If you want to build something similar, here is the shape.
Your crawler starts at the homepage and follows internal links until it has visited every reachable page. It stores what it finds in a structured format, not just raw HTML.
Then a processor cleans the data. It removes duplicates, fixes trailing slashes, filters out URLs you do not want indexed, and marks pages that changed.
Next, a generator creates the XML sitemap. This is the file Google reads. It includes the URL, last modified date, and priority if you use that.
Finally, a renderer builds the visual map for humans. React Flow works well here because it handles large graphs without you writing custom canvas code.
The whole thing lives behind a simple admin page. One button triggers a crawl. Another button downloads the sitemap. The map updates automatically.
What changed for our SEO
The biggest change was speed.
New pages started getting indexed faster. Not because we hacked Google, but because we stopped making Google hunt for them. The sitemap was accurate the moment a page went live.
We also saw fewer crawl errors. Old 404 pages stopped appearing in the sitemap. Redirected pages were handled correctly. The signal we sent to search engines got cleaner.
Internal linking improved because the map made problems visible. Pages that had been orphans for months finally got parent pages pointing to them.
And honestly, the team spent less time on sitemap drama. No more manual exports. No more forgetting to resubmit after a launch. The system handled the boring part.
What I would do differently
If I rebuilt this today, I would make the crawler incremental from the start.
Right now it crawls the whole site every time. That is fine for our size, but it does not scale. An incremental crawl would only revisit pages that changed or pages linked from changed pages.
I would also add change detection at the content level. Right now we check if a page changed, but we do not always know what changed. Knowing whether the change is a title tweak or a full rewrite would help us decide which pages need re-priority.
And I would add notifications. When the crawler finds a new orphan page, it should ping Slack. When a 404 appears in the sitemap, it should flag it before we manually notice.
Small improvements. But they would make the system feel less like a tool and more like a teammate.
Should you build this?
Maybe not exactly this.
If you run a small site and a plugin works, keep the plugin. Automation is only worth it when the manual version starts breaking often enough to annoy you.
But if you are in this situation:
- Your site has hundreds of pages and keeps growing
- Your team changes content often
- You keep finding orphan pages or outdated URLs in your sitemap
- You want the structure of your site to be visible, not hidden in XML
Then building your own crawler-backed sitemap starts to make sense.
You do not need a big team. You need a crawler, a scheduler, and a way to see the output. WordPress, FastAPI, and React Flow worked for us. Other stacks work too.
What to do next
Pick one thing to fix this week.
Open your current sitemap. Find three URLs that should not be there. A test page. A redirect. A dead page.
Remove them.
Then look at your three most important pages. Are they in the sitemap? Are other pages linking to them? If not, that is your real work.
Sitemaps are not sexy. But they are the floor. And if the floor is uneven, everything you build on it wobbles.
Facing the same problem?
I work with marketing teams to automate reporting, build analytics dashboards, and replace manual data work with Python-powered workflows.
Start a conversation →