SEO: indexing, robots.txt, sitemap.xml
The platform does not synthesize robots.txt or sitemap.xml. Authoring
them is your job, and you should do it when production is ready to be indexed
— a half-built portfolio in search results helps nobody.
What the platform handles automatically:
- Previews are never indexable. Every response from a preview facet (any
branch other than
main) carriesX-Robots-Tag: noindex, nofollow. You don't need to add anything to a preview to keep it out of search engines and AI crawlers — work freely on branches. - Production is unblocked by the worker — it never filters bots or injects
robots rules, so what you commit is what the worker serves. (Caveat: a
zone-level operator setting such as Cloudflare's Managed robots.txt or AI
Crawl Control can prepend/override rules at the edge, ahead of the worker —
outside this repo. If your committed
robots.txtmust be honored verbatim, confirm those are off for the production hostname.)
When to ship SEO files
Add robots.txt and sitemap.xml once the production space is real (real
content, real URLs you want indexed). Until then, leaving them off is fine —
search engines without explicit signals will crawl conservatively and you
won't have stale entries to clean up later.
Authoring robots.txt
Commit it to the workspace root. It's a static asset, served straight off R2
by handleAssetRequest — the worker is never invoked for it.
# Open to search engines and AI/LLM crawlers.
User-agent: *
Allow: /
# Content Signals (https://contentsignals.org) — explicit consent for AI use.
# search=yes — appear in search results
# ai-input=yes — AI assistants may use this page when answering questions
# ai-train=yes — this content may be used to train AI models
# Use `no` instead of `yes` to deny any signal.
Content-Signal: search=yes, ai-input=yes, ai-train=yes
Sitemap: https://<your-space-slug>.unfolder.space/sitemap.xml
For a custom domain, swap the host in the Sitemap: line.
If you want to deny AI training but still appear in search results:
User-agent: *
Allow: /
Content-Signal: search=yes, ai-input=yes, ai-train=no
Sitemap: https://<your-space-slug>.unfolder.space/sitemap.xml
Authoring sitemap.xml
Two approaches, pick the one that fits your site.
Static — commit the file
Simple, perfect for content that changes when you deploy.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://<your-space-slug>.unfolder.space/</loc></url>
<url><loc>https://<your-space-slug>.unfolder.space/about</loc></url>
<url><loc>https://<your-space-slug>.unfolder.space/work/portrait-series</loc></url>
</urlset>
Dynamic — generate it in your App worker
If your URLs come from data (e.g. blog posts in durable-sqlite), match
/sitemap.xml in App.fetch and return XML built from your data. Don't
commit a static sitemap.xml in this case — the static asset would shadow
the worker route.
async fetch(request: Request) {
const url = new URL(request.url);
if (url.pathname === "/sitemap.xml") {
const slugs = await this.listPostSlugs();
const urls = ["/", "/blog/", ...slugs.map((s) => `/blog/${s}`)]
.map((p) => `\t<url><loc>${url.origin}${p}</loc></url>`)
.join("\n");
return new Response(
`<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n${urls}\n</urlset>\n`,
{ headers: { "content-type": "application/xml; charset=utf-8" } },
);
}
// ...
}
Verifying
After deploy, fetch the production host:
https://<your-space-slug>.unfolder.space/robots.txthttps://<your-space-slug>.unfolder.space/sitemap.xml
For previews, both should be inaccessible to crawlers regardless of what you
commit — the platform's X-Robots-Tag: noindex rides on every response.
Reference: file placement
robots.txt and sitemap.xml are ordinary static assets. Per
unfolder://docs/bundling, anything that isn't .ts/.tsx/.js/.jsx/.json (or
that lives under public/) is treated as a static asset, so committing them
at the workspace root is enough — no config needed.