SEO: indexing, robots.txt, sitemap.xml

The platform does not synthesize robots.txt or sitemap.xml. Authoring them is your job, and you should do it when production is ready to be indexed — a half-built portfolio in search results helps nobody.

What the platform handles automatically:

  • Previews are never indexable. Every response from a preview facet (any branch other than main) carries X-Robots-Tag: noindex, nofollow. You don't need to add anything to a preview to keep it out of search engines and AI crawlers — work freely on branches.
  • Production is unblocked by the worker — it never filters bots or injects robots rules, so what you commit is what the worker serves. (Caveat: a zone-level operator setting such as Cloudflare's Managed robots.txt or AI Crawl Control can prepend/override rules at the edge, ahead of the worker — outside this repo. If your committed robots.txt must be honored verbatim, confirm those are off for the production hostname.)

When to ship SEO files

Add robots.txt and sitemap.xml once the production space is real (real content, real URLs you want indexed). Until then, leaving them off is fine — search engines without explicit signals will crawl conservatively and you won't have stale entries to clean up later.

Authoring robots.txt

Commit it to the workspace root. It's a static asset, served straight off R2 by handleAssetRequest — the worker is never invoked for it.

# Open to search engines and AI/LLM crawlers.
User-agent: *
Allow: /

# Content Signals (https://contentsignals.org) — explicit consent for AI use.
#   search=yes   — appear in search results
#   ai-input=yes — AI assistants may use this page when answering questions
#   ai-train=yes — this content may be used to train AI models
# Use `no` instead of `yes` to deny any signal.
Content-Signal: search=yes, ai-input=yes, ai-train=yes

Sitemap: https://<your-space-slug>.unfolder.space/sitemap.xml

For a custom domain, swap the host in the Sitemap: line.

If you want to deny AI training but still appear in search results:

User-agent: *
Allow: /
Content-Signal: search=yes, ai-input=yes, ai-train=no
Sitemap: https://<your-space-slug>.unfolder.space/sitemap.xml

Authoring sitemap.xml

Two approaches, pick the one that fits your site.

Static — commit the file

Simple, perfect for content that changes when you deploy.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
	<url><loc>https://<your-space-slug>.unfolder.space/</loc></url>
	<url><loc>https://<your-space-slug>.unfolder.space/about</loc></url>
	<url><loc>https://<your-space-slug>.unfolder.space/work/portrait-series</loc></url>
</urlset>

Dynamic — generate it in your App worker

If your URLs come from data (e.g. blog posts in durable-sqlite), match /sitemap.xml in App.fetch and return XML built from your data. Don't commit a static sitemap.xml in this case — the static asset would shadow the worker route.

async fetch(request: Request) {
	const url = new URL(request.url);
	if (url.pathname === "/sitemap.xml") {
		const slugs = await this.listPostSlugs();
		const urls = ["/", "/blog/", ...slugs.map((s) => `/blog/${s}`)]
			.map((p) => `\t<url><loc>${url.origin}${p}</loc></url>`)
			.join("\n");
		return new Response(
			`<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n${urls}\n</urlset>\n`,
			{ headers: { "content-type": "application/xml; charset=utf-8" } },
		);
	}
	// ...
}

Verifying

After deploy, fetch the production host:

  • https://<your-space-slug>.unfolder.space/robots.txt
  • https://<your-space-slug>.unfolder.space/sitemap.xml

For previews, both should be inaccessible to crawlers regardless of what you commit — the platform's X-Robots-Tag: noindex rides on every response.

Reference: file placement

robots.txt and sitemap.xml are ordinary static assets. Per unfolder://docs/bundling, anything that isn't .ts/.tsx/.js/.jsx/.json (or that lives under public/) is treated as a static asset, so committing them at the workspace root is enough — no config needed.