Your Terms: robots.txt, indexing, sitemap.xml

A space is private until you opt in. If you have not authored a robots.txt, the platform synthesizes a deny-everyone default on GET /robots.txt:

User-agent: *
Disallow: /

So a half-built space isn't indexed before it's ready. To open up, write your own robots.txt — it is served verbatim off R2 and always wins over the default. Your job flips from "write robots.txt to lock things down" to "write robots.txt to open up when production is ready to be indexed."

What the platform does for you:

Synthesizes a deny-all robots.txt only when you haven't authored one. The moment you publish your own public/robots.txt, it takes over (the default applies only on the miss). The synthesized response is tagged X-Unfolder-Robots: default-deny so you can tell it apart from an authored file.
Never stamps indexability onto your pages. Page responses carry no X-Robots-Tag — what your worker/assets serve is what the visitor sees. (Caveat: a zone-level operator setting such as Cloudflare's Managed robots.txt or AI Crawl Control can override at the edge, ahead of the worker — outside this repo. For the worker-served robots.txt to be honored verbatim, those must be off for the production hostname.)

Opening up — author your `robots.txt`

Save it under public/ (or the workspace root — either works for a .txt). It's a static asset, served straight off R2 by handleAssetRequest; the worker is never invoked for it.

# Open to search engines and AI/LLM crawlers.
User-agent: *
Allow: /

# Content Signals (https://contentsignals.org) — explicit consent for AI use.
#   search=yes   — appear in search results
#   ai-input=yes — AI assistants may use this page when answering questions
#   ai-train=yes — this content may be used to train AI models
# Use `no` instead of `yes` to deny any signal.
Content-Signal: search=yes, ai-input=yes, ai-train=yes

Sitemap: https://<your-space-slug>.unfolder.space/sitemap.xml

For a custom domain, swap the host in the Sitemap: line.

Deny AI training but still appear in search results:

User-agent: *
Allow: /
Content-Signal: search=yes, ai-input=yes, ai-train=no
Sitemap: https://<your-space-slug>.unfolder.space/sitemap.xml

Keep the space fully private (the same as authoring nothing, but explicit):

User-agent: *
Disallow: /

When to open up

Leave the deny-all default in place while the space is a work in progress — search engines and AI crawlers stay out, and you have no stale entries to clean up later. Author your own robots.txt once production is real (real content, real URLs you want indexed). Until you do, the space stays private by default.

Authoring `sitemap.xml`

Two approaches, pick the one that fits your site.

Static — save the file

Simple, perfect for content that changes when you publish.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
	<url><loc>https://<your-space-slug>.unfolder.space/</loc></url>
	<url><loc>https://<your-space-slug>.unfolder.space/about</loc></url>
	<url><loc>https://<your-space-slug>.unfolder.space/work/portrait-series</loc></url>
</urlset>

Dynamic — generate it in your `App` worker

If your URLs come from data (e.g. blog posts in durable-sqlite), match /sitemap.xml in App.fetch and return XML built from your data. Don't ship a static sitemap.xml in this case — the static asset would shadow the worker route.

async fetch(request: Request) {
	const url = new URL(request.url);
	if (url.pathname === "/sitemap.xml") {
		const slugs = await this.listPostSlugs();
		const urls = ["/", "/blog/", ...slugs.map((s) => `/blog/${s}`)]
			.map((p) => `\t<url><loc>${url.origin}${p}</loc></url>`)
			.join("\n");
		return new Response(
			`<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n${urls}\n</urlset>\n`,
			{ headers: { "content-type": "application/xml; charset=utf-8" } },
		);
	}
	// ...
}

A worker can also serve robots.txt dynamically the same way — match /robots.txt in App.fetch and return it. A worker-served robots.txt wins over the synthesized default too (the default only applies when nothing answers).

Verifying

After publish, fetch the production host:

https://<your-space-slug>.unfolder.space/robots.txt — yours if authored, else the deny-all default (look for X-Unfolder-Robots: default-deny).
https://<your-space-slug>.unfolder.space/sitemap.xml

Reference: file placement

robots.txt and sitemap.xml are ordinary static assets. Per unfolder://docs/bundling, anything under public/ is served as a static asset (prefix stripped), and at the workspace root anything that isn't .ts/.tsx/.js/.jsx/.json is too — so saving them under public/ (or at the root) is enough, no config needed. Binary belongs in media; web text like robots.txt belongs in the workspace.