Glossary

robots.txt

robots.txt

Robots.txt tells search engines and AI crawlers which paths they may fetch and which they may not. Plaintext, served at site root.

robots.txt

Robots.txt tells search engines and AI crawlers which paths they may fetch and which they may not. Plaintext, served at the site root, the file is the oldest contract on the web between site owners and crawlers.

What it means in operation

The file lives at mobitaste.com/robots.txt. Each block names a user agent (Googlebot, Bingbot, PerplexityBot, ClaudeBot) and lists allow and disallow rules per path. A typical marketing site allows everything except admin routes, draft pages, and search-result pages with infinite parameters. The MobiTaste robots.txt also points crawlers at the sitemap location and at the llms.txt file. The file is a request, not an enforcement: well-behaved crawlers respect it, but it is not a security control. Anything that must be private goes behind authentication, not a disallow line.

Why it matters

The case for a clean robots.txt is crawl budget. Search engines allot a finite number of requests per site per day. If a crawler spends that budget on staging URLs, internal search pages, or duplicate filter combinations, your real pages get crawled less often. A clean robots.txt directs the budget at the URLs that matter. For AI answer engines, robots.txt is also where you choose whether to allow training crawlers (ClaudeBot, GPTBot) versus answer-only crawlers (PerplexityBot). The two are different decisions; bundle them on or off as the policy dictates.

  • Sitemap: the file robots.txt points to.
  • llms.txt: the AI-specific companion file.
  • Schema markup: the per-page metadata crawlers read after entry.

Ready to start without stopping service?

14-day free trial, no card. First table order in under an hour.