Skip to content
SEWWA

Robots.txt Generator: Control How Search Engines Crawl Your Site

Mar 10, 2026 — SEO, Web Development, Tools

Every website has a secret conversation with search engines. It happens through a small but powerful file called robots.txt.

This unassuming text file tells search engine crawlers like Googlebot which pages they can and cannot access on your site. Get it right, and you control your crawl budget, protect sensitive pages, and improve your SEO. Get it wrong, and you might accidentally block your entire site from being indexed.

In this guide, we’ll cover everything you need to know about robots.txt files—what they are, why they matter, and how to create one that works for your site.

What is Robots.txt?

Robots.txt is a plain text file that lives at the root of your website (e.g., https://yoursite.com/robots.txt). It uses the Robots Exclusion Protocol to communicate with web crawlers.

Here’s what a basic robots.txt file looks like:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Sitemap: https://yoursite.com/sitemap.xml

Let’s break this down:

Why Robots.txt Matters for SEO

1. Crawl Budget Optimization

Search engines allocate a specific “crawl budget” to each site—the number of pages they’ll crawl in a given time period. Robots.txt helps you:

Example: An e-commerce site with 10,000 product pages might want to block filter pages like /search/* or /sort/* to focus crawl budget on actual product pages.

2. Protect Sensitive Content

While robots.txt isn’t a security measure, it helps keep certain pages out of search results:

Important: Robots.txt doesn’t prevent access—it only asks crawlers politely to stay away. For true security, use authentication.

3. Prevent Duplicate Content Issues

Many CMS platforms create multiple URLs for the same content. Robots.txt can block:

4. Server Resource Management

Blocking unnecessary crawls reduces server load, especially important for:

Common Robots.txt Directives

User-agent

Specifies which crawler the rules apply to:

# Apply to all crawlers
User-agent: *
# Google-specific
User-agent: Googlebot
# Bing-specific
User-agent: Bingbot

Allow & Disallow

Control access to paths:

# Allow everything
Allow: /
# Block specific directory
Disallow: /admin/
# Block specific file type
Disallow: /*.pdf$
# Block URL parameters
Disallow: /*?*

Crawl-delay

Sets a delay between requests (not supported by all crawlers):

User-agent: *
Crawl-delay: 10

Sitemap

Points to your sitemap location:

Sitemap: https://yoursite.com/sitemap.xml

Common Robots.txt Mistakes to Avoid

Mistake 1: Blocking Your Entire Site

# DON'T DO THIS
User-agent: *
Disallow: /

This blocks everything from being crawled. We’ve seen this happen on production sites after someone copied a staging robots.txt file.

Mistake 2: Conflicting Rules

# Confusing for crawlers
User-agent: *
Allow: /blog/
Disallow: /blog/private/
Allow: /blog/private/public/

Keep rules simple and avoid deep nesting.

Mistake 3: Relying on Robots.txt for Security

Robots.txt is publicly accessible. Anyone can see which pages you’re trying to hide. Use proper authentication instead.

Mistake 4: Blocking CSS and JavaScript

# DON'T block these
Disallow: /wp-content/
Disallow: /assets/

Modern search engines need to render pages. Blocking CSS/JS can hurt your rankings.

Mistake 5: No Sitemap Reference

Always include your sitemap URL to help crawlers discover your content faster.

Robots.txt Templates for Common Platforms

WordPress

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Disallow: /search/
Disallow: /*?s=*
Sitemap: https://yoursite.com/sitemap.xml

Shopify

User-agent: *
Allow: /
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /search
Disallow: /*?sort*
Sitemap: https://yoursite.com/sitemap.xml

Static Site (Astro, Next.js, etc.)

User-agent: *
Allow: /
# Block common generated files
Disallow: /api/
Disallow: /_astro/
Disallow: /_next/
Sitemap: https://yoursite.com/sitemap.xml

How to Create a Robots.txt File

Option 1: Use Our Free Generator

We built a Robots.txt Generator that makes this process stupidly simple:

  1. Enter your website URL
  2. Select which paths to allow/block
  3. Add your sitemap URL
  4. Copy the generated robots.txt
  5. Upload to your site’s root directory

The tool validates your rules, warns about conflicts, and provides ready-to-use output.

Option 2: Manual Creation

  1. Create a new text file named robots.txt
  2. Add your rules using the directives above
  3. Upload to your site’s root directory (e.g., public/ folder)
  4. Test using Google’s robots.txt Tester (opens in a new window)

Option 3: Platform-Specific Methods

Testing Your Robots.txt

After creating your file, always test it:

  1. Check syntax: Visit https://yoursite.com/robots.txt in your browser
  2. Google Search Console: Use the robots.txt Tester tool
  3. Test specific URLs: Use the URL inspection tool to see if a page is blocked

Advanced Robots.txt Techniques

Block Specific File Types

# Block PDFs
Disallow: /*.pdf$
# Block images
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.gif$

Allow Only Specific Crawlers

# Allow only Google and Bing
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: *
Disallow: /

Handle Multiple Subdomains

Create separate robots.txt files for each subdomain:

When to Update Your Robots.txt

Update your robots.txt file when:

The Bottom Line

Robots.txt is a small file with a big impact on your SEO. Take 5 minutes to:

  1. Check if you have a robots.txt file
  2. Review the rules to ensure nothing important is blocked
  3. Add your sitemap URL
  4. Test using Google Search Console

A well-configured robots.txt file helps search engines crawl your site efficiently, protects sensitive areas, and ensures your best content gets indexed.


Need help with technical SEO? Check out our other SEO tools (opens in a new window) or reach out on X (opens in a new window).