Robots.txt Generator: Control How Search Engines Crawl Your Site

Mar 10, 2026 — SEO, Web Development, Tools

Every website has a secret conversation with search engines. It happens through a small but powerful file called robots.txt.

This unassuming text file tells search engine crawlers like Googlebot which pages they can and cannot access on your site. Get it right, and you control your crawl budget, protect sensitive pages, and improve your SEO. Get it wrong, and you might accidentally block your entire site from being indexed.

In this guide, we’ll cover everything you need to know about robots.txt files—what they are, why they matter, and how to create one that works for your site.

What is Robots.txt?

Robots.txt is a plain text file that lives at the root of your website (e.g., https://yoursite.com/robots.txt). It uses the Robots Exclusion Protocol to communicate with web crawlers.

Here’s what a basic robots.txt file looks like:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/

Sitemap: https://yoursite.com/sitemap.xml

Let’s break this down:

User-agent: Specifies which crawler the rules apply to (* means all crawlers)
Allow: Permits access to specific paths
Disallow: Blocks access to specific paths
Sitemap: Tells crawlers where to find your sitemap

Why Robots.txt Matters for SEO

1. Crawl Budget Optimization

Search engines allocate a specific “crawl budget” to each site—the number of pages they’ll crawl in a given time period. Robots.txt helps you:

Block low-value pages (admin panels, search results, tag pages)
Prioritize important content for crawling
Ensure your best pages get indexed faster

Example: An e-commerce site with 10,000 product pages might want to block filter pages like /search/* or /sort/* to focus crawl budget on actual product pages.

2. Protect Sensitive Content

While robots.txt isn’t a security measure, it helps keep certain pages out of search results:

Admin dashboards (/admin/, /wp-admin/)
Staging environments
Private user profiles
Thank-you pages
Internal search results

Important: Robots.txt doesn’t prevent access—it only asks crawlers politely to stay away. For true security, use authentication.

3. Prevent Duplicate Content Issues

Many CMS platforms create multiple URLs for the same content. Robots.txt can block:

Parameter-based URLs (/*?sort=*)
Archive pages
Tag and category feeds
Print-friendly versions

4. Server Resource Management

Blocking unnecessary crawls reduces server load, especially important for:

High-traffic sites
Limited hosting plans
Database-heavy applications

Common Robots.txt Directives

User-agent

Specifies which crawler the rules apply to:

# Apply to all crawlers
User-agent: *

# Google-specific
User-agent: Googlebot

# Bing-specific
User-agent: Bingbot

Allow & Disallow

Control access to paths:

# Allow everything
Allow: /

# Block specific directory
Disallow: /admin/

# Block specific file type
Disallow: /*.pdf$

# Block URL parameters
Disallow: /*?*

Crawl-delay

Sets a delay between requests (not supported by all crawlers):

User-agent: *
Crawl-delay: 10

Sitemap

Points to your sitemap location:

Sitemap: https://yoursite.com/sitemap.xml

Common Robots.txt Mistakes to Avoid

❌ Mistake 1: Blocking Your Entire Site

# DON'T DO THIS
User-agent: *
Disallow: /

This blocks everything from being crawled. We’ve seen this happen on production sites after someone copied a staging robots.txt file.

❌ Mistake 2: Conflicting Rules

# Confusing for crawlers
User-agent: *
Allow: /blog/
Disallow: /blog/private/
Allow: /blog/private/public/

Keep rules simple and avoid deep nesting.

❌ Mistake 3: Relying on Robots.txt for Security

Robots.txt is publicly accessible. Anyone can see which pages you’re trying to hide. Use proper authentication instead.

❌ Mistake 4: Blocking CSS and JavaScript

# DON'T block these
Disallow: /wp-content/
Disallow: /assets/

Modern search engines need to render pages. Blocking CSS/JS can hurt your rankings.

❌ Mistake 5: No Sitemap Reference

Always include your sitemap URL to help crawlers discover your content faster.

Robots.txt Templates for Common Platforms

WordPress

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Disallow: /search/
Disallow: /*?s=*

Sitemap: https://yoursite.com/sitemap.xml

Shopify

User-agent: *
Allow: /
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /search
Disallow: /*?sort*

Sitemap: https://yoursite.com/sitemap.xml

Static Site (Astro, Next.js, etc.)

User-agent: *
Allow: /

# Block common generated files
Disallow: /api/
Disallow: /_astro/
Disallow: /_next/

Sitemap: https://yoursite.com/sitemap.xml

How to Create a Robots.txt File

Option 1: Use Our Free Generator

We built a Robots.txt Generator that makes this process stupidly simple:

Enter your website URL
Select which paths to allow/block
Add your sitemap URL
Copy the generated robots.txt
Upload to your site’s root directory

The tool validates your rules, warns about conflicts, and provides ready-to-use output.

Option 2: Manual Creation

Create a new text file named robots.txt
Add your rules using the directives above
Upload to your site’s root directory (e.g., public/ folder)
Test using Google’s robots.txt Tester (opens in a new window)

Option 3: Platform-Specific Methods

WordPress: Use plugins like Yoast SEO or RankMath
Shopify: Automatically generated in your store settings
Static sites: Place in your public/ or static/ folder

Testing Your Robots.txt

After creating your file, always test it:

Check syntax: Visit https://yoursite.com/robots.txt in your browser
Google Search Console: Use the robots.txt Tester tool
Test specific URLs: Use the URL inspection tool to see if a page is blocked

Advanced Robots.txt Techniques

Block Specific File Types

# Block PDFs
Disallow: /*.pdf$

# Block images
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.gif$

Allow Only Specific Crawlers

# Allow only Google and Bing
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Disallow: /

Handle Multiple Subdomains

Create separate robots.txt files for each subdomain:

https://yoursite.com/robots.txt
https://blog.yoursite.com/robots.txt
https://shop.yoursite.com/robots.txt

When to Update Your Robots.txt

Update your robots.txt file when:

✅ You add new sections to your site
✅ You change your URL structure
✅ You launch a staging or development environment
✅ You notice crawl budget issues in Google Search Console
✅ You add a new sitemap

The Bottom Line

Robots.txt is a small file with a big impact on your SEO. Take 5 minutes to:

Check if you have a robots.txt file
Review the rules to ensure nothing important is blocked
Add your sitemap URL
Test using Google Search Console

A well-configured robots.txt file helps search engines crawl your site efficiently, protects sensitive areas, and ensures your best content gets indexed.

Need help with technical SEO? Check out our other SEO tools (opens in a new window) or reach out on X (opens in a new window).