Robots.txt Generator: Control How Search Engines Crawl Your Site
Every website has a secret conversation with search engines. It happens through a small but powerful file called robots.txt.
This unassuming text file tells search engine crawlers like Googlebot which pages they can and cannot access on your site. Get it right, and you control your crawl budget, protect sensitive pages, and improve your SEO. Get it wrong, and you might accidentally block your entire site from being indexed.
In this guide, we’ll cover everything you need to know about robots.txt files—what they are, why they matter, and how to create one that works for your site.
Robots.txt is a plain text file that lives at the root of your website (e.g., https://yoursite.com/robots.txt). It uses the Robots Exclusion Protocol to communicate with web crawlers.
Here’s what a basic robots.txt file looks like:
User-agent: *Allow: /Disallow: /admin/Disallow: /private/
Sitemap: https://yoursite.com/sitemap.xmlLet’s break this down:
- User-agent: Specifies which crawler the rules apply to (
*means all crawlers) - Allow: Permits access to specific paths
- Disallow: Blocks access to specific paths
- Sitemap: Tells crawlers where to find your sitemap
Search engines allocate a specific “crawl budget” to each site—the number of pages they’ll crawl in a given time period. Robots.txt helps you:
- Block low-value pages (admin panels, search results, tag pages)
- Prioritize important content for crawling
- Ensure your best pages get indexed faster
Example: An e-commerce site with 10,000 product pages might want to block filter pages like /search/* or /sort/* to focus crawl budget on actual product pages.
While robots.txt isn’t a security measure, it helps keep certain pages out of search results:
- Admin dashboards (
/admin/,/wp-admin/) - Staging environments
- Private user profiles
- Thank-you pages
- Internal search results
Important: Robots.txt doesn’t prevent access—it only asks crawlers politely to stay away. For true security, use authentication.
Many CMS platforms create multiple URLs for the same content. Robots.txt can block:
- Parameter-based URLs (
/*?sort=*) - Archive pages
- Tag and category feeds
- Print-friendly versions
Blocking unnecessary crawls reduces server load, especially important for:
- High-traffic sites
- Limited hosting plans
- Database-heavy applications
Specifies which crawler the rules apply to:
# Apply to all crawlersUser-agent: *
# Google-specificUser-agent: Googlebot
# Bing-specificUser-agent: BingbotControl access to paths:
# Allow everythingAllow: /
# Block specific directoryDisallow: /admin/
# Block specific file typeDisallow: /*.pdf$
# Block URL parametersDisallow: /*?*Sets a delay between requests (not supported by all crawlers):
User-agent: *Crawl-delay: 10Points to your sitemap location:
Sitemap: https://yoursite.com/sitemap.xml# DON'T DO THISUser-agent: *Disallow: /This blocks everything from being crawled. We’ve seen this happen on production sites after someone copied a staging robots.txt file.
# Confusing for crawlersUser-agent: *Allow: /blog/Disallow: /blog/private/Allow: /blog/private/public/Keep rules simple and avoid deep nesting.
Robots.txt is publicly accessible. Anyone can see which pages you’re trying to hide. Use proper authentication instead.
# DON'T block theseDisallow: /wp-content/Disallow: /assets/Modern search engines need to render pages. Blocking CSS/JS can hurt your rankings.
Always include your sitemap URL to help crawlers discover your content faster.
User-agent: *Allow: /Disallow: /wp-admin/Disallow: /wp-includes/Disallow: /wp-content/plugins/Disallow: /wp-content/themes/Disallow: /search/Disallow: /*?s=*
Sitemap: https://yoursite.com/sitemap.xmlUser-agent: *Allow: /Disallow: /adminDisallow: /cartDisallow: /ordersDisallow: /checkoutDisallow: /searchDisallow: /*?sort*
Sitemap: https://yoursite.com/sitemap.xmlUser-agent: *Allow: /
# Block common generated filesDisallow: /api/Disallow: /_astro/Disallow: /_next/
Sitemap: https://yoursite.com/sitemap.xmlWe built a Robots.txt Generator that makes this process stupidly simple:
- Enter your website URL
- Select which paths to allow/block
- Add your sitemap URL
- Copy the generated robots.txt
- Upload to your site’s root directory
The tool validates your rules, warns about conflicts, and provides ready-to-use output.
- Create a new text file named
robots.txt - Add your rules using the directives above
- Upload to your site’s root directory (e.g.,
public/folder) - Test using Google’s robots.txt Tester (opens in a new window)
- WordPress: Use plugins like Yoast SEO or RankMath
- Shopify: Automatically generated in your store settings
- Static sites: Place in your
public/orstatic/folder
After creating your file, always test it:
- Check syntax: Visit
https://yoursite.com/robots.txtin your browser - Google Search Console: Use the robots.txt Tester tool
- Test specific URLs: Use the URL inspection tool to see if a page is blocked
# Block PDFsDisallow: /*.pdf$
# Block imagesDisallow: /*.jpg$Disallow: /*.png$Disallow: /*.gif$# Allow only Google and BingUser-agent: GooglebotAllow: /
User-agent: BingbotAllow: /
User-agent: *Disallow: /Create separate robots.txt files for each subdomain:
https://yoursite.com/robots.txthttps://blog.yoursite.com/robots.txthttps://shop.yoursite.com/robots.txt
Update your robots.txt file when:
- ✅ You add new sections to your site
- ✅ You change your URL structure
- ✅ You launch a staging or development environment
- ✅ You notice crawl budget issues in Google Search Console
- ✅ You add a new sitemap
Robots.txt is a small file with a big impact on your SEO. Take 5 minutes to:
- Check if you have a robots.txt file
- Review the rules to ensure nothing important is blocked
- Add your sitemap URL
- Test using Google Search Console
A well-configured robots.txt file helps search engines crawl your site efficiently, protects sensitive areas, and ensures your best content gets indexed.
Need help with technical SEO? Check out our other SEO tools (opens in a new window) or reach out on X (opens in a new window).