Crawl setting in screaming frog

Crawl Settings Explained: A Beginner’s Delight to Configuring Screaming Frog

Unlock the full power of Screaming Frog with this beginner-friendly guide by Dr. Anubhav Gupta. Discover how to set up crawl settings for precise audits, avoid common pitfalls, and optimize your site for both search engines and answer engines. Perfect for Indian webmasters and SEO enthusiasts looking to improve their technical SEO foundation.

Search-engine crawlers and answer engines will only reward content they can reach and understand. A single mis-configured checkbox in Screaming Frog can hide hundreds of valuable URLs from your audit, inflate crawl time, or even crash the tool, and thus correct crawl settings should be the focus. This guide—authored for Elgorythm.in by Dr Anubhav Gupta—breaks down each core crawl setting so even first-time users can run accurate, resource-friendly crawls that reflect how Googlebot and emerging AEO systems really see your site.

1 Why Crawl Settings Matter

Incorrect limits and filters skew every downstream insight—link equity flow, duplicate detection, Core Web Vitals sampling, and AI-driven content audits all rely on a clean crawl dataset. Screaming Frog’s hybrid RAM/disk engine can index millions of URLs when tuned correctly, but runs out of memory or misses JavaScript links when left on defaults[1][2][3]. The result is wasted crawl budget and blind spots that hurt organic growth.

2 Seven Settings You Must Master

SettingWhere to ConfigureWhat It ControlsBeginner-Safe Value
Storage ModeFile → Settings → StorageRAM vs. SSD database; stability on large sitesDatabase on SSD for > 200 k URLs[2][4]
Memory AllocationFile → Settings → MemoryHow much RAM Frog may use4 GB for ≤2 M URLs; never allocate all system RAM[1][2]
Crawl Depth & LimitsConfig → Spider → Limits / PreferencesMax clicks from start page, folders, redirectsDepth ≤3 for key pages to mimic “3-click” rule[5][6]
Include / Exclude RegexConfig → Include / ExcludeNarrow or omit URL patternsStart with folder-level regex like ^https://example.com/blog/.*[7][8]
Query-String HandlingConfig → Spider → LimitsParameter explosion controlLimit query strings to 0–1 for e-commerce facets[9][10]
User-Agent & ThrottlingConfig → User-Agent / SpeedBypass firewalls; respect serversGooglebot-Smartphone + threads 2–4 on shared hosting[11][12][13]
JavaScript RenderingConfig → Spider → RenderingHeadless Chromium DOM crawl“JavaScript” + Googlebot mobile viewport for modern sites[14][3]

2.1 Storage Mode & Memory

Database mode autosaves to disk and survives power loss, preventing “crawl-crash anxiety” when tackling 100 k+ URLs[2][4]. Allocate RAM conservatively—Screaming Frog needs only 4 GB for ~2 M URLs; over-allocation slows the OS[1][2].

2.2 Crawl Depth

Pages deeper than three clicks often receive less crawl budget and PageRank. Use Internal → HTML and sort the Crawl Depth column; anything >3 is flagged in Links → Pages with High Crawl Depth[5][6]. Internal links or hub pages usually fix the issue.

2.3 Include / Exclude & URL Rewriting

Exclude infinite calendars or faceted variations before they flood RAM. Example to skip WordPress admin assets:

/wp-(content|admin|includes)/.*|/\?.*

2.4 Parameter & Facet Control

Set Limit Number of Query Strings to 0 to ignore every ?color= or ?size= URL[9]. Alternatively, tick Configuration → URL Rewriting → Remove All to drop parameters entirely during the crawl[10].

2.5 User-Agent Switching & Speed

Firewalls can return HTTP 423 “Locked” for Screaming Frog’s default UA[15]. Swap to Googlebot-Smartphone or Chrome in two clicks and lower threads if the server struggles[11][16][13]. This avoids analytics pollution while reflecting mobile-first indexing.

2.6 JavaScript Rendering

Enable JavaScript rendering plus Store HTML & Rendered HTML to compare DOM differences; Screaming Frog’s “Show Differences” visually highlights JS-only elements[17][3]. Increase the AJAX timeout to 10–15 s if content loads slowly[3].

2.7 Custom Extraction & AI

The Custom JavaScript library now pipes crawl data into OpenAI for tasks such as intent classification or auto-generating alt text—all configured under Config → Custom JavaScript[18]. Keep this disabled on first crawls to conserve credits.

3 Quick-Start Workflow for First-Time Audits

  1. Switch to Database mode and allocate 4 GB RAM.
  2. Enter start URL in Spider mode; add staging credentials if needed[19].
  3. Pre-crawl checklist:
    • Set UA to Googlebot-Smartphone.
    • Enable JavaScript rendering only if the site is JS-heavy.
    • Exclude obvious traps (search results, calendars, wp-admin).
  4. Run a 10 URL test crawl to verify screenshots, blocked resources, and server load.
  5. Launch the full crawl; monitor Response Codes and Memory dashboard.
  6. Export Crawl Overview and All Issues reports for dev hand-off.

4 GEO Tips for Indian Sites

  • Many Indian hosts throttle aggressive crawlers. Keep threads ≤4 and delays ≥200 ms to avoid 5xx bans[12].
  • If Cloudflare or Sucuri blocks your IP, request whitelisting and provide the exact UA string to IT[20][15].
  • Test regional CDNs; sometimes *.in versions route to different stacks that serve alternate robots.txt rules—crawl both variations.

5 AEO Optimisation Inside Screaming Frog

  1. Structured Data Validation – enable JSON-LD extraction and fix all Validation Errors so answers qualify for rich results[1][18].
  2. Answer Snippet Audit – use Custom Search to locate “FAQ,” “How to,” and “Definition” patterns missing <h2> markup; internal linking elevates them to SGE-style answers.
  3. People-Also-Ask Mapping – integrate Google Search Console API to overlay Clicks and Position onto crawl data, prioritising pages that own featured snippets but have thin headings[1].

6 Troubleshooting Cheatsheet

SymptomLikely CauseFix
Memory error at 10%Parameter explosionExclude facets; limit query strings to 0[9][21]
0 URLs crawled, 423 statusFirewall blocks UAChange UA; throttle speed[15]
Blank screenshotsResources blockedCheck Pages with Blocked Resources; allowlist CDN[3]
Only homepage crawledInclude regex too narrowBroaden pattern or start crawl at higher directory[22]

7 Trust Checklist Before Hitting “Start”

☑ Database mode on SSD

☑ UA set to Googlebot-Smartphone

☑ Depth, parameter, and folder exclusions reviewed

☑ JavaScript rendering verified with 10-URL test

☑ Crawl speed aligned with server capacity

☑ API integrations (GA4, GSC) connected for business context[1]

Use this list every time—you’ll prevent 90% of audit-breaking errors.

8 Closing Thoughts by Dr Anubhav Gupta

Mastering these foundational settings ensures that Screaming Frog mirrors real-world crawl behaviour, reveals the right issues, and safeguards your machine resources. Configure once, template your settings, and your future audits will start trustworthy and data-rich from the first URL.

Ready to explore advanced modules like vector embeddings and crawl automation? Head over to our pillar guide, “The Ultimate Guide to Screaming Frog SEO Spider,” and keep fine-tuning. Your site—and every answer engine—will thank you.

Frequently Asked Questions (FAQs)

1. What are the most important crawl settings to configure in Screaming Frog for beginners?

The most crucial crawl settings for beginners include selecting the right storage mode (preferably database for large sites), setting appropriate memory allocation, controlling crawl depth, using include/exclude regex to focus the crawl, managing query-string handling, adjusting the user-agent and crawl speed, and enabling JavaScript rendering for modern sites. These settings ensure efficient, accurate, and resource-friendly crawls.

2. How can I prevent Screaming Frog from crashing during large site crawls?

To avoid crashes, use database storage mode (especially for sites with over 200,000 URLs), allocate a reasonable amount of RAM (e.g., 4 GB for up to 2 million URLs), and exclude unnecessary URL patterns or parameters that could cause a parameter explosion. Running a small test crawl before the full audit also helps identify potential issues early.

3. Why is it important to limit crawl depth, and what value should I use?

Limiting crawl depth ensures that Screaming Frog mirrors how search engines, like Googlebot, typically access a site—most important content should be within three clicks from the homepage. Setting a crawl depth of three helps focus the audit on key pages and prevents wasting resources on deep, low-value URLs.

4. How do I handle sites with lots of URL parameters or facets?

For sites with many parameters or faceted navigation (like e-commerce), limit the number of query strings to zero or one, or use URL rewriting to remove parameters entirely. This prevents Screaming Frog from crawling thousands of near-duplicate URLs, saving memory and crawl time.

5. What should I do if Screaming Frog is blocked by my website’s firewall or CDN?

If you encounter issues like HTTP 423 errors or the crawl stops at the homepage, try changing the user-agent to Googlebot-Smartphone or Chrome, reduce crawl speed, and request your IT team or CDN provider to whitelist your crawler’s IP and user-agent. This helps ensure the crawler can access all intended pages without being blocked.

  1. Screaming-Frog-SEO-Spider-Guide.docx     
  2. https://www.screamingfrog.co.uk/seo-spider/tutorials/how-to-crawl-large-websites/    
  3. https://www.youtube.com/watch?v=Uvm-21sQAzM    
  4. https://www.youtube.com/watch?v=nTC0S9vwjLU 
  5. https://gofishdigital.com/blog/crawl-depth-audit-optimization-guide/ 
  6. https://www.screamingfrog.co.uk/seo-spider/issues/links/pages-with-high-crawl-depth/ 
  7. https://learndigitaladvertising.com/includes-and-excludes/
  8. https://www.youtube.com/watch?v=7PNV7jrBSx4
  9. https://www.reddit.com/r/TechSEO/comments/10vzmqc/excluding_parameter_urls_in_screaming_frog_regex/  
  10. https://www.screamingfrog.co.uk/advanced-screaming-frog-crawling-tips-and-use-cases-for-ecommerce-auditing/ 
  11. https://www.linkedin.com/posts/chris-long-marketing_technical-seo-tip-if-screaming-frog-is-forbidden-activity-7247957136100487168-I29Z 
  12. https://bravr.com/increase-memory-speed-screaming-frog-seo-spider/ 
  13. https://gofishdigital.com/blog/how-to-crawl-as-googlebot-smartphone-with-screaming-frog/ 
  14. https://ca.linkedin.com/posts/chris-long-marketing_technical-seo-tip-screaming-frogs-javascript-activity-6985571388254224384-ikId
  15. https://www.linkedin.com/posts/gerald-fauter_for-my-seo-analysis-i-use-the-screaming-activity-7272892656597229568-b66P  
  16. https://seopulses.ru/kak-pomenyat-user-agent-v-screaming-frog-seo-spider/
  17. https://web.swipeinsight.app/posts/technical-seo-tip-screaming-frog-s-show-differences-highlights-javascript-loaded-content-6738
  18. https://zeo.org/resources/blog/how-to-use-ai-in-screaming-frog 
  19. https://www.screamingfrog.co.uk/seo-spider/tutorials/how-to-crawl-a-staging-website/
  20. https://www.reddit.com/r/TechSEO/comments/17dnkwy/how_to_block_screaming_frog_crawls/
  21. https://webmasters.stackexchange.com/questions/137448/screaming-frog-runs-out-of-memory-will-this-affect-crawling-and-ranking
  22. https://www.youtube.com/watch?v=4KxSwc5pjk8
author avatar
Elgorythm Anubhav

Leave a Comment

Your email address will not be published. Required fields are marked *

Verified by MonsterInsights