Search-engine crawlers and answer engines will only reward content they can reach and understand. A single mis-configured checkbox in Screaming Frog can hide hundreds of valuable URLs from your audit, inflate crawl time, or even crash the tool, and thus correct crawl settings should be the focus. This guide—authored for Elgorythm.in by Dr Anubhav Gupta—breaks down each core crawl setting so even first-time users can run accurate, resource-friendly crawls that reflect how Googlebot and emerging AEO systems really see your site.
1 Why Crawl Settings Matter
Incorrect limits and filters skew every downstream insight—link equity flow, duplicate detection, Core Web Vitals sampling, and AI-driven content audits all rely on a clean crawl dataset. Screaming Frog’s hybrid RAM/disk engine can index millions of URLs when tuned correctly, but runs out of memory or misses JavaScript links when left on defaults[1][2][3]. The result is wasted crawl budget and blind spots that hurt organic growth.
2 Seven Settings You Must Master
Setting | Where to Configure | What It Controls | Beginner-Safe Value |
Storage Mode | File → Settings → Storage | RAM vs. SSD database; stability on large sites | Database on SSD for > 200 k URLs[2][4] |
Memory Allocation | File → Settings → Memory | How much RAM Frog may use | 4 GB for ≤2 M URLs; never allocate all system RAM[1][2] |
Crawl Depth & Limits | Config → Spider → Limits / Preferences | Max clicks from start page, folders, redirects | Depth ≤3 for key pages to mimic “3-click” rule[5][6] |
Include / Exclude Regex | Config → Include / Exclude | Narrow or omit URL patterns | Start with folder-level regex like ^https://example.com/blog/.*[7][8] |
Query-String Handling | Config → Spider → Limits | Parameter explosion control | Limit query strings to 0–1 for e-commerce facets[9][10] |
User-Agent & Throttling | Config → User-Agent / Speed | Bypass firewalls; respect servers | Googlebot-Smartphone + threads 2–4 on shared hosting[11][12][13] |
JavaScript Rendering | Config → Spider → Rendering | Headless Chromium DOM crawl | “JavaScript” + Googlebot mobile viewport for modern sites[14][3] |
2.1 Storage Mode & Memory
Database mode autosaves to disk and survives power loss, preventing “crawl-crash anxiety” when tackling 100 k+ URLs[2][4]. Allocate RAM conservatively—Screaming Frog needs only 4 GB for ~2 M URLs; over-allocation slows the OS[1][2].
2.2 Crawl Depth
Pages deeper than three clicks often receive less crawl budget and PageRank. Use Internal → HTML and sort the Crawl Depth column; anything >3 is flagged in Links → Pages with High Crawl Depth[5][6]. Internal links or hub pages usually fix the issue.
2.3 Include / Exclude & URL Rewriting
Exclude infinite calendars or faceted variations before they flood RAM. Example to skip WordPress admin assets:
/wp-(content|admin|includes)/.*|/\?.*
2.4 Parameter & Facet Control
Set Limit Number of Query Strings to 0 to ignore every ?color= or ?size= URL[9]. Alternatively, tick Configuration → URL Rewriting → Remove All to drop parameters entirely during the crawl[10].
2.5 User-Agent Switching & Speed
Firewalls can return HTTP 423 “Locked” for Screaming Frog’s default UA[15]. Swap to Googlebot-Smartphone or Chrome in two clicks and lower threads if the server struggles[11][16][13]. This avoids analytics pollution while reflecting mobile-first indexing.
2.6 JavaScript Rendering
Enable JavaScript rendering plus Store HTML & Rendered HTML to compare DOM differences; Screaming Frog’s “Show Differences” visually highlights JS-only elements[17][3]. Increase the AJAX timeout to 10–15 s if content loads slowly[3].
2.7 Custom Extraction & AI
The Custom JavaScript library now pipes crawl data into OpenAI for tasks such as intent classification or auto-generating alt text—all configured under Config → Custom JavaScript[18]. Keep this disabled on first crawls to conserve credits.
3 Quick-Start Workflow for First-Time Audits
- Switch to Database mode and allocate 4 GB RAM.
- Enter start URL in Spider mode; add staging credentials if needed[19].
- Pre-crawl checklist:
- Set UA to Googlebot-Smartphone.
- Enable JavaScript rendering only if the site is JS-heavy.
- Exclude obvious traps (search results, calendars, wp-admin).
- Run a 10 URL test crawl to verify screenshots, blocked resources, and server load.
- Launch the full crawl; monitor Response Codes and Memory dashboard.
- Export Crawl Overview and All Issues reports for dev hand-off.
4 GEO Tips for Indian Sites
- Many Indian hosts throttle aggressive crawlers. Keep threads ≤4 and delays ≥200 ms to avoid 5xx bans[12].
- If Cloudflare or Sucuri blocks your IP, request whitelisting and provide the exact UA string to IT[20][15].
- Test regional CDNs; sometimes *.in versions route to different stacks that serve alternate robots.txt rules—crawl both variations.
5 AEO Optimisation Inside Screaming Frog
- Structured Data Validation – enable JSON-LD extraction and fix all Validation Errors so answers qualify for rich results[1][18].
- Answer Snippet Audit – use Custom Search to locate “FAQ,” “How to,” and “Definition” patterns missing <h2> markup; internal linking elevates them to SGE-style answers.
- People-Also-Ask Mapping – integrate Google Search Console API to overlay Clicks and Position onto crawl data, prioritising pages that own featured snippets but have thin headings[1].
6 Troubleshooting Cheatsheet
Symptom | Likely Cause | Fix |
Memory error at 10% | Parameter explosion | Exclude facets; limit query strings to 0[9][21] |
0 URLs crawled, 423 status | Firewall blocks UA | Change UA; throttle speed[15] |
Blank screenshots | Resources blocked | Check Pages with Blocked Resources; allowlist CDN[3] |
Only homepage crawled | Include regex too narrow | Broaden pattern or start crawl at higher directory[22] |
7 Trust Checklist Before Hitting “Start”
☑ Database mode on SSD
☑ UA set to Googlebot-Smartphone
☑ Depth, parameter, and folder exclusions reviewed
☑ JavaScript rendering verified with 10-URL test
☑ Crawl speed aligned with server capacity
☑ API integrations (GA4, GSC) connected for business context[1]
Use this list every time—you’ll prevent 90% of audit-breaking errors.
8 Closing Thoughts by Dr Anubhav Gupta
Mastering these foundational settings ensures that Screaming Frog mirrors real-world crawl behaviour, reveals the right issues, and safeguards your machine resources. Configure once, template your settings, and your future audits will start trustworthy and data-rich from the first URL.
Ready to explore advanced modules like vector embeddings and crawl automation? Head over to our pillar guide, “The Ultimate Guide to Screaming Frog SEO Spider,” and keep fine-tuning. Your site—and every answer engine—will thank you.
Frequently Asked Questions (FAQs)
1. What are the most important crawl settings to configure in Screaming Frog for beginners?
The most crucial crawl settings for beginners include selecting the right storage mode (preferably database for large sites), setting appropriate memory allocation, controlling crawl depth, using include/exclude regex to focus the crawl, managing query-string handling, adjusting the user-agent and crawl speed, and enabling JavaScript rendering for modern sites. These settings ensure efficient, accurate, and resource-friendly crawls.
2. How can I prevent Screaming Frog from crashing during large site crawls?
To avoid crashes, use database storage mode (especially for sites with over 200,000 URLs), allocate a reasonable amount of RAM (e.g., 4 GB for up to 2 million URLs), and exclude unnecessary URL patterns or parameters that could cause a parameter explosion. Running a small test crawl before the full audit also helps identify potential issues early.
3. Why is it important to limit crawl depth, and what value should I use?
Limiting crawl depth ensures that Screaming Frog mirrors how search engines, like Googlebot, typically access a site—most important content should be within three clicks from the homepage. Setting a crawl depth of three helps focus the audit on key pages and prevents wasting resources on deep, low-value URLs.
4. How do I handle sites with lots of URL parameters or facets?
For sites with many parameters or faceted navigation (like e-commerce), limit the number of query strings to zero or one, or use URL rewriting to remove parameters entirely. This prevents Screaming Frog from crawling thousands of near-duplicate URLs, saving memory and crawl time.
5. What should I do if Screaming Frog is blocked by my website’s firewall or CDN?
If you encounter issues like HTTP 423 errors or the crawl stops at the homepage, try changing the user-agent to Googlebot-Smartphone or Chrome, reduce crawl speed, and request your IT team or CDN provider to whitelist your crawler’s IP and user-agent. This helps ensure the crawler can access all intended pages without being blocked.
- Screaming-Frog-SEO-Spider-Guide.docx
- https://www.screamingfrog.co.uk/seo-spider/tutorials/how-to-crawl-large-websites/
- https://www.youtube.com/watch?v=Uvm-21sQAzM
- https://www.youtube.com/watch?v=nTC0S9vwjLU
- https://gofishdigital.com/blog/crawl-depth-audit-optimization-guide/
- https://www.screamingfrog.co.uk/seo-spider/issues/links/pages-with-high-crawl-depth/
- https://learndigitaladvertising.com/includes-and-excludes/
- https://www.youtube.com/watch?v=7PNV7jrBSx4
- https://www.reddit.com/r/TechSEO/comments/10vzmqc/excluding_parameter_urls_in_screaming_frog_regex/
- https://www.screamingfrog.co.uk/advanced-screaming-frog-crawling-tips-and-use-cases-for-ecommerce-auditing/
- https://www.linkedin.com/posts/chris-long-marketing_technical-seo-tip-if-screaming-frog-is-forbidden-activity-7247957136100487168-I29Z
- https://bravr.com/increase-memory-speed-screaming-frog-seo-spider/
- https://gofishdigital.com/blog/how-to-crawl-as-googlebot-smartphone-with-screaming-frog/
- https://ca.linkedin.com/posts/chris-long-marketing_technical-seo-tip-screaming-frogs-javascript-activity-6985571388254224384-ikId
- https://www.linkedin.com/posts/gerald-fauter_for-my-seo-analysis-i-use-the-screaming-activity-7272892656597229568-b66P
- https://seopulses.ru/kak-pomenyat-user-agent-v-screaming-frog-seo-spider/
- https://web.swipeinsight.app/posts/technical-seo-tip-screaming-frog-s-show-differences-highlights-javascript-loaded-content-6738
- https://zeo.org/resources/blog/how-to-use-ai-in-screaming-frog
- https://www.screamingfrog.co.uk/seo-spider/tutorials/how-to-crawl-a-staging-website/
- https://www.reddit.com/r/TechSEO/comments/17dnkwy/how_to_block_screaming_frog_crawls/
- https://webmasters.stackexchange.com/questions/137448/screaming-frog-runs-out-of-memory-will-this-affect-crawling-and-ranking
- https://www.youtube.com/watch?v=4KxSwc5pjk8