Search-engine crawlers and answer engines will only reward content they can reach and understand. A single mis-configured checkbox in Screaming Frog can hide hundreds of valuable URLs from your audit, inflate crawl time, or even crash the tool, and thus correct crawl settings should be the focus. This guide—authored for Elgorythm.in by Dr Anubhav Gupta—breaks down each core crawl setting so even first-time users can run accurate, resource-friendly crawls that reflect how Googlebot and emerging AEO systems really see your site.
Incorrect limits and filters skew every downstream insight—link equity flow, duplicate detection, Core Web Vitals sampling, and AI-driven content audits all rely on a clean crawl dataset. Screaming Frog’s hybrid RAM/disk engine can index millions of URLs when tuned correctly, but runs out of memory or misses JavaScript links when left on defaults[1][2][3]. The result is wasted crawl budget and blind spots that hurt organic growth.
Setting | Where to Configure | What It Controls | Beginner-Safe Value |
Storage Mode | File → Settings → Storage | RAM vs. SSD database; stability on large sites | Database on SSD for > 200 k URLs[2][4] |
Memory Allocation | File → Settings → Memory | How much RAM Frog may use | 4 GB for ≤2 M URLs; never allocate all system RAM[1][2] |
Crawl Depth & Limits | Config → Spider → Limits / Preferences | Max clicks from start page, folders, redirects | Depth ≤3 for key pages to mimic “3-click” rule[5][6] |
Include / Exclude Regex | Config → Include / Exclude | Narrow or omit URL patterns | Start with folder-level regex like ^https://example.com/blog/.*[7][8] |
Query-String Handling | Config → Spider → Limits | Parameter explosion control | Limit query strings to 0–1 for e-commerce facets[9][10] |
User-Agent & Throttling | Config → User-Agent / Speed | Bypass firewalls; respect servers | Googlebot-Smartphone + threads 2–4 on shared hosting[11][12][13] |
JavaScript Rendering | Config → Spider → Rendering | Headless Chromium DOM crawl | “JavaScript” + Googlebot mobile viewport for modern sites[14][3] |
Database mode autosaves to disk and survives power loss, preventing “crawl-crash anxiety” when tackling 100 k+ URLs[2][4]. Allocate RAM conservatively—Screaming Frog needs only 4 GB for ~2 M URLs; over-allocation slows the OS[1][2].
Pages deeper than three clicks often receive less crawl budget and PageRank. Use Internal → HTML and sort the Crawl Depth column; anything >3 is flagged in Links → Pages with High Crawl Depth[5][6]. Internal links or hub pages usually fix the issue.
Exclude infinite calendars or faceted variations before they flood RAM. Example to skip WordPress admin assets:
/wp-(content|admin|includes)/.*|/\?.*
Set Limit Number of Query Strings to 0 to ignore every ?color= or ?size= URL[9]. Alternatively, tick Configuration → URL Rewriting → Remove All to drop parameters entirely during the crawl[10].
Firewalls can return HTTP 423 “Locked” for Screaming Frog’s default UA[15]. Swap to Googlebot-Smartphone or Chrome in two clicks and lower threads if the server struggles[11][16][13]. This avoids analytics pollution while reflecting mobile-first indexing.
Enable JavaScript rendering plus Store HTML & Rendered HTML to compare DOM differences; Screaming Frog’s “Show Differences” visually highlights JS-only elements[17][3]. Increase the AJAX timeout to 10–15 s if content loads slowly[3].
The Custom JavaScript library now pipes crawl data into OpenAI for tasks such as intent classification or auto-generating alt text—all configured under Config → Custom JavaScript[18]. Keep this disabled on first crawls to conserve credits.
Symptom | Likely Cause | Fix |
Memory error at 10% | Parameter explosion | Exclude facets; limit query strings to 0[9][21] |
0 URLs crawled, 423 status | Firewall blocks UA | Change UA; throttle speed[15] |
Blank screenshots | Resources blocked | Check Pages with Blocked Resources; allowlist CDN[3] |
Only homepage crawled | Include regex too narrow | Broaden pattern or start crawl at higher directory[22] |
☑ Database mode on SSD
☑ UA set to Googlebot-Smartphone
☑ Depth, parameter, and folder exclusions reviewed
☑ JavaScript rendering verified with 10-URL test
☑ Crawl speed aligned with server capacity
☑ API integrations (GA4, GSC) connected for business context[1]
Use this list every time—you’ll prevent 90% of audit-breaking errors.
Mastering these foundational settings ensures that Screaming Frog mirrors real-world crawl behaviour, reveals the right issues, and safeguards your machine resources. Configure once, template your settings, and your future audits will start trustworthy and data-rich from the first URL.
Ready to explore advanced modules like vector embeddings and crawl automation? Head over to our pillar guide, “The Ultimate Guide to Screaming Frog SEO Spider,” and keep fine-tuning. Your site—and every answer engine—will thank you.
The most crucial crawl settings for beginners include selecting the right storage mode (preferably database for large sites), setting appropriate memory allocation, controlling crawl depth, using include/exclude regex to focus the crawl, managing query-string handling, adjusting the user-agent and crawl speed, and enabling JavaScript rendering for modern sites. These settings ensure efficient, accurate, and resource-friendly crawls.
To avoid crashes, use database storage mode (especially for sites with over 200,000 URLs), allocate a reasonable amount of RAM (e.g., 4 GB for up to 2 million URLs), and exclude unnecessary URL patterns or parameters that could cause a parameter explosion. Running a small test crawl before the full audit also helps identify potential issues early.
Limiting crawl depth ensures that Screaming Frog mirrors how search engines, like Googlebot, typically access a site—most important content should be within three clicks from the homepage. Setting a crawl depth of three helps focus the audit on key pages and prevents wasting resources on deep, low-value URLs.
For sites with many parameters or faceted navigation (like e-commerce), limit the number of query strings to zero or one, or use URL rewriting to remove parameters entirely. This prevents Screaming Frog from crawling thousands of near-duplicate URLs, saving memory and crawl time.
If you encounter issues like HTTP 423 errors or the crawl stops at the homepage, try changing the user-agent to Googlebot-Smartphone or Chrome, reduce crawl speed, and request your IT team or CDN provider to whitelist your crawler’s IP and user-agent. This helps ensure the crawler can access all intended pages without being blocked.
SARK Promotions is the trusted website development company in Manama, building tailored WordPress, Shopify, and…
SARK Promotions is Doha’s trusted website development company, building WordPress, Shopify, and Wix sites that…
SEO A/B testing is a powerful way to discover what really drives better rankings and…
Core Web Vitals are now central to SEO, directly tied to Google’s Page Experience ranking…
Learn how to scale your website worldwide with international SEO techniques. This guide covers multilingual…
SARK Promotions is the trusted website development company in Muscat, creating WordPress, Shopify and Wix…