Categories: SEO Tool Kit

Crawl Settings Explained: A Beginner’s Delight to Configuring Screaming Frog

Search-engine crawlers and answer engines will only reward content they can reach and understand. A single mis-configured checkbox in Screaming Frog can hide hundreds of valuable URLs from your audit, inflate crawl time, or even crash the tool, and thus correct crawl settings should be the focus. This guide—authored for Elgorythm.in by Dr Anubhav Gupta—breaks down each core crawl setting so even first-time users can run accurate, resource-friendly crawls that reflect how Googlebot and emerging AEO systems really see your site.

1 Why Crawl Settings Matter

Incorrect limits and filters skew every downstream insight—link equity flow, duplicate detection, Core Web Vitals sampling, and AI-driven content audits all rely on a clean crawl dataset. Screaming Frog’s hybrid RAM/disk engine can index millions of URLs when tuned correctly, but runs out of memory or misses JavaScript links when left on defaults^[1]^[2]^[3]. The result is wasted crawl budget and blind spots that hurt organic growth.

2 Seven Settings You Must Master

Setting	Where to Configure	What It Controls	Beginner-Safe Value
Storage Mode	File → Settings → Storage	RAM vs. SSD database; stability on large sites	Database on SSD for > 200 k URLs^[2]^[4]
Memory Allocation	File → Settings → Memory	How much RAM Frog may use	4 GB for ≤2 M URLs; never allocate all system RAM^[1]^[2]
Crawl Depth & Limits	Config → Spider → Limits / Preferences	Max clicks from start page, folders, redirects	Depth ≤3 for key pages to mimic “3-click” rule^[5]^[6]
Include / Exclude Regex	Config → Include / Exclude	Narrow or omit URL patterns	Start with folder-level regex like ^https://example.com/blog/.*^[7]^[8]
Query-String Handling	Config → Spider → Limits	Parameter explosion control	Limit query strings to 0–1 for e-commerce facets^[9]^[10]
User-Agent & Throttling	Config → User-Agent / Speed	Bypass firewalls; respect servers	Googlebot-Smartphone + threads 2–4 on shared hosting^[11]^[12]^[13]
JavaScript Rendering	Config → Spider → Rendering	Headless Chromium DOM crawl	“JavaScript” + Googlebot mobile viewport for modern sites^[14]^[3]

2.1 Storage Mode & Memory

Database mode autosaves to disk and survives power loss, preventing “crawl-crash anxiety” when tackling 100 k+ URLs^[2]^[4]. Allocate RAM conservatively—Screaming Frog needs only 4 GB for ~2 M URLs; over-allocation slows the OS^[1]^[2].

2.2 Crawl Depth

Pages deeper than three clicks often receive less crawl budget and PageRank. Use Internal → HTML and sort the Crawl Depth column; anything >3 is flagged in Links → Pages with High Crawl Depth^[5]^[6]. Internal links or hub pages usually fix the issue.

2.3 Include / Exclude & URL Rewriting

Exclude infinite calendars or faceted variations before they flood RAM. Example to skip WordPress admin assets:

/wp-(content|admin|includes)/.*|/\?.*

2.4 Parameter & Facet Control

Set Limit Number of Query Strings to 0 to ignore every ?color= or ?size= URL^[9]. Alternatively, tick Configuration → URL Rewriting → Remove All to drop parameters entirely during the crawl^[10].

2.5 User-Agent Switching & Speed

Firewalls can return HTTP 423 “Locked” for Screaming Frog’s default UA^[15]. Swap to Googlebot-Smartphone or Chrome in two clicks and lower threads if the server struggles^[11]^[16]^[13]. This avoids analytics pollution while reflecting mobile-first indexing.

2.6 JavaScript Rendering

Enable JavaScript rendering plus Store HTML & Rendered HTML to compare DOM differences; Screaming Frog’s “Show Differences” visually highlights JS-only elements^[17]^[3]. Increase the AJAX timeout to 10–15 s if content loads slowly^[3].

2.7 Custom Extraction & AI

The Custom JavaScript library now pipes crawl data into OpenAI for tasks such as intent classification or auto-generating alt text—all configured under Config → Custom JavaScript^[18]. Keep this disabled on first crawls to conserve credits.

3 Quick-Start Workflow for First-Time Audits

Switch to Database mode and allocate 4 GB RAM.
Enter start URL in Spider mode; add staging credentials if needed^[19].
Pre-crawl checklist:
- Set UA to Googlebot-Smartphone.
- Enable JavaScript rendering only if the site is JS-heavy.
- Exclude obvious traps (search results, calendars, wp-admin).
Run a 10 URL test crawl to verify screenshots, blocked resources, and server load.
Launch the full crawl; monitor Response Codes and Memory dashboard.
Export Crawl Overview and All Issues reports for dev hand-off.

4 GEO Tips for Indian Sites

Many Indian hosts throttle aggressive crawlers. Keep threads ≤4 and delays ≥200 ms to avoid 5xx bans^[12].
If Cloudflare or Sucuri blocks your IP, request whitelisting and provide the exact UA string to IT^[20]^[15].
Test regional CDNs; sometimes *.in versions route to different stacks that serve alternate robots.txt rules—crawl both variations.

5 AEO Optimisation Inside Screaming Frog

Structured Data Validation – enable JSON-LD extraction and fix all Validation Errors so answers qualify for rich results^[1]^[18].
Answer Snippet Audit – use Custom Search to locate “FAQ,” “How to,” and “Definition” patterns missing <h2> markup; internal linking elevates them to SGE-style answers.
People-Also-Ask Mapping – integrate Google Search Console API to overlay Clicks and Position onto crawl data, prioritising pages that own featured snippets but have thin headings^[1].

6 Troubleshooting Cheatsheet

Symptom	Likely Cause	Fix
Memory error at 10%	Parameter explosion	Exclude facets; limit query strings to 0^[9]^[21]
0 URLs crawled, 423 status	Firewall blocks UA	Change UA; throttle speed^[15]
Blank screenshots	Resources blocked	Check Pages with Blocked Resources; allowlist CDN^[3]
Only homepage crawled	Include regex too narrow	Broaden pattern or start crawl at higher directory^[22]

7 Trust Checklist Before Hitting “Start”

☑ Database mode on SSD

☑ UA set to Googlebot-Smartphone

☑ Depth, parameter, and folder exclusions reviewed

☑ JavaScript rendering verified with 10-URL test

☑ Crawl speed aligned with server capacity

☑ API integrations (GA4, GSC) connected for business context^[1]

Use this list every time—you’ll prevent 90% of audit-breaking errors.

8 Closing Thoughts by Dr Anubhav Gupta

Mastering these foundational settings ensures that Screaming Frog mirrors real-world crawl behaviour, reveals the right issues, and safeguards your machine resources. Configure once, template your settings, and your future audits will start trustworthy and data-rich from the first URL.

Ready to explore advanced modules like vector embeddings and crawl automation? Head over to our pillar guide, “The Ultimate Guide to Screaming Frog SEO Spider,” and keep fine-tuning. Your site—and every answer engine—will thank you.

Frequently Asked Questions (FAQs)

1. What are the most important crawl settings to configure in Screaming Frog for beginners?

The most crucial crawl settings for beginners include selecting the right storage mode (preferably database for large sites), setting appropriate memory allocation, controlling crawl depth, using include/exclude regex to focus the crawl, managing query-string handling, adjusting the user-agent and crawl speed, and enabling JavaScript rendering for modern sites. These settings ensure efficient, accurate, and resource-friendly crawls.

2. How can I prevent Screaming Frog from crashing during large site crawls?

To avoid crashes, use database storage mode (especially for sites with over 200,000 URLs), allocate a reasonable amount of RAM (e.g., 4 GB for up to 2 million URLs), and exclude unnecessary URL patterns or parameters that could cause a parameter explosion. Running a small test crawl before the full audit also helps identify potential issues early.

3. Why is it important to limit crawl depth, and what value should I use?

Limiting crawl depth ensures that Screaming Frog mirrors how search engines, like Googlebot, typically access a site—most important content should be within three clicks from the homepage. Setting a crawl depth of three helps focus the audit on key pages and prevents wasting resources on deep, low-value URLs.

4. How do I handle sites with lots of URL parameters or facets?

For sites with many parameters or faceted navigation (like e-commerce), limit the number of query strings to zero or one, or use URL rewriting to remove parameters entirely. This prevents Screaming Frog from crawling thousands of near-duplicate URLs, saving memory and crawl time.

5. What should I do if Screaming Frog is blocked by my website’s firewall or CDN?

If you encounter issues like HTTP 423 errors or the crawl stops at the homepage, try changing the user-agent to Googlebot-Smartphone or Chrome, reduce crawl speed, and request your IT team or CDN provider to whitelist your crawler’s IP and user-agent. This helps ensure the crawler can access all intended pages without being blocked.

Dr. Anubhav Gupta

Anubhav Gupta is a leading SEO Expert in India and the author of Handbook of SEO. With years of experience helping businesses grow through strategic search optimization, he specializes in technical SEO, content strategy, and digital marketing transformation. Anubhav is also the co-founder of SARK Promotions and Curiobuddy, where he drives innovative campaigns and publishes children’s magazines like The KK Times and The Qurious Atom. Passionate about knowledge sharing, he regularly writes on Elgorythm.in and MarketingSEO.in, making complex SEO concepts simple and actionable for readers worldwide.

See Full Bio

Dr. Anubhav Gupta

Next Boost Your SEO Strategy: How to Integrate Screaming Frog with Google Analytics & Search Console »

Previous « How to Use Screaming Frog for a Complete Technical SEO Audit

Website Development Company in Manama — Transforming Local Businesses Online

SARK Promotions is the trusted website development company in Manama, building tailored WordPress, Shopify, and…

1 day ago

Website Development

Website Development

Website Development Company in Muscat — Transforming your digital presence

SARK Promotions is the trusted website development company in Muscat, creating WordPress, Shopify and Wix…

1 week ago