MEDIUMasked at 5 companies

Web Crawler Multithreaded

A medium-tier problem at 50% community acceptance, tagged with Depth-First Search, Breadth-First Search, Concurrency. Reported in interviews at MongoDB and 4 others.

Founder's read

A multithreaded web crawler hits different than your standard graph traversal. MongoDB, Dropbox, and OpenAI all ask this one, and the acceptance rate hovers around 50 percent. Most candidates nail the BFS or DFS logic, then panic when threads enter the picture. The trick isn't the crawl itself. It's the synchronization. You need to know how to fence off shared state, manage a thread pool, and handle the race conditions that sink half the submissions. If you freeze on thread safety during the live OA, StealthCoder surfaces a working solution invisible to the proctor.

Companies asking
5
Difficulty
MEDIUM
Acceptance
50%

Companies that ask "Web Crawler Multithreaded"

If this hits your live OA

Web Crawler Multithreaded is the kind of problem that decides whether you pass. StealthCoder reads the problem on screen and surfaces a working solution in under 2 seconds. Invisible to screen share. The proctor sees nothing. Made by an Amazon engineer who watched the leaked-problem repo become an industry secret. He decided you should have it too.

Get StealthCoder
What this means

The problem forces you to merge two skillsets candidates often keep separate: graph traversal and concurrent execution. You pick DFS or BFS for the crawl pattern, but the real complexity lives in the synchronization layer. A naive approach (spawn a thread per URL) tanks on resource exhaustion. You need a bounded thread pool, a thread-safe queue or work-stealing deque, and careful locking around the visited set. Common failures: deadlocks from improper lock ordering, race conditions on the visited check-and-mark, or threads that never signal completion. The depth-first versus breadth-first choice matters less than your concurrency primitives. If this pattern blindsides you mid-assessment, StealthCoder runs invisibly and gives you the synchronization scaffold in seconds, letting you code it out and move on.

Pattern tags

The honest play

You know the problem. Make sure you actually pass it.

Web Crawler Multithreaded recycles across companies for a reason. It's medium-tier, and most candidates blank under the timer. StealthCoder is the hedge: an AI overlay invisible during screen share. It reads the problem and surfaces a working solution in under 2 seconds. Made by an Amazon engineer who watched the leaked-problem repo become an industry secret. He decided you should have it too. Works on HackerRank, CodeSignal, CoderPad, and Karat.

Web Crawler Multithreaded interview FAQ

Is this still asked at top companies?+

Yes. MongoDB, Dropbox, OpenAI, Databricks, and Rubrik all report it. The 50 percent acceptance rate suggests it's a real filter, not a warm-up. You'll see it more often at infrastructure and backend teams than full-stack or frontend roles.

What's the trick that catches people?+

Thread safety on the visited set. Candidates write correct BFS, add threads, then miss the race condition where two threads check visited simultaneously, both think a URL is new, and spawn duplicate work. You need atomic check-and-set or proper locking, and many candidates don't thread this through fast enough.

Should I use DFS or BFS for the crawl pattern?+

BFS is safer in practice for web crawling because it's naturally bounded by width at each level. DFS with multithreading can blow the stack or memory on deep sites. The problem rarely specifies, so BFS with a thread pool is the standard hedge.

How does Concurrency relate to the other topics here?+

Concurrency is the wrinkle that separates this from a basic graph problem. DFS and BFS are your skeleton. Concurrency is the muscle. You need to know thread lifecycle, synchronization primitives (locks, semaphores, barriers), and deadlock avoidance to avoid the trap cases.

What language penalty should I expect?+

Java or Python are safest. Java has strong concurrency libraries (ExecutorService, ConcurrentHashMap). Python's GIL can complicate true parallelism, but most interviewers accept thread-based solutions. C++ is doable but adds complexity. Ask your recruiter what they prefer before the OA.

Want the actual problem statement? View "Web Crawler Multithreaded" on LeetCode →

Frequency and company-tag data sourced from public community-maintained interview-report repos. Problem, description, and trademark © LeetCode. StealthCoder is not affiliated with LeetCode.