Anansi

Create a web crawler, get your favorite comic book!

About Anansi

Welcome to Anansi

Build a web crawler that can traverse the vast expanse of the World Wide Web, collecting data from the darkest corners of the internet. Then, you can choose your favorite comic book from your favorite bookstore (under $10)! ☕

Finally, once this YSWS is over, we will open source these datasets for the world to use!

📋 Project Requirements

Web Crawler Must:

•You must crawl websites and parse HTML to extract data (not using an API)
•Store data in a structured directory (include this hierarchy outline in your README)
•Include error handling and logging
•Implement user-agent identification & respect robots.txt (do not crawl any sites that are not allowed to be crawled)
•Provide documentation and setup instructions
•You must have an objective/reason (e.g. find the most popular movie everyday)

Example:

This project scrapes data from each department at De Anza college course listing and puts each list of courses in their own department json file

Github Kaggle

❓ Frequently Asked Questions

What programming languages are accepted?

Any language! Python, JavaScript, Go, Rust, Java, C#, PHP, etc.

Is there a deadline?

This ends on July 4th, 2025 at 11:59pm

Can I use existing web scraping libraries?

Absolutely! Libraries like Scrapy (Python), Puppeteer (JavaScript), or Beautiful Soup are encouraged. The goal is to build a functional and reliable crawler

Can I work in a team?

No, this YSWS is an individual program