Complete tutorial to automate web scraping and data collection
This tutorial will show you how to use OpenClaw for web scraping and automated data collection. You'll learn to extract data from websites, monitor prices, collect content, and build reusable scraping workflows. Estimated time: 25-35 minutes.
By the end of this tutorial, you'll have:
Before starting:
Start with simple data extraction requests:
"Go to https://example.com/product and extract:
- Product name
- Price
- Description
- Rating
- Availability
Save this data to product.json"
OpenClaw will navigate to the page, identify the elements, extract the data, and save it in your preferred format.
Set up automated price monitoring for products you're interested in:
"Every day at 9am, check the price of [product URL]:
1. Visit the product page
2. Extract the current price
3. Compare with previous price
4. Send me a notification if price dropped below $X
5. Save price history to prices.csv"
OpenClaw will create a cron job to monitor prices automatically and notify you of changes.
Monitor multiple products:
"Monitor these products daily:
- https://example.com/product1
- https://example.com/product2
- https://example.com/product3
Create a price comparison report each day"
Collect content from websites for research or archiving:
"Visit https://example.com/articles and:
1. Extract all article links
2. Visit each article page
3. Extract title, author, content, date
4. Save each article as a markdown file
5. Create an index file with all articles"
"Collect today's top stories from:
- https://news.ycombinator.com
- https://www.reddit.com/r/programming
- https://example.com/tech-news
Extract headlines, summaries, and links
Create a daily news digest document"
Automate scraping across multiple pages:
"Scrape all products from this e-commerce site:
1. Visit the first page
2. Extract all product links
3. Click 'Next' button
4. Repeat for all pages
5. Visit each product page
6. Extract product details
7. Save all data to products.json"
OpenClaw will handle pagination, navigation, and data extraction automatically.
Process and store scraped data effectively:
"Extract all products and save as JSON:
- Format: array of product objects
- Include: name, price, description, url
- Save to: ~/data/products.json"
"Extract price history and save as CSV:
- Columns: date, price, product_name
- Append to: ~/data/prices.csv"
"Extract articles and save as markdown files:
- One file per article
- Filename: YYYY-MM-DD-title.md
- Include: title, content, source, date"
Set up automated, scheduled scraping tasks:
{
"cron": {
"jobs": [
{
"schedule": "0 9 * * *",
"command": "agent --message 'Check prices for monitored products and send me a summary'"
},
{
"schedule": "0 12 * * *",
"command": "agent --message 'Collect today's news articles and create a digest'"
}
]
}
}
"Scrape job listings from [job board]:
- Extract: title, company, location, salary, description
- Filter: remote jobs only, salary > $X
- Save: matching jobs to jobs.json
- Send: daily summary of new jobs"
"Collect research papers on [topic]:
- Visit academic database
- Search for papers
- Extract: title, authors, abstract, PDF link
- Download PDFs
- Create bibliography document"
"Track stock prices every hour:
- Visit stock quote page
- Extract current price
- Compare with previous price
- Calculate percentage change
- Alert if change > 5%
- Log price history"
Now that you can scrape websites, explore these related topics:
Combine web scraping with other OpenClaw features for powerful workflows. Use memory to track scraping history, automation for scheduled tasks, and skills for specialized scraping needs.