- Python Automation Cookbook
- Jaime Buelta
- 120字
- 2025-04-04 16:17:47
Getting ready
This recipe builds on the introduced concepts, so it will download and parse the pages to search for links and continue downloading.
When crawling the web, remember to set limits when downloading. It's very easy to crawl over too many pages. As anyone checking Wikipedia can confirm, the internet is potentially limitless.
We'll use as an example a prepared example, available in the GitHub repo: https://github.com/PacktPublishing/Python-Automation-Cookbook/tree/master/Chapter03/test_site. Download the whole site and run the included script.
$ python simple_delay_server.py
This serves the site in the URL http://localhost:8000. You can check it on a browser. It's a simple blog with three entries. Most of it is uninteresting, but we added a couple of paragraphs that contain the keyword python.