Member-only story
Scraping Google’s Search Engine With Python — A Step-by-Step Tutorial
Provision your free Custom Search Engine on Google Cloud Platform
In principle, accessing a website via Python is not that hard. You import the requests
module, define the url you want to access, and simply pass a HTTP request. For google.com, that might look something like:
import requestssession = requests.Session()
url = 'https://www.google.com/search?q=Will+it+rain+today+in+Amsterdam'
result = session.get(url)
output = result.text
You could try, but will quickly find that the ‘Accept cookies’ button is in the way and you won’t get any meaningful results. In fact, Google — as well as many other websites — deliberately sets up roadblocks to prevent automated abuse of their search engine. Some workarounds circulate on the web, like clicking the cookie button with selenium
or manually importing your cookie settings. However, there’s a reason to stay away from such practices, and that is the responsible use principle. Web scraping can easily overload a server with large numbers of automated requests — although Google’s servers can probably take the hit, with 3.5 billion searches daily — and circumvents the ads that bring in money. Thus, we should always check the API documentation and ensure the traffic load…