CodeX

Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Follow publication

Member-only story

Scraping Google’s Search Engine With Python — A Step-by-Step Tutorial

Wouter van Heeswijk, PhD
CodeX
Published in
6 min readMay 29, 2021

--

Photo by Marten Newhall on Unsplash

In principle, accessing a website via Python is not that hard. You import the requests module, define the url you want to access, and simply pass a HTTP request. For google.com, that might look something like:

import requestssession = requests.Session()
url = 'https://www.google.com/search?q=Will+it+rain+today+in+Amsterdam'
result = session.get(url)
output = result.text

You could try, but will quickly find that the ‘Accept cookies’ button is in the way and you won’t get any meaningful results. In fact, Google — as well as many other websites — deliberately sets up roadblocks to prevent automated abuse of their search engine. Some workarounds circulate on the web, like clicking the cookie button with selenium or manually importing your cookie settings. However, there’s a reason to stay away from such practices, and that is the responsible use principle. Web scraping can easily overload a server with large numbers of automated requests — although Google’s servers can probably take the hit, with 3.5 billion searches daily — and circumvents the ads that bring in money. Thus, we should always check the API documentation and ensure the traffic load…

--

--

CodeX
CodeX

Published in CodeX

Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Wouter van Heeswijk, PhD
Wouter van Heeswijk, PhD

Written by Wouter van Heeswijk, PhD

Assistant professor in Financial Engineering and Operations Research. Writing about reinforcement learning, optimization problems, and data science.

Responses (1)

Write a response