Anonymous Web Scraping using Python and Tor

Requirements

Python

socks module provides a standard socket-like interface for Python for tunneling connections through SOCKS proxies.
$ sudo pip install pysocks

Requests is an HTTP library, written in Python, that is wrapper over urllib*, and is very pythonic in use.
$ sudo pip install requests

Fedora

$ sudo yum install tor

Ubuntu

$ sudo apt-get install tor

After doing all the required installations. To start the tor and let run in background run following command.
$ tor &

By default tor uses port# 9050 if not mentioned otherwise. You can check if the process is listening using command netstat.
$ netstat -tupln

look for process listening on port# 9050 netstat

Now for the programming part do following open up the python interpreter and run commands as follows.
>>> import socks
>>> import socket
>>> socks.setdefaultproxy(proxy_type=socks.PROXY_TYPE_SOCKS5, addr="127.0.0.1", port=9050)

socks.setdefaultproxy sets a default proxy which all further socksocket objects will use, unless explicitly changed.
>>> socket.socket = socks.socksocket

socks.socksocket returns a socket object which is assigned to socket.socket which opens a socket. Now all connections made by the script will be done using this socket.

>>> import requests
>>> print requests.get("http://icanhazip.com").text
176.10.99.203

Try opening that url http://icanhazip.com from your browser as well. This website shows your public IP address. You will see different IP address in browser and in program output. Now you can change above script and write your webscraping or webcrawling program around it and make your python program run anonymously on the internet.

13 thoughts on “Anonymous Web Scraping using Python and Tor

  1. How can i get requests to .onion sites?

    when i try it, i get an error:
    requests.exceptions.ConnectionError: HTTPConnectionPool(host=’zqktlwi4fecvo6ri.onion’, port=80): Max retries exceeded with url: / (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7fd35ba09b10>: Failed to establish a new connection: [Errno -2] Name or service not known’,))

    Like

  2. Got the subscription of a, supposedly premium, VPN service last September and since then I have had issues with its desktop client. Whenever I update the app, an error pops up which I am not too sure about, but after that the client doesn’t run. I talked with the support and they took 10 hours to respond and when they did, they asked me to reinstall the app and then update. Tried that and again the same issue. Through updates worked fine on my android phone. They said there may be a problem in their app, so they will look into it. A whole month passed and didn’t hear from them. i changed it with FastestVPN a couple of months back and really happy with their desktop client, no server connection issues, no client issues.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s