Anonymous Web Scraping using Python and Tor

Requirements

Python

socks module provides a standard socket-like interface for Python for tunneling connections through SOCKS proxies.
$ sudo pip install pysocks

Requests is an HTTP library, written in Python, that is wrapper over urllib*, and is very pythonic in use.
$ sudo pip install requests

Fedora

$ sudo yum install tor

Ubuntu

$ sudo apt-get install tor

After doing all the required installations. To start the tor and let run in background run following command.
$ tor &

By default tor uses port# 9050 if not mentioned otherwise. You can check if the process is listening using command netstat.
$ netstat -tupln

look for process listening on port# 9050 netstat

Now for the programming part do following open up the python interpreter and run commands as follows.
>>> import socks
>>> import socket
>>> socks.setdefaultproxy(proxy_type=socks.PROXY_TYPE_SOCKS5, addr="127.0.0.1", port=9050)

socks.setdefaultproxy sets a default proxy which all further socksocket objects will use, unless explicitly changed.
>>> socket.socket = socks.socksocket

socks.socksocket returns a socket object which is assigned to socket.socket which opens a socket. Now all connections made by the script will be done using this socket.

>>> import requests
>>> print requests.get("http://icanhazip.com").text
176.10.99.203

Try opening that url http://icanhazip.com from your browser as well. This website shows your public IP address. You will see different IP address in browser and in program output. Now you can change above script and write your webscraping or webcrawling program around it and make your python program run anonymously on the internet.

Advertisements

11 thoughts on “Anonymous Web Scraping using Python and Tor

  1. How can i get requests to .onion sites?

    when i try it, i get an error:
    requests.exceptions.ConnectionError: HTTPConnectionPool(host=’zqktlwi4fecvo6ri.onion’, port=80): Max retries exceeded with url: / (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7fd35ba09b10>: Failed to establish a new connection: [Errno -2] Name or service not known’,))

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s