Scraping TOR Part 1
I enjoy data and it's mostly for threat intelligence. One of the things I wanted to understand was scraping data off of TOR. In this series, we will first see how to interact with ‘TOR” through the use of command-line tools.
The Tor Project | Privacy & Freedom Online
ABOUT US We believe everyone should be able to explore the internet with privacy. We are the Tor Project, a 501(c)(3)…
Tor, short for The Onion Router, is free and open-source software for enabling anonymous communication. It directs Internet traffic through a free, worldwide, volunteer overlay network, consisting of more than seven thousand relays, to conceal a user’s location and usage from anyone performing network surveillance or traffic analysis. Using Tor makes it more difficult to trace a user’s Internet activity. Tor’s intended use is to protect the personal privacy of its users, as well as their freedom and ability to communicate confidentially through IP address anonymity using Tor exit nodes.
Tor (network) - Wikipedia
Developer(s) The Tor Project Initial release 20 September 2002; 19 years ago  0.4.7.7 (28 April 2022; 43 days ago…
Let's start with installing TOR on our operating system:
sudo apt install tor
Next, we need to edit the torc config file to uncomment the control port 9051, then find CookieAuthentication 1 and uncomment it then change the 1 to 0:
sudo vi /etc/tor/torrc
Now we can go ahead and restart the service:
sudo /etc/init.d/tor restart
Torify is the tool that can be used to run any command through TOR. To verify that you are actually running commands through TOR run the following two commands and make sure that they are working as expected i.e not the same
torify curl ifconfig.me 2>/dev/nullcurl ifconfig.me
Now you should be able to run commands through TOR and have an environment set up for Part 2 of this series. I stop the service whenever I’m not using it, so do as you please.
- If you are experiencing weird issues, restart the service
- If your IPs are the same as your ISP-given IP address, something is wrong, restart the service
- If you want to refresh the node you are connecting to do this:
echo -e 'AUTHENTICATE ""\r\nsignal NEWNYM\r\nQUIT' | nc 127.0.0.1 9051
See you in the next tutorial to start looking at running tools through this setup and start scraping for Keywords that we won’t necessarily see on the ‘surface web’.