In this post I’ll be testing the proxy service provided by NetNut. For a bit of context take a look at my What is a Proxy? post.
If you’re interested in trying out NetNut proxies then you can register for a free trial. You’ll be provided with a username and password, with which you’ll be able to login to your dashboard.
The dashboard provides a lot of information on your proxy usage and billing.
The are two separate proxy URLs are:
gw.netnut.net
for most sites; andgw-open.netnut.net
for.gov
sites.
NetNut also provides three proxy types:
dc
— data center;res
— rotating residential proxy;stc
— static residential proxy.
You select the proxy type by modifying your username. For example, use datawookie-res-any
for residential proxies and datawookie-dc-any
for data center proxies.
The simplest application for these proxies is adding them to your browser configuration. I tested this and it works well. My main interest in proxies, however, is for webs scraping, and here too the NetNut proxies work well. You can find code samples for using NetNut proxies from various programming languages here. There’s also good documentation.
Testing
Rotating IP
The first thing that I’d like to test is the rotating IPs. In principle my requests should be routed at random through a variety of proxy exit points. I used the script below to test this behaviour.
import argparse
import os
import requests
from dotenv import load_dotenv
load_dotenv()
PROXY_URL = os.getenv("PROXY_URL")
PROXY_PORT = os.getenv("PROXY_PORT")
PROXY_USER = os.getenv("PROXY_USER")
PROXY_PASSWORD = os.getenv("PROXY_PASSWORD")
def proxy_httpbin(endpoint="ip"):
url = "http://httpbin.org/" + endpoint
proxies = {
"http": f"http://{PROXY_USER}:{PROXY_PASSWORD}@{PROXY_URL}:{PROXY_PORT}",
"https": f"http://{PROXY_USER}:{PROXY_PASSWORD}@{PROXY_URL}:{PROXY_PORT}",
}
response = requests.get(url, proxies=proxies, timeout=10)
response.raise_for_status()
return response.json()
parser = argparse.ArgumentParser(exit_on_error=False)
parser.add_argument("-n", "--iterations", type=int, default=1)
args = parser.parse_args()
for i in range(args.iterations):
print(proxy_httpbin())
🚨 You can’t use a session object here because it maintains a persistent connection, which in turn means that requests are routed through the same proxy.
The script sends a request to http://httpbin.org/ip, which responds with the IP from with the request originated. First check that my actual IP is without using a proxy.
curl http://httpbin.org/ip
{
"origin": "92.10.157.203"
}
Now let’s try a request through the proxy.
python3 test-proxy-httpbin.py
{'origin': '74.77.64.93'}
Looks promising. Now run it a few times in succession to see if the IP changes.
for i in $(seq 10); do python3 test-proxy-httpbin.py; done
{'origin': '88.226.76.120'}
{'origin': '177.19.26.100'}
{'origin': '73.41.151.172'}
{'origin': '190.238.105.211'}
{'origin': '66.24.228.100'}
{'origin': '186.171.17.39'}
{'origin': '73.143.41.35'}
{'origin': '157.35.27.200'}
{'origin': '172.73.46.197'}
{'origin': '179.235.248.198'}
Excellent! Each of the ten requests originated from a different IP address. Proxies rotate as advertised.
Sticky Sessions
What if you don’t want your proxy IP to change? You can keep a specific proxy IP by adding a session ID to your username. The session ID can be any number with between 1 and 8 digits (although it should generally contain at least 4 digits).
For example, the username datawookie-res-any-sid-13577
gives me one specific IP.
{'origin': '5.31.235.75'}
Using this username will always return to the same IP. If, however, I change the username to datawookie-res-any-sid-6999
, which has a different session ID, then I get a different IP.
{'origin': '5.107.173.160'}
You can manually rotate your proxy IP by explicitly changing the session ID.
Speed
My next concern is how much extra latency is being added to each request by routing it through a proxy. I did a quick and rather unsophisticated test: sending 1000 simple GET
requests to Google (https://www.google.com) either through the proxy or directly. I gathered the statistics and generated some plots.
First let’s compare the response times as a time series. The panels below show the response times as a series of observations routed either through the proxy (bottom panel) or directly (top panel). The average response time for each treatment is superimposed as a dashed horizontal line.
If you look carefully at the plot you will see that the times for the direct routing are generally lower than those for the proxy. There’s also a lot less dispersion or variability in the latency from one request to the next. This is to be expected since the direct requests should be following the same route between client and server, while those via the proxy will be travelling a variety of different routes.
A histogram will give a clearer comparison of the response times. Below are histograms of the response times for the same two treatments.
The average for each treatment is again indicated by a dashed line. The average response time for direct routing is 0.082 s, while for routing through the proxy it’s 1.422 s. On average the proxy routing is around 17.2 times slower. For most of my applications this difference is not significant. The only time this would really become a problem is when I am sending out an enormous number of requests, and that seldom happens.
The distribution for proxied requests has a rather long tail, which means that there are probably just a few requests that significantly inflate the average response time.
treatment mean median min max
1 direct 0.08243253 0.08114358 0.0667949 0.2338765
2 proxy 1.42184984 1.13911074 0.3716806 19.7514928
I repeated the analysis with data gathered for http://httpbin.org/ip. The corresponding plots are below.
With this target the average response time for direct routing is 0.303 s, while for routing through the proxy it’s 1.013 s. On average the proxy routing is around 3.3 times slower. The ratio is a lot smaller than when we targeted Google.
I think that the main reason for the disparity between the latencies with Google and httpbin.org is server location. I believe that there’s just a single httpbin.org server (on AWS in Ashburn, Virginia), while Google has highly distributed servers via CDNs. The requests all originate in the UK. The baseline for direct requests to httpbin.org is higher because all of those requests are being routed across the Atlantic, whereas the requests to Google might be served by closer locations in Europe.
Proxy Location
Normally you’d want not only a series of different IPs but also various locations. By default the proxy IPs will be randomly located. You can also be more specific about the geographic location of the proxies. This is very handy if your requests need to originate from (or avoid originating from) a specific country. Once again this is affected by modifying your username. For example, use datawookie-res-any
yields proxy locations situated anywhere in the world, while datawookie-res-uk
confines them to the UK.
To generate data for proxy locations I used http://httpbin.org/ip to get the proxy IP and http://ip-api.com/ to get the corresponding locations. First let’s look at the distribution of proxy locations when we don’t specify any particular geographic location.
They are heavily dominated by proxies in the US. However, if specifically limit the locations to a country then you get exit IPs distributed across that country. There’s a list of support countries here.
Here are the location for 100 requests via proxies in the UK.
And these are the locations of 100 proxy exit IPs in South Africa.
Conclusion
I’m satisfied with the proxy services provided by NetNut. They satisfy my primary requirements:
- easy access to rotating proxy IPs;
- relatively low latency; and
- good geographic distribution.