I decided it would be interesting to investigate the most popular web servers in use today. Let's see if anything interesting shakes out.
I think it's useful to stay on top of trends in the industry. I think the best way to stay engaged is to entertain these passing curiosities, I find that in trying to answer one question I inevitably learn a host of other things unrelated to it. This is, for me, the fastest way to get a holistic view of things.
I'm basing "popularity" off of a subset of Alexa's report of the top million most popular sites. You can follow along with the same data-set if you would like. I realize there's a certain level of faith placed in Alexa here, a better alternative might be to collate lists from multiple sources like Quantcast, but Alexa's data-set is more readily available.
I am only interested in the self-reported server type for each of the sites, this ignores the case of supporting infrastructure beyond the top-level web server, but we're running fast and loose here.
The easiest way I can think of to check is with cURL
using the -I
flag to
request only the header response. I had to adjust expectations when I set things
running against the full million URLs and came to realize it would take
approximately 120 hours to complete. Instead I ran against the top 10,000 sites
and finished up in about an hour.
head -n 10000 top-1m.csv | xargs -P4 -I{} curl -I {} >> output.txt
I'm not following redirects because it is unnecessary for this particular experiment. The output:
HTTP/1.1 301 Moved Permanently
Pragma: no-cache
Location: https://facebook.com/
Cache-Control: private, no-cache, no-store, must-revalidate
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Vary: Accept-Encoding
Content-Type: text/html
X-FB-Debug: weUIry9+s+K9742KJTrA6mFu/tNRbhIx2kphr/7eyl1il6OmXwa3XoAoF9XrVSTQUJymhOY9r3NK3QpKN7lsXA==
Date: Fri, 18 Mar 2016 14:36:58 GMT
Connection: keep-alive
Content-Length: 0
HTTP/1.1 301 TLS Redirect
Server: Varnish
Location: https://wikipedia.org/
Content-Length: 0
Accept-Ranges: bytes
Date: Fri, 18 Mar 2016 14:36:59 GMT
X-Varnish: 3360237012
Age: 0
Via: 1.1 varnish
Connection: close
X-Cache: cp2001 frontend int(0)
Set-Cookie: WMF-Last-Access=18-Mar-2016;Path=/;HttpOnly;Expires=Tue, 19 Apr 2016 12:00:00 GMT
X-Client-IP: 75.53.203.26
Set-Cookie: GeoIP=US:TX:Chireno:31.48:-94.40:v4; Path=/; Domain=.wikipedia.org
For me the only thing of interest here is the reported Server:
name. So
stripping that out from the rest is easy enough, along with a rough count of
each type:
awk '/^Server: / { print $2 }' output.txt | sort | uniq -c | sort -rn | head -n20
1686 nginx
1596 Apache
1200 cloudflare-nginx
414 Microsoft-IIS/7.5
323 BigIP
209 Microsoft-IIS/8.5
200 nginx/1.6.2
186 AkamaiGHost
165 Apache/2.2.15
143 nginx/1.8.0
134 Varnish
112 gws
95 Microsoft-IIS/6.0
94 nginx/1.8.1
88 nginx/1.4.6
86 Apache/2.2.22
83 Tengine
74 LiteSpeed
70 Apache-Coyote/1.1
67 Apache/2.4.7
The most obvious issue is the mixing of specific versions of the same server software. I derived a more general list with the following:
awk '/^Server: / { split($2, arr, "/"); print arr[1] }' output.txt | sort | uniq -c | sort -rn | head -n20
1686 nginx
1596 Apache
1200 cloudflare-nginx
1030 nginx
826 Microsoft-IIS
797 Apache
323 BigIP
186 AkamaiGHost
134 Varnish
112 gws
83 Tengine
74 LiteSpeed
70 Apache-Coyote
64 AmazonS3
45 UltraDNS
41 Tengine
38 QRATOR
37 openresty
36 DNSME
31 sffe
Which reveals another issue present in the server responses that cURL
has
written to the output file - control characters present result in repeated
entries for the same server types. Let's strip those out with sed
:
sed 's/^M//g' output.txt > filtered_output.txt
count | server name |
---|---|
2716 | nginx |
2393 | Apache |
1200 | cloudflare-nginx |
827 | Microsoft-IIS |
323 | BigIP |
186 | AkamaiGHost |
135 | Varnish |
124 | Tengine |
112 | gws |
74 | LiteSpeed |
70 | Apache-Coyote |
64 | AmazonS3 |
51 | openresty |
45 | UltraDNS |
41 | lighttpd |
38 | QRATOR |
36 | DNSME |
31 | sffe |
28 | Server |
26 |
It is interesting to note just how many times Nginx is included in this list, what could be described as "vanilla nginx", Cloudflare's custom implementation which uses Lua, and openresty which, like Cloudflare's version, uses Lua throughout.
I was surprised to see just how many sites are powered by Cloudflare's CDNs it would be interesting to visualize the relative concentration of these sites within the top-n most popular sites. I would guess they cluster at the top of the range, indicating sites forced to optimize content delivery under load.
I was also interested to learn (as seen in the case of Facebook) that the specification does not require a server name response be sent at all.
If we don't strip out the version numbers, a different picture comes to light, specifically for Nginx:
awk '/^Server: / { split($2, arr, "/"); print arr[1], arr[2] }' filtered_output.txt \
| sort | uniq -c | sort -k3,3 | awk '/ nginx / { print $1, $3 }'
count | version |
---|---|
1708 | n/a |
200 | 1.6.2 |
144 | 1.8.0 |
95 | 1.8.1 |
90 | 1.4.6 |
38 | 1.2.1 |
34 | 1.0.15 |
30 | 1.1.19 |
29 | 1.6.0 |
28 | 1.6.3 |
It is unsurprising that the top versions of Nginx basically follow the most popular package repository versions. In this case:
Version Number | Likely Source |
---|---|
1.6.2 | Ubuntu 15.04 LTS and Debian Jessie |
1.8.x | the stable branch of Nginx's hosted repositories |
1.4.6 | Ubuntu 14.04 LTS |
It also unsurprising that most sites (this one included) mask the version of
Nginx being used - most likely in an effort to obscure attack vectors. I was
going to give credit to the site running the single oldest version of Nginx,
0.6.32
, but it appears to be a malware site which is no fun.
It seems the lion's share of internet hosting is held by a relative few, and more interestingly that Nginx and Lua provide unmatched scalability, as evidenced by Cloudflare's adoption. I had previously thought of Openresty as a cute proof-of-concept, looking at these result I think I will have to seriously consider it as a platform for application development.