[nolan@nprescott.com] $>  cat blog archive feed

Top Web Servers

2016-03-18

I decided it would be interesting to investigate the most popular web servers in use today. Let's see if anything interesting shakes out.

Why?

I think it's useful to stay on top of trends in the industry. I think the best way to stay engaged is to entertain these passing curiosities, I find that in trying to answer one question I inevitably learn a host of other things unrelated to it. This is, for me, the fastest way to get a holistic view of things.

How?

I'm basing "popularity" off of a subset of Alexa's report of the top million most popular sites. You can follow along with the same data-set if you would like. I realize there's a certain level of faith placed in Alexa here, a better alternative might be to collate lists from multiple sources like Quantcast, but Alexa's data-set is more readily available.

First Things First

I am only interested in the self-reported server type for each of the sites, this ignores the case of supporting infrastructure beyond the top-level web server, but we're running fast and loose here.

The easiest way I can think of to check is with cURL using the -I flag to request only the header response. I had to adjust expectations when I set things running against the full million URLs and came to realize it would take approximately 120 hours to complete. Instead I ran against the top 10,000 sites and finished up in about an hour.

head -n 10000 top-1m.csv | xargs -P4 -I{} curl -I {} >> output.txt

I'm not following redirects because it is unnecessary for this particular experiment. The output:

 HTTP/1.1 301 Moved Permanently
 Pragma: no-cache
 Location: https://facebook.com/
 Cache-Control: private, no-cache, no-store, must-revalidate
 Expires: Sat, 01 Jan 2000 00:00:00 GMT
 Vary: Accept-Encoding
 Content-Type: text/html
 X-FB-Debug: weUIry9+s+K9742KJTrA6mFu/tNRbhIx2kphr/7eyl1il6OmXwa3XoAoF9XrVSTQUJymhOY9r3NK3QpKN7lsXA==
 Date: Fri, 18 Mar 2016 14:36:58 GMT
 Connection: keep-alive
 Content-Length: 0

 HTTP/1.1 301 TLS Redirect
 Server: Varnish
 Location: https://wikipedia.org/
 Content-Length: 0
 Accept-Ranges: bytes
 Date: Fri, 18 Mar 2016 14:36:59 GMT
 X-Varnish: 3360237012
 Age: 0
 Via: 1.1 varnish
 Connection: close
 X-Cache: cp2001 frontend int(0)
 Set-Cookie: WMF-Last-Access=18-Mar-2016;Path=/;HttpOnly;Expires=Tue, 19 Apr 2016 12:00:00 GMT
 X-Client-IP: 75.53.203.26
 Set-Cookie: GeoIP=US:TX:Chireno:31.48:-94.40:v4; Path=/; Domain=.wikipedia.org

For me the only thing of interest here is the reported Server: name. So stripping that out from the rest is easy enough, along with a rough count of each type:

awk '/^Server: / { print $2 }' output.txt | sort | uniq -c | sort -rn | head -n20
     1686 nginx
     1596 Apache
     1200 cloudflare-nginx
      414 Microsoft-IIS/7.5
      323 BigIP
      209 Microsoft-IIS/8.5
      200 nginx/1.6.2
      186 AkamaiGHost
      165 Apache/2.2.15
      143 nginx/1.8.0
      134 Varnish
      112 gws
       95 Microsoft-IIS/6.0
       94 nginx/1.8.1
       88 nginx/1.4.6
       86 Apache/2.2.22
       83 Tengine
       74 LiteSpeed
       70 Apache-Coyote/1.1
       67 Apache/2.4.7

The most obvious issue is the mixing of specific versions of the same server software. I derived a more general list with the following:

awk '/^Server: / { split($2, arr, "/"); print arr[1] }' output.txt | sort | uniq -c | sort -rn | head -n20
     1686 nginx
     1596 Apache
     1200 cloudflare-nginx
     1030 nginx
      826 Microsoft-IIS
      797 Apache
      323 BigIP
      186 AkamaiGHost
      134 Varnish
      112 gws
       83 Tengine
       74 LiteSpeed
       70 Apache-Coyote
       64 AmazonS3
       45 UltraDNS
       41 Tengine
       38 QRATOR
       37 openresty
       36 DNSME
       31 sffe

Which reveals another issue present in the server responses that cURL has written to the output file - control characters present result in repeated entries for the same server types. Let's strip those out with sed:

sed 's/^M//g' output.txt > filtered_output.txt

Final Tally on Top Web Servers

count server name
2716 nginx
2393 Apache
1200 cloudflare-nginx
827 Microsoft-IIS
323 BigIP
186 AkamaiGHost
135 Varnish
124 Tengine
112 gws
74 LiteSpeed
70 Apache-Coyote
64 AmazonS3
51 openresty
45 UltraDNS
41 lighttpd
38 QRATOR
36 DNSME
31 sffe
28 Server
26

top 10 web servers

Takeaways

It is interesting to note just how many times Nginx is included in this list, what could be described as "vanilla nginx", Cloudflare's custom implementation which uses Lua, and openresty which, like Cloudflare's version, uses Lua throughout.

I was surprised to see just how many sites are powered by Cloudflare's CDNs it would be interesting to visualize the relative concentration of these sites within the top-n most popular sites. I would guess they cluster at the top of the range, indicating sites forced to optimize content delivery under load.

I was also interested to learn (as seen in the case of Facebook) that the specification does not require a server name response be sent at all.

Nginx Version Security

If we don't strip out the version numbers, a different picture comes to light, specifically for Nginx:

awk '/^Server: / { split($2, arr, "/"); print arr[1], arr[2] }' filtered_output.txt \
| sort | uniq -c | sort -k3,3 | awk '/ nginx / { print $1, $3 }'

Results

count version
1708 n/a
200 1.6.2
144 1.8.0
95 1.8.1
90 1.4.6
38 1.2.1
34 1.0.15
30 1.1.19
29 1.6.0
28 1.6.3

top Nginx versions

It is unsurprising that the top versions of Nginx basically follow the most popular package repository versions. In this case:

Version Number Likely Source
1.6.2 Ubuntu 15.04 LTS and Debian Jessie
1.8.x the stable branch of Nginx's hosted repositories
1.4.6 Ubuntu 14.04 LTS

It also unsurprising that most sites (this one included) mask the version of Nginx being used - most likely in an effort to obscure attack vectors. I was going to give credit to the site running the single oldest version of Nginx, 0.6.32, but it appears to be a malware site which is no fun.

The Web Server Landscape

It seems the lion's share of internet hosting is held by a relative few, and more interestingly that Nginx and Lua provide unmatched scalability, as evidenced by Cloudflare's adoption. I had previously thought of Openresty as a cute proof-of-concept, looking at these result I think I will have to seriously consider it as a platform for application development.

[nolan@nprescott.com] $> █