Top Million Web Sites
A large-scale scan of the top million web sites (per Alexa traffic data) was performed in early 2010 using the Nmap Security Scanner and its scripting engine.
We retrieved each site’s icon by first parsing the HTML for a link tag and then falling back to /favicon.ico if that failed. 328,427 unique icons were collected, of which 288,945 were proper images. The remaining 39,482 were error strings and other non-image files. Our original goal was just to improve our http-favicon.nse script, but we had enough fun browsing so many icons that we used them to create the visualization below.
The area of each icon is proportional to the sum of the reach of all sites using that icon. When both a bare domain name and its “www.” counterpart used the same icon, only one of them was counted. The smallest icons–those corresponding to sites with approximately 0.0001% reach–are scaled to 16×16 pixels. The largest icon (Google) is 11,936 x 11,936 pixels, and the whole diagram is 37,440 x 37,440. Since your web browser would choke on that, we have created the interactive viewer below (click and drag to pan, double-click to zoom, or type in a site name to go right to it).