A group of researchers at the Information Sciences Institute have conducted what they are calling a census of the Internet. The methodology they used for this was to ping each address in turn and map out the type of responses they received. In the end they don’t seem to have received much at all with only 4.4% of addresses responding to a ping. They started this project in 2003.
This has not prevented them from producing a snazzy “map” and a research paper. Now I feel bad saying it given that they have spent so much time on it, but this data is virtually useless, given that the results only show a response, positive or negative, from a host. Their data shows that there are about 180 million active hosts, with that number including servers. If we look at some alternative Internet usage statistics though, we see there are actually more like 1.2 Billion users, and then business users on top of that.
If this study gathered more information about the hosts then there would be a big enough sample size to have a high confidence in the data. Given their purpose was to calculate how many IP addresses were active, getting 10% of the number you would expect means there is a significant hole in the methodology. I think they realise this at some level given the weasel words in the paper, for example in the abstract “..there is much to be learned..” and in the Introduction “Yet there is much to be learned..” while the results and conclusion offer no information beyond a conjectured trend in firewall usage and some motherhood statements about their methodology “..broaden[ing] the field of Internet measurements..”
I hope they revise their methodology from scratch if they intend to produce further iterations of this study.