This time I have prepared a short analysis comparing languages on Wikipedia. We will check what languages are most widely used around the world and which countries have the most Internet users.
Nowadays English is a linga franca, as we can find English speakers in every country. On the other hand there is huge number of languages that are used only locally (in one country or even region).
I can imagine, that languages used globally will have smaller differences between night and day, because in every time zone there are some speakers. Compare English to Czech page view patterns to get the idea:
Of course, this approach is far from being perfect. Mainly because it is based on time, so it doesn’t include migration within the same time zone, i.e. from Latin America to U.S. Moreover Spanish is definitely a global language as it is used in many countries. But, as you will see, this method doesn’t prove it. The population of Spanish native speakers in Europe to small when compared to Latin America to significantly raise page views for “European” daily hours.
I decided to compute the index as follows:
min number of PV per hour on a given day / average number of PV per hour on a given day
I tried several variants of it (i.e. using 20th percentile instead of min), but the results were similar.
This is the language global index for 20 most popular languages on Wikipedia. Higher values mean small differences between night and day:
language glob_idx 1 English 0.7981971 2 Norwegian 0.5851168 3 Persian 0.5177893 4 Arabic 0.4909870 5 Chinese 0.4857803 6 Korean 0.4816817 7 Swedish 0.4765534 8 Ukrainian 0.4445452 9 Polish 0.4108401 10 Spanish 0.4013519 11 Dutch 0.3936351 12 French 0.3602967 13 Italian 0.3544782 14 Portuguese 0.3493829 15 Turkish 0.3481315 16 Japanese 0.3357735 17 Czech 0.3345046 18 Indonesian 0.3308224 19 Russian 0.3130690 20 German 0.2860656
English as the most popular around the world — no surprise here. Then we can see Norwegian and Persian, which I find hard to explain. Anyway, these values are much lower (0.5) than English (0.8) meaning that they can’t really be compared.
On the other end of the scales we have German and Russian, which also can also be surprising. In case of Russia it probably means that most of the Internet users are in one part of the country.
Of course, there are more controversies about this measure — I mentioned Spanish, which definitely is used widely in many countries, but here it’s in the middle of the scale. In fact it has similar value to Polish (which definitely is not a global language).
But there is also something else I wanted to check — number of page views compared with number of native speakers. This could tell as about Internet users in a given country. From the chart below we can see, that this relation is quite linear: the more native speakers, the more daily page views. But some languages are closer to right-bottom corner meaning that there are not so many page views on Wikipedia (and probably not so many Internet users). The colour is the “language global index” described above — lighter means it is more “spread” around the globe.