Information about wikiroute and the graph structure of wikipedia

Usage of the default wikiroute client

You can plug in the source, destination and language in the respective input elements and click on the route button button or press enter to get the shortest paths.

The source and destination are case sensitive, except for the first character. The source and destination must not be a redirect page and it must be in the main namespace.

You can click on the show graph checkbox for a more graphical representation of the paths.

You can click on the pages in the path to only show the paths that go via that page. In order to get back to the overview of all the paths, simply press enter or click on the route button again.

There is also a button to swap the source an destinations.

If you want multiple sources or destinations you can do that by adding a "|" between the sources or the destinations. In this case it will also find the shortest paths. For example, if you have "Beetle|dog" as the source and "wolf" as the destination, it will only show the path from dog to wolf, since that one is shorter than the paths from beetle to wolf.

API

GET /wikiroute

URL params: source, dest, lang

The source and dest parameters contain the sources and destinations. When there are multiple sources or destinations, they should be delimted by an "|".

This request returns an JSON object, with the following properties:

POST /wikiroute

URL params: lang

request body: a JSON object containing the sources and destinations. JSON format: {"sources":["example page 1","example page 2"],"dests":["example page 3"]}

returns the same thing as the GET API, but it doesn't suffer from restrictions in URL length.

Datasets

Last update:

The raw data was downloaded from dumps.wikimedia.org (pagelinks.sql). In case you want to download the parsed datasets, please contact me.

Statistics

English statistics are based on the January 2024 version and nl stats based on April 2023, unless mentioned otherwise.

Longest route:

For the english wikipedia the diameter of the graph was 35 (as of January 2024). The two articles that were the farthest apart are "Operating capacity" and "Billboard Top Hits: 1995". In the new dataset the article "1874 Missouri State Auditor election" is further away. The longest shortest path for nl.wikipedia.org was a bit shorter, only being 18 steps long.

The longest route was determined in the following way: first by using Dijkstra's algorithm a good guess was created. This was done by picking a random article and checking which article is the farthest away from that article. Then the starting article was optimized. After that it was checked that the destination article is also optimal.

To verify that the route is indeed the longest a random well-connected article ("root") was chosen. Then the maximal distance to that article was determined (taking the random article as destination, resulting in distance 5). After that every article with a distance of less than (expected longest path)-(max distance to random article)=35 from the random article was disregarded. The few remaining articles were then examined as possible other endpoints.

Average route length:

Route length histogram

Based all routes starting from a random sample of 2750 (en) / 10000 (nl) articles. The proportion gives the probability that given two random wikipedia articles the shortest path has a given distance. It is calculated by dividing the number of paths with the given distance by the total number of paths (including no route found) starting from the sample.
DistanceProportion (en)Proportion (nl)
0 0.0000001740.000000472
1 0.0000132 0.0000161
2 0.00153 0.00117
3 0.0768 0.0376
4 0.474 0.2152
5 0.317 0.2925
6 0.0483 0.1579
7 0.00859 0.0796
8 0.00084 0.0198
9 0.0000812 0.00318
10 0.0000192 0.000287
>=110.0000229 0.0000372
No route0.072600.1926

Extension

I have also made a userscript that you can use on wikipedia. It will mark the links you have to follow to get to get to the destinations using the least amount of hops.

This userscript requires Tampermonkey or a similar extension in order to run.

To find out all of the features, you should look at the userscipt and reverse-engineer the latest features.

Code

Most of the code is available on Github