git.maemo.org Git - routino/blob - doc/ALGORITHM.txt

   1                              Routino : Algorithm
   2                              ===================
   3
   4
   5    This page describes the development of the algorithm that is used in
   6    Routino for finding routes.
   7
   8
   9 Simplest Algorithm
  10 ------------------
  11
  12    The algorithm to find a route is fundamentally simple: Start at the
  13    beginning, follow all possible routes and keep going until you reach
  14    the end.
  15
  16    While this method does work, it isn't fast. To be able to find a route
  17    quickly needs a different algorithm, one that can find the correct
  18    answer without wasting time on routes that lead nowhere.
  19
  20
  21 Improved Algorithm
  22 ------------------
  23
  24    The simplest way to do this is to follow all possible segments from the
  25    starting node to the next nearest node (an intermediate node in the
  26    complete journey). For each node that is reached store the shortest
  27    route from the starting node and the length of that route. The list of
  28    intermediate nodes needs to be maintained in order of shortest overall
  29    route on the assumption that there is a straight line route from here
  30    to the end node.
  31    At each point the intermediate node that has the shortest potential
  32    overall journey time is processed before any other node. From the first
  33    node in the list follow all possible segments and place the newly
  34    discovered nodes into the same list ordered in the same way. This will
  35    tend to constrain the list of nodes examined to be the ones that are
  36    between the start and end nodes. If at any point you reach a node that
  37    has already been reached by a longer route then you can discard that
  38    route since the newly discovered route is shorter. Conversely if the
  39    previously discovered route is shorter then discard the new route.
  40    At some point the end node will be reached and then any routes with
  41    potential lengths longer than this actual route can be immediately
  42    discarded. The few remaining potential routes must be continued until
  43    they are found to be shorter or have no possibility of being shorter.
  44    The shortest possible route is then found.
  45
  46    At all times when looking at a node only those segments that are
  47    possible by the chosen means of transport are followed. This allows the
  48    type of transport to be handled easily. When finding the quickest route
  49    the same rules apply except that criterion for sorting is the shortest
  50    potential route (assuming that from each node to the end is the fastest
  51    possible type of highway).
  52
  53    This method also works, but again it isn't very fast. The problem is
  54    that the complexity is proportional to the number of nodes or segments
  55    in all routes examined between the start and end nodes. Maintaining the
  56    list of intermediate nodes in order is the most complex part.
  57
  58
  59 Final Algorithm
  60 ---------------
  61
  62    The final algorithm that is implemented in the router is basically the
  63    one above but with an important difference. Instead of finding a long
  64    route among a data set of 8,000,000 nodes (number of highway nodes in
  65    UK at beginning of 2010) it finds one long route in a data set of
  66    1,000,000 nodes and a few hundred very short routes in the full data
  67    set. Since the time taken to find a route is proportional to the number
  68    of nodes the main route takes 1/10th of the time and the very short
  69    routes take almost no time at all.
  70
  71    The solution to making the algorithm fast is therefore to discard most
  72    of the nodes and only keep the interesting ones. In this case a node is
  73    deemed to be interesting if it is the junction of two segments with
  74    different properties. In the algorithm these are classed as
  75    super-nodes. Starting at each super-node a super-segment is generated
  76    that finishes on another super-node and contains the shortest path
  77    along segments with identical properties (and these properties are
  78    inherited by the super-segment). The point of choosing the shortest
  79    route is that since all segments considered have identical properties
  80    they will be treated identically when properties are taken into
  81    account. This decision making process can be repeated until the only
  82    the most important and interesting nodes remain.
  83
  84    To find a route between a start and finish point now comprises the
  85    following steps (assuming a shortest route is required):
  86
  87     1. Find all shortest routes from the start point along normal segments
  88        and stopping when super-nodes are reached.
  89     2. Find all shortest routes from the end point backwards along normal
  90        segments and stopping when super-nodes are reached.
  91     3. Find the shortest route along super-segments from the set of
  92        super-nodes in step 1 to the set of super-nodes in step 2 (taking
  93        into account the lengths found in steps 1 and 2 between the
  94        start/finish super-nodes and the ultimate start/finish point).
  95     4. For each super-segment in step 3 find the shortest route between
  96        the two end-point super-nodes.
  97
  98    This multi-step process is considerably quicker than using all nodes
  99    but gives a result that still contains the full list of nodes that are
 100    visited. There are some special cases though, for example very short
 101    routes that do not pass through any super-nodes, or routes that start
 102    or finish on a super-node. In these cases one or more of the steps
 103    listed can be removed or simplified.
 104
 105 Routing Preferences
 106
 107    One of the important features of Routino is the ability to select a
 108    route that is optimum for a set of criteria such as preferences for
 109    each type of highway, speed limits and other restrictions and highway
 110    properties.
 111
 112    All of these features are handled by assigning a score to each segment
 113    while calculating the route and trying to minimise the score rather
 114    than simply minimising the length.
 115
 116    Segment length
 117           When calculating the shortest route the length of the segment is
 118           the starting point for the score.
 119
 120    Speed preference
 121           When calculating the quickest route the time taken calculated
 122           from the length of the segment and the lower of the highway's
 123           own speed limit and the user's speed preference for the type of
 124           highway is the starting point for the score.
 125
 126    Oneway restriction
 127           If a highway has the oneway property in the opposite direction
 128           to the desired travel and the user's preference is to obey
 129           oneway restrictions then the segment is ignored.
 130
 131    Weight, height, width & length limits
 132           If a highway has one of these limits and its value is less than
 133           the user's specified requirement then the segment is ignored.
 134
 135    Highway preference
 136           The highway preference specified by the user is a percentage,
 137           these are scaled so that the most preferred highway type has a
 138           weighted preference of 1.0 (0% always has a weighted preference
 139           of 0.0). The calculated score for a segment is divided by this
 140           weighted preference.
 141
 142    Highway properties
 143           The other highway properties are specified by the user as a
 144           percentage and each highway either has that property or not. The
 145           user's property preference is scaled into the range 0.0 (for 0%)
 146           to 1.0 (for 100%) to give a weighted preference, a second
 147           "non-property" weighted preference is calcuated in the same way
 148           after subtracting the user's preference from 100%. If a segment
 149           has a particular property then the calculated score is divided
 150           by the weighted preference for that property, if the segment
 151           does not have this property then it is divided by the
 152           non-property weighted preference. To ensure that setting
 153           property preferences near 50% do not cause large variations in
 154           routes the highway's preference is found by taking the square
 155           root of the property preference.
 156
 157 Implementation
 158 --------------
 159
 160    The hardest part of implementing this router is the data organisation.
 161    The arrangement of the data to minimise the number of operations
 162    required to follow a route from one node to another is much harder than
 163    designing the algorithm itself.
 164
 165    The final implementation uses a separate table for nodes, segments and
 166    ways. Each table individually is implemented as a C-language data
 167    structure that is written to disk by a program which parses the
 168    OpenStreetMap XML data file. In the router these data structures are
 169    memory mapped so that the operating system handles the problems of
 170    loading the needed data blocks from disk.
 171
 172    Each node contains a latitude and longitude and they are sorted
 173    geographically so that converting a latitude and longitude coordinate
 174    to a node is fast as well as looking up the coordinate of a node. The
 175    node also contains the location in the array of segments for the first
 176    segment that uses that node.
 177    Each segment contains the location of the two nodes as well as the way
 178    that the segment came from. The location of the next segment that uses
 179    one of the two nodes is also stored; the next segment for the other
 180    node is the following one in the array. The length of the segment is
 181    also pre-computed and stored.
 182    Each way has a name, a highway type, a list of allowed types of
 183    traffic, a speed limit, any weight, height, width or length
 184    restrictions and the highway properties.
 185
 186    The super-nodes are mixed in with the nodes and the super-segments are
 187    mixed in with the segments. For the nodes they are the same as the
 188    normal nodes, so just a flag is needed to indicate that they are super.
 189    The super-segments are in addition to the normal segments so they
 190    increase the database size (by about 10%) and are also marked with a
 191    flag.
 192
 193
 194 Practicalities
 195 --------------
 196
 197    At the time of writing (April 2010) the OpenStreetMap data for Great
 198    Britain (taken from GeoFabrik) contains:
 199      * 14,675,098 nodes
 200           + 8,767,521 are highway nodes
 201           + 1,120,297 are super-nodes
 202      * 1,876,822 ways
 203           + 1,412,898 are highways
 204                o 9,316,328 highway segments
 205                o 1,641,009 are super-segments
 206      * 60,572 relations
 207
 208    The database files when generated are 41.5 MB for nodes, 121.6 MB for
 209    segments and 12.6 MB for ways and are stored uncompressed. By having at
 210    least 200 MB or RAM available the routing can be performed with no disk
 211    accesses (once the data has been read once).
 212
 213
 214 --------
 215
 216 Copyright 2008-2010 Andrew M. Bishop.