Routing on Geospatial Reference Information for Transport Networks of Spain

The Spanish National Geographic Institute (IGN) published the first release of the Geographic Reference Information on Transport Networks (GRI-TN) in March 2017. Its main goal was to fulfil INSPIRE Directive requirements, as well as to become the main data source for other products developed by the IGN regarding this theme. During the years following that first release, the focus has been on updating and improving the data. This fact has encouraged new data use cases, which differ from the initially planned ones, have arisen, allowing to detect problems in the data, and highlighting the need to evolve the data model, as well as the way in which they are provided to users (not only formats, but also update frequency). One of those use cases is finding the shortest paths between different points in the area covered by the Road Transport Network. In this article, the methodology used to do it is exposed; likewise, the setbacks that have come up during the process and the current limitations of the GRI-TN datasets in order to get most accurate results.


Introduction
In July 2020, personnel from the Ministry of Ecological Transition and Demographic Challenge (MITECO, hereinafter) contacted IGN to consider the possibility of calculating the cost in time and the distance in kilometres from the capital of each one from the municipalities of Spain to the nearest hospital, the nearest municipality with more than 20.000 inhabitants and the closest municipality with more than 50.000 inhabitants. The reason for this request was based on a study that sought to assess the risk of depopulation in Spanish municipalities, and which took the distance from the various socio-cultural and health facilities and services as one of the factors that a priori could contribute to increasing said risk.

Approach
In order to identify the data set and the most appropriate tools for resolving the query from among those available at the IGN, we proceeded to analyse in detail the main particularities of the request: -It was necessary to carry out this route calculation in a massive way (8,131 municipalities, according to Statistics National Institute (INE) data from December 2019).

-
In addition, it was necessary to identify, for each starting point (centroid of each municipal capital), the point of interest to which it should go for each of the three types of routes requested.

-
It was not only necessary to obtain the cost of each route in kilometres but also in minutes. To do this, it was necessary to estimate the time it takes to cross each of the sections that make up the road network since speed data is not available for each section.
After evaluating the available tools and the state of the data, it was decided to use the sections of the road network collected in the GRI-TN database, located on a PostgreSQL server, using the attributes of said sections such as the type of section (trunk, interchange, service road, etc.), the type of road (highway, highway, conventional road, etc.), the type of road (double or single), etc., to estimate the average speed in them, and the functions of the pgRouting extension to calculate the routes and their associated costs.

Input Data
The process described below was carried out based on the following data: -Hospitals: National Catalogue of Hospitals, provided by MITECO. It is an alphanumeric catalogue that was provided with a specific geometry to perform spatial analysis.

Process
In the figure below (which is described in detail in the following sections), are shown schematically the stages followed to obtain the requested routes. The linear elements (segments) of the GRI-TN road network have an implicit topology: they are connected, but these connections are not described or materialized in any way in the database.
To generate the routes from the data collected in the table of road network segments, it is necessary to convert this topology into an explicit topology, i.e. to create nodes at all points where two or more segments connect. This process is done using the pgr_createtopology function on pgRouting, which not only generates these nodes but also identifies the start and end nodes for each section.

Figure 2: Production of explicit topology
Once the information that describes how the network is connected had been obtained, the next step was to calculate the time required to travel each of the segments, based on their length and the estimated average speed on each type of road. The approximation to this data was obtained in two phases: -In the first one, a preliminary calculation supported by several studies that propose methodologies and average speeds for the different types of road. It was based on the speeds proposed for each type of road (distinguishing between motorway/highway, conventional road, crossing, urban road, and road or 1 Non-trunk segment: junctions, entry or exit roads, roundabouts, and service roads. 2 The distinction between named roads and unnamed roads is due to the fact that the latter, usually, correspond to roads with path), and they were qualified according to whether each section in question is a trunk or not (link, roundabout or service road).

-
In the second phase, an evaluation of the speeds obtained was carried out, calculating the cost in kilometres and minutes of various routes from the data of the GRI-TN and comparing them with the results obtained for the same routes in commercial navigation programs. From the results of this comparison, the preliminary speeds were adjusted, finally using the estimates shown in the table below 1 , 2 .

Table 1: Travel speed estimation by type of road
Besides, a subsequent estimation was carried out for the roads to evaluate the speed based on their slope and the sinuosity of their route, although this data was not finally used to calculate the requested routes.

Definition of influence areas
Once the spatial data necessary for the calculation of routes had been obtained (sections of the network, cost of each one, and connection nodes), we proceeded to identify the start and endpoints of each route). The origin points for all routes, as indicated above, were those identified as a descriptive point of the capital of each municipality in the Geographical Nomenclature for Municipalities and Population Entities published by the IGN.
Since these points are generally not located on segments of the GRI-TN, and therefore they do not coincide with any of the nodes created in the process described above, it was necessary to assign them a node. Firstly, points were assigned to the closest node. Subsequently, it was found that in many cases the closest node had poor connectivity (it was associated with roads with a very high cost of passage or without connection to other roads in the network), which led to clearly excessive route costs. To avoid this type of errors, a second version no known title. These are usually forest roads or tracks that have been paved to enable the passage of vehicles, but hardly any maintenance is carried out and they also have a very winding route, both factors that limit the speed greatly. of the table of linear geometries and nodes was made, in which the shortest connectivity routes and high travel times were eliminated (mainly roads and paths, to force the algorithm to identify only routes in roads and urban roads). From this new table of segments, the nodes with optimal connectivity were identified in an environment of 100-200 meters from the point identified by nomenclature as the centre of the municipality capital. To get this was used the Floyd-Warshall algorithm (implemented in the function pgr_floydWarshall on pgRouting), which calculates all possible routes between a set of segments of a network. This same process was carried out, for the same reasons, with the points identified by the hospitals. Any minimum path algorithm requires the identification of the start and end nodes. Consequently, for each of the starting points, it was necessary to find the closest points of interest, that is, according to the objectives of the study, the hospital and the closest municipalities with more than 20,000 and 50,000 inhabitants. The problem was that this data was precisely part of the missing information. To obtain this, territorial models were generated that show the cost in minutes of going from each point of the territory to the closest point of interest. These models were generated by the interpolation of the nodes that are at a maximum time of 55 minutes from any hospital or 100 minutes from any municipality with more than 20,000 or 50,000 inhabitants. Consequently, the algorithm pgr_drivingdistance was used, which allows calculating the cost from a given node to all nodes that are at less than a given maximum time.
The result was three raster files, one for each set of items of interest. The following figures show part of these cost models.

Generation of Routes
When the necessary data was available, the Dijkstra algorithm (pgr_dijkstra on pgRouting) was executed to obtain the final routes. In this stage, the main problem to be faced was the performance of the database, since the tables of segments and nodes had, respectively, 7 and 5 million records and the calculation of each of the routes was too slow to make the study feasible. This aspect was solved by working on views of the section and node tables, dynamically generated for each of the routes from a variable size buffer depending on the distance between the start and end points of the route. Once this is completed, each calculation time of each route has been less than 5 seconds in more than 90% of the cases, approaching 30-35 seconds in the cases in which a greater number of segments came into play.

Review of Routes obtained
After the routes were calculated, those with the highest cost were reviewed, as well as the municipalities where it was not possible to calculate any of the routes.
Regarding the higher-cost routes, errors were sometimes detected in the network or in the definition of the starting or ending point of the route, and consequently, they were corrected. In the same way, and since the table that did not make use of the roads and paths of the network was finally used, the results were compared with those that would have been obtained using the paths, to identify the cases where this change could have meant choosing a significantly more expensive path (more than 5 minutes slower). These cases were analysed and the route that could be considered the most reliable was chosen (usually, after comparing it with route calculation results from other commercial programs).
In the cases of the municipalities for which it was not possible to calculate the route, these were analysed based on two reasons: a) Island municipalities located on islands that do not have populations of 20,000/50,000 inhabitants (therefore, it is not possible to calculate the route from the road network; it would be possible to calculate the route to the nearest port or heliport, and from there, estimate the time to heliports or to ports on other islands that do have populations of more than 20,000 or 50,000 inhabitants). As an example, in the figure below it can be seen that the island of Fuerteventura does not have a population higher than 50,000. b) The singular case of the municipality of Llivia (Gerona), which is surrounded by French territory. Given that the IGN only has data on the road network at the Spanish level, it has not been possible to obtain the cost of the routes from the GRI-TN database.

Results
The research made it possible to obtain, mainly, two sets of data: -On one hand, the cost maps described in section 2.2.2, from which other derivative products have been obtained, such as isochrone maps (lines that join points of territory at the same distance as the nearest point of interest).

Conclusions
This research shows a particular case of use of GRI-TN road network data. With its dissemination, the aim is to show one of the possible use cases of the information collected in GRI-TN, as well as the issues detected and how they were solved. However, it is necessary to take into account the limitations of the starting data, which may have an impact on the results obtained:  The segments of the GRI-TN do not have information regarding the direction or turning restrictions. This makes the results obtained (in terms of both of cost in km and cost in time) are, in general, slightly optimistic.  Currently the updating phase of the GRI-TN road data is being completed, so that soon there could be new data that would change some of the times obtained. Although in practically all the cases it would not affect significantly, it is necessary to point out that during the period in which this study was carried out, the road network in the regions of Catalonia and Andalusia was being updated, also with the incorporation of several new high-speed roads to the GRI-TN. In general terms, results have been positive, insofar as it has allowed the road network data collected in the IGR-RT to be tested, and above all, it opens the door to the possibility of offering new services to citizens. And not so much those related to the calculation of the optimal route (for which there are numerous commercial programs, both free and commercial use), but mainly concerning the calculation of isochrones, which can be useful for many users and institutions.