CartAGen: an Open Source Research Platform for Map Generalization

: Automatic map generalization is a complex task that is still a research problem and requires the development of research prototypes before being usable in productive map processes. In the meantime, reproducible research principles are becoming a standard. Publishing reproducible research means that researchers share their code and their data so that other researchers might be able to reproduce the published experiments, in order to check them, extend them, or compare them to their own experiments. Open source software is a key tool to share code and software, and CartAGen is the first open source research platform that tackles the overall map generalization problem: not only the building blocks that are generalization algorithms, but also methods to chain them, and spatial analysis tools necessary for data enrichment. This paper presents the CartAGen platform, its architecture and its components. The main component of the platform is the implementation of several multi-agent based models of the literature such as AGENT, CartACom, GAEL, CollaGen, or DIOGEN. The paper also explains and discusses different ways, as a researcher, to use or to contribute to CartAGen.

There is not a unique way to generalize a map and the past attempts to automate this complex process led to very different prototypes, which most of the time, can be neither compared nor combined.This limitation does not really allow map generalization research to build upon past research: even if we know past proposals of models and algorithms, we cannot reuse the prototypes and their code.Open science can be a solution to this problem by promoting reproducibility of research and collaboration (Munafò et al., 2017).Reproducible research requires the availability to other researchers of the data and code used in a published study, which might be challenging regarding the code in computational science (Peng, 2011), and regarding both the data (Ostermann & Granell, 2017) and the code (Singleton et al., 2016) in geographical information science.If we go back to map generalization, sharing software components to foster reproducibility is not a new idea (Edwardes et al., 2007;Stoter et al., 2009).Past research mainly focused on the use of web services for map generalization, with the development of the WebGen platform (Neun & Burghardt, 2005;Edwardes et al., 2007) that was further developed and enriched by several research projects (Foester et al., 2008;Foester et al., 2010;Klammer, 2013).The web services provided by the WebGen platform are generalization algorithms and enrichment processes, i.e. the building blocks of an automated generalization process.Other studies then focused on the issue of the orchestration of these web services to build a fully automated process that uses these web services (Gould, 2012;Regnauld et al., 2014).A service-oriented approach based on moving code was also proposed for map generalization (Müller et al., 2012).However, there are very few studies that promote open source platforms for map generalization beyond an isolated release of the code on a repository.We only found the work of Bergenheim et al. (2009)

Presentation of the CartAGen Platform
CartAGen is an open source Java platform developed at IGN France1 .CartAGen was initially a platform to capitalize the research code on agent-based generalization (Renard et al., 2010).It was further extended and released as an open source software.This sections describes the platform architecture, and its content regarding agentbased generalization, algorithms and data enrichment.

Platform Architecture
The CartAGen platform is a GIS library focused on map generalization, but it was not completely built from scratch.It is built upon the GeOxygene Java library2 (Bucher et al., 2012), which implements OGC/ISO specifications and contains advanced GIS capabilities for research, such as data matching, conflation or data quality assessment.It is also based on very common open source Java libraries, such as JTS3 for geometry computing, or GeoTools4 for handling GIS files and coordinate systems.The CartAGen architecture is based on a centralized data schema, following the principles of multiple representation database modeling (Balley et al., 2004;Mustière & van Smaalen 2007).Generic feature types are created for the usual map features, e.g.buildings, roads, rivers (Figure 2), with attributes that can be useful in a map generalization process (i.e.building height and nature in Figure 2), and attributes that usually describe such features (i.e. the fact that the building can be eliminated at some point of the generalization process in Figure 2).With such a centralized schema, algorithms and generalization processed are coded to work on instances of the schema whether they come from national mapping agency data or from OpenStreetMap, they are generic and reusable.The matching of a user dataset to this centralized data schema is done only once when datasets are imported (i.e.use of the delegation design pattern).

Agent-Based Generalization
The main feature of the CartAGen platform is its agentbased generalization model and its implementation for buildings and roads (Ruas & Duchêne, 2007).It is composed of implementations of the AGENT model (Ruas, 1999;Barrault et al., 2001), a partial implementation of CartACom (Duchêne et al., 2012) model, as well as very partial implementations of the GAEL model (Gaffuri, 2007) and DIOGEN (Maudet et al., 2017) models.These agent-based generalization models are partly integrated together (Duchêne et al., 2018) and can be used as standalone processes or jointly within two orchestration frameworks: CollaGen (Touya & Duchêne, 2011), and ScaleMaster2.0(Touya & Girres, 2013).CartAGen also proposes an implementation of the least-squares generalization models from Harrie (1999) and Sester (2005).Results of automated generalization with CartAGen can be found on its website5 .

Generalization Algorithms
Generalization algorithms are the building blocks of the automatic generalization processes, see for instance Stanislawski et al. (2014) for a list of existing algorithms.
Thus, many of these algorithms are implemented in the CartAGen platform with possibilities to select parameter values in dialog boxes (Table 1).
Experiments were carried out to parallelize the algorithms from CartAGen (Touya et al., 2017b), to show that they can be applied to very large datasets, using a third party framework.

Data Enrichment
To automate generalization processes, data enrichment methods are essential to characterize the initial data and identify key patterns, spatial relations, or complex features.This is why many data enrichment techniques are available in CartAGen: creating blocks and cities out of roads and buildings, creating road complex features (Touya, 2010) such as strokes, roundabouts (Figure 3), dual carriageways, or finding groups of aligned buildings.The centralized schema allows the storage of such enrichment as persistent features of the dataset.

CartAGen as an Open Generalization Platform
CartAGen is meant to become an open generalization system, and this section discusses the current capabilities of CartAGen against the requirements (internal, external, and common) of an open generalization system defined by Edwardes et al. (2007) there is an OGC SLD/SE implementation of styling rules, and constraints can be customized with XML files.To summarize this section, only three of the eleven requirements from Edwardes et al. (2007) are not met yet by CartAGen: reusing existing web generalization services, real-time generalization and environment independent research.We plan to meet these three requirements with future improvements of the platform.

CartAGen for Open Science
Reproducible research requires the availability to other researchers of the data and code used in a published study.Other researchers should be able to reproduce the experiments to contradict them, extend them, or to compare them to other approaches.CartAGen only applies to the availability of code/methods, and only this part of the topic is discussed here.Ostermann & Granell (2017) state that we should distinguish reproducibility and replicability.Reproducibility is the possibility to reproduce exactly the same experiments.Replicability is the possibility to reproduce a similar experiment but at a different scale.To enable replicability, pseudo-code and formulas, or the source code are required.So publishing papers with experiments carried out with CartAGen enables replicability, at least regarding the code/method side of the problem.To enable reproducibility, we need executable tools or precise step-by-step information, which goes beyond the access to source code.Both are possible with CartAGen, but giving a URL link to the source code in CartAGen is not enough to make it reproducible, the code should be given with a step-bystep tutorial to exactly reproduce an experiment.
Ostermann & Granell (2017) make another interesting point on the date of the research to reproduce: the older the research is, the harder it is to reproduce or replicate because the code constantly changes and old version might become unavailable.This is the case with CartAGen, and a good practice with reproducibility should be to tag the versions of the code for a given experiment with the DOIs of the papers that used that code.

How to Use CartAGen
Publishing code in open source is not just a compliance to open science standards; it is also a simple way to foster collaborations among researchers in map generalization.This section briefly explains how people interested in map generalization can use CartAGen, and what it is not usable for.

Use CartAGen as a Contributor
The preferred way to use CartAGen is to use it as a developer, and thus a contributor.The CartAGen open source project is managed by a project steering committee (PSC) that defines how the project should be extended and grants commit rights to potential contributors6 .There are two ways to be a contributor to CartAGen.The first way would be the one favoured for new users: a new user simply forks the CartAGen project to develop its own extensions.The user can benefit from the updates in the core project and ask new features and bug corrections by raising "issues" on the Github website of the platform.Then, when the user feels his own extensions are worth being integrated to the core project (e.g. after coding a new algorithm or providing missing documentation), he can send a "pull request" that will be studied and accepted (or not) by the PSC.The second way to contribute to the project is to be even more committed and ask extended rights on the project administration to the PSC.In this case, applications to become a member of the PSC are welcome.

Simple Access to CartAGen
Even if CartAGen is made for a developer use, because of the maturity of the code, a simple access to its functionalities is also enabled.CartAGen can be used with a graphic user interface (GUI) that is quite similar to a GIS interface (Figure 4).Geographic data can be imported with the GUI and generalization can be performed using the default menu.This GUI was mainly designed for developer users, but tutorials7 show how to trigger some algorithms or an agent-based generalization on user data.Another possibility for a simple access to CartAGen is to use a QGIS plugin that runs CartAGen code from QGIS.There are quite few existing plugins, but a tutorial is provided to create a plugin for QGIS in python, for a given CartAGen algorithm.

Using CartAGen in National Mapping Agencies
Research in map generalization has long been driven by the needs of National Mapping Agencies (NMA) and for a few years, several automatic processes have been used in production, using commercial software solutions (Duchêne et al., 2014).Providing automatic solutions for NMAs is not the purpose of CartAGen platform that is dedicated to research.But CartAGen is already fully integrated with PostGIS, and loosely with QGIS, which are open source software that are used in some NMAs to produce maps.And many of the CartAGen algorithms and processes are not available in the existing commercial software, and could improve the existing automatic productions presented in (Duchêne et al., 2014).This is why we believe that automatic processes for NMAs could be built based on CartAGen in the future.We need to develop prototypes that use a subset of the generalization capabilities in a fully automated process for very large datasets, to demonstrate the feasibility.

Research Agenda Supported by CartAGen
As a research platform, CartAGen evolution is mainly driven by the current and future research projects that will use the platform to build prototypes and experiments.This section describes some on-going research projects.

Machine Learning
As map generalization is a complex task that requires some specific cartographic knowledge that is not easy to reproduce in a computer program, machine learning techniques were tried in the early days of automated map generalization research (Weibel et al., 1995).Since then, it was regularly used for data enrichment or for the orchestration of the generalization process (a good review is available in (Karsznia & Weibel, 2018)).The main recent advances in the domain of machine learning are due to the success of deep neural network, and such methods are also being experimented on map generalization problems, with a focus on building generalization for now (Ma, 2017;Sester et al., 2018).CartAGen is not meant to be a machine (or deep) learning library as open source libraries such as PyTorch, TensorFlow or Keras that can handle map images as well as other images.But CartAGen can be used to feed such libraries with training examples obtained with the map generalization processes available in the platform.For instance, Figure 5 shows two images, one before, one after an agent-based building generalization, which were generated from CartAGen to train a U-Net (Ronneberger et al., 2015) kind of deep neural network.CartAGen could be used to experiment such a coupling.

OpenStreetMap Generalization
If most generalization processes were designed to make topographic maps with data from national mapping agencies, now OpenStreetMap is becoming the major data source to make topographic maps at multiple scales all over the world.And most existing generalization processes are not really adapted to OSM characteristics, and mainly its heterogeneity of level of detail, accuracy or completeness (Touya et al., 2017a).For instance, Figure 6 shows buildings lying inside a river bed because of an inconsistent capture of the features.Here consistency is restored thanks to a least squares adjustment, similar to the models proposed for generalization (Harrie, 1999;Sester, 2005), prior to generalization.Figure 6.OSM buildings displaced by least squares to restore some consistency with the river (Touya et al., 2017a).
More generally, the existing algorithms still apply, even if the data diversity requires more specific algorithms (Touya et al., 2017a).But the orchestration processes need to be even more adaptive to take the heterogeneity into account.

Combining Generalization and Stylization
Styling, i.e. the set of symbols used to render the data in a map, is an obvious input of map generalization, as the symbol sizes in the map are required for the generalization system to calibrate the transformations.So, a first approximation is to define the style of a map prior to generalization.But, when advanced styling is used, such as the watercolor style of Figure 7 (Christophe et al., 2016), or textures to depict relief, there are often generalization requirements, and generalization becomes an input of the styling step.In Figure 7 for instance, the roads and the forest boundaries were highly smoothed to make the watercolor effect legible. Figure 7.A watercolor styled map created with advanced computer graphics techniques integrated in GeOxygene, and thus available from CartAGen (Christophe et al., 2016).This interdependency pleads for an integrated management of the generalization and styling processes in mapmaking.CartAGen enables generalization and styling, and will be used to implement future research on this integrated mapmaking, with a specific focus on tactile maps for visually impaired people that require both generalization and styling (Touya et al., 2018).

Generalization and 3D Visualizations
3D visualization of geographic data is not just a nice way to present the information but can be very useful for professional practitioners, for instance for flood management (Figure 8).And even in 3D abstraction and generalization is required to enhance visualization and user experience.Though 3D generalization (Meng & Forberg, 2007) is a real issue, as as can be seen in Figure 8 where some 3D buildings should be simplified, it is not the only issue.The flood management use case requires the visualization of 2D vector layers on top of (or below) the 3D data, and the oblique view requires some generalization of such data, particularly when 2D features are far from the camera.We plan to use CartAGen to generate multiple representation databases that could be used to improve such 3D oblique generalization.

Conclusion and Future Work
To conclude, this paper presented CartAGen, an open generalization platform for research developed by researchers from IGN France.We showed that it meets 8 http://www.itowns-project.org/most of the requirements of an open generalization platform, and we hope that it will be used and collaboratively improved by other researchers in the field.We presented on-going projects where we use CartAGen, but we hope that this paper encourage researchers to develop their own projects with CartAGen, or to participate in these projects with us.Apart from the on-going research topics presented in the previous sections, there is a huge amount of improvements to be made to transform CartAGen into an open generalization platform used by many researchers.The first one is the documentation of this sprawling platform.If we look at very successful open source platforms (QGIS) or libraries (PyTorch for deep learning for example), their success is mainly based on the quality of the documentation, with many tutorials to help developers getting started with the platform/library.All things considered, if we want students and researchers in map generalization to use this open platform, the documentation should be our main focus.We also plan to provide completely automated workflows based on CollaGen and/or the ScaleMaster2.0 to demonstrate how building blocks can work together, because this is where CartAGen goes way beyond the commercial software.Finally, the main missing requirement for an open platform is the lack of interoperability with web services techniques, such as WebGen, and we would like to develop this capability for CartAGen.
that proposes generalization algorithms in the Grass open source platform.This is why, despite all the past initiatives to develop an open generalization system, the assessment is the same in 2018 than back in 2007: it is still extremely complex to share generalization algorithms, or processes, and we are still far from the open science standards.This paper presents the CartAGen platform, which is yet another attempt towards an open generalization platform.Compared to the previous attempts, CartAGen differs as it is an open source library, open to all researchers in the field who want to contribute.The remainder of the paper is structured as follows.Section 2 briefly presents the CartAGen Platform and its capabilities.Then, Section 3 discusses CartAGen as an open generalization platform against the requirements from Edwardes et al. (2007).Section 4 briefly discusses why platform such as CartAGen can favor reproducibility and open science.Section 5 describes how CartAGen can be used by researchers interested in map generalization.In Section 6, some current research topics that use CartAGen are highlighted, before concluding the paper in Section 6.

Figure 1 .
Figure 1.Joint use of AGENT and CartACom generalization with CollaGen at three intermediate scales between 1:25k and 1:100k.

Figure 2 .
Figure 2. UML class diagram of the generic feature types for building and road section features in CartAGen

Figure 3 .
Figure 3. Several road network enrichment algorithms are available in CartAGen (roundabouts in red and branching crossroads in dark green).

Figure 4 .
Figure 4.The GUI of CartAGen, quite similar to any GIS.

Figure 5 .
Figure 5. Two images generated from CartAGen to train a deep learning model for building generalization.The building is generalized by an agent-based process with constraints on size, granularity, squareness, and shape preservation.On the other hand, we still do not know how to integrate a trained deep model such as the one from Sester et al. (2018) with an orchestration framework (e.g.CollaGen or ScaleMaster2.0)that can trigger agent-based processes.CartAGen could be used to experiment such a coupling.

Figure 8 .
Figure 8. 3D oblique visualization of the potential impact of flood on energy network with iTowns 8 .The impacts are represented with 2D geometries put on top of 3D data.