Mapping roads through deep learning and weakly supervised training

7/23/2019

Creating accurate maps today is a painstaking, time-consuming manual process, even with access to satellite imagery and mapping software. Many regions — particularly in the developing world — remain largely unmapped. To help close this gap, Facebook AI researchers and engineers have developed a new method that uses deep learning and weakly supervised training to predict road networks from commercially available high-resolution satellite imagery. The resulting model sets a new bar for the state of the art for accuracy, and because it is able to accommodate regional differences in road networks, it can effectively predict roads around the globe.

We are now sharing the details of our model and making data available to the global mapping community through Map With AI, a new set of specialized map-editing services and tools. Map With AI includes an editor interface, RapiD, which allows mapping experts to easily review, verify, and adjust the map as needed.

We used this system to map all the previously unmapped roads in Thailand — more than 300,000 miles’ worth — in OpenStreetMap (OSM), a community-based effort to create freely available, editable maps of the world. We were able to complete this project in 18 months — less than half the time it would have taken a team of 100 mapping experts to do it manually.

Accurate mapping data helps us better serve people everywhere with products such as Facebook Marketplace and Facebook Local. Map With AI also aligns with our core goals: to connect people and ensure everyone is represented on the map. As with Facebook AI’s population density maps project, these maps will also be publicly available as a resource for disaster response, urban planning, development projects, and many other use cases. When floods hit Kerala, India, in 2018, for example, Map With AI expedited mapping of the region by the Humanitarian OpenStreetMap Team (HOT) to assist in relief efforts. We hope that RapiD will accelerate OSM and HOT volunteers’ work to create freely available maps of regions around the world.

To use RapiD, a user selects a road to bring it onto the map. From there, it can be further edited as needed before submitting to OSM. White lines represent existing OSM roads. Magenta lines represent RapiD’s prediction. Maxar satellite images are used in this and the next images as the background.

Leveraging new techniques for more efficient, accurate mapping

We've pushed our mapping research forward on several fronts. At CVPR 2018, we helped organize the DeepGlobe Satellite Challenge, advancing the state of the art in satellite image analysis by providing datasets and a competition platform to host and evaluate computer vision and machine learning solutions. We are also developing new learning techniques and architectures suited to the problem space of remote sensing; investigating weakly supervised learning techniques to apply our road mapping work at a global scale; and working with our mapping team to test these approaches at scale and build the right tooling.

Road Segmentation

In extracting roads from satellite imagery, we’ve leveraged recent advances in using fully convolutional neural networks for semantic segmentation in conjunction with large-scale weakly supervised learning. Road detection is a straightforward application of semantic segmentation where the road is the foreground and the rest of the image is the background. As shown in the graphic below, the output of this process is a rasterized map showing how confidently the model can predict whether each pixel of the input satellite imagery is a road. For our road segmentation, we’ve used a modified version of the D-LinkNet architecture that won the DeepGlobe Satellite Road Extraction Challenge. Vectorization and postprocessing techniques can then take these outputs and convert them into road vectors compatible with geospatial databases such as OSM.

Left: results of the segmentation model per-pixel predictions; bright magenta means higher probability of the pixel belonging to a road. Right: Conflation of the vectorized roads data with the existing OSM roads (in white). (Satellite images provided by Maxar.)

Global scale with weakly supervised training

As part of our Thailand road-mapping project, we had human experts review and correct the road networks that the AI system identified. We then used these manually corrected maps as training data for the model. The Thailand project mapped the country’s entire road network, so we could be confident of the accuracy and completeness of the data. We found that training on this dataset produced highly accurate validation results for Thailand, but accuracy dropped sharply for other regions. Because the project aims to be able to map roads across the globe, we investigated ways to use additional OSM data from other regions to train a new model.

Maps of many other countries still contain substantial gaps; therefore, we explored new ways to get high-quality, geographically diverse training data. Drawing inspiration from our previous work on weakly supervised image classification and training building detection models on OSM data, we experimented with translating these weakly supervised training ideas from classification to semantic segmentation. This experiment required identifying regions with adequate, accurate data coverage and then converting the OSM database’s road vectors into rasterized semantic segmentation labels. For both of these challenges, we took a straightforward approach that at first generated noisy, imperfect training data.

We collected our training data as a set of 2,048-by-2,048-pixel tiles, with a resolution of approximately 24 inches per pixel. We discarded tiles where fewer than 25 roads had been mapped, because we found that they often included only major roads (with no examples of smaller roads that would be more challenging to label correctly). For each remaining tile, we rasterized the road vectors and used the resulting mask as our training label. To work at the same resolution as the DeepGlobe dataset, we randomly cropped each image to 1,024 by 1,024 pixels, thereby producing roughly 1.8 million tiles covering more than 700,000 square miles of terrain. The result was 1,000x more than the roughly 630 square miles that the DeepGlobe dataset covered. To create segmentation masks from these road vectors, we simply rasterized each road vector to five pixels. Semantic segmentation labels tend to be pixel-perfect, but the labels we create with this heuristic are not. Roads vary in width and contour in ways that these rasterized vectors could not capture perfectly. Furthermore, roads in different regions around the globe are mapped from different satellite imagery sources and thus do not always align completely with the imagery we use for our training data.

Visualization of the geographic distribution of training data for the OSM road segmentation model. Some areas are missing because satellite imagery was unavailable at the time of the experiments.

Using only the noisy labels that our data collection process generated, we were able to produce results competitive with many entrants in the DeepGlobe challenge. After fine-tuning the training data in the DeepGlobe challenge dataset, our model achieved state-of-the-art results.

What is more noteworthy than these fine-tuned results is that the model performs well on a global scale, even when trained only on OSM data. Most datasets available for training road segmentation models are heavily biased toward particular regions or levels of development. For example, the DeepGlobe roads dataset contains data only from India, Indonesia, and Thailand, and the SpaceNet Road Extraction Challenge dataset focuses only on major cities. The dataset we created spans six continents and all levels of developments, providing much more data to train on than other available alternatives. To evaluate how larger, more diverse datasets affect the generalizability of our model, we evaluated our OSM-trained model as well as the DeepGlobe model (trained on DeepGlobe data). We evaluated both models on several other datasets (Las Vegas, Paris, Shanghai, etc. — see our paper for details), which are outside the geographic distribution of the DeepGlobe dataset. Across these test sets, the mean Intersection over Union (IoU) score of the DeepGlobe model is 0.218, and the mean IoU score of the OSM-trained model is 0.355. These scores give us a 62 percent relative improvement and a 13.7 percent absolute improvement.

Road extraction from a relatively well-mapped area in Kampala, Uganda. From left to right: Maxar satellite imagery, OSM (manually mapped), THA/IND/IDN trained model, Global OSM trained model. The model trained on DeepGlobe draws numerous nonexistent roads through the middle of houses, whereas the globally trained model performs well.

AI-powered tools to efficiently create new maps

Once the model identifies potential roads, we need to validate the roads and submit them to OSM. Bringing this data to the community is an important part of our process; our model's results, though powerful, are not perfect. Local or regional differences can affect whether roads are classified correctly. Some results mistakenly trace other satellite image features, such as dry riverbeds, narrow beaches, and canals. Furthermore, the model may not find all roads within an area, or may overlook connection points and potential roads that would be obvious to an expert human mapper. Therefore, our next step is to join the model's results with capable mappers who have received specialized training in how to validate the model's results. To do this, we leverage tools that are already familiar to the mapping community (iD, JavaOpenStreetMap, and Tasking Manager).

Our efforts are focused on building RapiD, an open source extension of the widely used web-based iD map editor. Additionally, we built a system that combines the model's results with data already available in OSM. This process, called conflation, both advises on how to join new roads with existing data and prevents overwriting existing road data with suggested roads. It is our hope that RapiD will allow people in the mapping community to improve and leverage these tools for their own use cases.

This video demonstrates the RapiD mapping experience in comparison with the traditional iD editor.

The RapiD editor allows human reviewers to visualize the conflated roads, highlight new changes, and use new commands and shortcuts for the most common data cleanup tasks, such as adjusting the road's classification to fit in the surrounding context. Because we extended an existing editing tool, iD, mappers are able to use familiar tooling to work with new data. To ensure that high-quality data is submitted to OSM, we incorporated integrity checks to catch potential issues with the model's results.

Early feedback from leaders in the mapping community has been encouraging:

“The tool strikes a good balance between suggesting machine-generated features and manual mapping. It gives mappers the final say in what ends up in the map, but helps just enough to both be useful and draw attention to undermapped places. It could benefit from a more interactive walk-through to get casual mappers started. The tweaks to iD and the added shortcut keys make it powerful enough for mappers who want to use it more than casually,” said Martijn van Exel, a longtime contributor to OSM. “This is definitely going to be a key part of the future of OSM. We can never map the world, and keep it mapped, without assistance from machines. The trick is to find the sweet spot. OSM is a people project, and the map is a reflection of mappers' interests, skills, biases, etc. That core tenet can never be lost, but it can and must travel along with new horizons in mapping.”

“In my opinion, the most unique benefit of RapiD is that it’s available for some of the world’s most complex geographies, where automation is most desperately needed. Most modern algorithms, training sets, and techniques were invented to work for the areas with highly developed infrastructure. In the developing world — for example, Africa, Southeast Asia, Latin America — where roads are not well-defined, maintained, or developed, even the best-trained human eye can struggle to identify and properly classify features,” said Dmitry Kuzhanov, a geospatial manager in the ride-sharing industry.

“RapiD is a significant step forward because it combines the scale that AI enables with the general intelligence and contextual understanding that humans innately have. We humans are still involved, but we become far more efficient as a consequence. It will be interesting to see if RapiD becomes to OpenStreetMap what the bicycle is to commuting,“ said Edoardo Neerhut, strategic partnerships manager at Mapillary.

Altogether, good tooling empowers mappers, reduces the tedious and time-consuming parts of drawing roads based on satellite data, increases road shape accuracy, and provides options for identifying suggested roads — even if mappers choose not to make use of those suggestions. It was important to provide tooling that did not limit the capabilities and judgment of professional mappers. We will continuously improve RapiD based on feedback from these mappers to make the process smoother. We believe the resulting tooling improves the utility of satellite imagery for mapping.

To learn more about the Map With AI service and our partners’ experiences using it, check out our Facebook Tech@ blog post. To browse our machine learning road predictions or start mapping with RapiD, please visit mapwith.ai.