Mapping for humanitarian aid and development with weakly and semi-supervised learning

April 09, 2019

Written byDerrick Bonafilia, James Gill, Danil Kirsanov, and Jason Sundram

Written by

Derrick Bonafilia, James Gill, Danil Kirsanov, and Jason Sundram

Share

When disaster or disease strikes, relief agencies respond more effectively when they have detailed mapping tools to know exactly where to deliver assistance. But extremely reliable and precise maps often are not available. So, our team, composed of artificial intelligence researchers and data scientists in Facebook's Boston office, used our computer vision expertise to create and share population density maps that are more accurate and higher resolution than any of their predecessors.

Building on our previous publication of similar high-resolution population maps for 22 countries, we're now releasing new maps of the majority of the African continent, and the project will eventually map nearly the whole world’s population. When it is completed, humanitarian agencies will be able to determine how populations are distributed even in remote areas, so that health care workers can better reach households and relief workers can better distribute aid. Offering open data for free in a responsible way also enables Facebook researchers to better understand the many applications of their work and to guide their research in the right directions. No Facebook data has been or will be used in the project. The census and satellite data used contain no personally identifiable information.

Using a mixture of machine learning techniques, high-resolution satellite imagery, and population data, we mapped hundreds of millions of structures distributed across vast areas and then used that to extrapolate the local population density. The satellite maps used in this project were generated using commercially available satellite images from DigitalGlobe — the same type of imagery made available via publicly accessible mapping services. The other major data source for the maps is national census data for each country that was shared with Columbia University’s Center for International Earth Science Information Network (CIESIN), which collaborated with Facebook researchers on this project.

Since we released the first set of maps two years ago, they have improved how nonprofits do their work, how researchers learn, and how policies are developed. For example, in Malawi, the Red Cross and the Missing Maps program, in partnership with the Malawi Ministry of Health, used Facebook maps to inform a measles and rubella campaign. By showing that 97 percent of land space was uninhabited, the Red Cross was able to deploy 3,000 trained local volunteers to specific areas in need.

Along with today’s release of a new set of high-resolution maps, we are sharing details here on how we have approached this project.

A challenge suited for deep learning

A country’s census shows how many people live in a particular census tract, but it doesn’t indicate where people live in these tracts — and sometimes the tracts encompass hundreds of square miles. Africa alone has 1.2 billion people across nearly 16 million square miles; its largest census tract is 150,000 square miles with 55,000 people. If researchers knew where the houses or other buildings were located in these tracts, they could create extremely accurate density maps by allocating the population proportionally to each one. This sort of granularity is crucial for efficient allocation of resources for efforts such as vaccination campaigns. Since it is not feasible to find these buildings by hand, we rely on deep learning to find them.

High-resolution (50 cm per pixel) satellite images of the entire globe take up roughly 1.5 petabytes of storage. One early challenge in working with these images is the massive imbalance in the data set: Most of the world’s land does not contain a building, so we have often dealt with negative-to-positive class imbalances of 100,000-to-1. We used a preprocessing step using classical computer vision techniques with near-perfect recall (at the cost of low precision) to discard most areas that did not contain a building. This left us with candidate ~30x30-meter (64x64-pixel) patches of satellite imagery.

We then faced the challenge of classifying which patches contained a building. This was also compounded by the class imbalance. While greatly reduced by the preprocessing, the ratio of empty squares to those with buildings was still 10-to-1 or even 1,000-to-1. This created an imbalanced binary classification problem, and we therefore evaluated our results using the F1 score, which is the harmonic mean of the precision and recall. To avoid regional biases in our results, we calculated results region by region.

We’ve worked with labelers to develop an extensive test set for every country in the project. The breadth of the test set ensures that we maintain high accuracy across different regions. We also work with third-party groups, such as the World Bank, that have conducted on-the-ground validations to ensure that our results correctly reflect the ground reality. Our methodology has been developed in close coordination with experts in geographic and demographic data at CIESIN, and we’ve worked closely with partners such as Humanitarian Open Street Map to make sure we’re focusing our efforts in the right direction. Our collaborations with these partners helps ensure that we take a cross-disciplinary approach and avoid the many pitfalls of attempting this type of global-scale work in isolation. Finally, we make sure to review the list of countries we release publicly with the domestic political context in mind and have avoided the release of country data in a number of policy- and conflict-related circumstances.

The initial iteration of our population density maps was built by performing semantic segmentation using a fully convolutional neural net and then converting the resulting segmentation maps into binary classification results. With an increased training corpus and many advances made by the machine learning research community over the past few years, we’ve been able to simplify the problem to a straightforward binary classification task using residual neural nets. This simplification is both computational and conceptual. Now, given an input image, a single neural net predicts whether the given image contains a building. This approach to classification is also significantly less computationally expensive than a segmentation-based approach because it allows us to use smaller neural nets and produce outputs with a smaller memory footprint. It allows us to build data sets for more places with less computation — a key component of scaling to a truly global data set. In the case of Africa, the process is reduced to classifying 11.5 billion 64x64-pixel images. While this is a large number, the infrastructure at Facebook — in particular, FBLearner and Presto’s Geospatial operations — made this practical. After switching to this classification approach and training a ResNet18 on around 1 million images, we significantly improved results in 66 of the 73 regions that we used both approaches on, with the average F1 score increasing from .818 to .907. You can see this process at work in the image below, showcasing our model predicting on Africa.

Our pipeline first sets aside locations that couldn't contain a building. Then the neural net ranks each remaining location according to the likelihood that it does contain a building. The high-ranking locations are shown here as blue dots. Each is assigned population from census data (show here as the glowing map). Finally, we overlay our distributed population onto the locations on the map. (Background image courtesy of DigitalGlobe.)

Utilizing large-scale open data

Another obstacle to building a global model is acquiring training data sampled from the entire world. We turned to OpenStreetMap (OSM), a free editable map of the world that is being built by volunteers and released with an open-content license. OSM has an extremely large number of labeled features, is open for all to use, and has data for almost every region in the world. The regional diversity of OSM allows us to avoid the developed-world bias found in many other training sets. (For example, systems trained only on brick or concrete buildings might overlook other kinds of structures.) By using the data in OSM, we were able to collect more than 100 million labeled examples to add to our training data set. However, using OSM data for labels presented several challenges that required novel approaches to overcome.

Here is a sample of nearly 500 patches that were marked by our low-precision preprocessing step as potentially containing a building.

Weakly supervised approach

The first challenge here was the quality and correctness of available data, along with the temporal and spatial consistency of OSM data. We solved these problems with our weakly supervised approach to collecting positive examples.

Weakly supervised learning has led to drastic improvements in modeling accuracy in recent work. For example, a team here at Facebook leveraged weakly supervised labels from publicly available Instagram hashtags to outperform state-of-the-art results on ImageNet. A key learning from this work is that training on larger but noisier data sets can drastically improve results.

Following these insights, we used the tags in OSM to weakly label positive examples of buildings in our imagery. If a given patch of imagery overlaps with a building in OSM, we labeled that patch as containing a building. Due to the issues of spatial and temporal alignment (such as inaccurate mapping or outdated satellite imagery), this is not always correct. We then cleaned up these labels by throwing out all positively labeled examples that were marked as obviously not containing a building in our initial preprocessing step. After this cleanup, on a sample on 1,000 positively labeled examples, we found that 996 patches did contain a building, giving us a robust 99.6 percent labeling accuracy for positive examples.

Here are images that our weakly supervised labeling approach identified as containing a building.

Semi-supervised approach

Another challenge is that OSM-tagged features have high precision but extremely low recall. While most labels in OSM are accurate, a lack of label could mean the absence of a building or that the area is yet unmapped. This made collecting negative examples at scale a bit more complex. We use a semisupervised technique that combines elements of bootstrapping (or self-training) and data distillation.

We first ran our existing model on all patches of imagery that remained after our preprocessing step to find the probability of a given image being erroneously labeled as not containing a structure, according to the output of our model, our evaluation of our model on labeled validation images, and a particular threshold on the score. We then used our uniformly sampled and manually labeled data to find the probability of a random image patch containing a building. Using these two probabilities and setting the output threshold of our model accordingly, we could use the outputs of our old model to label a large number of image patches as negative, or not containing a building, while bounding our expected labeling error rate to below 1 percent.

This image shows tiles that our system labeled as not containing a building. A few mistakes are visible, but the approach overall proves to be accurate.

Real-world results

To obtain our production model, we trained a ResNet50 on this new data set and fine-tuned it on our original data set. The new model outperforms the old model in 75 of our 79 territories and further improved the average F1 from its baseline from .818 to .920, a relative accuracy improvement of more than 12 percent compared with its predecessor. Even more exciting than the absolute accuracy increases is the ability to apply these models to a much larger range of geographic regions. This allows for more effective humanitarian efforts in more of the world.

The results of this computer vision problem are joined with the same census results also used to create CIESIN’s Gridded Population of the World. The end result is a set of the world’s most accurate, highest-resolution population density maps. Rigorous evaluations — both on the ground and through high-resolution satellite imagery — by our internal teams and through third-party partners have confirmed the unprecedented accuracy of our our initial release in 2016, and we have made significant improvements on our already state-of-the-art results over the past two years. The unprecedented resolution, scale, and accuracy of our newest offerings should continue to aid humanitarian relief and development efforts around the world.

The data set is available to download here. We plan to release high-resolution population maps of additional countries over the coming months, and we look forward to our partners using them in more places to help people in need. For a view into how our partners are using our maps in their work, please read this companion blog post.

Written by

Derrick Bonafilia

Research Engineer, Facebook

James Gill

Software Engineer, Facebook

Danil Kirsanov

Software Engineer, Facebook

Jason Sundram

Engineering Manager, Facebook