4 Try MOSAIKS – MOSAIKS Training Manual

4.1 Overview

This demo replicates key results from the original MOSAIKS publication (Rolf et al. 2021). While MOSAIKS has great potential to improve access to satellite-based prediction in data-sparse environments, the original paper focused on demonstrating performance in the United States where high-quality training data was readily available.

The US served as an ideal testing ground for several reasons:

Extensive ground truth data available across multiple variables
Reliable spatial referencing of data
Diverse landscapes and built environments
Ability to benchmark against existing methods
Systematic validation of predictions

This validation in a data-rich environment was crucial for establishing MOSAIKS as a reliable tool before deploying it in contexts where ground truth data is scarce or unreliable.

4.2 Demonstration code

4.2.1 Workflow

Below is a link to a Jupyter notebook intended to demonstrate practical use of MOSAIKS with real data. In fact, this notebook uses the original input data and features from Rolf et al. 2021. The code demonstrates:

Loading pre-computed MOSAIKS features and labels
Merging the features and labels
Training a ridge regression model
Evaluating predictions
Visualizing results

4.2.2 Label data

The demo showcases MOSAIKS predicting several variables, and with a subset of the data used in, the original paper. The variables include:

Figure 4.1: Forest cover input data (left) from Global Land Analysis & Discover (GLAD) Global 2010 Tree Cover (30 m)

Figure 4.2: Elevation input data (left) provided by Mapzen, and accessed via the Amazon Web Services (AWS) Terrain Tile service. Download code can be found here.

Figure 4.3: Population density input data (left) from the Gridded Population of the World (GPW) dataset. These data can be accessed here.

Figure 4.4: Nighttime lights luminosity input data (left) generated from nighttime satellite imagery, which is provided by the Earth Observations Group at the National Oceanic and Atmospheric Administration (NOAA) and the National Geophysical Data Center (NGDC). These data can be accessed here.

Figure 4.5: Income input data (left) from the American Community Survey (ACS) 5-year estimates of median annual household income in 2015. These data are accessible using the acs package in R (48), table number B19013

Figure 4.6: Road length input data (left) from the United States Geological Survey (USGS) National Transportation Dataset, which is based on TIGER/Line data provided by US Census Bureau in 2016. These data can be accessed here.

A user simply needs to select which variable they would like to predict, and no other changes need to be made to the code. All data has been preprocessed and the code will download the necessary files from Zenodo.

4.2.3 Constraints

To stay within the Colab free tier limits of memory usage, we subset the data. We take a 50% random sample of both features (K=4,000 instead of 8,192) and observations (N=50,000 instead of 100,000) compared to the original paper. Despite using this reduced dataset, the demo still achieves strong predictive performance, highlighting MOSAIKS’s efficiency.

4.3 Run the code!

Click the badge to run the demonstration!

↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓

Remember to click File -> Save a copy in Drive to save any changes you make.

Or to view a static version of the code on GitHub, click the badge below.

For instructions and tips on using Google Colab, please refer to Chapter 1.

4.4 Don’t want to run code?

Consider watching this demonstration instead!

Figure 4.7: An overview of MOSAIKS and a live demonstration of generating novel predictions using the system. Video recorded by CIGAR Generalized Planetary Remote Sensing - 2020 Convention session. Presented by Esther Rolf and Tamma Carleton.

4.5 What’s next?

After establishing MOSAIKS’s capabilities in the US context, the MOSAIKS development team have successfully demonstrated the system in many additional settings. This includes on the global scale, or in settings with few or low quality data. In the coming chapters, we will explore some of these applications, showing how MOSAIKS can help address data gaps in regions where traditional data collection is challenging or costly.

Looking forward

In the next section we will take a closer look at the label data that can be used with MOSAIKS.