3  Access MOSAIKS

This chapter is under review and may need revisions.

3.1 Introduction

At its core, MOSAIKS requires two main inputs: satellite features and ground truth data. Our aim is to make these features as accessible as possible so that the majority of users do not have to worry about the technical details of satellite imagery processing.

To this end, we have worked to develop multiple ways to access MOSAIKS features:

Option Imagery Source Spatial Coverage Spatial Resolution Temporal Resolution Weighting
MOSAIKS API Planet Labs Visual Basemap Global land areas 0.01° 2019 Q3 Unweighted
MOSAIKS API Planet Labs Visual Basemap Global land areas 0.1°, 1° 2019 Q3 Area & population
MOSAIKS API Planet Labs Visual Basemap Global land areas ADM0, ADM1, ADM2 2019 Q3 Area & population
Rolf et al 2021 Google Static Maps Continental United States (~100k locations) 0.01° 2019 unweighted
Chapter 14 Any - see Chapter 9 User-defined User-defined User-defined User-defined
Table 3.1: Summary of MOSAIKS feature sources. The weighting column indicates that the features were generated at 0.01 degree resolution and are weighted when they are aggregated up to lower resolutions. The available weighting schemes are based on area of the aggregation polygon, or by the population of the the aggregation polygon.
  • Chapters 1 through 8 focus on publicly available features
  • Chapters 9 through 15 cover computing custom features from imagery

The MOSAIKS API should be considered the primary way to access features. It is a user-friendly interface that allows you to download features for any location on Earth. The API is designed to be accessible to users with a range of technical backgrounds, from beginners to experts. For more details on what features are available on the API, see Chapter 13.

However, there are many settings where users will want or need to compute their own customized features. Chapter 14 guides readers through this process.

3.2 MOSAIKS API

MOSAIKS API Link

The MOSAIKS API is a user-friendly interface that allows you to download features for any land location on Earth. The API is designed to be accessible to users with a range of technical backgrounds, from beginners to experts. To take advantage of the API, you will need to register for an account.

3.2.1 Register for an account

Visit api.mosaiks.org.

Figure 3.1: Login page for the MOSAIKS API.

Select Register to create an account. You will need to provide a username, an email, and a password.

Figure 3.2: Registration page for the MOSAIKS API.

Once registered, you can log in to begin downloading MOSAIKS features.

Figure 3.3: MOSAIKS API landing page.

3.2.2 API resources

From the landing page, you can read additional information about MOSAIKS and access resources to help you get started.

This book is developed to provide you with all the information you need to use MOSAIKS.

The API contains the following pages:

Page Description
Home Landing page for the API. Contains general information about using MOSAIKS and the API
Precomputed Files Precomputed features at administrative boundary scales
HDI Global Human Development Index (HDI) estimates at the municipality and grid levels
Global Grids Precomputed and area or population features at 0.1° and 1° resolution
Map Query Precomputed features at 0.01 degree resolution, user defines bounding box over area of interest
File Query Precomputed features at 0.01 degree resolution, user uploads file with latitude and longitude coordinates
My Files Files you have queried from the API, available to download
Resources Example Python and R notebooks for using the MOSAIKS framework
Table 3.2: MOSAIKS API pages and descriptions.

3.2.3 API features

Currently the MOSAIKS API has a single set of global features (with several aggregation levels). The features are freely available to the public for download; this is the fastest and easiest way to begin using MOSAIKS.

The API features use input imagery from Planet Labs, Inc. Visual Basemap Global Quarterly 2019 (quarter 3) product. Image quality, and therefore feature quality, may be affected by local conditions. For example, an area undergoing a rainy season in the third quarter (July to September), is likely to contain image artifacts from cloud cover. This will in turn effect the feature calculations. For more details on the input imagery Chapter 9.

Given the static nature of the API, the easiest way to get started with MOSAIKS is to have label data for a recent time period (ideally from 2019 for fast changing labels, or a close year for more steady labels).

Figure 3.4: Planet Labs basemap imagery (left) and 4 of 4,000 MOSAIKS features downloaded from the API (right).

Using MOSAIKS for time series data is possible and can work well, however this currently requires computing your own custom features. See Chapter 14 and Chapter 18 for more information.

The MOSAIKS features are created using a 0.01 x 0.01 degree latitude-longitude global land grid. These are the native features available for download from the API, but it also offers pre-aggregated features at 0.1 and 1 degree resolution, as well as administrative boundaries (ADM0, ADM1, and ADM2).

3.2.4 High resolution features

The file query and map query are the two methods to obtain the high resolution (0.01 degree) features through the API. For simplicity, we store these features in a tabular format with latitude and longitude coordinates. These coordinates are the center of each grid cell.

When you download the high resolution features (0.01 degree), you will receive them in a tabular .csv format where:

  • Each row (N) represents a unique grid cell
  • The first two columns contain latitude and longitude coordinates (grid centroids)
  • The remaining columns represent K features (currently K = 4000 features)

Note: There is a limit of N = 100,000 records per query

3.2.4.1 Map query

  • Create rectangular boxes by specifying latitude and longitude coordinates
  • Multiple boxes can be created
  • The system displays an estimated number of records for each box
  • Note that estimates are based on box area and may not reflect actual record numbers, especially for areas containing seas and oceans
Figure 3.5: MOSAIKS API Map Query page.

Use geojson.io to find the bounding box coordinates for your area of interest.

3.2.4.2 File query

  • Submit a file with custom latitude and longitude coordinates
  • The API returns features for grid cells closest to your input coordinates
  • Points are allocated to the nearest grid point if they don’t exactly match
  • The output file may have a different number of rows than your input
  • Point ordering may change in the output
Figure 3.6: API File Query

3.2.5 Aggregated features

Many users may find it easier to work with features aggregated to some level. The MOSAIKS API offers pre-aggregated features to accommodate these needs. The API offers several levels, including larger grid cells (0.1 and 1 degree) or summarized to administrative boundaries (ADM0, ADM1, and ADM2). These files are available for download as either single or chunked files depending on the resolution.

Figure 3.7: Example showing of 3 representative random convolutional features (rows). Features are downloaded from the MOSAIKS API at 0.01° resolution (the native resolution) and aggregated to 3 levels, including (A) larger grid cells (0.1°), (B) counties, and (C) states.

3.2.5.1 Weighting schemes

At each level of aggregation, we offer area weighted features and population weighted features. Population weights are from the Gridded Population of the World (GPWv4) population density dataset. The area weighting scheme is based on the area of the high resolution grid cells.

3.2.5.2 When to use aggregated features

The aggregated features are particularly useful in a few scenarios:

  1. You have data at a scale larger than the 0.01 degree grid cells. Many datasets come at the country, state, or county level.

  2. Your data has a lot of noise that can be smoothed out by aggregating to a larger scale. A common example of this might be household survey data that is noisy at the individual level but smooths out when aggregated to the village or district level.

  3. You want to do global analysis and don’t have the computational resources to work with the full 0.01 degree grid cells.

In all cases using the pre-aggregated features can save you time and computational resources.

Scenario: You are working with a Living Standards Measurement Study - Integrated Surveys on Agriculture (LSMS-ISA). This dataset has survey data with geographic coordinates at the household level. To protect, the privacy of the respondents, the data is jittered within a 5 km radius but it always remains within local administrative boundaries. You can therefore summarize your labels to the administrative units and build a model with the aggregated features.

If you want to make high resolution predictions, with low resolution label data, you can build your model with aggregated features and use the high resolution features to make predictions. This will be covered in Chapter 17.

3.3 Using MOSAIKS features for prediction

This is a brief overview. Detailed instructions appear later in the manual (Chapter 16).

Basic workflow:

  1. Obtain ground truth measurements (“labels”; see Chapter 5)
  2. Download matching features (see Chapter 13 for more details).
  3. Spatially merge labels and features (see Chapter 7)
  4. Use regression to model relationship between imagery and outcome (see Chapter 16)
  5. Use regression results to predict outcome in new locations (see Chapter 16)

You can experiment with various machine learning approaches in the regression step. For beginners, we recommend starting with our example Jupyter notebook (Chapter 4) that demonstrates a simple ridge regression approach (suitable for both R and Python users).

Figure 3.8: Using MOSAIKS, a simplified workflow design.

This topic will be covered in greater depth in later chapters (see Chapter 16). In the next chapter, you will see a simple MOSAIKS workflow which replicates the results of Rolf et al. 2021.

3.4 Citation requirements

When referring to the MOSAIKS methodology or when generating MOSAIKS features, please reference: Rolf et al. “A generalizable and accessible approach to machine learning with global satellite imagery.” Nature Communications (2021).

You can use the following Bibtex:

@article{article,
    author = {Rolf, Esther and Proctor, Jonathan and Carleton, Tamma and Bolliger, Ian and Shankar, Vaishaal and Ishihara, Miyabi and Recht, Benjamin and Hsiang, Solomon},
    year = {2021},
    month = {07},
    pages = {},
    title = {A generalizable and accessible approach to machine learning with global satellite imagery},
    volume = {12},
    journal = {Nature Communications},
    doi = {10.1038/s41467-021-24638-z}
}

If using features downloaded from the API, please reference, in addition to the publication above, the MOSAIKS API.

You can cite the API using the following Bibtex:

 @misc{MOSAIKS API,
    author = {{Carleton, Tamma and Chong, Trinetta and Druckenmiller, Hannah and Noda, Eugenio and Proctor, Jonathan and Rolf, Esther and Hsiang, Solomon}},
    title = {{Multi-Task Observation Using Satellite Imagery and Kitchen Sinks (MOSAIKS) API}},
    howpublished = "\url{ https://api.mosaiks.org }",
    version = {1.0},
    year = {2022},
}
Looking forward

In the next chapter you will have a chance to try MOSAIKS on Google Colab with the data from Rolf et al 2021.