Label data

This chapter is under review and may need revisions.

Overview

This section explores the ground truth data (labels) that can be used to train a predictive model with MOSAIKS. While the system is designed to be flexible with respect to the types of outcomes it can predict, understanding what makes good label data and how to prepare it properly is crucial for success.

Label data represents the “truth” that MOSAIKS attempts to predict - whether that’s crop yields, population density, economic indicators, or any other variable that might be visible (directly or indirectly) in satellite imagery. The quality and characteristics of this label data significantly influence model performance.

What makes good label data?

For optimal performance with MOSAIKS, label data should have several key characteristics:

Accurate geographic location information
Appropriate spatial resolution (typically ≥1km²)
Reasonable temporal alignment with imagery features
Sufficient sample size (generally ≥300 observations)
Observable connection to surface features

Section outline

The following chapters will guide you through key considerations for working with label data in MOSAIKS:

Chapter	Key Topics
5 What labels work?	Example applications, performance analysis, validation
6 Survey data	Survey integration, sampling design, geographic referencing
7 Preparing labels	Data cleaning, spatial joining, quality control
8 Label data demo	Hands-on example, practical workflow, troubleshooting

Table 1: Outline of the label data section

These chapters provide both practical guidance for preparing your own label data and deeper understanding of what types of outcomes MOSAIKS can effectively predict.

Looking forward

In the next chapter, we’ll explore over 100 different outcomes that have been tested with MOSAIKS, examining what works well and what doesn’t.