Label data
This chapter is under review and may need revisions.
Overview
This section explores the ground truth data (labels) that can be used to train a predictive model with MOSAIKS. While the system is designed to be flexible with respect to the types of outcomes it can predict, understanding what makes good label data and how to prepare it properly is crucial for success.
Label data represents the “truth” that MOSAIKS attempts to predict - whether that’s crop yields, population density, economic indicators, or any other variable that might be visible (directly or indirectly) in satellite imagery. The quality and characteristics of this label data significantly influence model performance.
What makes good label data?
For optimal performance with MOSAIKS, label data should have several key characteristics:
- Accurate geographic location information
- Appropriate spatial resolution (typically ≥1km²)
- Reasonable temporal alignment with imagery features
- Sufficient sample size (generally ≥300 observations)
- Observable connection to surface features
Section outline
The following chapters will guide you through key considerations for working with label data in MOSAIKS:
Chapter | Key Topics |
---|---|
5 What labels work? | Example applications, performance analysis, validation |
6 Survey data | Survey integration, sampling design, geographic referencing |
7 Preparing labels | Data cleaning, spatial joining, quality control |
8 Label data demo | Hands-on example, practical workflow, troubleshooting |
These chapters provide both practical guidance for preparing your own label data and deeper understanding of what types of outcomes MOSAIKS can effectively predict.
In the next chapter, we’ll explore over 100 different outcomes that have been tested with MOSAIKS, examining what works well and what doesn’t.