14 Computing features
This chapter is in early draft form and may be incomplete.
14.1 Overview
While the MOSAIKS API provides pre-computed features for many applications, some use cases require computing custom features. This chapter covers the technical details of generating MOSAIKS features from satellite imagery.
14.2 Requirements
To compute MOSAIKS features, you’ll need:
- Satellite imagery (see Chapter 10)
- GPU-enabled computing environment (recommended)
- Python with deep learning libraries (pytorch recommended)
- Sufficient storage for features
14.3 Implementation
There are several ways to implement MOSAIKS feature extraction:
14.3.1 torchgeo
implementation
The torchgeo library provides a PyTorch implementation of random convolutional features:
import torch
from torchgeo.models import RCF
# Define model parameters
= 3 # Size of random patches
patch_size = 4 # Number of input image channels
in_channels = 4000 # Number of features to generate
num_filters
# When empirical, supply a custom pytorch dataset class that returns
# a dictionary with 'image' key. This samples the dataset for model
# weights. If gaussian do not supply a dataset class.
# Initialize RCF model
= RCF(
model =in_channels,
in_channels=num_filters,
features=3,
kernel_size=-1.0,
bias=42,
seed='empirical',
mode=CustomDataset,
dataset
)
# Move model to GPU if available
= torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = model.to(device) model
14.4 Feature parameters
Several key parameters influence feature extraction:
14.4.1 Number of features (K)
- Controls feature vector dimensionality
- More features capture more information
- Increases computation and storage needs
- Typical range: 1,000-8,192
- Diminishing returns above ~4,000
14.4.2 Patch size
- Determines spatial context captured
- Larger patches see more context
- But increase computation
- Typical size: 3x3 or 5x5 pixels
- Match to imagery resolution
14.4.3 Input channels
- Depends on available spectral bands
- RGB = 3 channels
- Can use additional bands
- More bands = richer spectral info
- But increases computation
14.5 Practical considerations
14.5.1 Memory management
When processing large imagery datasets:
14.5.2 Storage formats
Efficient formats for large feature matrices:
14.5.3 Parallel processing
For large-scale feature extraction:
14.6 Quality control
Important checks during feature extraction:
-
Input validation
- Image dimensions
- Value ranges
- Missing data
- Band ordering
-
Feature statistics
- Distribution checks
- Zero/missing values
- Correlation analysis
- Feature importance
-
Performance monitoring
- Memory usage
- Processing speed
- GPU utilization
- Storage efficiency
14.7 Best practices
-
Documentation
- Record all parameters
- Track data sources
- Document processing steps
- Note any issues
-
Testing
- Unit tests for functions
- Integration tests
- Performance benchmarks
- Validation checks
-
Version control
- Code versioning
- Feature versioning
- Parameter tracking
- Result logging
In the next chapter, we’ll work through a complete example of computing custom MOSAIKS features.