14 Computing features

This chapter is in early draft form and may be incomplete.

14.1 Overview

While the MOSAIKS API provides pre-computed features for many applications, some use cases require computing custom features. This chapter covers the technical details of generating MOSAIKS features from satellite imagery.

14.2 Requirements

To compute MOSAIKS features, you’ll need:

Satellite imagery (see Chapter 10)
GPU-enabled computing environment (recommended)
Python with deep learning libraries (pytorch recommended)
Sufficient storage for features

14.3 Implementation

There are several ways to implement MOSAIKS feature extraction:

14.3.1 `torchgeo` implementation

The torchgeo library provides a PyTorch implementation of random convolutional features:

import torch
from torchgeo.models import RCF

# Define model parameters
patch_size = 3  # Size of random patches
in_channels = 4  # Number of input image channels
num_filters = 4000  # Number of features to generate

# When empirical, supply a custom pytorch dataset class that returns 
# a dictionary with 'image' key. This samples the dataset for model 
# weights. If gaussian do not supply a dataset class.

# Initialize RCF model
model = RCF(
    in_channels=in_channels, 
    features=num_filters, 
    kernel_size=3, 
    bias=-1.0, 
    seed=42, 
    mode='empirical',
    dataset=CustomDataset,
)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

14.4 Feature parameters

Several key parameters influence feature extraction:

14.4.1 Number of features (K)

Controls feature vector dimensionality
More features capture more information
Increases computation and storage needs
Typical range: 1,000-8,192
Diminishing returns above ~4,000

14.4.2 Patch size

Determines spatial context captured
Larger patches see more context
But increase computation
Typical size: 3x3 or 5x5 pixels
Match to imagery resolution

14.4.3 Input channels

Depends on available spectral bands
RGB = 3 channels
Can use additional bands
More bands = richer spectral info
But increases computation

14.5 Practical considerations

14.5.1 Memory management

When processing large imagery datasets:

14.5.2 Storage formats

Efficient formats for large feature matrices:

14.5.3 Parallel processing

For large-scale feature extraction:

14.6 Quality control

Important checks during feature extraction:

Input validation
- Image dimensions
- Value ranges
- Missing data
- Band ordering
Feature statistics
- Distribution checks
- Zero/missing values
- Correlation analysis
- Feature importance
Performance monitoring
- Memory usage
- Processing speed
- GPU utilization
- Storage efficiency

14.7 Best practices

Documentation
- Record all parameters
- Track data sources
- Document processing steps
- Note any issues
Testing
- Unit tests for functions
- Integration tests
- Performance benchmarks
- Validation checks
Version control
- Code versioning
- Feature versioning
- Parameter tracking
- Result logging

Looking forward

In the next chapter, we’ll work through a complete example of computing custom MOSAIKS features.

14.1 Overview

14.2 Requirements

14.3 Implementation

14.3.1 torchgeo implementation

14.4 Feature parameters

14.4.1 Number of features (K)

14.4.2 Patch size

14.4.3 Input channels

14.5 Practical considerations

14.5.1 Memory management

14.5.2 Storage formats

14.5.3 Parallel processing

14.6 Quality control

14.7 Best practices

14.3.1 `torchgeo` implementation