14  Computing features

This chapter is in early draft form and may be incomplete.

14.1 Overview

While the MOSAIKS API provides pre-computed features for many applications, some use cases require computing custom features. This chapter covers the technical details of generating MOSAIKS features from satellite imagery.

14.2 Requirements

To compute MOSAIKS features, you’ll need:

  • Satellite imagery (see Chapter 10)
  • GPU-enabled computing environment (recommended)
  • Python with deep learning libraries (pytorch recommended)
  • Sufficient storage for features

14.3 Implementation

There are several ways to implement MOSAIKS feature extraction:

14.3.1 torchgeo implementation

The torchgeo library provides a PyTorch implementation of random convolutional features:

import torch
from torchgeo.models import RCF

# Define model parameters
patch_size = 3  # Size of random patches
in_channels = 4  # Number of input image channels
num_filters = 4000  # Number of features to generate

# When empirical, supply a custom pytorch dataset class that returns 
# a dictionary with 'image' key. This samples the dataset for model 
# weights. If gaussian do not supply a dataset class.

# Initialize RCF model
model = RCF(
    in_channels=in_channels, 
    features=num_filters, 
    kernel_size=3, 
    bias=-1.0, 
    seed=42, 
    mode='empirical',
    dataset=CustomDataset,
)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

14.4 Feature parameters

Several key parameters influence feature extraction:

14.4.1 Number of features (K)

  • Controls feature vector dimensionality
  • More features capture more information
  • Increases computation and storage needs
  • Typical range: 1,000-8,192
  • Diminishing returns above ~4,000

14.4.2 Patch size

  • Determines spatial context captured
  • Larger patches see more context
  • But increase computation
  • Typical size: 3x3 or 5x5 pixels
  • Match to imagery resolution

14.4.3 Input channels

  • Depends on available spectral bands
  • RGB = 3 channels
  • Can use additional bands
  • More bands = richer spectral info
  • But increases computation

14.5 Practical considerations

14.5.1 Memory management

When processing large imagery datasets:

14.5.2 Storage formats

Efficient formats for large feature matrices:

14.5.3 Parallel processing

For large-scale feature extraction:

14.6 Quality control

Important checks during feature extraction:

  1. Input validation
    • Image dimensions
    • Value ranges
    • Missing data
    • Band ordering
  2. Feature statistics
    • Distribution checks
    • Zero/missing values
    • Correlation analysis
    • Feature importance
  3. Performance monitoring
    • Memory usage
    • Processing speed
    • GPU utilization
    • Storage efficiency

14.7 Best practices

  1. Documentation
    • Record all parameters
    • Track data sources
    • Document processing steps
    • Note any issues
  2. Testing
    • Unit tests for functions
    • Integration tests
    • Performance benchmarks
    • Validation checks
  3. Version control
    • Code versioning
    • Feature versioning
    • Parameter tracking
    • Result logging
Looking forward

In the next chapter, we’ll work through a complete example of computing custom MOSAIKS features.