1  Compute setup

1.1 Overview

This course primarily uses Google Colaboratory (Colab) for our computational needs. Colab is a free, cloud-based platform that allows you to write and execute Python code through your browser. It comes with many pre-installed libraries and provides free access to computing resources, including GPUs.


1.2 Requirements

To participate in the coding portions of this course, you’ll need:

  • A laptop or desktop computer
  • A reliable internet connection
  • A Google account (if you don’t have one, create one at accounts.google.com)
  • A web browser (Chromium based browsers recommended)

1.3 Getting started with Google Colab

1.3.1 Accessing Colab

  1. Go to colab.research.google.com
  2. Sign in with your Google account
  3. Click “New Notebook” to create your first Colab notebook

1.3.2 Understanding the interface

The Colab interface is similar to Jupyter notebooks, with a few key components:

  • Menu Bar: Contains options for File, Edit, View, Insert, Runtime, Tools, and Help.
  • Toolbar: Quick access to common actions like adding code/text cells.
  • Cell Area: Where you write and execute code or text.
  • Runtime Status: Shows the state of your notebook’s connection to Google’s servers.

1.3.3 Basic operations

  1. Creating Cells:
    • Code cells: Click + Code. Supports Python or R code depending on the selected runtime
    • Text cells: Click + Text. Supports Markdown and HTML tags for documentation
  2. Running Cells:
    • Click the play button next to the cell or use Shift+Enter
    • Can also select Runtime > Run the focused cell (or another Run option) from the menu

1.3.4 Important features

  1. Runtime Type:
    • Click Runtime > Change runtime type
    • Select Python 3 as the runtime
    • For GPU access: Change the hardware accelerator to one of the offered GPU types when needed
  2. File Management:
    • Files uploaded to Colab are temporary and will be lost when the runtime disconnects

    • Connect to Google Drive and save outputs there for persistent storage:

      from google.colab import drive
      drive.mount('/content/drive')
  3. Package Installation:

Install additional packages using:

# Warning: using "!conda install" is not recommended. 
# As a general rule use the magic command "%conda install" instead.
%conda install <package_name>
# Warning: using "!pip install" is not recommended. 
# As a general rule use the magic command "%pip install" instead.
%pip install <package_name>

1.3.5 Best practices

  1. Save Your Work:
    • The links in this book will make a fresh copy of a notebook as they are saved on GitHub.
    • To save any changes you make, click File > Save a copy in Drive
    • Download important notebooks locally as backups
  2. Resource Management:
    • Close unused notebooks to free up resources
    • Be aware of idle timeouts (notebooks disconnect after extended inactivity)
  3. Memory Usage:
    • Monitor memory usage through Runtime > View resources
    • The free tier of Colab provides very limited memory (12GB) and may not be sufficient for large datasets or complex models

1.3.6 Keyboard shortcuts

Here are some useful keyboard shortcuts for working in Colab:

Shortcut Action
Ctrl+M H View keyboard shortcuts
Ctrl+Enter Run current cell
Shift+Enter Run cell and move to next
Alt+Enter Run cell and insert below
Ctrl+M A Insert code cell above
Ctrl+M B Insert code cell below
Ctrl+M M Convert to text cell
Ctrl+M Y Convert to code cell
Ctrl+M D Delete current cell
Ctrl+M L Toggle line numbers
Ctrl+M O Toggle output
Ctrl+M X Cut cell
Ctrl+M C Copy cell
Ctrl+M V Paste cell below
Shift+Up/Down Select multiple cells
Ctrl+F Find and replace
Ctrl+S Save notebook
Table 1.1: Windows/Linux keyboard shortcuts
Shortcut Action
⌘+M H View keyboard shortcuts
⌘+Enter Run current cell
Shift+Enter Run cell and move to next
Option+Enter Run cell and insert below
⌘+M A Insert code cell above
⌘+M B Insert code cell below
⌘+M M Convert to text cell
⌘+M Y Convert to code cell
⌘+M D Delete current cell
⌘+M L Toggle line numbers
⌘+M O Toggle output
⌘+M X Cut cell
⌘+M C Copy cell
⌘+M V Paste cell below
Shift+Up/Down Select multiple cells
⌘+F Find and replace
⌘+S Save notebook
Table 1.2: Mac keyboard shortcuts

1.3.7 Common issues and solutions

  1. Runtime Disconnections:
    • Click “Reconnect” when prompted
    • Your variables will be reset, but saved code remains
  2. Package Installation Issues:
    • Restart the runtime after installing new packages
    • Use Runtime > Restart runtime
  3. Memory Errors:
    • Clear unnecessary variables as you go
    • Consider using smaller data samples during development

Memory errors are common when working with large datasets or complex models on the free tier of Colab. If you encounter these issues, consider using a paid version of Colab or connecting a Google Cloud Platform virtual machine (VM).

1.3.8 Getting help

  • Access Colab’s documentation: Help > Frequently Asked Questions
  • Try using Google Gemini for AI assistance.

1.4 AI assistance in Colab

Google Gemini is a powerful AI assistant seamlessly integrated with Google Colab. You can use it to generate code, comments, or markdown text to improve your notebooks. Gemini can be accessed in several ways in Colab, all starting by selecting the Gemini icon in different parts of the notebook editor.

Gemini icon

Look for this icon to indicate where you can click to access Gemini in Colab.

Here are a few ways you can use Google Gemini effectively in Colab:

1.4.1 Chat support

Click the Gemini button in the top-right corner to open a chat interface where you can ask questions about your code, debug issues, or get explanations of concepts. This option is especially useful for beginners or for tackling complex problems.

1.4.2 Code generation

Use the “Generate code” option (the sparkle icon) above any empty code cell to generate new code based on your description. You can ask it to do many different things including:

  • Loading a dataset called my_data.csv
  • Plotting a histogram of the data
  • Building a model to predict y from X

1.4.3 Code explanation

Use the “Explain code” option (the sparkle icon) above any complete code cell to open a chat interface that will automatically explain the code in the cell. This is useful for understanding code written by someone else, learning new concepts, or getting a second opinion on your work.

1.4.4 Code completion

Colab provides intelligent autocomplete as you type:

  • Press Tab to accept suggestions
  • Use Ctrl+Space (Cmd+Space on Mac) to manually trigger suggestions
  • Get real-time documentation and parameter hints

While these AI tools are helpful, always review and understand the code they suggest before using it in your work.


1.5 Accessing course notebooks

All course notebooks are hosted on GitHub and can be accessed directly in Google Colab. There are two ways to open them:

1.5.2 Method 2: Clone the notebook

To select a notebook from the repository Notebook repository:

  1. Open Google Colab (colab.research.google.com)
  2. Click File > Open Notebook
  3. Select the GitHub tab
  4. Enter the repository URL: https://github.com/[username]/[repo] (UPDATE WITH REPO)
  5. Select the notebook you want to open

1.5.3 Saving your work

When you open a notebook from GitHub in Colab, it creates a temporary copy. To save your work:

  1. Click File > Save a copy in Drive
  2. This creates your own editable copy in your Google Drive
  3. All future changes will be saved to your copy

1.5.4 Notebook organization

The course notebooks are organized into:

  • demos/: Complete demonstration notebooks
  • exercises/: Interactive notebooks with exercises to complete
  • solutions/: Complete versions of exercise notebooks

Each notebook includes:

  • Clear instructions and explanations in markdown cells
  • Code cells with examples or exercises
  • TO DO sections for exercises
  • Validation cells to check your work

1.6 Data access and management

There are several ways to access data in Colab notebooks. Here are the main approaches:

1.6.1 Direct downloads

For data hosted on repositories like Zenodo, you can download directly using wget:

# Download the data
!wget https://zenodo.org/records/14040658/files/Data.zip

# Unzip the data
!unzip Data.zip

1.6.2 Google Drive integration

1.6.2.1 Mount Google Drive

For data stored in Google Drive:

  1. First, mount your Google Drive:

    from google.colab import drive
    drive.mount('/content/drive')
  2. Access your data using the mounted path:

    drive_path = "/content/drive/MyDrive/<project_folder>"

1.6.2.2 Copy data to the VM (optional)

For better performance, make local copies of the data on the virtual machine (VM):

import os
import shutil

# Create local directory
local_dir = "/content/data/"
os.makedirs(local_dir, exist_ok=True)

# Copy data from Drive to VM
drive_data = os.path.join(drive_path, "my_data") 
shutil.copytree(drive_data, local_dir, dirs_exist_ok=True)

Remember that the VM’s storage is temporary - files will be deleted when the runtime disconnects. Always keep a backup of your data in Drive or another permanent storage location.

1.6.2.2.1 Why copy data to the VM?

When working with data in Colab, copying files from Google Drive to the virtual machine (VM) can significantly improve performance:

  1. Faster Access: Reading directly from Google Drive requires data to be transferred over the network for each operation. Local VM storage provides much faster read/write speeds.
  2. Reduced Latency: Network latency between Colab and Google Drive can slow down operations that require multiple data accesses. Local data eliminates this latency.
  3. More Reliable: Network connectivity issues or Drive access problems won’t interrupt your analysis once data is copied locally.
  4. Better for Iterative Processing: If your code needs to read the same data multiple times (like in machine learning training loops), local access is much more efficient.

For example, reading a 1 GB dataset from Drive might take 30 seconds, while reading from local VM storage could take just a few seconds. The time spent copying data once at the start of your session can save significant time during analysis. This is especially true in a notebook environment where a user may develop code that repeatedly accesses the same data files, but cannot store it all in memory (e.g., many image files).

1.6.2.3 Save outputs to Google Drive

To save outputs or models to Google Drive:

# Set the output directory
output_dir = "/content/drive/MyDrive/project_folder/output"

# Save outputs
shutil.copytree(local_output, output_dir, dirs_exist_ok=True)

This ensures that any work done in the notebook is saved to your Google Drive for future reference. If output files are not copied and remain in the VM, they will be lost when the runtime disconnects.


1.7 Local environment setup

While this book’s primary approach is to use Google Colab, some learners may prefer or need to run code locally. The book is largely setup to do this, though the user will need to manage their own computing environment. For that purpose, we provide an environment.yml file (located in the environment directory of this book). Below are the steps to get you set up with Miniconda and create a local environment.

Though local environments can offer more control, we strongly recommend Google Colab for consistency and free cloud-based resources. This local setup is purely optional and might be more suitable for those with particular dependencies or advanced setups.

1.7.1 Downloading and installing Miniconda

Miniconda is a minimal installer for conda. Choose the installer for your operating system from the links below and follow the prompts.

  1. Go to the Miniconda Windows Installer.
  2. Download the .exe installer for your Windows system (64-bit recommended).
  3. Double-click the installer and follow the on-screen instructions.
  4. When prompted, check the option to Add Miniconda to PATH or select “Install for All Users” which typically adds conda to PATH automatically.
  1. Go to the Miniconda macOS Installer.
  2. Download the .pkg (or .sh if you prefer) installer for macOS (64-bit).
  3. Double-click the installer and follow the on-screen instructions.
  4. When prompted, check the option to Add Miniconda to PATH or add the appropriate path lines to your ~/.zshrc or ~/.bash_profile file manually.
  1. Go to the Miniconda Linux Installer.
  2. Download the .sh installer for your Linux distribution (64-bit recommended).
  3. Open a terminal and run bash Miniconda3-latest-Linux-x86_64.sh.
  4. Follow the prompts; consider allowing the installer to initialize Miniconda for your shell (adding conda to your PATH).

1.7.2 Adding conda to your PATH

If you did not add conda to your PATH during installation, you can manually do so by adding a line to your shell configuration file (~/.bashrc, ~/.zshrc, or similar):

# Example for Linux/macOS users
export PATH="$HOME/miniconda3/bin:$PATH"

For Windows, ensure that you selected the option to add conda to PATH during installation, or run the Anaconda Prompt (which automatically has conda available) to manage your environment.

1.7.3 Creating a local environment from environment.yml

In the environment directory of the course repository, you will find a file named environment.yml. This file lists all the packages needed for the local setup.

  1. Clone or download the book repository to your local machine.

  2. Open a terminal (or Anaconda Prompt on Windows).

  3. Navigate to the folder containing environment.yml.

    cd path/to/MOSAIKS-Training-Manual/environment
  4. Create the environment:

    conda env create -f environment.yml
  5. Activate the environment:

    conda activate <environment_name>

    Where <environment_name> is the name specified in environment.yml (check the name: field in the file). In this case the name is mosaiks.

1.7.4 Using the new environment in VS Code

Visual Studio Code (VS Code) can detect and use your new conda environment for Python development.

  1. Open VS Code.
  2. Install the Python extension (if not already installed).
  3. Press Ctrl+Shift+P (or Cmd+Shift+P on macOS) and type “Python: Select Interpreter”.
  4. Select the interpreter associated with your newly created environment (it should be listed by name or path).
  5. Open or create a new Python file or notebook, and verify that VS Code is using the correct environment (you can see the chosen environment in the bottom-right corner of VS Code).

1.7.5 Other environment managers

While conda is a common tool for managing Python environments, there are other popular options such as:

Each has its own configuration files and setup instructions. If you prefer these tools or already use them, you can typically replicate the packages listed in environment.yml. Check the respective tool’s documentation for specific instructions on how to translate the dependencies.


Looking forward

In the next chapter, we will take a closer look at the MOSAIKS framework, its core concepts, and how it can be applied to solve real-world problems.