1 Compute setup
1.1 Overview
This course primarily uses Google Colaboratory (Colab) for our computational needs. Colab is a free, cloud-based platform that allows you to write and execute Python code through your browser. It comes with many pre-installed libraries and provides free access to computing resources, including GPUs.
1.2 Requirements
To participate in the coding portions of this course, you’ll need:
- A laptop or desktop computer
- A reliable internet connection
- A Google account (if you don’t have one, create one at accounts.google.com)
- A web browser (Chromium based browsers recommended)
1.3 Getting started with Google Colab
1.3.1 Accessing Colab
- Go to colab.research.google.com
- Sign in with your Google account
- Click “New Notebook” to create your first Colab notebook
1.3.2 Understanding the interface
The Colab interface is similar to Jupyter notebooks, with a few key components:
- Menu Bar: Contains options for
File
,Edit
,View
,Insert
,Runtime
,Tools
, andHelp
. - Toolbar: Quick access to common actions like adding code/text cells.
- Cell Area: Where you write and execute code or text.
- Runtime Status: Shows the state of your notebook’s connection to Google’s servers.
1.3.3 Basic operations
- Creating Cells:
- Code cells: Click
+ Code
. Supports Python or R code depending on the selected runtime - Text cells: Click
+ Text
. Supports Markdown and HTML tags for documentation
- Code cells: Click
- Running Cells:
- Click the play button next to the cell or use
Shift
+Enter
- Can also select
Runtime
>Run the focused cell
(or anotherRun
option) from the menu
- Click the play button next to the cell or use
1.3.4 Important features
- Runtime Type:
- Click
Runtime
>Change runtime type
- Select
Python 3
as the runtime - For GPU access: Change the hardware accelerator to one of the offered
GPU
types when needed
- Click
- File Management:
Files uploaded to Colab are temporary and will be lost when the runtime disconnects
Connect to Google Drive and save outputs there for persistent storage:
from google.colab import drive '/content/drive') drive.mount(
- Package Installation:
Install additional packages using:
# Warning: using "!conda install" is not recommended.
# As a general rule use the magic command "%conda install" instead.
%conda install <package_name>
# Warning: using "!pip install" is not recommended.
# As a general rule use the magic command "%pip install" instead.
%pip install <package_name>
1.3.5 Best practices
- Save Your Work:
- The links in this book will make a fresh copy of a notebook as they are saved on GitHub.
- To save any changes you make, click
File
>Save a copy in Drive
- Download important notebooks locally as backups
- Resource Management:
- Close unused notebooks to free up resources
- Be aware of idle timeouts (notebooks disconnect after extended inactivity)
- Memory Usage:
- Monitor memory usage through
Runtime
>View resources
- The free tier of Colab provides very limited memory (12GB) and may not be sufficient for large datasets or complex models
- Monitor memory usage through
1.3.6 Keyboard shortcuts
Here are some useful keyboard shortcuts for working in Colab:
Shortcut | Action |
---|---|
Ctrl+M H | View keyboard shortcuts |
Ctrl+Enter | Run current cell |
Shift+Enter | Run cell and move to next |
Alt+Enter | Run cell and insert below |
Ctrl+M A | Insert code cell above |
Ctrl+M B | Insert code cell below |
Ctrl+M M | Convert to text cell |
Ctrl+M Y | Convert to code cell |
Ctrl+M D | Delete current cell |
Ctrl+M L | Toggle line numbers |
Ctrl+M O | Toggle output |
Ctrl+M X | Cut cell |
Ctrl+M C | Copy cell |
Ctrl+M V | Paste cell below |
Shift+Up/Down | Select multiple cells |
Ctrl+F | Find and replace |
Ctrl+S | Save notebook |
Shortcut | Action |
---|---|
⌘+M H | View keyboard shortcuts |
⌘+Enter | Run current cell |
Shift+Enter | Run cell and move to next |
Option+Enter | Run cell and insert below |
⌘+M A | Insert code cell above |
⌘+M B | Insert code cell below |
⌘+M M | Convert to text cell |
⌘+M Y | Convert to code cell |
⌘+M D | Delete current cell |
⌘+M L | Toggle line numbers |
⌘+M O | Toggle output |
⌘+M X | Cut cell |
⌘+M C | Copy cell |
⌘+M V | Paste cell below |
Shift+Up/Down | Select multiple cells |
⌘+F | Find and replace |
⌘+S | Save notebook |
1.3.7 Common issues and solutions
- Runtime Disconnections:
- Click “Reconnect” when prompted
- Your variables will be reset, but saved code remains
- Package Installation Issues:
- Restart the runtime after installing new packages
- Use
Runtime
>Restart runtime
- Memory Errors:
- Clear unnecessary variables as you go
- Consider using smaller data samples during development
Memory errors are common when working with large datasets or complex models on the free tier of Colab. If you encounter these issues, consider using a paid version of Colab or connecting a Google Cloud Platform virtual machine (VM).
1.3.8 Getting help
- Access Colab’s documentation:
Help
>Frequently Asked Questions
- Try using Google Gemini for AI assistance.
1.4 AI assistance in Colab
Google Gemini is a powerful AI assistant seamlessly integrated with Google Colab. You can use it to generate code, comments, or markdown text to improve your notebooks. Gemini can be accessed in several ways in Colab, all starting by selecting the Gemini icon in different parts of the notebook editor.
Look for this icon to indicate where you can click to access Gemini in Colab.
Here are a few ways you can use Google Gemini effectively in Colab:
1.4.1 Chat support
Click the Gemini button in the top-right corner to open a chat interface where you can ask questions about your code, debug issues, or get explanations of concepts. This option is especially useful for beginners or for tackling complex problems.
1.4.2 Code generation
Use the “Generate code” option (the sparkle icon) above any empty code cell to generate new code based on your description. You can ask it to do many different things including:
- Loading a dataset called
my_data.csv
- Plotting a histogram of the data
- Building a model to predict
y
fromX
1.4.3 Code explanation
Use the “Explain code” option (the sparkle icon) above any complete code cell to open a chat interface that will automatically explain the code in the cell. This is useful for understanding code written by someone else, learning new concepts, or getting a second opinion on your work.
1.4.4 Code completion
Colab provides intelligent autocomplete as you type:
- Press Tab to accept suggestions
- Use Ctrl+Space (Cmd+Space on Mac) to manually trigger suggestions
- Get real-time documentation and parameter hints
While these AI tools are helpful, always review and understand the code they suggest before using it in your work.
1.5 Accessing course notebooks
All course notebooks are hosted on GitHub and can be accessed directly in Google Colab. There are two ways to open them:
1.5.1 Method 1: Direct links
Each section of this book includes direct “Open in Colab” links for relevant notebooks. Simply click the badge to open the notebook:
This method will open a fresh copy of the notebook as it is saved on GitHub. If you have already clicked the badge once, made changes, and saved your notebook, then you will need to navigate to your drive folder where it is saved to access those changes.
Clicking the badge in this book will always open a fresh copy.
1.5.2 Method 2: Clone the notebook
To select a notebook from the repository Notebook repository:
- Open Google Colab (colab.research.google.com)
- Click
File
>Open Notebook
- Select the GitHub tab
- Enter the repository URL:
https://github.com/[username]/[repo]
(UPDATE WITH REPO) - Select the notebook you want to open
1.5.3 Saving your work
When you open a notebook from GitHub in Colab, it creates a temporary copy. To save your work:
- Click
File
>Save a copy in Drive
- This creates your own editable copy in your Google Drive
- All future changes will be saved to your copy
1.5.4 Notebook organization
The course notebooks are organized into:
demos/
: Complete demonstration notebooksexercises/
: Interactive notebooks with exercises to completesolutions/
: Complete versions of exercise notebooks
Each notebook includes:
- Clear instructions and explanations in markdown cells
- Code cells with examples or exercises
- TO DO sections for exercises
- Validation cells to check your work
1.6 Data access and management
There are several ways to access data in Colab notebooks. Here are the main approaches:
1.6.1 Direct downloads
For data hosted on repositories like Zenodo, you can download directly using wget
:
# Download the data
!wget https://zenodo.org/records/14040658/files/Data.zip
# Unzip the data
!unzip Data.zip
1.6.2 Google Drive integration
1.6.2.1 Mount Google Drive
For data stored in Google Drive:
First, mount your Google Drive:
from google.colab import drive '/content/drive') drive.mount(
Access your data using the mounted path:
= "/content/drive/MyDrive/<project_folder>" drive_path
1.6.2.2 Copy data to the VM (optional)
For better performance, make local copies of the data on the virtual machine (VM):
import os
import shutil
# Create local directory
= "/content/data/"
local_dir =True)
os.makedirs(local_dir, exist_ok
# Copy data from Drive to VM
= os.path.join(drive_path, "my_data")
drive_data =True) shutil.copytree(drive_data, local_dir, dirs_exist_ok
Remember that the VM’s storage is temporary - files will be deleted when the runtime disconnects. Always keep a backup of your data in Drive or another permanent storage location.
1.6.2.2.1 Why copy data to the VM?
When working with data in Colab, copying files from Google Drive to the virtual machine (VM) can significantly improve performance:
- Faster Access: Reading directly from Google Drive requires data to be transferred over the network for each operation. Local VM storage provides much faster read/write speeds.
- Reduced Latency: Network latency between Colab and Google Drive can slow down operations that require multiple data accesses. Local data eliminates this latency.
- More Reliable: Network connectivity issues or Drive access problems won’t interrupt your analysis once data is copied locally.
- Better for Iterative Processing: If your code needs to read the same data multiple times (like in machine learning training loops), local access is much more efficient.
For example, reading a 1 GB dataset from Drive might take 30 seconds, while reading from local VM storage could take just a few seconds. The time spent copying data once at the start of your session can save significant time during analysis. This is especially true in a notebook environment where a user may develop code that repeatedly accesses the same data files, but cannot store it all in memory (e.g., many image files).
1.6.2.3 Save outputs to Google Drive
To save outputs or models to Google Drive:
# Set the output directory
= "/content/drive/MyDrive/project_folder/output"
output_dir
# Save outputs
=True) shutil.copytree(local_output, output_dir, dirs_exist_ok
This ensures that any work done in the notebook is saved to your Google Drive for future reference. If output files are not copied and remain in the VM, they will be lost when the runtime disconnects.
1.7 Local environment setup
While this book’s primary approach is to use Google Colab, some learners may prefer or need to run code locally. The book is largely setup to do this, though the user will need to manage their own computing environment. For that purpose, we provide an environment.yml
file (located in the environment
directory of this book). Below are the steps to get you set up with Miniconda and create a local environment.
Though local environments can offer more control, we strongly recommend Google Colab for consistency and free cloud-based resources. This local setup is purely optional and might be more suitable for those with particular dependencies or advanced setups.
1.7.1 Downloading and installing Miniconda
Miniconda is a minimal installer for conda. Choose the installer for your operating system from the links below and follow the prompts.
- Go to the Miniconda Windows Installer.
- Download the
.exe
installer for your Windows system (64-bit recommended). - Double-click the installer and follow the on-screen instructions.
- When prompted, check the option to Add Miniconda to PATH or select “Install for All Users” which typically adds conda to PATH automatically.
- Go to the Miniconda macOS Installer.
- Download the
.pkg
(or.sh
if you prefer) installer for macOS (64-bit). - Double-click the installer and follow the on-screen instructions.
- When prompted, check the option to Add Miniconda to PATH or add the appropriate path lines to your
~/.zshrc
or~/.bash_profile
file manually.
- Go to the Miniconda Linux Installer.
- Download the
.sh
installer for your Linux distribution (64-bit recommended). - Open a terminal and run
bash Miniconda3-latest-Linux-x86_64.sh
. - Follow the prompts; consider allowing the installer to initialize Miniconda for your shell (adding conda to your PATH).
1.7.2 Adding conda to your PATH
If you did not add conda to your PATH during installation, you can manually do so by adding a line to your shell configuration file (~/.bashrc
, ~/.zshrc
, or similar):
# Example for Linux/macOS users
export PATH="$HOME/miniconda3/bin:$PATH"
For Windows, ensure that you selected the option to add conda to PATH during installation, or run the Anaconda Prompt (which automatically has conda available) to manage your environment.
1.7.3 Creating a local environment from environment.yml
In the environment
directory of the course repository, you will find a file named environment.yml
. This file lists all the packages needed for the local setup.
Clone or download the book repository to your local machine.
Open a terminal (or Anaconda Prompt on Windows).
Navigate to the folder containing
environment.yml
.cd path/to/MOSAIKS-Training-Manual/environment
Create the environment:
conda env create -f environment.yml
Activate the environment:
conda activate <environment_name>
Where
<environment_name>
is the name specified inenvironment.yml
(check thename:
field in the file). In this case the name ismosaiks
.
1.7.4 Using the new environment in VS Code
Visual Studio Code (VS Code) can detect and use your new conda environment for Python development.
- Open VS Code.
- Install the Python extension (if not already installed).
- Press Ctrl+Shift+P (or Cmd+Shift+P on macOS) and type “Python: Select Interpreter”.
- Select the interpreter associated with your newly created environment (it should be listed by name or path).
- Open or create a new Python file or notebook, and verify that VS Code is using the correct environment (you can see the chosen environment in the bottom-right corner of VS Code).
1.7.5 Other environment managers
While conda is a common tool for managing Python environments, there are other popular options such as:
Each has its own configuration files and setup instructions. If you prefer these tools or already use them, you can typically replicate the packages listed in environment.yml
. Check the respective tool’s documentation for specific instructions on how to translate the dependencies.
In the next chapter, we will take a closer look at the MOSAIKS framework, its core concepts, and how it can be applied to solve real-world problems.