How-To Guide: From Shapefile to Calibrated Model¶

This guide walks through the complete SWIM-RS workflow starting with just a shapefile of fields or polygons.

What You Get¶

After running SWIM-RS, you receive daily time series for each field including:

Evapotranspiration (ET, mm/day) — calibrated to satellite observations
Snow water equivalent (SWE, mm) — accumulation and melt
Soil moisture (root zone depletion, mm)
Deep percolation (groundwater recharge, mm/day)
Runoff (mm/day)
Irrigation (simulated applied water, mm/day)
Crop coefficients (Kcb, Ke, Ks)

All outputs are calibrated against satellite ET fraction (ETf) from OpenET models, ensuring consistency with remote sensing observations while providing complete daily coverage and physically-based partitioning.

Prerequisites¶

SWIM-RS installed — see Installation Guide
Earth Engine account — sign up at https://earthengine.google.com/
EE authenticated — run earthengine authenticate and complete OAuth

Verify your setup:

source .venv/bin/activate
swim --help
python -c "import ee; ee.Initialize(); print('EE OK')"

Overview¶

The SWIM-RS workflow has four main steps:

swim extract → swim prep → swim calibrate → swim evaluate

Step	What it does	Time
`extract`	Exports NDVI, ETf, met, properties from EE/GridMET	Hours (EE queue)
`prep`	Builds `.swim` container, computes dynamics	Minutes
`calibrate`	Runs PEST++ IES parameter estimation	Minutes to hours
`evaluate`	Runs calibrated model, writes output CSVs	Seconds

Step 1: Prepare Your Shapefile¶

Your shapefile needs:

Unique ID column — each polygon must have a unique identifier (e.g., site_id, field_id)
Valid geometries — no self-intersections or null geometries
Projected or WGS84 — SWIM-RS will reproject as needed

Optional but recommended: - State column — US state codes (e.g., MT, OR) for automatic irrigation mask selection

Example shapefile structure:

site_id	state	geometry
field_001	MT	POLYGON(...)
field_002	MT	POLYGON(...)

Step 2: Create Project Directory¶

Set up the standard directory structure:

mkdir -p my_project/data/gis
cp /path/to/my_fields.shp my_project/data/gis/
cp /path/to/my_fields.shx my_project/data/gis/
cp /path/to/my_fields.dbf my_project/data/gis/
cp /path/to/my_fields.prj my_project/data/gis/

Step 3: Create the TOML Config¶

Copy the template and customize it:

cp /path/to/swim-rs/docs/template.toml my_project/my_project.toml

Edit the TOML to set your project name, shapefile path, date range, and other options. The template includes comments explaining each setting. Key fields to customize:

project — your project name
fields_shapefile — path to your shapefile
feature_id — the unique ID column in your shapefile
start_date / end_date — your study period
etf_target_model — which OpenET model to calibrate against

The full annotated template is shown below. You can also download template.toml directly.

template.toml (click to expand)

# SWIM-RS Project Configuration Template
# Copy this file to your project directory and customize.
# See docs/how_to.md for detailed documentation.

# --------------------------------------------------------------------------
# Project Identity
# --------------------------------------------------------------------------
project = "my_project"  # Project name (used in output filenames)
root = "."              # Root directory for path resolution

# --------------------------------------------------------------------------
# Paths
# All paths support {variable} substitution from earlier definitions.
# --------------------------------------------------------------------------
[paths]
project_workspace = "{root}"
data = "{project_workspace}/data"

# SwimContainer (Zarr-based data store)
container = "{data}/{project}.swim"

# Remote sensing data
remote_sensing = "{data}/remote_sensing"
landsat = "{remote_sensing}/landsat"
sentinel = "{remote_sensing}/sentinel"  # Optional: for Sentinel-2 NDVI

# Meteorology
met = "{data}/met_timeseries/gridmet"

# GIS files
gis = "{data}/gis"
fields_shapefile = "{gis}/fields.shp"  # REQUIRED: your input shapefile

# GridMET mapping (created by swim extract)
gridmet_centroids = "{gis}/gridmet_centroids.shp"
gridmet_mapping = "{gis}/fields_gfid.shp"
gridmet_factors = "{gis}/fields_gfid.json"

# Bias correction rasters (optional)
correction_tifs = "{data}/bias_correction_tif"

# Properties (soils, land cover, irrigation)
properties = "{data}/properties"
irr = "{properties}/{project}_irr.csv"
ssurgo = "{properties}/{project}_ssurgo.csv"
lulc = "{properties}/{project}_landcover.csv"
properties_json = "{properties}/{project}_properties.json"

# Snow data
snodas_in = "{data}/snow/snodas/extracts"

# --------------------------------------------------------------------------
# Earth Engine (optional)
# --------------------------------------------------------------------------
[earth_engine]
# bucket = "my-gcs-bucket"  # Uncomment to export to GCS instead of Drive

# --------------------------------------------------------------------------
# Field Identifiers
# --------------------------------------------------------------------------
[ids]
feature_id = "site_id"      # REQUIRED: unique ID column in your shapefile
gridmet_join_id = "GFID"    # GridMET cell ID column (created by extract)
gridmet_id = "GFID"
state_col = "state"         # Optional: US state codes for irrigation mask

# --------------------------------------------------------------------------
# Data Sources
# --------------------------------------------------------------------------
[data_sources]
met_source = "gridmet"      # "gridmet" (CONUS) or "era5" (global)
snow_source = "snodas"      # "snodas" (CONUS) or "era5" (global)
soil_source = "ssurgo"      # "ssurgo" (CONUS) or "hwsd" (global)
mask_mode = "irrigation"    # "irrigation" or "none"

# --------------------------------------------------------------------------
# Model Settings
# --------------------------------------------------------------------------
[misc]
irrigation_threshold = 0.3  # Fraction of field area irrigated
elev_units = "m"            # Elevation units in shapefile
refet_type = "eto"          # Reference ET: "eto" (grass) or "etr" (alfalfa)
runoff_process = "cn"       # "cn" (Curve Number) or "ier" (infiltration-excess)

# --------------------------------------------------------------------------
# Date Range
# --------------------------------------------------------------------------
[date_range]
start_date = "2000-01-01"
end_date = "2023-12-31"

# --------------------------------------------------------------------------
# Crop Coefficient Settings
# --------------------------------------------------------------------------
[crop_coefficient]
kc_proxy = "etf"            # Use ETf to derive crop coefficients
cover_proxy = "ndvi"        # Use NDVI for cover fraction

# --------------------------------------------------------------------------
# Calibration (required for swim calibrate)
# --------------------------------------------------------------------------
[calibration]
pest_run_dir = "{project_workspace}/data/pestrun"

# ETf calibration target:
#   - Single model: "ptjpl", "ssebop", "sims", "geesebal", "eemetric", "disalexi"
#   - Ensemble mean: "ensemble" (averages all available models in container)
etf_target_model = "ptjpl"

# Optional: ensemble members for uncertainty-weighted calibration
# When provided, observation weights are computed from inter-model spread:
#   weight = obsval / (std + 0.1)
# etf_ensemble_members = ["ssebop", "sims", "geesebal"]

workers = 6                 # Parallel workers for calibration
realizations = 20           # PEST++ IES ensemble size
calibration_dir = "{pest_run_dir}/pest/mult"
obs_folder = "{pest_run_dir}/obs"
initial_values_csv = "{pest_run_dir}/params.csv"
spinup = "{pest_run_dir}/spinup.json"

# Optional: custom forward model script
# python_script = "{project_workspace}/custom_forward_run.py"

# --------------------------------------------------------------------------
# Forecast (optional)
# --------------------------------------------------------------------------
# [forecast]
# forecast_parameters = "{pest_run_dir}/pest/archive/{project}.3.par.csv"

See Config Schema Reference below for detailed documentation of all options.

Step 4: Extract Data from Earth Engine¶

Run the extraction command:

cd my_project
swim extract my_project.toml

This exports to Google Drive by default. To use a GCS bucket:

swim extract my_project.toml --export bucket --bucket my-gcs-bucket

What gets extracted¶

Data	Source	Destination
Landsat NDVI	EE Landsat 8/9	`data/remote_sensing/landsat/`
ET fraction	OpenET (PT-JPL default)	`data/remote_sensing/etf/`
Meteorology	GridMET THREDDS	`data/met_timeseries/gridmet/`
Snow (SWE)	SNODAS	`data/snow/snodas/`
Soils	SSURGO	`data/properties/`
Land cover	CDL/NLCD	`data/properties/`
Irrigation	LANID/IrrMapper	`data/properties/`

Monitor EE tasks¶

Check progress at: https://code.earthengine.google.com/tasks

Download from Drive¶

Once tasks complete, download the exported CSVs:

# Using rclone (configure gdrive remote first)
rclone sync gdrive:swim/landsat data/remote_sensing/landsat/extracts/
rclone sync gdrive:swim/etf data/remote_sensing/etf/extracts/
rclone sync gdrive:swim/snodas data/snow/snodas/extracts/

Or download manually from Drive and place in the appropriate directories.

Optional: Add Sentinel-2¶

For higher temporal resolution NDVI (2017+):

swim extract my_project.toml --add-sentinel

Optional: Multiple ETf models¶

For ensemble calibration:

swim extract my_project.toml --etf-models ssebop,ptjpl,sims

Step 5: Build the Container¶

Once extraction is complete, build the .swim container:

swim prep my_project.toml

This: 1. Creates data/my_project.swim (Zarr-based container) 2. Ingests all extracted data with provenance tracking 3. Computes merged NDVI and crop dynamics 4. Exports model-ready inputs (prepped_input.json, swim_input.h5)

Prep options¶

# Include Sentinel-2 NDVI
swim prep my_project.toml --add-sentinel

# Overwrite existing container
swim prep my_project.toml --overwrite

# International mode (no irrigation masks)
swim prep my_project.toml --international

# Limit to specific sites for testing
swim prep my_project.toml --sites field_001,field_002

Inspect the container¶

swim inspect data/my_project.swim --detailed

Step 6: Calibrate the Model¶

Run PEST++ IES calibration:

swim calibrate my_project.toml

Calibration adjusts soil and vegetation parameters so that modeled ET matches satellite-observed ETf on clear-sky days. The calibrated model then fills gaps and provides physically consistent partitioning of ET into evaporation and transpiration.

Single model vs ensemble calibration¶

Single model (default): Calibrate against one ETf model.

[calibration]
etf_target_model = "ptjpl"

Ensemble mean: Calibrate against the average of all available ETf models in the container.

[calibration]
etf_target_model = "ensemble"

Uncertainty-weighted: Calibrate against one model, but use multiple models to weight observations. Observations where models agree get higher weight; observations where models diverge get lower weight.

[calibration]
etf_target_model = "ptjpl"
etf_ensemble_members = ["ssebop", "sims", "geesebal"]

For ensemble or uncertainty-weighted calibration, extract multiple ETf models during Step 4:

swim extract my_project.toml --etf-models ssebop,ptjpl,sims,geesebal

Calibration options¶

# More realizations for better uncertainty quantification
swim calibrate my_project.toml --realizations 300

# More parallel workers (faster calibration)
swim calibrate my_project.toml --workers 12

# Both
swim calibrate my_project.toml --workers 12 --realizations 300

What calibration produces¶

data/pestrun/spinup.json — initial state from spinup run
data/pestrun/params.csv — calibrated parameters
data/pestrun/pest/ — full PEST++ project files

Step 7: Run the Calibrated Model¶

Generate the output time series:

swim evaluate my_project.toml

Output files¶

For each field, SWIM-RS writes a CSV with daily values:

Column	Description	Units
`date`	Date	YYYY-MM-DD
`eta`	Actual evapotranspiration	mm/day
`etf`	ET fraction (ET/ETref)	-
`kcb`	Basal crop coefficient	-
`ke`	Evaporation coefficient	-
`ks`	Stress coefficient	-
`swe`	Snow water equivalent	mm
`rain`	Rainfall (liquid precip)	mm/day
`snow`	Snowfall (solid precip)	mm/day
`melt`	Snowmelt	mm/day
`runoff`	Surface runoff	mm/day
`dperc`	Deep percolation	mm/day
`depl_root`	Root zone depletion	mm
`irr_sim`	Simulated irrigation	mm/day
`ndvi`	NDVI (interpolated)	-

Output location: project directory or --out-dir if specified.

International Workflows¶

For sites outside CONUS, use ERA5-Land and HWSD:

[data_sources]
met_source = "era5"
snow_source = "era5"
soil_source = "hwsd"
mask_mode = "none"  # no irrigation masking

Extract with:

swim extract my_project.toml --international

Troubleshooting¶

EE quota exceeded¶

Split extraction across multiple days or limit sites:

swim extract my_project.toml --sites field_001,field_002,field_003

Missing data for some fields¶

Check container coverage:

swim inspect data/my_project.swim

Re-extract missing data with --overwrite for specific components:

swim extract my_project.toml --no-met --no-properties  # only RS data

Calibration not converging¶

Increase realizations: --realizations 500
Check ETf target quality in the container
Try a different ETf model: change etf_target_model in TOML

Memory issues¶

Limit parallel workers:

swim calibrate my_project.toml --workers 2

Config Schema Reference¶

Each project uses a TOML with a small, consistent set of keys. These are the required and common optional entries.

Top level¶

project (string): Project identifier, used in output filenames
root (string): Root directory for resolving paths (usually ".")

`[paths]` (required)¶

project_workspace (string): Usually {root} or {root}/{project}
data (string): Data directory under workspace
container (string): Path to the .swim container file
landsat, sentinel (strings): Remote sensing directories
met (string): GridMET/ERA5 time series directory
gis (string): GIS files directory
fields_shapefile (string, REQUIRED): Path to your input shapefile
gridmet_mapping (string): Shapefile for precomputed GridMET cell mapping
gridmet_factors (string): JSON written by mapping step
correction_tifs (string): Folder of monthly ETo/ETr bias correction rasters
properties (string): Directory for properties CSV/JSON
irr, ssurgo, lulc, properties_json (strings): Property file paths
snodas_in (string): SNODAS extract directory

`[earth_engine]` (optional)¶

bucket (string): GCS bucket for exports (defaults to Google Drive if omitted)

`[ids]` (required)¶

feature_id (string, REQUIRED): Unique field identifier column in your shapefile
gridmet_join_id (string): ID column in gridmet_mapping shapefile
state_col (string): US state codes column for irrigation mask selection

`[data_sources]` (required)¶

met_source (string): "gridmet" (CONUS) or "era5" (global)
snow_source (string): "snodas" (CONUS) or "era5" (global)
soil_source (string): "ssurgo" (CONUS) or "hwsd" (global)
mask_mode (string): "irrigation" or "none"

`[misc]` (required)¶

irrigation_threshold (float): Fraction of field area irrigated to trigger irrigation mode (0.0–1.0)
elev_units (string): Elevation units in shapefile ("m" or "ft")
refet_type (string): Reference ET type — "eto" (grass) or "etr" (alfalfa)
runoff_process (string): "cn" (Curve Number) or "ier" (infiltration-excess)

`[date_range]` (required)¶

start_date, end_date (string): YYYY-MM-DD format

`[crop_coefficient]` (required)¶

kc_proxy (string): Usually "etf"
cover_proxy (string): Usually "ndvi"

`[calibration]` (required for `swim calibrate`)¶

pest_run_dir (string): Directory for PEST++ files
etf_target_model (string): Calibration target — "ptjpl", "ssebop", "sims", "geesebal", "eemetric", "disalexi", or "ensemble" (mean of all available models)
etf_ensemble_members (array of strings, optional): Additional models for uncertainty-weighted calibration. When provided, observation weights are computed from inter-model spread: observations where models agree get higher weight, observations where models diverge get lower weight.
workers (int): Parallel workers for calibration
realizations (int): PEST++ IES ensemble size
calibration_dir, obs_folder, initial_values_csv, spinup (strings): PEST file paths
python_script (string, optional): Custom forward model script

`[forecast]` (optional)¶

forecast_parameters (string): Path to calibrated parameter CSV for forecasting

Notes¶

All paths support {variable} substitution from earlier definitions
Shapefile is the canonical source; EE exporters convert it to a FeatureCollection automatically
See the template.toml embedded above for a complete working example

Next Steps¶

Explore the Examples for more detailed workflows
Read Algorithm Description for model physics
See Container Architecture for data structure details
Check CLI Cheatsheet for quick reference