How-To Guide: From Shapefile to Calibrated Model¶
This guide walks through the complete SWIM-RS workflow starting with just a shapefile of fields or polygons.
What You Get¶
After running SWIM-RS, you receive daily time series for each field including:
- Evapotranspiration (ET, mm/day) — calibrated to satellite observations
- Snow water equivalent (SWE, mm) — accumulation and melt
- Soil moisture (root zone depletion, mm)
- Deep percolation (groundwater recharge, mm/day)
- Runoff (mm/day)
- Irrigation (simulated applied water, mm/day)
- Crop coefficients (Kcb, Ke, Ks)
All outputs are calibrated against satellite ET fraction (ETf) from OpenET models, ensuring consistency with remote sensing observations while providing complete daily coverage and physically-based partitioning.
Prerequisites¶
- SWIM-RS installed — see Installation Guide
- Earth Engine account — sign up at https://earthengine.google.com/
- EE authenticated — run
earthengine authenticateand complete OAuth
Verify your setup:
Overview¶
The SWIM-RS workflow has four main steps:
| Step | What it does | Time |
|---|---|---|
extract |
Exports NDVI, ETf, met, properties from EE/GridMET | Hours (EE queue) |
prep |
Builds .swim container, computes dynamics |
Minutes |
calibrate |
Runs PEST++ IES parameter estimation | Minutes to hours |
evaluate |
Runs calibrated model, writes output CSVs | Seconds |
Step 1: Prepare Your Shapefile¶
Your shapefile needs:
- Unique ID column — each polygon must have a unique identifier (e.g.,
site_id,field_id) - Valid geometries — no self-intersections or null geometries
- Projected or WGS84 — SWIM-RS will reproject as needed
Optional but recommended:
- State column — US state codes (e.g., MT, OR) for automatic irrigation mask selection
Example shapefile structure:
| site_id | state | geometry |
|---|---|---|
| field_001 | MT | POLYGON(...) |
| field_002 | MT | POLYGON(...) |
Step 2: Create Project Directory¶
Set up the standard directory structure:
mkdir -p my_project/data/gis
cp /path/to/my_fields.shp my_project/data/gis/
cp /path/to/my_fields.shx my_project/data/gis/
cp /path/to/my_fields.dbf my_project/data/gis/
cp /path/to/my_fields.prj my_project/data/gis/
Step 3: Create the TOML Config¶
Copy the template and customize it:
Edit the TOML to set your project name, shapefile path, date range, and other options. The template includes comments explaining each setting. Key fields to customize:
project— your project namefields_shapefile— path to your shapefilefeature_id— the unique ID column in your shapefilestart_date/end_date— your study periodetf_target_model— which OpenET model to calibrate against
The full annotated template is shown below. You can also download template.toml directly.
template.toml (click to expand)
# SWIM-RS Project Configuration Template
# Copy this file to your project directory and customize.
# See docs/how_to.md for detailed documentation.
# --------------------------------------------------------------------------
# Project Identity
# --------------------------------------------------------------------------
project = "my_project" # Project name (used in output filenames)
root = "." # Root directory for path resolution
# --------------------------------------------------------------------------
# Paths
# All paths support {variable} substitution from earlier definitions.
# --------------------------------------------------------------------------
[paths]
project_workspace = "{root}"
data = "{project_workspace}/data"
# SwimContainer (Zarr-based data store)
container = "{data}/{project}.swim"
# Remote sensing data
remote_sensing = "{data}/remote_sensing"
landsat = "{remote_sensing}/landsat"
sentinel = "{remote_sensing}/sentinel" # Optional: for Sentinel-2 NDVI
# Meteorology
met = "{data}/met_timeseries/gridmet"
# GIS files
gis = "{data}/gis"
fields_shapefile = "{gis}/fields.shp" # REQUIRED: your input shapefile
# GridMET mapping (created by swim extract)
gridmet_centroids = "{gis}/gridmet_centroids.shp"
gridmet_mapping = "{gis}/fields_gfid.shp"
gridmet_factors = "{gis}/fields_gfid.json"
# Bias correction rasters (optional)
correction_tifs = "{data}/bias_correction_tif"
# Properties (soils, land cover, irrigation)
properties = "{data}/properties"
irr = "{properties}/{project}_irr.csv"
ssurgo = "{properties}/{project}_ssurgo.csv"
lulc = "{properties}/{project}_landcover.csv"
properties_json = "{properties}/{project}_properties.json"
# Snow data
snodas_in = "{data}/snow/snodas/extracts"
# --------------------------------------------------------------------------
# Earth Engine (optional)
# --------------------------------------------------------------------------
[earth_engine]
# bucket = "my-gcs-bucket" # Uncomment to export to GCS instead of Drive
# --------------------------------------------------------------------------
# Field Identifiers
# --------------------------------------------------------------------------
[ids]
feature_id = "site_id" # REQUIRED: unique ID column in your shapefile
gridmet_join_id = "GFID" # GridMET cell ID column (created by extract)
gridmet_id = "GFID"
state_col = "state" # Optional: US state codes for irrigation mask
# --------------------------------------------------------------------------
# Data Sources
# --------------------------------------------------------------------------
[data_sources]
met_source = "gridmet" # "gridmet" (CONUS) or "era5" (global)
snow_source = "snodas" # "snodas" (CONUS) or "era5" (global)
soil_source = "ssurgo" # "ssurgo" (CONUS) or "hwsd" (global)
mask_mode = "irrigation" # "irrigation" or "none"
# --------------------------------------------------------------------------
# Model Settings
# --------------------------------------------------------------------------
[misc]
irrigation_threshold = 0.3 # Fraction of field area irrigated
elev_units = "m" # Elevation units in shapefile
refet_type = "eto" # Reference ET: "eto" (grass) or "etr" (alfalfa)
runoff_process = "cn" # "cn" (Curve Number) or "ier" (infiltration-excess)
# --------------------------------------------------------------------------
# Date Range
# --------------------------------------------------------------------------
[date_range]
start_date = "2000-01-01"
end_date = "2023-12-31"
# --------------------------------------------------------------------------
# Crop Coefficient Settings
# --------------------------------------------------------------------------
[crop_coefficient]
kc_proxy = "etf" # Use ETf to derive crop coefficients
cover_proxy = "ndvi" # Use NDVI for cover fraction
# --------------------------------------------------------------------------
# Calibration (required for swim calibrate)
# --------------------------------------------------------------------------
[calibration]
pest_run_dir = "{project_workspace}/data/pestrun"
# ETf calibration target:
# - Single model: "ptjpl", "ssebop", "sims", "geesebal", "eemetric", "disalexi"
# - Ensemble mean: "ensemble" (averages all available models in container)
etf_target_model = "ptjpl"
# Optional: ensemble members for uncertainty-weighted calibration
# When provided, observation weights are computed from inter-model spread:
# weight = obsval / (std + 0.1)
# etf_ensemble_members = ["ssebop", "sims", "geesebal"]
workers = 6 # Parallel workers for calibration
realizations = 20 # PEST++ IES ensemble size
calibration_dir = "{pest_run_dir}/pest/mult"
obs_folder = "{pest_run_dir}/obs"
initial_values_csv = "{pest_run_dir}/params.csv"
spinup = "{pest_run_dir}/spinup.json"
# Optional: custom forward model script
# python_script = "{project_workspace}/custom_forward_run.py"
# --------------------------------------------------------------------------
# Forecast (optional)
# --------------------------------------------------------------------------
# [forecast]
# forecast_parameters = "{pest_run_dir}/pest/archive/{project}.3.par.csv"
See Config Schema Reference below for detailed documentation of all options.
Step 4: Extract Data from Earth Engine¶
Run the extraction command:
This exports to Google Drive by default. To use a GCS bucket:
What gets extracted¶
| Data | Source | Destination |
|---|---|---|
| Landsat NDVI | EE Landsat 8/9 | data/remote_sensing/landsat/ |
| ET fraction | OpenET (PT-JPL default) | data/remote_sensing/etf/ |
| Meteorology | GridMET THREDDS | data/met_timeseries/gridmet/ |
| Snow (SWE) | SNODAS | data/snow/snodas/ |
| Soils | SSURGO | data/properties/ |
| Land cover | CDL/NLCD | data/properties/ |
| Irrigation | LANID/IrrMapper | data/properties/ |
Monitor EE tasks¶
Check progress at: https://code.earthengine.google.com/tasks
Download from Drive¶
Once tasks complete, download the exported CSVs:
# Using rclone (configure gdrive remote first)
rclone sync gdrive:swim/landsat data/remote_sensing/landsat/extracts/
rclone sync gdrive:swim/etf data/remote_sensing/etf/extracts/
rclone sync gdrive:swim/snodas data/snow/snodas/extracts/
Or download manually from Drive and place in the appropriate directories.
Optional: Add Sentinel-2¶
For higher temporal resolution NDVI (2017+):
Optional: Multiple ETf models¶
For ensemble calibration:
Step 5: Build the Container¶
Once extraction is complete, build the .swim container:
This:
1. Creates data/my_project.swim (Zarr-based container)
2. Ingests all extracted data with provenance tracking
3. Computes merged NDVI and crop dynamics
4. Exports model-ready inputs (prepped_input.json, swim_input.h5)
Prep options¶
# Include Sentinel-2 NDVI
swim prep my_project.toml --add-sentinel
# Overwrite existing container
swim prep my_project.toml --overwrite
# International mode (no irrigation masks)
swim prep my_project.toml --international
# Limit to specific sites for testing
swim prep my_project.toml --sites field_001,field_002
Inspect the container¶
Step 6: Calibrate the Model¶
Run PEST++ IES calibration:
Calibration adjusts soil and vegetation parameters so that modeled ET matches satellite-observed ETf on clear-sky days. The calibrated model then fills gaps and provides physically consistent partitioning of ET into evaporation and transpiration.
Single model vs ensemble calibration¶
Single model (default): Calibrate against one ETf model.
Ensemble mean: Calibrate against the average of all available ETf models in the container.
Uncertainty-weighted: Calibrate against one model, but use multiple models to weight observations. Observations where models agree get higher weight; observations where models diverge get lower weight.
For ensemble or uncertainty-weighted calibration, extract multiple ETf models during Step 4:
Calibration options¶
# More realizations for better uncertainty quantification
swim calibrate my_project.toml --realizations 300
# More parallel workers (faster calibration)
swim calibrate my_project.toml --workers 12
# Both
swim calibrate my_project.toml --workers 12 --realizations 300
What calibration produces¶
data/pestrun/spinup.json— initial state from spinup rundata/pestrun/params.csv— calibrated parametersdata/pestrun/pest/— full PEST++ project files
Step 7: Run the Calibrated Model¶
Generate the output time series:
Output files¶
For each field, SWIM-RS writes a CSV with daily values:
| Column | Description | Units |
|---|---|---|
date |
Date | YYYY-MM-DD |
eta |
Actual evapotranspiration | mm/day |
etf |
ET fraction (ET/ETref) | - |
kcb |
Basal crop coefficient | - |
ke |
Evaporation coefficient | - |
ks |
Stress coefficient | - |
swe |
Snow water equivalent | mm |
rain |
Rainfall (liquid precip) | mm/day |
snow |
Snowfall (solid precip) | mm/day |
melt |
Snowmelt | mm/day |
runoff |
Surface runoff | mm/day |
dperc |
Deep percolation | mm/day |
depl_root |
Root zone depletion | mm |
irr_sim |
Simulated irrigation | mm/day |
ndvi |
NDVI (interpolated) | - |
Output location: project directory or --out-dir if specified.
International Workflows¶
For sites outside CONUS, use ERA5-Land and HWSD:
[data_sources]
met_source = "era5"
snow_source = "era5"
soil_source = "hwsd"
mask_mode = "none" # no irrigation masking
Extract with:
Troubleshooting¶
EE quota exceeded¶
Split extraction across multiple days or limit sites:
Missing data for some fields¶
Check container coverage:
Re-extract missing data with --overwrite for specific components:
Calibration not converging¶
- Increase realizations:
--realizations 500 - Check ETf target quality in the container
- Try a different ETf model: change
etf_target_modelin TOML
Memory issues¶
Limit parallel workers:
Config Schema Reference¶
Each project uses a TOML with a small, consistent set of keys. These are the required and common optional entries.
Top level¶
project(string): Project identifier, used in output filenamesroot(string): Root directory for resolving paths (usually".")
[paths] (required)¶
project_workspace(string): Usually{root}or{root}/{project}data(string): Data directory under workspacecontainer(string): Path to the.swimcontainer filelandsat,sentinel(strings): Remote sensing directoriesmet(string): GridMET/ERA5 time series directorygis(string): GIS files directoryfields_shapefile(string, REQUIRED): Path to your input shapefilegridmet_mapping(string): Shapefile for precomputed GridMET cell mappinggridmet_factors(string): JSON written by mapping stepcorrection_tifs(string): Folder of monthly ETo/ETr bias correction rastersproperties(string): Directory for properties CSV/JSONirr,ssurgo,lulc,properties_json(strings): Property file pathssnodas_in(string): SNODAS extract directory
[earth_engine] (optional)¶
bucket(string): GCS bucket for exports (defaults to Google Drive if omitted)
[ids] (required)¶
feature_id(string, REQUIRED): Unique field identifier column in your shapefilegridmet_join_id(string): ID column ingridmet_mappingshapefilestate_col(string): US state codes column for irrigation mask selection
[data_sources] (required)¶
met_source(string):"gridmet"(CONUS) or"era5"(global)snow_source(string):"snodas"(CONUS) or"era5"(global)soil_source(string):"ssurgo"(CONUS) or"hwsd"(global)mask_mode(string):"irrigation"or"none"
[misc] (required)¶
irrigation_threshold(float): Fraction of field area irrigated to trigger irrigation mode (0.0–1.0)elev_units(string): Elevation units in shapefile ("m"or"ft")refet_type(string): Reference ET type —"eto"(grass) or"etr"(alfalfa)runoff_process(string):"cn"(Curve Number) or"ier"(infiltration-excess)
[date_range] (required)¶
start_date,end_date(string): YYYY-MM-DD format
[crop_coefficient] (required)¶
kc_proxy(string): Usually"etf"cover_proxy(string): Usually"ndvi"
[calibration] (required for swim calibrate)¶
pest_run_dir(string): Directory for PEST++ filesetf_target_model(string): Calibration target —"ptjpl","ssebop","sims","geesebal","eemetric","disalexi", or"ensemble"(mean of all available models)etf_ensemble_members(array of strings, optional): Additional models for uncertainty-weighted calibration. When provided, observation weights are computed from inter-model spread: observations where models agree get higher weight, observations where models diverge get lower weight.workers(int): Parallel workers for calibrationrealizations(int): PEST++ IES ensemble sizecalibration_dir,obs_folder,initial_values_csv,spinup(strings): PEST file pathspython_script(string, optional): Custom forward model script
[forecast] (optional)¶
forecast_parameters(string): Path to calibrated parameter CSV for forecasting
Notes¶
- All paths support
{variable}substitution from earlier definitions - Shapefile is the canonical source; EE exporters convert it to a FeatureCollection automatically
- See the template.toml embedded above for a complete working example
Next Steps¶
- Explore the Examples for more detailed workflows
- Read Algorithm Description for model physics
- See Container Architecture for data structure details
- Check CLI Cheatsheet for quick reference