Calibrate Package (swimrs.calibrate)¶
PEST++ IES integration for parameter estimation and inverse modeling.
PestBuilder
¶
Builder for PEST++ IES calibration control files.
Constructs PEST++ control files, observation files, and parameter templates for calibrating SWIM-RS model parameters against ET fraction and SWE observations.
The builder handles: - Parameter setup with prior information from soil and vegetation data - Observation file generation from remote sensing ET and SNODAS SWE - Localization matrix construction for ensemble methods - Forward run script generation
Attributes:
| Name | Type | Description |
|---|---|---|
config |
ProjectConfig instance with calibration settings. |
|
pest_run_dir |
Root directory for PEST++ files. |
|
pest_dir |
Directory containing the .pst control file. |
|
master_dir |
Directory for PEST++ master process. |
|
pst_file |
Path to the generated .pst control file. |
Example
from swimrs.swim import ProjectConfig from swimrs.calibrate import PestBuilder
config = ProjectConfig() config.read_config("project.toml", calibrate=True)
with PestBuilder(config) as builder: ... builder.spinup() ... builder.build_pest(target_etf='ssebop') ... builder.build_localizer() ... builder.write_control_settings(noptmax=4, reals=250)
config = config
instance-attribute
¶
project_ws = config.project_ws
instance-attribute
¶
pest_run_dir = config.pest_run_dir
instance-attribute
¶
_container = None
instance-attribute
¶
_container_path = None
instance-attribute
¶
_owns_container = False
instance-attribute
¶
observation_index = {}
instance-attribute
¶
masks = ['inv_irr', 'irr', 'no_mask']
instance-attribute
¶
pest = None
instance-attribute
¶
etf_std = None
instance-attribute
¶
etf_capture_indexes = []
instance-attribute
¶
params_file = os.path.join(self.pest_run_dir, 'params.csv')
instance-attribute
¶
prior_contstraint = prior_constraint
instance-attribute
¶
conflicted_obs = conflicted_obs
instance-attribute
¶
pest_dir = os.path.join(config.pest_run_dir, 'pest')
instance-attribute
¶
master_dir = os.path.join(config.pest_run_dir, 'master')
instance-attribute
¶
workers_dir = os.path.join(config.pest_run_dir, 'workers')
instance-attribute
¶
obs_dir = os.path.join(config.pest_run_dir, 'obs')
instance-attribute
¶
pst_file = os.path.join(self.pest_dir, f'{self.config.project_name}.pst')
instance-attribute
¶
obs_idx_file = os.path.join(self.pest_dir, f'{self.config.project_name}.idx.csv')
instance-attribute
¶
pest_args = self.get_pest_builder_args()
instance-attribute
¶
verbose = verbose
instance-attribute
¶
python_script = python_script
instance-attribute
¶
overwrite_build = False
instance-attribute
¶
__init__(config, container, use_existing: bool = False, python_script: str | None = None, prior_constraint: dict | None = None, conflicted_obs: str | None = None, verbose: bool = True) -> None
¶
Initialize PestBuilder for PEST++ calibration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ProjectConfig instance |
required | |
container
|
SwimContainer instance or path to .swim directory. Required - all data is sourced from the container. |
required | |
use_existing
|
bool
|
If True, use existing PEST++ setup |
False
|
python_script
|
str | None
|
Path to custom forward run script |
None
|
prior_constraint
|
dict | None
|
Prior constraint settings |
None
|
conflicted_obs
|
str | None
|
Path to conflicted observations file |
None
|
verbose
|
bool
|
If False, suppress pyemu/PstFrom output. Default True. |
True
|
_init_container(container) -> None
¶
Initialize container from instance or path.
_load_data_from_container() -> None
¶
Load all data from container (replaces SamplePlots).
Populates: - self.plot_order: field UIDs - self.plot_properties: field properties dict - self.irr: irrigation data dict - self.ke_max: bare soil evaporation coefficient dict - self.kc_max: max crop coefficient dict - self.date_range: (start_date, end_date) tuple
_get_etf_data(fid: str, model: str = 'ssebop') -> pd.DataFrame
¶
Get ETf data for a field from container.
Returns DataFrame with columns like '{model}etf{mask}' for each mask.
If model='ensemble', computes the mean across all available ETf models.
_discover_etf_models() -> list[str]
¶
Discover available ETf models in the container.
_get_swe_data(fid: str) -> pd.DataFrame
¶
Get SWE data for a field from container.
close() -> None
¶
Close container if we own it.
__enter__() -> PestBuilder
¶
__exit__(exc_type, exc_val, exc_tb) -> bool
¶
get_pest_builder_args() -> dict
¶
build_pest(target_etf: str = 'openet', members: list[str] | None = None) -> None
¶
Build the PEST++ control file and supporting files.
Creates the .pst control file, observation files, parameter templates, and forward run script in the pest directory.
Uses the process package with portable swim_input.h5 file. Workers are fully self-contained and can run without shared storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_etf
|
str
|
ET model to use as calibration target ('ssebop', 'ptjpl', etc.). |
'openet'
|
members
|
list[str] | None
|
Optional list of ensemble member models for uncertainty weighting. If provided, observation weights are computed from inter-model spread. |
None
|
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If use_existing=True was set in constructor. |
print_build_diagnostics(max_groups: int = 25) -> pd.DataFrame
¶
Print a compact diagnostics table after building the PEST++ project.
This is meant to make it obvious whether calibration is actually using the intended observations/weights (e.g., ETf weights not all zero).
Returns¶
pd.DataFrame Per-observation-group summary table (also printed).
_build_obs_diagnostics_table(obs: pd.DataFrame) -> pd.DataFrame
staticmethod
¶
Build per-observation-group diagnostics for a PEST++ observation table.
_write_forward_run_script() -> None
¶
Generate custom_forward_run.py with portable relative paths.
Uses the process package with swim_input.h5 for fully portable workers. All paths are relative to the worker directory - no shared storage needed.
build_localizer() -> None
¶
Build the localization matrix for ensemble Kalman methods.
Creates a sparse matrix that restricts parameter-observation correlations to physically meaningful relationships. ET observations only update ET-related parameters, SWE observations only update snow parameters.
Writes loc.mat and localizer_summary.json to the pest directory.
write_control_settings(noptmax: int = -2, reals: int = 250) -> None
¶
Write PEST++ IES control settings to the .pst file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
noptmax
|
int
|
Maximum optimization iterations. Use -2 for parameter estimation mode, positive values for optimization. |
-2
|
reals
|
int
|
Number of realizations in the ensemble. |
250
|
initial_parameter_dict() -> OrderedDict
¶
dry_run(exe: str = 'pestpp-ies') -> None
¶
spinup(overwrite: bool = False) -> None
¶
Run model spinup to initialize state variables.
Runs the model with initial parameters and saves the final state to the spinup JSON file for warm-starting calibration runs.
This method also creates the initial swim_input.h5 file (without spinup state). After spinup completes, _build_swim_input() rebuilds the h5 with the spinup state baked in.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
overwrite
|
bool
|
If True, regenerate spinup even if file exists. |
False
|
_build_swim_input() -> str
¶
Build portable swim_input.h5 file for workers with spinup state.
Creates a self-contained HDF5 file with all input data needed for model execution, including spinup state if available. This file is copied to each PEST++ worker for isolated execution.
If spinup() was called first, this rebuilds the h5 with spinup state baked in. The rebuild is necessary because spinup creates the h5 without spinup state (since it's generating it).
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Path to the created swim_input.h5 file. |
_write_params() -> None
¶
_write_swe_obs(count: int) -> None
¶
_write_etf_obs(target: str, members: list[str] | None) -> int
¶
_finalize_obs() -> None
¶
Write std to observations dataframes.
We should be able to write std to the observations dataframes in the etf and swe writers, but they are lost in the pest build call, so are written here.
add_regularization() -> None
¶
_drop_conflicts(i: int, fid: str) -> None
¶
PestResults
¶
Parse, summarize, and clean up PEST++ IES calibration results.
Provides utilities for checking calibration success, extracting summary metrics, and cleaning up intermediate files after PEST++ runs.
Attributes:
| Name | Type | Description |
|---|---|---|
pest_dir |
Path to the pest/ directory containing .pst and master/. |
|
master_dir |
Path to the master/ directory with output files. |
|
project_name |
Name of the project (matches .pst filename stem). |
Example
from swimrs.calibrate import PestResults
results = PestResults("./pest_run/pest", "my_project") success, issues = results.is_successful()
if success: ... summary = results.get_summary() ... print(f"Phi reduction: {summary['phi_reduction_pct']:.1f}%") ... results.cleanup(archive_dir="./archive") ... else: ... for issue in issues: ... print(f"Issue: {issue}")
ARCHIVE_FILES = ['{project}.pst', '{project}.rec', '{project}.phi.meas.csv', '{project}.phi.composite.csv', 'params.csv', 'localizer_summary.json', 'loc.mat']
class-attribute
instance-attribute
¶
DEBUG_FILES = ['panther_master.rec', '{project}.*.obs.csv', '{project}.*.par.csv', '{project}.*.pdc.csv', '{project}.*.pcs.csv']
class-attribute
instance-attribute
¶
CLEANUP_PATTERNS = ['*.jcb', '*.jco', '*.rei', '*.rst']
class-attribute
instance-attribute
¶
pest_dir = Path(pest_dir)
instance-attribute
¶
master_dir = self.pest_dir / 'master'
instance-attribute
¶
project_name = project_name
instance-attribute
¶
pest_run_dir = self.pest_dir.parent
instance-attribute
¶
workers_dir = self.pest_run_dir / 'workers'
instance-attribute
¶
_rec_content = None
instance-attribute
¶
_phi_data = None
instance-attribute
¶
_noptmax = None
instance-attribute
¶
rec_file: Path
property
¶
Path to main record file.
pst_file: Path
property
¶
Path to control file.
__init__(pest_dir: str, project_name: str)
¶
Initialize results handler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pest_dir
|
str
|
Path to pest/ directory (contains master/, pst file, etc.) |
required |
project_name
|
str
|
Project name (e.g., '2_Fort_Peck') |
required |
_read_rec_file() -> str
¶
Read and cache record file content.
_read_phi_data() -> pd.DataFrame | None
¶
Read and cache phi measurement data.
_get_noptmax() -> int | None
¶
Extract noptmax from control file or record.
_get_par_files() -> list[Path]
¶
Get all parameter CSV files sorted by iteration.
is_successful() -> tuple[bool, list[str]]
¶
Check if calibration succeeded.
Returns:
| Type | Description |
|---|---|
bool
|
Tuple of (success, issues) where: |
list[str]
|
|
tuple[bool, list[str]]
|
|
get_summary() -> dict
¶
Extract key metrics from calibration results.
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with summary metrics. |
get_final_parameters() -> pd.DataFrame | None
¶
Get final calibrated parameter ensemble.
Returns:
| Type | Description |
|---|---|
DataFrame | None
|
DataFrame with parameter values, or None if not found. |
_calculate_dir_size(path: Path) -> float
¶
Calculate directory size in MB.
cleanup(archive_dir: str | None = None, keep_debug: bool = False, dry_run: bool = False) -> dict
¶
Clean up calibration files based on success status.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
archive_dir
|
str | None
|
Directory to archive important files (None = pest_dir/archive). |
None
|
keep_debug
|
bool
|
Force keeping debug files even on success. |
False
|
dry_run
|
bool
|
If True, report what would be done without doing it. |
False
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with cleanup report. |
_get_recommendations(issues: list[str]) -> list[str]
¶
Generate debugging recommendations based on issues.
print_summary() -> None
¶
Print a formatted summary to stdout.
run_pst(_dir: str, _cmd: str, pst_file: str, num_workers: int, worker_root: str, master_dir: str | None = None, verbose: bool = True, cleanup: bool = True) -> None
¶
Run PEST++ calibration with parallel workers.
Launches the PEST++ master and worker processes using pyemu's os_utils. Workers execute the forward model in parallel across multiple cores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_dir
|
str
|
Directory containing the .pst control file. |
required |
_cmd
|
str
|
PEST++ executable command (e.g., 'pestpp-ies'). |
required |
pst_file
|
str
|
Name of the .pst control file. |
required |
num_workers
|
int
|
Number of parallel worker processes. |
required |
worker_root
|
str
|
Directory for worker process files. |
required |
master_dir
|
str | None
|
Directory for master process output. Defaults to None. |
None
|
verbose
|
bool
|
Print progress messages. Defaults to True. |
True
|
cleanup
|
bool
|
Clean up worker directories on completion. Defaults to True. |
True
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the pest directory does not exist. |
Example
run_pst( ... _dir='/path/to/pest', ... _cmd='pestpp-ies', ... pst_file='project.pst', ... num_workers=4, ... worker_root='/path/to/workers', ... master_dir='/path/to/master' ... )