Data Extraction (`swimrs.data_extraction`)¶

Earth Engine exports and meteorology ingest helpers.

Earth Engine¶

`export_ptjpl_zonal_stats(shapefile: str, bucket: str, feature_id: str = 'FID', select: list[str] | None = None, start_yr: int = 2000, end_yr: int = 2024, mask_type: str = 'no_mask', check_dir: str | None = None, state_col: str = 'state', buffer: float | None = None, batch_size: int = 60, file_prefix: str = 'swim') -> None` ¶

Export per-scene PT-JPL ET fraction zonal means for polygons to GCS CSVs.

Parameters¶

shapefile : str Path to polygon shapefile with feature IDs. bucket : str GCS bucket name (no scheme). feature_id : str, optional Field name for feature identifier. select : list, optional Optional list of feature IDs to process. start_yr : int, optional Inclusive start year (default: 2000). end_yr : int, optional Inclusive end year (default: 2024). mask_type : {'no_mask', 'irr', 'inv_irr'}, optional Irrigation masking strategy (default: 'no_mask'). check_dir : str, optional If set, skip exports when CSV already exists at check_dir/.csv. state_col : str, optional Column with state abbreviation for mask source selection. buffer : float, optional Buffer distance in meters to apply to geometries. batch_size : int, optional Number of scenes to process per export batch (default: 60). PT-JPL runs fast so larger batches are efficient. file_prefix : str, optional Bucket path prefix, typically project name (default: 'swim').

`export_ssebop_zonal_stats(shapefile: str, bucket: str, feature_id: str = 'FID', select: list[str] | None = None, start_yr: int = 2000, end_yr: int = 2024, mask_type: str = 'no_mask', check_dir: str | None = None, state_col: str = 'state', buffer: float | None = None, batch_size: int = 15, file_prefix: str = 'swim') -> None` ¶

Export per-scene SSEBop ET fraction zonal means for polygons to GCS CSVs.

Parameters¶

shapefile : str Path to polygon shapefile with feature IDs. bucket : str GCS bucket name (no scheme). feature_id : str, optional Field name for feature identifier. select : list, optional Optional list of feature IDs to process. start_yr : int, optional Inclusive start year (default: 2000). end_yr : int, optional Inclusive end year (default: 2024). mask_type : {'no_mask', 'irr', 'inv_irr'}, optional Irrigation masking strategy (default: 'no_mask'). check_dir : str, optional If set, skip exports when CSV already exists at check_dir/.csv. state_col : str, optional Column with state abbreviation for mask source selection. buffer : float, optional Buffer distance in meters to apply to geometries. batch_size : int, optional Number of scenes to process per export batch (default: 15). Smaller batches reduce server-side memory usage. file_prefix : str, optional Bucket path prefix, typically project name (default: 'swim').

`export_sims_zonal_stats(shapefile: str, bucket: str, feature_id: str = 'FID', select: list[str] | None = None, start_yr: int = 2000, end_yr: int = 2024, mask_type: str = 'no_mask', check_dir: str | None = None, state_col: str = 'state', buffer: float | None = None, batch_size: int = 15, file_prefix: str = 'swim') -> None` ¶

Export per-scene SIMS ET fraction zonal means for polygons to GCS CSVs.

Parameters¶

shapefile : str Path to polygon shapefile with feature IDs. bucket : str GCS bucket name (no scheme). feature_id : str, optional Field name for feature identifier. select : list, optional Optional list of feature IDs to process. start_yr : int, optional Inclusive start year (default: 2000). end_yr : int, optional Inclusive end year (default: 2024). mask_type : {'no_mask', 'irr', 'inv_irr'}, optional Irrigation masking strategy (default: 'no_mask'). check_dir : str, optional If set, skip exports when CSV already exists at check_dir/.csv. state_col : str, optional Column with state abbreviation for mask source selection. buffer : float, optional Buffer distance in meters to apply to geometries. batch_size : int, optional Number of scenes to process per export batch (default: 15). Smaller batches reduce server-side memory usage. file_prefix : str, optional Bucket path prefix, typically project name (default: 'swim').

`export_geesebal_zonal_stats(shapefile: str, bucket: str, feature_id: str = 'FID', select: list[str] | None = None, start_yr: int = 2000, end_yr: int = 2024, mask_type: str = 'no_mask', check_dir: str | None = None, state_col: str = 'state', buffer: float | None = None, batch_size: int = 15, file_prefix: str = 'swim') -> None` ¶

Export per-scene geeSEBAL ET fraction zonal means for polygons to GCS CSVs.

Parameters¶

shapefile : str Path to polygon shapefile with feature IDs. bucket : str GCS bucket name (no scheme). feature_id : str, optional Field name for feature identifier. select : list, optional Optional list of feature IDs to process. start_yr : int, optional Inclusive start year (default: 2000). end_yr : int, optional Inclusive end year (default: 2024). mask_type : {'no_mask', 'irr', 'inv_irr'}, optional Irrigation masking strategy (default: 'no_mask'). check_dir : str, optional If set, skip exports when CSV already exists at check_dir/.csv. state_col : str, optional Column with state abbreviation for mask source selection. buffer : float, optional Buffer distance in meters to apply to geometries. batch_size : int, optional Number of scenes to process per export batch (default: 15). Smaller batches reduce server-side memory usage. file_prefix : str, optional Bucket path prefix, typically project name (default: 'swim').

GridMET / ERA5¶

`GridMet` ¶

Bases: Thredds

U of I Gridmet

Return as numpy array per met variable in daily stack unless modified.

['bi', 'elev', 'erc', 'fm100', fm1000', 'pdsi', 'pet', 'pr', 'rmax', 'rmin', 'sph', 'srad',

'th', 'tmmn', 'tmmx', 'vs']

----------
Observation elements to access. Currently available elements:
- 'bi' : burning index [-]
- 'elev' : elevation above sea level [m]
- 'erc' : energy release component [-]
- 'fm100' : 100-hour dead fuel moisture [%]
- 'fm1000' : 1000-hour dead fuel moisture [%]
- 'pdsi' : Palmer Drough Severity Index [-]
- 'pet' : daily reference potential evapotranspiration [mm]
- 'pr' : daily accumulated precipitation [mm]
- 'rmax' : daily maximum relative humidity [%]
- 'rmin' : daily minimum relative humidity [%]
- 'sph' : daily mean specific humidity [kg/kg]
- 'prcp' : daily total precipitation [mm]
- 'srad' : daily mean downward shortwave radiation at surface [W m-2]
- 'th' : daily mean wind direction clockwise from North [degrees]
- 'tmmn' : daily minimum air temperature [K]
- 'tmmx' : daily maximum air temperature [K]
- 'vs' : daily mean wind speed [m -s]

:param start: start of period of data, datetime.datetime object or string format 'YYY-MM-DD' :param end: end of period of data, datetime.datetime object or string format 'YYY-MM-DD' :param variables: List of available variables. At lease one. :param date: date of data, datetime.datetime object or string format 'YYY-MM-DD' :param bbox: bounds.GeoBounds object representing spatial bounds :return: numpy.ndarray

Must have either start and end, or date. Must have at least one valid variable. Invalid variables will be excluded gracefully.

NetCDF dates are in xl '1900' format, i.e., number of days since 1899-12-31 23:59

xlrd.xldate handles this for the time being

`date = date` `instance-attribute` ¶

`start = start` `instance-attribute` ¶

`end = end` `instance-attribute` ¶

`variable = variable` `instance-attribute` ¶

`bbox = bbox` `instance-attribute` ¶

`target_profile = target_profile` `instance-attribute` ¶

`clip_feature = clip_feature` `instance-attribute` ¶

`lat = lat` `instance-attribute` ¶

`lon = lon` `instance-attribute` ¶

`service = 'thredds.northwestknowledge.net:8080'` `instance-attribute` ¶

`scheme = 'http'` `instance-attribute` ¶

`temp_dir = mkdtemp()` `instance-attribute` ¶

`available = ['elev', 'pr', 'rmax', 'rmin', 'sph', 'srad', 'th', 'tmmn', 'tmmx', 'pet', 'vs', 'erc', 'bi', 'fm100', 'pdsi']` `instance-attribute` ¶

kwords = {'bi': 'daily_mean_burning_index_g', 'elev': '', 'erc': 'energy_release_component-g', 'fm100': 'dead_fuel_moisture_100hr', 'fm1000': 'dead_fuel_moisture_1000hr', 'pdsi': 'daily_mean_palmer_drought_severity_index', 'etr': 'daily_mean_reference_evapotranspiration_alfalfa', 'pet': 'daily_mean_reference_evapotranspiration_grass', 'pr': 'precipitation_amount', 'rmax': 'daily_maximum_relative_humidity', 'rmin': 'daily_minimum_relative_humidity', 'sph': 'daily_mean_specific_humidity', 'srad': 'daily_mean_shortwave_radiation_at_surface', 'th': 'daily_mean_wind_direction', 'tmmn': 'daily_minimum_temperature', 'tmmx': 'daily_maximum_temperature', 'vs': 'daily_mean_wind_speed', 'vpd': 'daily_mean_vapor_pressure_deficit'} `instance-attribute` ¶

`single_year = False` `instance-attribute` ¶

`init(variable: str | None = None, date=None, start=None, end=None, bbox=None, target_profile=None, clip_feature=None, lat: float | None = None, lon: float | None = None) -> None` ¶

`subset_daily_tif(out_filename: str | None = None) -> np.ndarray` ¶

`subset_nc(out_filename: str | None = None, return_array: bool = False)` ¶

`get_point_timeseries() -> DataFrame` ¶

Retrieve daily time series for a point location.

Downloads meteorological data for the point specified by lat/lon coordinates over the date range defined at initialization.

Returns:

Type	Description
`DataFrame`	DataFrame with datetime index and single column for the variable.

Example

gm = GridMet(variable='etr', lat=45.5, lon=-116.5, ... start='2020-01-01', end='2020-12-31') df = gm.get_point_timeseries() print(df.head())

Data Extraction (`swimrs.data_extraction`)¶

Earth Engine¶

Parameters¶

Parameters¶

Parameters¶

Parameters¶

GridMET / ERA5¶

`GridMet` ¶

`date = date` `instance-attribute` ¶

`start = start` `instance-attribute` ¶

`end = end` `instance-attribute` ¶

`variable = variable` `instance-attribute` ¶

`bbox = bbox` `instance-attribute` ¶

`target_profile = target_profile` `instance-attribute` ¶

`clip_feature = clip_feature` `instance-attribute` ¶

`lat = lat` `instance-attribute` ¶

`lon = lon` `instance-attribute` ¶

`service = 'thredds.northwestknowledge.net:8080'` `instance-attribute` ¶

`scheme = 'http'` `instance-attribute` ¶

`temp_dir = mkdtemp()` `instance-attribute` ¶

`available = ['elev', 'pr', 'rmax', 'rmin', 'sph', 'srad', 'th', 'tmmn', 'tmmx', 'pet', 'vs', 'erc', 'bi', 'fm100', 'pdsi']` `instance-attribute` ¶

`single_year = False` `instance-attribute` ¶

`init(variable: str | None = None, date=None, start=None, end=None, bbox=None, target_profile=None, clip_feature=None, lat: float | None = None, lon: float | None = None) -> None` ¶

`subset_daily_tif(out_filename: str | None = None) -> np.ndarray` ¶

`subset_nc(out_filename: str | None = None, return_array: bool = False)` ¶

`get_point_timeseries() -> DataFrame` ¶

`get_point_elevation() -> float` ¶

`_build_url() -> str` ¶

`write_netcdf(outputroot: str) -> None` ¶

Data Extraction (swimrs.data_extraction)¶

Earth Engine¶

Parameters¶

Parameters¶

Parameters¶

Parameters¶

GridMET / ERA5¶

GridMet ¶

date = date instance-attribute ¶

start = start instance-attribute ¶

end = end instance-attribute ¶

variable = variable instance-attribute ¶

bbox = bbox instance-attribute ¶

target_profile = target_profile instance-attribute ¶

clip_feature = clip_feature instance-attribute ¶

lat = lat instance-attribute ¶

lon = lon instance-attribute ¶

service = 'thredds.northwestknowledge.net:8080' instance-attribute ¶

scheme = 'http' instance-attribute ¶

temp_dir = mkdtemp() instance-attribute ¶

available = ['elev', 'pr', 'rmax', 'rmin', 'sph', 'srad', 'th', 'tmmn', 'tmmx', 'pet', 'vs', 'erc', 'bi', 'fm100', 'pdsi'] instance-attribute ¶

single_year = False instance-attribute ¶

__init__(variable: str | None = None, date=None, start=None, end=None, bbox=None, target_profile=None, clip_feature=None, lat: float | None = None, lon: float | None = None) -> None ¶

subset_daily_tif(out_filename: str | None = None) -> np.ndarray ¶

subset_nc(out_filename: str | None = None, return_array: bool = False) ¶

get_point_timeseries() -> DataFrame ¶

get_point_elevation() -> float ¶

_build_url() -> str ¶

write_netcdf(outputroot: str) -> None ¶

Data Extraction (`swimrs.data_extraction`)¶

`GridMet` ¶

`date = date` `instance-attribute` ¶

`start = start` `instance-attribute` ¶

`end = end` `instance-attribute` ¶

`variable = variable` `instance-attribute` ¶

`bbox = bbox` `instance-attribute` ¶

`target_profile = target_profile` `instance-attribute` ¶

`clip_feature = clip_feature` `instance-attribute` ¶

`lat = lat` `instance-attribute` ¶

`lon = lon` `instance-attribute` ¶

`service = 'thredds.northwestknowledge.net:8080'` `instance-attribute` ¶

`scheme = 'http'` `instance-attribute` ¶

`temp_dir = mkdtemp()` `instance-attribute` ¶

`available = ['elev', 'pr', 'rmax', 'rmin', 'sph', 'srad', 'th', 'tmmn', 'tmmx', 'pet', 'vs', 'erc', 'bi', 'fm100', 'pdsi']` `instance-attribute` ¶

`single_year = False` `instance-attribute` ¶

`init(variable: str | None = None, date=None, start=None, end=None, bbox=None, target_profile=None, clip_feature=None, lat: float | None = None, lon: float | None = None) -> None` ¶

`subset_daily_tif(out_filename: str | None = None) -> np.ndarray` ¶

`subset_nc(out_filename: str | None = None, return_array: bool = False)` ¶

`get_point_timeseries() -> DataFrame` ¶

`get_point_elevation() -> float` ¶

`_build_url() -> str` ¶

`write_netcdf(outputroot: str) -> None` ¶