Skip to content

Data Extraction (swimrs.data_extraction)

Earth Engine exports and meteorology ingest helpers.

Earth Engine

export_ptjpl_zonal_stats(shapefile: str, bucket: str, feature_id: str = 'FID', select: list[str] | None = None, start_yr: int = 2000, end_yr: int = 2024, mask_type: str = 'no_mask', check_dir: str | None = None, state_col: str = 'state', buffer: float | None = None, batch_size: int = 60, file_prefix: str = 'swim') -> None

Export per-scene PT-JPL ET fraction zonal means for polygons to GCS CSVs.

Parameters

shapefile : str Path to polygon shapefile with feature IDs. bucket : str GCS bucket name (no scheme). feature_id : str, optional Field name for feature identifier. select : list, optional Optional list of feature IDs to process. start_yr : int, optional Inclusive start year (default: 2000). end_yr : int, optional Inclusive end year (default: 2024). mask_type : {'no_mask', 'irr', 'inv_irr'}, optional Irrigation masking strategy (default: 'no_mask'). check_dir : str, optional If set, skip exports when CSV already exists at check_dir/.csv. state_col : str, optional Column with state abbreviation for mask source selection. buffer : float, optional Buffer distance in meters to apply to geometries. batch_size : int, optional Number of scenes to process per export batch (default: 60). PT-JPL runs fast so larger batches are efficient. file_prefix : str, optional Bucket path prefix, typically project name (default: 'swim').

export_ssebop_zonal_stats(shapefile: str, bucket: str, feature_id: str = 'FID', select: list[str] | None = None, start_yr: int = 2000, end_yr: int = 2024, mask_type: str = 'no_mask', check_dir: str | None = None, state_col: str = 'state', buffer: float | None = None, batch_size: int = 15, file_prefix: str = 'swim') -> None

Export per-scene SSEBop ET fraction zonal means for polygons to GCS CSVs.

Parameters

shapefile : str Path to polygon shapefile with feature IDs. bucket : str GCS bucket name (no scheme). feature_id : str, optional Field name for feature identifier. select : list, optional Optional list of feature IDs to process. start_yr : int, optional Inclusive start year (default: 2000). end_yr : int, optional Inclusive end year (default: 2024). mask_type : {'no_mask', 'irr', 'inv_irr'}, optional Irrigation masking strategy (default: 'no_mask'). check_dir : str, optional If set, skip exports when CSV already exists at check_dir/.csv. state_col : str, optional Column with state abbreviation for mask source selection. buffer : float, optional Buffer distance in meters to apply to geometries. batch_size : int, optional Number of scenes to process per export batch (default: 15). Smaller batches reduce server-side memory usage. file_prefix : str, optional Bucket path prefix, typically project name (default: 'swim').

export_sims_zonal_stats(shapefile: str, bucket: str, feature_id: str = 'FID', select: list[str] | None = None, start_yr: int = 2000, end_yr: int = 2024, mask_type: str = 'no_mask', check_dir: str | None = None, state_col: str = 'state', buffer: float | None = None, batch_size: int = 15, file_prefix: str = 'swim') -> None

Export per-scene SIMS ET fraction zonal means for polygons to GCS CSVs.

Parameters

shapefile : str Path to polygon shapefile with feature IDs. bucket : str GCS bucket name (no scheme). feature_id : str, optional Field name for feature identifier. select : list, optional Optional list of feature IDs to process. start_yr : int, optional Inclusive start year (default: 2000). end_yr : int, optional Inclusive end year (default: 2024). mask_type : {'no_mask', 'irr', 'inv_irr'}, optional Irrigation masking strategy (default: 'no_mask'). check_dir : str, optional If set, skip exports when CSV already exists at check_dir/.csv. state_col : str, optional Column with state abbreviation for mask source selection. buffer : float, optional Buffer distance in meters to apply to geometries. batch_size : int, optional Number of scenes to process per export batch (default: 15). Smaller batches reduce server-side memory usage. file_prefix : str, optional Bucket path prefix, typically project name (default: 'swim').

export_geesebal_zonal_stats(shapefile: str, bucket: str, feature_id: str = 'FID', select: list[str] | None = None, start_yr: int = 2000, end_yr: int = 2024, mask_type: str = 'no_mask', check_dir: str | None = None, state_col: str = 'state', buffer: float | None = None, batch_size: int = 15, file_prefix: str = 'swim') -> None

Export per-scene geeSEBAL ET fraction zonal means for polygons to GCS CSVs.

Parameters

shapefile : str Path to polygon shapefile with feature IDs. bucket : str GCS bucket name (no scheme). feature_id : str, optional Field name for feature identifier. select : list, optional Optional list of feature IDs to process. start_yr : int, optional Inclusive start year (default: 2000). end_yr : int, optional Inclusive end year (default: 2024). mask_type : {'no_mask', 'irr', 'inv_irr'}, optional Irrigation masking strategy (default: 'no_mask'). check_dir : str, optional If set, skip exports when CSV already exists at check_dir/.csv. state_col : str, optional Column with state abbreviation for mask source selection. buffer : float, optional Buffer distance in meters to apply to geometries. batch_size : int, optional Number of scenes to process per export batch (default: 15). Smaller batches reduce server-side memory usage. file_prefix : str, optional Bucket path prefix, typically project name (default: 'swim').

GridMET / ERA5

GridMet

Bases: Thredds

U of I Gridmet

Return as numpy array per met variable in daily stack unless modified.

['bi', 'elev', 'erc', 'fm100', fm1000', 'pdsi', 'pet', 'pr', 'rmax', 'rmin', 'sph', 'srad',

'th', 'tmmn', 'tmmx', 'vs']

----------
Observation elements to access. Currently available elements:
- 'bi' : burning index [-]
- 'elev' : elevation above sea level [m]
- 'erc' : energy release component [-]
- 'fm100' : 100-hour dead fuel moisture [%]
- 'fm1000' : 1000-hour dead fuel moisture [%]
- 'pdsi' : Palmer Drough Severity Index [-]
- 'pet' : daily reference potential evapotranspiration [mm]
- 'pr' : daily accumulated precipitation [mm]
- 'rmax' : daily maximum relative humidity [%]
- 'rmin' : daily minimum relative humidity [%]
- 'sph' : daily mean specific humidity [kg/kg]
- 'prcp' : daily total precipitation [mm]
- 'srad' : daily mean downward shortwave radiation at surface [W m-2]
- 'th' : daily mean wind direction clockwise from North [degrees]
- 'tmmn' : daily minimum air temperature [K]
- 'tmmx' : daily maximum air temperature [K]
- 'vs' : daily mean wind speed [m -s]

:param start: start of period of data, datetime.datetime object or string format 'YYY-MM-DD' :param end: end of period of data, datetime.datetime object or string format 'YYY-MM-DD' :param variables: List of available variables. At lease one. :param date: date of data, datetime.datetime object or string format 'YYY-MM-DD' :param bbox: bounds.GeoBounds object representing spatial bounds :return: numpy.ndarray

Must have either start and end, or date. Must have at least one valid variable. Invalid variables will be excluded gracefully.

NetCDF dates are in xl '1900' format, i.e., number of days since 1899-12-31 23:59

xlrd.xldate handles this for the time being

date = date instance-attribute

start = start instance-attribute

end = end instance-attribute

variable = variable instance-attribute

bbox = bbox instance-attribute

target_profile = target_profile instance-attribute

clip_feature = clip_feature instance-attribute

lat = lat instance-attribute

lon = lon instance-attribute

service = 'thredds.northwestknowledge.net:8080' instance-attribute

scheme = 'http' instance-attribute

temp_dir = mkdtemp() instance-attribute

available = ['elev', 'pr', 'rmax', 'rmin', 'sph', 'srad', 'th', 'tmmn', 'tmmx', 'pet', 'vs', 'erc', 'bi', 'fm100', 'pdsi'] instance-attribute

kwords = {'bi': 'daily_mean_burning_index_g', 'elev': '', 'erc': 'energy_release_component-g', 'fm100': 'dead_fuel_moisture_100hr', 'fm1000': 'dead_fuel_moisture_1000hr', 'pdsi': 'daily_mean_palmer_drought_severity_index', 'etr': 'daily_mean_reference_evapotranspiration_alfalfa', 'pet': 'daily_mean_reference_evapotranspiration_grass', 'pr': 'precipitation_amount', 'rmax': 'daily_maximum_relative_humidity', 'rmin': 'daily_minimum_relative_humidity', 'sph': 'daily_mean_specific_humidity', 'srad': 'daily_mean_shortwave_radiation_at_surface', 'th': 'daily_mean_wind_direction', 'tmmn': 'daily_minimum_temperature', 'tmmx': 'daily_maximum_temperature', 'vs': 'daily_mean_wind_speed', 'vpd': 'daily_mean_vapor_pressure_deficit'} instance-attribute

single_year = False instance-attribute

__init__(variable: str | None = None, date=None, start=None, end=None, bbox=None, target_profile=None, clip_feature=None, lat: float | None = None, lon: float | None = None) -> None

subset_daily_tif(out_filename: str | None = None) -> np.ndarray

subset_nc(out_filename: str | None = None, return_array: bool = False)

get_point_timeseries() -> DataFrame

Retrieve daily time series for a point location.

Downloads meteorological data for the point specified by lat/lon coordinates over the date range defined at initialization.

Returns:

Type Description
DataFrame

DataFrame with datetime index and single column for the variable.

Example

gm = GridMet(variable='etr', lat=45.5, lon=-116.5, ... start='2020-01-01', end='2020-12-31') df = gm.get_point_timeseries() print(df.head())

get_point_elevation() -> float

_build_url() -> str

write_netcdf(outputroot: str) -> None