stackstac.stack#
- stackstac.stack(items, assets=frozenset({'image/jp2', 'image/tiff', 'image/vnd.stac.geotiff', 'image/x.geotiff'}), epsg=None, resolution=None, bounds=None, bounds_latlon=None, snap_bounds=True, resampling=Resampling.nearest, chunksize=1024, dtype=dtype('float64'), fill_value=nan, rescale=True, sortby_date='asc', xy_coords='topleft', properties=True, band_coords=True, gdal_env=None, errors_as_nodata=(RasterioIOError('HTTP response code: 404'), ), reader=<class 'stackstac.rio_reader.AutoParallelRioReader'>)#
Create an
xarray.DataArray
of all the STAC items, reprojected to the same grid and stacked by time.The DataArray’s dimensions will be
("time", "band", "y", "x")
. It’s backed by a lazyDask array
, so you can manipulate it without touching any data.We’ll try to choose the output coordinate reference system, resolution, and bounds based on the metadata in the STAC items. However, if not all items have the necessary metadata, or aren’t in the same coordinate reference system, you’ll have specify these yourself—
epsg
andresolution
are the two parameters you’ll set most often.Examples
>>> import stackstac >>> import satsearch >>> items = satsearch.Search(...).items()
>>> # Use default CRS, resolution, bounding box, etc. >>> xr_stack = stackstac.stack(items) >>> >>> # Reproject to 100-meter resolution in web mercator >>> xr_stack = stackstac.stack(items, epsg=3857, resolution=100) >>> >>> # Only use specific asset IDs >>> xr_stack = stackstac.stack(items, assets=["B01", "B03", "B02"]) >>> >>> # Clip to a custom bounding box >>> xr_stack = stackstac.stack(items, bounds_latlon=[-106.2, 35.6, -105.6, 36]) >>> >>> # Turn off all metadata if you don't need it >>> xr_stack = stackstac.stack( ... items, properties=False, bands_coords=False, xy_coords=False, sortby_date=False ... ) >>> >>> # Custom dtype and fill_value >>> xr_stack = stackstac.stack(items, rescale=False, fill_value=0, dtype="uint16")
Note
Don’t be scared of all the parameters!
Though there are lots of options, you can leave nearly all of them as their defaults.
- Parameters:
items (
Union
[SatstacItemCollection
,PystacCatalog
,PystacItemCollection
,Sequence
[ItemDict
],SatstacItem
,PystacItem
,ItemDict
,Sequence
[PystacItem
],Sequence
[SatstacItem
]]) – The STAC items to stack. Can be a plain Python list of dicts following the STAC JSON specification, or objects from the satstac or pystac libraries.assets (
Union
[List
[str
],AbstractSet
[str
],None
]) –Which asset IDs to use. Any Items missing a particular Asset will return an array of
fill_value
for that Asset. By default, returns all assets with a GeoTIFF or JPEG2000type
.If None, all assets are used.
If a list of strings, those asset IDs are used.
If a set, only assets compatible with those mimetypes are used (according to the
type
field on each asset). Note that if you giveassets={"image/tiff"}
, and the assetB1
hastype="image/tiff"
on some items buttype="image/png"
on others, thenB1
will not be included. Mimetypes structure is respected, soimage/tiff
will also matchimage/tiff; application=geotiff
;image
will matchimage/tiff
andimage/jp2
, etc. See the STAC common media types for ideas. Assets which don’t havetype
specified on any item will be dropped in this case.Note: each asset’s data must contain exactly one band. Multi-band assets (like an RGB GeoTIFF) are not yet supported.
epsg (
Optional
[int
]) – Reproject into this coordinate reference system, as given by an EPSG code. If None (default), uses whatever CRS is set on all the items. In this case, all Items/Assets must have theproj:epsg
field, and it must be the same value for all of them.resolution (
Union
[int
,float
,Tuple
[Union
[int
,float
],Union
[int
,float
]],None
]) –Output resolution. Careful: this must be given in the output CRS’s units! For example, with
epsg=4326
(meaning lat-lon), the units are degrees of latitude/longitude, not meters. Givingresolution=20
in that case would mean each pixel is 20ºx20º (probably not what you wanted). You can also give pair of(x_resolution, y_resolution)
.If None (default), we try to calculate each Asset’s resolution based on whatever metadata is available, and pick the minimum of all the resolutions—meaning by default, all data will be upscaled to match the “finest” or “highest-resolution” Asset.
To estimate resolution, these combinations of fields must be set on each Asset or Item (in order of preference):
The
proj:transform
andproj:epsg
fieldsThe
proj:shape
and one ofproj:bbox
orbbox
fields
bounds (
Optional
[Tuple
[Union
[int
,float
],Union
[int
,float
],Union
[int
,float
],Union
[int
,float
]]]) –Output spatial bounding-box, as a tuple of
(min_x, min_y, max_x, max_y)
. This defines the(west, south, east, north)
rectangle the output array will cover. Values must be in the same coordinate reference system asepsg
.If None (default), the bounding box of all the input items is automatically used. (This only requires the
bbox
field to be set on each Item, which is a required field in the STAC specification, so only in rare cases will auto-calculating the bounds fail.) So in most cases, you can leavebounds
as None. You’d only need to set it when you want to use a custom bounding box.When
bounds
is given, any assets that don’t overlap those bounds are dropped.bounds_latlon (
Optional
[Tuple
[Union
[int
,float
],Union
[int
,float
],Union
[int
,float
],Union
[int
,float
]]]) – Same asbounds
, but given in degrees (latitude and longitude) for convenience. Only one ofbounds
andbounds_latlon
can be used.snap_bounds (
bool
) –Whether to snap the bounds to whole-number intervals of
resolution
to prevent fraction-of-a-pixel offsets. Default: True.This is equivalent to the
-tap
or target-align pixels argument in GDAL.resampling (
Resampling
) – The rasterio resampling method to use when reprojecting or rescaling data to a different CRS or resolution. Default:rasterio.enums.Resampling.nearest
.chunksize (
Union
[int
,Literal
[‘auto’],str
,None
,Tuple
[Union
[int
,Literal
[‘auto’],str
,None
],...
],Dict
[int
,Union
[int
,Literal
[‘auto’],str
,None
]]]) –The chunksize to use for the Dask array. Default: 1024. Picking a good chunksize will have significant effects on performance!
Can be given in any format Dask understands, such as
1024
,(1024, 2048)
,(10, "auto", 512, 512)
,15 MB
, etc.If only 1 or 2 sizes are given, like
2048
or(512, 1024)
, this is used to chunk just the spatial dimensions (last two). The time and band dimensions will have a chunksize of 1, meaning that every STAC Asset will be its own chunk. (This is the default.)If you’ll be filtering items somewhat randomly (like
stack[stack["eo:cloud_cover"] < 20]
), you want the chunksize to be like(1, X, X, X)
. Otherwise, if you had a chunksize like(3, 1, X, X)
, Dask would always load three items per chunk, even if two of them would be immediately thrown away because they didn’t match the cloud-cover filter.However, when your graph starts getting too large for Dask to handle, using a larger chunksize for the time or band dimensions can help a lot. For example,
chunksize=(10, 1, 512, 512)
would process in 512x512 pixel tiles, loading 10 assets at a time per tile.chunksize=(-1, 1, 512, 512)
would load every item within the 512x512 tile into the chunk. If you’ll be immediately compositing the data (like.median("time")
), doing this is often a good idea because you’ll be flattening the assets together anyway.dtype (
dtype
) – The NumPy data type of the output array. Default:float64
. Must be a data type that’s compatible withfill_value
.fill_value (
Union
[int
,float
]) –Value to fill nodata/masked pixels with. Default:
np.nan
.Using NaN is generally the best approach, since many functions already know how to handle/propagate NaNs, or have NaN-equivalents (
dask.array.nanmean
vsdask.array.mean
, for example). However, NaN requires a floating-pointdtype
. If you know the data can be represented in a smaller data type (likeuint16
), using a differentfill_value
(like 0) and managing it yourself could save a lot of memory.rescale (
bool
) – Whether to rescale pixel values by the scale and offset present in theraster:bands
metadata for each asset. Default: True. Note that this could produce floating-point data when the original values are ints, so setdtype
accordingly. RaisesValueError
if thedtype
specified can’t hold the data after rescaling: for example, if loading data withdtype=int, rescale=True
where the scaling factor is 1.5, the rescaled data would be floating-point, and couldn’t be stored as an integer.sortby_date (
Literal
[‘asc’, ‘desc’, False]) – Whether to pre-sort the items by date (from theproperties["datetime"]
field). One of"asc"
,"desc"
, or False to disable sorting. Default:"asc"
. Note that if you setsortby_date=False
, selecting date ranges from the DataArray (likeda.loc["2020-01":"2020-04"]
) may behave strangely, because xarray/pandas needs indexes to be sorted.xy_coords (
Literal
[‘center’, ‘topleft’, False]) –Whether to add geospatial coordinate labels for the
x
andy
dimensions of the DataArray, allowing for spatial indexing. The coordinates will be in the coordinate reference system given byepsg
If
"topleft"
(default), the coordinates are for each pixel’s upper-left corner, following raster conventions.If
"center"
, the coordinates are for each pixel’s centroid, following xarray conventions.If False,
x
andy
will just be indexed by row/column numbers, saving a small amount of time and local memory.properties (
Union
[bool
,str
,Sequence
[str
]]) –Which fields from each STAC Item’s
properties
to add as coordinates to the DataArray, indexing the “time” dimension.If None (default), all properties will be used. If a string or sequence of strings, only those fields will be used. For each Item missing a particular field, its value for that Item will be None.
If False, no properties will be added.
band_coords (
bool
) –Whether to include Asset-level metadata as coordinates for the
bands
dimension.If True (default), for each asset ID, the field(s) that have the same value across all Items will be added as coordinates.
The
eo:bands
field is also unpacked if present, andsar:polarizations
is renamed topolarization
for convenience.gdal_env (
Optional
[LayeredEnv
]) – Advanced use: aLayeredEnv
of GDAL configuration options to use while opening and reading datasets. If None (default),DEFAULT_GDAL_ENV
is used. Seerio_reader.py
for notes on why these default options were chosen.errors_as_nodata (
Tuple
[Exception
,...
]) –Exception patterns to ignore when opening datasets or reading data. Exceptions matching the pattern will be logged as warnings, and just produce nodata (
fill_value
).The exception patterns should be instances of an Exception type to catch, where
str(exception_pattern)
is a regex pattern to match againststr(raised_exception)
. For example,RasterioIOError("HTTP response code: 404")
(the default). OrIOError(r"HTTP response code: 4\d\d")
, to catch any 4xx HTTP error. OrException(".*")
to catch absolutely anything (that one’s probably a bad idea).reader (
Type
[Reader
]) – Advanced use: theReader
type to use. Currently there is only one real reader type:AutoParallelRioReader
. However, there’s alsoFakeReader
(which doesn’t read data at all, just returns random numbers), which can be helpful for isolating whether performace issues are due to IO and GDAL, or inherent to dask.
- Returns:
xarray DataArray, backed by a Dask array. No IO will happen until calling
.compute()
, or accessing.values
. The dimensions will be("time", "band", "y", "x")
.time
will be equal in length to the number of items you pass in, and indexed by STAC Item datetime. Note that this means multiple entries could have the same index. Note also datetime strings are cast to ‘UTC’ but passed to xarray without timezone information (dtype=’datetime64[ns]’).band
will be equal in length to the number of asset IDs used (see theassets
parameter for more).The size of
y
andx
will be determined byresolution
andbounds
, which in many cases are automatically computed from the items you pass in.- Return type: