Basic example

We’ll search for two months of Sentinel-2 data overlapping our area of interest—in this case, the Santa Fe ski area in New Mexico, USA (Google maps).

We use stackstac to create an xarray of all the data. From there, it’s easy to filter out cloudy scenes from the array based on their metadata, then create a median composite for each month.

[1]:
import stackstac
import satsearch
[2]:
lon, lat = -105.78, 35.79

We use satsearch to find the relevant STAC (Spatio-Temporal Asset Catalog) items. These basically provide metadata about the relevant scenes, and links to their data.

We’ll use element84’s search endpoint to look for items from the sentinel-s2-l2a-cogs collection on AWS.

[3]:
%%time
items = satsearch.Search(
    url="https://earth-search.aws.element84.com/v0",
    intersects=dict(type="Point", coordinates=[lon, lat]),
    collections=["sentinel-s2-l2a-cogs"],
    datetime="2020-04-01/2020-05-01"
).items()
len(items)
CPU times: user 43.7 ms, sys: 5.74 ms, total: 49.4 ms
Wall time: 648 ms
[3]:
13

Use stackstac to turn those STAC items into a lazy xarray. Using all the defaults, our data will be in its native coordinate reference system, at the finest resolution of al the assets.

[4]:
%time stack = stackstac.stack(items)
CPU times: user 13.1 ms, sys: 1.54 ms, total: 14.7 ms
Wall time: 13.7 ms
[5]:
stack
[5]:
<xarray.DataArray 'stackstac-9918e037162926aaa5ec5d240d1a9174' (time: 13, band: 17, y: 10980, x: 10980)>
dask.array<fetch_raster_window, shape=(13, 17, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/23)
  * time                        (time) datetime64[ns] 2020-04-01T18:04:04 ......
    id                          (time) <U24 'S2B_13SDV_20200401_0_L2A' ... 'S...
  * band                        (band) <U8 'overview' 'visual' ... 'WVP' 'SCL'
  * x                           (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05
  * y                           (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06
    platform                    (time) <U11 'sentinel-2b' ... 'sentinel-2b'
    ...                          ...
    view:off_nadir              int64 0
    sentinel:data_coverage      (time) float64 33.85 100.0 33.9 ... 100.0 34.29
    gsd                         int64 10
    sentinel:product_id         (time) <U60 'S2B_MSIL2A_20200401T174909_N0214...
    sentinel:sequence           <U1 '0'
    title                       (band) object None ... 'Scene Classification ...
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0

Well, that’s really all there is to it. Now you have an xarray DataArray, and you can do whatever you like to it!

Here, we’ll filter out scenes with >20% cloud coverage (according to the eo:cloud_cover field set by the data provider). Then, pick the bands corresponding to red, green, and blue, and use xarray’s resample to create 1-month median composites.

[6]:
lowcloud = stack[stack["eo:cloud_cover"] < 20]
rgb = lowcloud.sel(band=["B04", "B03", "B02"])
monthly = rgb.resample(time="MS").median("time", keep_attrs=True)
[7]:
monthly
[7]:
<xarray.DataArray 'stackstac-9918e037162926aaa5ec5d240d1a9174' (time: 2, band: 3, y: 10980, x: 10980)>
dask.array<stack, shape=(2, 3, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/15)
  * time                        (time) datetime64[ns] 2020-04-01 2020-05-01
  * band                        (band) <U8 'B04' 'B03' 'B02'
  * x                           (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05
  * y                           (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06
    proj:epsg                   int64 32613
    sentinel:latitude_band      <U1 'S'
    ...                          ...
    sentinel:grid_square        <U2 'DV'
    sentinel:valid_cloud_cover  bool True
    view:off_nadir              int64 0
    gsd                         int64 10
    sentinel:sequence           <U1 '0'
    title                       (band) object 'Band 4 (red)' ... 'Band 2 (blue)'
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0

So we don’t pull all ~200 GB of data down to our local machine, let’s slice out a little region around our area of interest.

We convert our lat-lon point to the data’s UTM coordinate reference system, then use that to slice the x and y dimensions, which are indexed by their UTM coordinates.

[8]:
import pyproj
x_utm, y_utm = pyproj.Proj(monthly.crs)(lon, lat)
buffer = 2000  # meters
[9]:
aoi = monthly.loc[..., y_utm+buffer:y_utm-buffer, x_utm-buffer:x_utm+buffer]
aoi
[9]:
<xarray.DataArray 'stackstac-9918e037162926aaa5ec5d240d1a9174' (time: 2, band: 3, y: 400, x: 400)>
dask.array<getitem, shape=(2, 3, 400, 400), dtype=float64, chunksize=(1, 1, 387, 317), chunktype=numpy.ndarray>
Coordinates: (12/15)
  * time                        (time) datetime64[ns] 2020-04-01 2020-05-01
  * band                        (band) <U8 'B04' 'B03' 'B02'
  * x                           (x) float64 4.275e+05 4.275e+05 ... 4.315e+05
  * y                           (y) float64 3.963e+06 3.963e+06 ... 3.959e+06
    proj:epsg                   int64 32613
    sentinel:latitude_band      <U1 'S'
    ...                          ...
    sentinel:grid_square        <U2 'DV'
    sentinel:valid_cloud_cover  bool True
    view:off_nadir              int64 0
    gsd                         int64 10
    sentinel:sequence           <U1 '0'
    title                       (band) object 'Band 4 (red)' ... 'Band 2 (blue)'
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0
[10]:
import dask.diagnostics
with dask.diagnostics.ProgressBar():
    data = aoi.compute()
[###########################             ] | 69% Completed | 10.1s/Users/gabe/Library/Caches/pypoetry/virtualenvs/stackstac-FdRcOknL-py3.8/lib/python3.8/site-packages/numpy/lib/nanfunctions.py:1113: RuntimeWarning: All-NaN slice encountered
  r, k = function_base._ureduce(a, func=_nanmedian, axis=axis, out=out,
[########################################] | 100% Completed | 13.3s
[11]:
data.plot.imshow(row="time", rgb="band", robust=True, size=6)
[11]:
<xarray.plot.facetgrid.FacetGrid at 0x12a848130>
_images/basic_15_1.svg