Note

You can view & download the original notebook on GitHub.

Basic example#

We’ll search for two months of Sentinel-2 data overlapping our area of interest—in this case, the Santa Fe ski area in New Mexico, USA (Google maps).

We use stackstac to create an xarray of all the data. From there, it’s easy to filter out cloudy scenes from the array based on their metadata, then create a median composite for each month.

[1]:
import stackstac
[2]:
lon, lat = -105.78, 35.79

We use pystac-client to find the relevant STAC (Spatio-Temporal Asset Catalog) items. These basically provide metadata about the relevant scenes, and links to their data.

We’ll use element84’s search endpoint to look for items from the sentinel-2-l2a collection on AWS.

[3]:
import pystac_client
URL = "https://earth-search.aws.element84.com/v1"
catalog = pystac_client.Client.open(URL)
[4]:
%%time
items = catalog.search(
    intersects=dict(type="Point", coordinates=[lon, lat]),
    collections=["sentinel-2-l2a"],
    datetime="2020-03-01/2020-06-01"
).item_collection()
len(items)
CPU times: user 107 ms, sys: 17 ms, total: 124 ms
Wall time: 1.27 s
[4]:
42

Use stackstac to turn those STAC items into a lazy xarray. Using all the defaults, our data will be in its native coordinate reference system, at the finest resolution of all the assets.

[5]:
%time stack = stackstac.stack(items)
CPU times: user 115 ms, sys: 3.43 ms, total: 119 ms
Wall time: 117 ms
[6]:
stack
[6]:
<xarray.DataArray 'stackstac-3a21164f19f81366a5242c365d798d31' (time: 42,
                                                                band: 32,
                                                                y: 10980,
                                                                x: 10980)>
dask.array<fetch_raster_window, shape=(42, 32, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/52)
  * time                                     (time) datetime64[ns] 2020-03-02...
    id                                       (time) <U24 'S2B_13SDV_20200302_...
  * band                                     (band) <U12 'aot' ... 'wvp-jp2'
  * x                                        (x) float64 4e+05 ... 5.098e+05
  * y                                        (y) float64 4e+06 ... 3.89e+06
    s2:product_type                          <U7 'S2MSI2A'
    ...                                       ...
    raster:bands                             (band) object [{'nodata': 0, 'da...
    gsd                                      (band) object None 10 ... None None
    common_name                              (band) object None 'blue' ... None
    center_wavelength                        (band) object None 0.49 ... None
    full_width_half_max                      (band) object None 0.098 ... None
    epsg                                     int64 32613
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0

Well, that’s really all there is to it. Now you have an xarray DataArray, and you can do whatever you like to it!

Here, we’ll filter out scenes with >20% cloud coverage (according to the eo:cloud_cover field set by the data provider). Then, pick the bands corresponding to red, green, and blue, and use xarray’s resample to create 1-month median composites.

[7]:
lowcloud = stack[stack["eo:cloud_cover"] < 20]
rgb = lowcloud.sel(band=["red", "green", "blue"])
monthly = rgb.resample(time="MS").median("time", keep_attrs=True)
[8]:
monthly
[8]:
<xarray.DataArray 'stackstac-3a21164f19f81366a5242c365d798d31' (time: 3,
                                                                band: 3,
                                                                y: 10980,
                                                                x: 10980)>
dask.array<stack, shape=(3, 3, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/21)
  * band                                     (band) <U12 'red' 'green' 'blue'
  * x                                        (x) float64 4e+05 ... 5.098e+05
  * y                                        (y) float64 4e+06 ... 3.89e+06
    s2:product_type                          <U7 'S2MSI2A'
    constellation                            <U10 'sentinel-2'
    mgrs:grid_square                         <U2 'DV'
    ...                                       ...
    gsd                                      (band) object 10 10 10
    common_name                              (band) object 'red' 'green' 'blue'
    center_wavelength                        (band) object 0.665 0.56 0.49
    full_width_half_max                      (band) object 0.038 0.045 0.098
    epsg                                     int64 32613
  * time                                     (time) datetime64[ns] 2020-03-01...
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0

So we don’t pull all ~200 GB of data down to our local machine, let’s slice out a little region around our area of interest.

We convert our lat-lon point to the data’s UTM coordinate reference system, then use that to slice the x and y dimensions, which are indexed by their UTM coordinates.

[9]:
import pyproj
x_utm, y_utm = pyproj.Proj(monthly.crs)(lon, lat)
buffer = 2000  # meters
[10]:
aoi = monthly.loc[..., y_utm+buffer:y_utm-buffer, x_utm-buffer:x_utm+buffer]
aoi
[10]:
<xarray.DataArray 'stackstac-3a21164f19f81366a5242c365d798d31' (time: 3,
                                                                band: 3,
                                                                y: 400, x: 400)>
dask.array<getitem, shape=(3, 3, 400, 400), dtype=float64, chunksize=(1, 1, 387, 316), chunktype=numpy.ndarray>
Coordinates: (12/21)
  * band                                     (band) <U12 'red' 'green' 'blue'
  * x                                        (x) float64 4.275e+05 ... 4.315e+05
  * y                                        (y) float64 3.963e+06 ... 3.959e+06
    s2:product_type                          <U7 'S2MSI2A'
    constellation                            <U10 'sentinel-2'
    mgrs:grid_square                         <U2 'DV'
    ...                                       ...
    gsd                                      (band) object 10 10 10
    common_name                              (band) object 'red' 'green' 'blue'
    center_wavelength                        (band) object 0.665 0.56 0.49
    full_width_half_max                      (band) object 0.038 0.045 0.098
    epsg                                     int64 32613
  * time                                     (time) datetime64[ns] 2020-03-01...
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0
[11]:
import dask.diagnostics
with dask.diagnostics.ProgressBar():
    data = aoi.compute()
[########################################] | 100% Completed | 12.97 s
[13]:
data.plot.imshow(row="time", rgb="band", robust=True, size=6);
_images/basic_17_0.png