Note

You can run this notebook interactively: Binder, or view & download the original on GitHub.

Basic example#

We’ll search for two months of Sentinel-2 data overlapping our area of interest—in this case, the Santa Fe ski area in New Mexico, USA (Google maps).

We use stackstac to create an xarray of all the data. From there, it’s easy to filter out cloudy scenes from the array based on their metadata, then create a median composite for each month.

[1]:
import stackstac
[2]:
lon, lat = -105.78, 35.79

We use pystac-client to find the relevant STAC (Spatio-Temporal Asset Catalog) items. These basically provide metadata about the relevant scenes, and links to their data.

We’ll use element84’s search endpoint to look for items from the sentinel-s2-l2a-cogs collection on AWS.

[3]:
import pystac_client
URL = "https://earth-search.aws.element84.com/v0"
catalog = pystac_client.Client.open(URL)
[4]:
%%time
items = catalog.search(
    intersects=dict(type="Point", coordinates=[lon, lat]),
    collections=["sentinel-s2-l2a-cogs"],
    datetime="2020-04-01/2020-05-01"
).get_all_items()
len(items)
CPU times: user 12.6 ms, sys: 3.74 ms, total: 16.4 ms
Wall time: 462 ms
[4]:
13

Use stackstac to turn those STAC items into a lazy xarray. Using all the defaults, our data will be in its native coordinate reference system, at the finest resolution of all the assets.

[5]:
%time stack = stackstac.stack(items)
CPU times: user 16.8 ms, sys: 1.9 ms, total: 18.7 ms
Wall time: 17.8 ms
[6]:
stack
[6]:
<xarray.DataArray 'stackstac-0f471b3bb92fa5774b7eddff84c2ca4d' (time: 13,
                                                                band: 17,
                                                                y: 10980,
                                                                x: 10980)>
dask.array<fetch_raster_window, shape=(13, 17, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/24)
  * time                        (time) datetime64[ns] 2020-04-01T18:04:04 ......
    id                          (time) <U24 'S2B_13SDV_20200401_0_L2A' ... 'S...
  * band                        (band) <U8 'overview' 'B11' ... 'visual' 'SCL'
  * x                           (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05
  * y                           (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06
    gsd                         int64 10
    ...                          ...
    updated                     (time) <U24 '2020-09-05T06:23:47.836Z' ... '2...
    sentinel:utm_zone           int64 13
    proj:epsg                   int64 32613
    sentinel:valid_cloud_cover  bool True
    title                       (band) <U31 'True color image' ... 'Scene Cla...
    epsg                        int64 32613
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0

Well, that’s really all there is to it. Now you have an xarray DataArray, and you can do whatever you like to it!

Here, we’ll filter out scenes with >20% cloud coverage (according to the eo:cloud_cover field set by the data provider). Then, pick the bands corresponding to red, green, and blue, and use xarray’s resample to create 1-month median composites.

[7]:
lowcloud = stack[stack["eo:cloud_cover"] < 20]
rgb = lowcloud.sel(band=["B04", "B03", "B02"])
monthly = rgb.resample(time="MS").median("time", keep_attrs=True)
[8]:
monthly
[8]:
<xarray.DataArray 'stackstac-0f471b3bb92fa5774b7eddff84c2ca4d' (time: 2,
                                                                band: 3,
                                                                y: 10980,
                                                                x: 10980)>
dask.array<stack, shape=(2, 3, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/16)
  * time                        (time) datetime64[ns] 2020-04-01 2020-05-01
  * band                        (band) <U8 'B04' 'B03' 'B02'
  * x                           (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05
  * y                           (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06
    gsd                         int64 10
    instruments                 <U3 'msi'
    ...                          ...
    view:off_nadir              int64 0
    sentinel:utm_zone           int64 13
    proj:epsg                   int64 32613
    sentinel:valid_cloud_cover  bool True
    title                       (band) <U31 'Band 4 (red)' ... 'Band 2 (blue)'
    epsg                        int64 32613
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0

So we don’t pull all ~200 GB of data down to our local machine, let’s slice out a little region around our area of interest.

We convert our lat-lon point to the data’s UTM coordinate reference system, then use that to slice the x and y dimensions, which are indexed by their UTM coordinates.

[9]:
import pyproj
x_utm, y_utm = pyproj.Proj(monthly.crs)(lon, lat)
buffer = 2000  # meters
[10]:
aoi = monthly.loc[..., y_utm+buffer:y_utm-buffer, x_utm-buffer:x_utm+buffer]
aoi
[10]:
<xarray.DataArray 'stackstac-0f471b3bb92fa5774b7eddff84c2ca4d' (time: 2,
                                                                band: 3,
                                                                y: 400, x: 400)>
dask.array<getitem, shape=(2, 3, 400, 400), dtype=float64, chunksize=(1, 1, 387, 316), chunktype=numpy.ndarray>
Coordinates: (12/16)
  * time                        (time) datetime64[ns] 2020-04-01 2020-05-01
  * band                        (band) <U8 'B04' 'B03' 'B02'
  * x                           (x) float64 4.275e+05 4.275e+05 ... 4.315e+05
  * y                           (y) float64 3.963e+06 3.963e+06 ... 3.959e+06
    gsd                         int64 10
    instruments                 <U3 'msi'
    ...                          ...
    view:off_nadir              int64 0
    sentinel:utm_zone           int64 13
    proj:epsg                   int64 32613
    sentinel:valid_cloud_cover  bool True
    title                       (band) <U31 'Band 4 (red)' ... 'Band 2 (blue)'
    epsg                        int64 32613
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0
[11]:
import dask.diagnostics
with dask.diagnostics.ProgressBar():
    data = aoi.compute()
[########################################] | 100% Completed | 7.13 s
[12]:
data.plot.imshow(row="time", rgb="band", robust=True, size=6);
_images/basic_16_0.png