Note
You can view & download the original notebook on Github.
Or, click here to run these notebooks on Coiled with access to Dask clusters.
Basic example¶
We’ll search for two months of Sentinel-2 data overlapping our area of interest—in this case, the Santa Fe ski area in New Mexico, USA (Google maps).
We use stackstac
to create an xarray of all the data. From there, it’s easy to filter out cloudy scenes from the array based on their metadata, then create a median composite for each month.
[1]:
import stackstac
import satsearch
[2]:
lon, lat = -105.78, 35.79
We use satsearch
to find the relevant STAC (Spatio-Temporal Asset Catalog) items. These basically provide metadata about the relevant scenes, and links to their data.
We’ll use element84’s search endpoint to look for items from the sentinel-s2-l2a-cogs
collection on AWS.
[3]:
%%time
items = satsearch.Search(
url="https://earth-search.aws.element84.com/v0",
intersects=dict(type="Point", coordinates=[lon, lat]),
collections=["sentinel-s2-l2a-cogs"],
datetime="2020-04-01/2020-05-01"
).items()
len(items)
CPU times: user 44.9 ms, sys: 6.67 ms, total: 51.5 ms
Wall time: 4.5 s
[3]:
13
Use stackstac
to turn those STAC items into a lazy xarray. Using all the defaults, our data will be in its native coordinate reference system, at the finest resolution of al the assets.
[4]:
%time stack = stackstac.stack(items)
CPU times: user 49.3 ms, sys: 2.18 ms, total: 51.4 ms
Wall time: 50.2 ms
[5]:
stack
[5]:
<xarray.DataArray 'stackstac-9918e037162926aaa5ec5d240d1a9174' (time: 13, band: 17, y: 10980, x: 10980)> dask.array<fetch_raster_window, shape=(13, 17, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray> Coordinates: (12/23) * time (time) datetime64[ns] 2020-04-01T18:04:04 ...... id (time) <U24 'S2B_13SDV_20200401_0_L2A' ... 'S... * band (band) <U8 'overview' 'visual' ... 'WVP' 'SCL' * x (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05 * y (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06 constellation <U10 'sentinel-2' ... ... sentinel:valid_cloud_cover bool True data_coverage (time) object 33.85 100 33.9 ... 32.84 100 34.29 sentinel:grid_square <U2 'DV' sentinel:product_id (time) <U60 'S2B_MSIL2A_20200401T174909_N0214... gsd int64 10 title (band) object None ... 'Scene Classification ... Attributes: spec: RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0... crs: epsg:32613 transform: | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0... resolution: 10.0
- time: 13
- band: 17
- y: 10980
- x: 10980
- dask.array<chunksize=(1, 1, 1024, 1024), meta=np.ndarray>
Array Chunk Bytes 213.15 GB 8.39 MB Shape (13, 17, 10980, 10980) (1, 1, 1024, 1024) Count 27304 Tasks 26741 Chunks Type float64 numpy.ndarray 13 1 10980 10980 17 - time(time)datetime64[ns]2020-04-01T18:04:04 ... 2020-05-...
array(['2020-04-01T18:04:04.000000000', '2020-04-03T17:54:07.000000000', '2020-04-06T18:04:04.000000000', '2020-04-08T17:54:08.000000000', '2020-04-11T18:04:03.000000000', '2020-04-13T17:54:10.000000000', '2020-04-16T18:04:06.000000000', '2020-04-18T17:54:06.000000000', '2020-04-21T18:04:01.000000000', '2020-04-23T17:54:12.000000000', '2020-04-26T18:04:09.000000000', '2020-04-28T17:54:06.000000000', '2020-05-01T18:04:03.000000000'], dtype='datetime64[ns]')
- id(time)<U24'S2B_13SDV_20200401_0_L2A' ... '...
array(['S2B_13SDV_20200401_0_L2A', 'S2A_13SDV_20200403_0_L2A', 'S2A_13SDV_20200406_0_L2A', 'S2B_13SDV_20200408_0_L2A', 'S2B_13SDV_20200411_0_L2A', 'S2A_13SDV_20200413_0_L2A', 'S2A_13SDV_20200416_0_L2A', 'S2B_13SDV_20200418_0_L2A', 'S2B_13SDV_20200421_0_L2A', 'S2A_13SDV_20200423_0_L2A', 'S2A_13SDV_20200426_0_L2A', 'S2B_13SDV_20200428_0_L2A', 'S2B_13SDV_20200501_0_L2A'], dtype='<U24')
- band(band)<U8'overview' 'visual' ... 'WVP' 'SCL'
array(['overview', 'visual', 'B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'AOT', 'WVP', 'SCL'], dtype='<U8')
- x(x)float644e+05 4e+05 ... 5.097e+05 5.098e+05
array([399965., 399975., 399985., ..., 509735., 509745., 509755.])
- y(y)float644e+06 4e+06 ... 3.89e+06 3.89e+06
array([4000025., 4000015., 4000005., ..., 3890255., 3890245., 3890235.])
- constellation()<U10'sentinel-2'
array('sentinel-2', dtype='<U10')
- eo:cloud_cover(time)float6429.24 1.16 27.26 ... 87.33 5.41
array([ 29.24, 1.16, 27.26, 1.15, 9.37, 100. , 4.53, 35. , 14.97, 31.78, 99.42, 87.33, 5.41])
- instruments()<U3'msi'
array('msi', dtype='<U3')
- sentinel:latitude_band()<U1'S'
array('S', dtype='<U1')
- view:off_nadir()int640
array(0)
- proj:epsg()int6432613
array(32613)
- platform(time)<U11'sentinel-2b' ... 'sentinel-2b'
array(['sentinel-2b', 'sentinel-2a', 'sentinel-2a', 'sentinel-2b', 'sentinel-2b', 'sentinel-2a', 'sentinel-2a', 'sentinel-2b', 'sentinel-2b', 'sentinel-2a', 'sentinel-2a', 'sentinel-2b', 'sentinel-2b'], dtype='<U11')
- updated(time)<U24'2020-09-05T06:23:47.836Z' ... '...
array(['2020-09-05T06:23:47.836Z', '2020-09-23T15:43:10.502Z', '2020-09-05T11:18:48.722Z', '2020-08-31T15:23:56.855Z', '2020-09-23T16:16:43.693Z', '2020-09-24T06:31:45.567Z', '2020-09-24T05:31:29.634Z', '2020-09-19T07:13:25.290Z', '2020-09-05T11:20:20.745Z', '2020-08-23T09:50:31.532Z', '2020-09-05T13:56:08.881Z', '2020-08-31T13:13:14.956Z', '2020-09-24T07:38:40.841Z'], dtype='<U24')
- sentinel:utm_zone()int6413
array(13)
- created(time)<U24'2020-09-05T06:23:47.836Z' ... '...
array(['2020-09-05T06:23:47.836Z', '2020-09-23T15:43:10.502Z', '2020-09-05T11:18:48.722Z', '2020-08-31T15:23:56.855Z', '2020-09-23T16:16:43.693Z', '2020-09-24T06:31:45.567Z', '2020-09-24T05:31:29.634Z', '2020-09-19T07:13:25.290Z', '2020-09-05T11:20:20.745Z', '2020-08-23T09:50:31.532Z', '2020-09-05T13:56:08.881Z', '2020-08-31T13:13:14.956Z', '2020-09-24T07:38:40.841Z'], dtype='<U24')
- sentinel:data_coverage(time)float6433.85 100.0 33.9 ... 100.0 34.29
array([ 33.85, 100. , 33.9 , 100. , 33.87, 100. , 33.37, 100. , 34.09, 100. , 32.84, 100. , 34.29])
- sentinel:sequence()<U1'0'
array('0', dtype='<U1')
- sentinel:valid_cloud_cover()boolTrue
array(True)
- data_coverage(time)object33.85 100 33.9 ... 32.84 100 34.29
array([33.85, 100, 33.9, 100, 33.87, 100, 33.37, 100, 34.09, None, 32.84, 100, 34.29], dtype=object)
- sentinel:grid_square()<U2'DV'
array('DV', dtype='<U2')
- sentinel:product_id(time)<U60'S2B_MSIL2A_20200401T174909_N021...
array(['S2B_MSIL2A_20200401T174909_N0214_R141_T13SDV_20200401T220155', 'S2A_MSIL2A_20200403T173901_N0214_R098_T13SDV_20200403T220105', 'S2A_MSIL2A_20200406T174901_N0214_R141_T13SDV_20200406T221027', 'S2B_MSIL2A_20200408T173859_N0214_R098_T13SDV_20200408T215856', 'S2B_MSIL2A_20200411T174909_N0214_R141_T13SDV_20200411T220443', 'S2A_MSIL2A_20200413T173901_N0214_R098_T13SDV_20200413T235616', 'S2A_MSIL2A_20200416T174911_N0214_R141_T13SDV_20200416T221101', 'S2B_MSIL2A_20200418T173859_N0214_R098_T13SDV_20200418T214953', 'S2B_MSIL2A_20200421T174859_N0214_R141_T13SDV_20200421T214716', 'S2A_MSIL2A_20200423T173911_N0214_R098_T13SDV_20200423T234141', 'S2A_MSIL2A_20200426T174911_N0214_R141_T13SDV_20200426T221124', 'S2B_MSIL2A_20200428T173859_N0214_R098_T13SDV_20200428T214309', 'S2B_MSIL2A_20200501T174909_N0214_R141_T13SDV_20200501T220131'], dtype='<U60')
- gsd()int6410
array(10)
- title(band)objectNone ... 'Scene Classification M...
array([None, 'True color image', 'Band 1 (coastal)', 'Band 2 (blue)', 'Band 3 (green)', 'Band 4 (red)', 'Band 5', 'Band 6', 'Band 7', 'Band 8 (nir)', 'Band 8A', 'Band 9', 'Band 11 (swir16)', 'Band 12 (swir22)', 'Aerosol Optical Thickness (AOT)', 'Water Vapour (WVP)', 'Scene Classification Map (SCL)'], dtype=object)
- spec :
- RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0, 4000020.0), resolutions_xy=(10.0, 10.0))
- crs :
- epsg:32613
- transform :
- | 10.00, 0.00, 399960.00| | 0.00,-10.00, 4000020.00| | 0.00, 0.00, 1.00|
- resolution :
- 10.0
Well, that’s really all there is to it. Now you have an xarray DataArray, and you can do whatever you like to it!
Here, we’ll filter out scenes with >20% cloud coverage (according to the eo:cloud_cover
field set by the data provider). Then, pick the bands corresponding to red, green, and blue, and use xarray’s resample
to create 1-month median composites.
[6]:
lowcloud = stack[stack["eo:cloud_cover"] < 20]
rgb = lowcloud.sel(band=["B04", "B03", "B02"])
monthly = rgb.resample(time="MS").median("time", keep_attrs=True)
[7]:
monthly
[7]:
<xarray.DataArray 'stackstac-9918e037162926aaa5ec5d240d1a9174' (time: 2, band: 3, y: 10980, x: 10980)> dask.array<stack, shape=(2, 3, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray> Coordinates: (12/15) * time (time) datetime64[ns] 2020-04-01 2020-05-01 * band (band) <U8 'B04' 'B03' 'B02' * x (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05 * y (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06 constellation <U10 'sentinel-2' instruments <U3 'msi' ... ... sentinel:utm_zone int64 13 sentinel:sequence <U1 '0' sentinel:valid_cloud_cover bool True sentinel:grid_square <U2 'DV' gsd int64 10 title (band) object 'Band 4 (red)' ... 'Band 2 (blue)' Attributes: spec: RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0... crs: epsg:32613 transform: | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0... resolution: 10.0
- time: 2
- band: 3
- y: 10980
- x: 10980
- dask.array<chunksize=(1, 1, 1024, 1024), meta=np.ndarray>
Array Chunk Bytes 5.79 GB 8.39 MB Shape (2, 3, 10980, 10980) (1, 1, 1024, 1024) Count 46323 Tasks 726 Chunks Type float64 numpy.ndarray 2 1 10980 10980 3 - time(time)datetime64[ns]2020-04-01 2020-05-01
array(['2020-04-01T00:00:00.000000000', '2020-05-01T00:00:00.000000000'], dtype='datetime64[ns]')
- band(band)<U8'B04' 'B03' 'B02'
array(['B04', 'B03', 'B02'], dtype='<U8')
- x(x)float644e+05 4e+05 ... 5.097e+05 5.098e+05
array([399965., 399975., 399985., ..., 509735., 509745., 509755.])
- y(y)float644e+06 4e+06 ... 3.89e+06 3.89e+06
array([4000025., 4000015., 4000005., ..., 3890255., 3890245., 3890235.])
- constellation()<U10'sentinel-2'
array('sentinel-2', dtype='<U10')
- instruments()<U3'msi'
array('msi', dtype='<U3')
- sentinel:latitude_band()<U1'S'
array('S', dtype='<U1')
- view:off_nadir()int640
array(0)
- proj:epsg()int6432613
array(32613)
- sentinel:utm_zone()int6413
array(13)
- sentinel:sequence()<U1'0'
array('0', dtype='<U1')
- sentinel:valid_cloud_cover()boolTrue
array(True)
- sentinel:grid_square()<U2'DV'
array('DV', dtype='<U2')
- gsd()int6410
array(10)
- title(band)object'Band 4 (red)' ... 'Band 2 (blue)'
array(['Band 4 (red)', 'Band 3 (green)', 'Band 2 (blue)'], dtype=object)
- spec :
- RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0, 4000020.0), resolutions_xy=(10.0, 10.0))
- crs :
- epsg:32613
- transform :
- | 10.00, 0.00, 399960.00| | 0.00,-10.00, 4000020.00| | 0.00, 0.00, 1.00|
- resolution :
- 10.0
So we don’t pull all ~200 GB of data down to our local machine, let’s slice out a little region around our area of interest.
We convert our lat-lon point to the data’s UTM coordinate reference system, then use that to slice the x
and y
dimensions, which are indexed by their UTM coordinates.
[8]:
import pyproj
x_utm, y_utm = pyproj.Proj(monthly.crs)(lon, lat)
buffer = 2000 # meters
[9]:
aoi = monthly.loc[..., y_utm+buffer:y_utm-buffer, x_utm-buffer:x_utm+buffer]
aoi
[9]:
<xarray.DataArray 'stackstac-9918e037162926aaa5ec5d240d1a9174' (time: 2, band: 3, y: 400, x: 400)> dask.array<getitem, shape=(2, 3, 400, 400), dtype=float64, chunksize=(1, 1, 387, 317), chunktype=numpy.ndarray> Coordinates: (12/15) * time (time) datetime64[ns] 2020-04-01 2020-05-01 * band (band) <U8 'B04' 'B03' 'B02' * x (x) float64 4.275e+05 4.275e+05 ... 4.315e+05 * y (y) float64 3.963e+06 3.963e+06 ... 3.959e+06 constellation <U10 'sentinel-2' instruments <U3 'msi' ... ... sentinel:utm_zone int64 13 sentinel:sequence <U1 '0' sentinel:valid_cloud_cover bool True sentinel:grid_square <U2 'DV' gsd int64 10 title (band) object 'Band 4 (red)' ... 'Band 2 (blue)' Attributes: spec: RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0... crs: epsg:32613 transform: | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0... resolution: 10.0
- time: 2
- band: 3
- y: 400
- x: 400
- dask.array<chunksize=(1, 1, 387, 317), meta=np.ndarray>
Array Chunk Bytes 7.68 MB 981.43 kB Shape (2, 3, 400, 400) (1, 1, 387, 317) Count 46347 Tasks 24 Chunks Type float64 numpy.ndarray 2 1 400 400 3 - time(time)datetime64[ns]2020-04-01 2020-05-01
array(['2020-04-01T00:00:00.000000000', '2020-05-01T00:00:00.000000000'], dtype='datetime64[ns]')
- band(band)<U8'B04' 'B03' 'B02'
array(['B04', 'B03', 'B02'], dtype='<U8')
- x(x)float644.275e+05 4.275e+05 ... 4.315e+05
array([427515., 427525., 427535., ..., 431485., 431495., 431505.])
- y(y)float643.963e+06 3.963e+06 ... 3.959e+06
array([3962935., 3962925., 3962915., ..., 3958965., 3958955., 3958945.])
- constellation()<U10'sentinel-2'
array('sentinel-2', dtype='<U10')
- instruments()<U3'msi'
array('msi', dtype='<U3')
- sentinel:latitude_band()<U1'S'
array('S', dtype='<U1')
- view:off_nadir()int640
array(0)
- proj:epsg()int6432613
array(32613)
- sentinel:utm_zone()int6413
array(13)
- sentinel:sequence()<U1'0'
array('0', dtype='<U1')
- sentinel:valid_cloud_cover()boolTrue
array(True)
- sentinel:grid_square()<U2'DV'
array('DV', dtype='<U2')
- gsd()int6410
array(10)
- title(band)object'Band 4 (red)' ... 'Band 2 (blue)'
array(['Band 4 (red)', 'Band 3 (green)', 'Band 2 (blue)'], dtype=object)
- spec :
- RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0, 4000020.0), resolutions_xy=(10.0, 10.0))
- crs :
- epsg:32613
- transform :
- | 10.00, 0.00, 399960.00| | 0.00,-10.00, 4000020.00| | 0.00, 0.00, 1.00|
- resolution :
- 10.0
[10]:
import dask.diagnostics
with dask.diagnostics.ProgressBar():
data = aoi.compute()
[########################################] | 100% Completed | 42.4s
[11]:
data.plot.imshow(row="time", rgb="band", robust=True, size=6)
[11]:
<xarray.plot.facetgrid.FacetGrid at 0x1272ddbb0>
