Note
You can run this notebook interactively: , or view & download the original on GitHub.
Stacking 6 years of imagery into a GIF#
We’ll load all the Landsat-8 (Collection 2, Level 2) data that’s available from Microsoft’s Planetary Computer over a small region on the coast of Cape Cod, Massachusetts, USA.
Using nothing but standard xarray syntax, we’ll mask cloudy pixels with the Landsat QA band and reduce the data down to biannual median composites.
Animated as a GIF, we can watch the coastline move over the years due to longshore drift.
Planetary Computer is Microsoft’s open Earth data initiative. It’s particularly nice to use, since they also maintain a STAC API for searching all the data, as well as a browseable data catalog. It’s free for anyone to use, though you have to sign your requests with the planetary_computer
package to
prevent abuse. If you sign up, you’ll get faster reads.
[1]:
import coiled
import distributed
import dask
import pystac_client
import planetary_computer as pc
import ipyleaflet
import IPython.display as dsp
import geogif
import stackstac
Using a cluster will make this much faster. Particularly if you’re not in Europe, which is where this data is stored.
You can sign up for a Coiled account and run clusters for free at https://cloud.coiled.io/ — no credit card or username required, just sign in with your GitHub or Google account.
[2]:
cluster = coiled.Cluster(
name="stackstac-eu",
software="gjoseph92/stackstac",
backend_options={"region": "eu-central-1"},
# ^ Coiled doesn't yet support Azure's West Europe region, so instead we'll run on a nearby AWS data center in Frankfurt
n_workers=20,
protocol="wss", # remove this line when not running on Binder
)
client = distributed.Client(cluster)
client
[2]:
Client
Client-3526dae4-fd9e-11ec-b3cd-acde48001122
Connection method: Cluster object | Cluster type: coiled.ClusterBeta |
Dashboard: http://3.71.109.191:8787 |
Cluster Info
ClusterBeta
stackstac-eu
Dashboard: http://3.71.109.191:8787 | Workers: 14 |
Total threads: 28 | Total memory: 52.85 GiB |
Scheduler Info
Scheduler
Scheduler-3d0a1dc1-7b08-4928-a510-4a64c7935be2
Comm: tls://10.13.9.5:8786 | Workers: 14 |
Dashboard: http://10.13.9.5:8787/status | Total threads: 28 |
Started: Just now | Total memory: 52.85 GiB |
Workers
Worker: stackstac-eu-worker-201e769700
Comm: tls://10.13.11.126:39537 | Total threads: 2 |
Dashboard: http://10.13.11.126:40201/status | Memory: 3.78 GiB |
Nanny: tls://10.13.11.126:41091 | |
Local directory: /scratch/dask-worker-space/worker-khp9ga_k |
Worker: stackstac-eu-worker-248c8dff5d
Comm: tls://10.13.10.113:36905 | Total threads: 2 |
Dashboard: http://10.13.10.113:36293/status | Memory: 3.78 GiB |
Nanny: tls://10.13.10.113:44109 | |
Local directory: /scratch/dask-worker-space/worker-94_uw15e |
Worker: stackstac-eu-worker-2a2fa4a228
Comm: tls://10.13.12.224:46087 | Total threads: 2 |
Dashboard: http://10.13.12.224:42999/status | Memory: 3.78 GiB |
Nanny: tls://10.13.12.224:32895 | |
Local directory: /scratch/dask-worker-space/worker-oetet8i6 |
Worker: stackstac-eu-worker-3f42f46d7d
Comm: tls://10.13.1.131:37181 | Total threads: 2 |
Dashboard: http://10.13.1.131:43675/status | Memory: 3.78 GiB |
Nanny: tls://10.13.1.131:32941 | |
Local directory: /scratch/dask-worker-space/worker-69vt3kbp |
Worker: stackstac-eu-worker-69fe05699b
Comm: tls://10.13.0.217:45731 | Total threads: 2 |
Dashboard: http://10.13.0.217:43531/status | Memory: 3.78 GiB |
Nanny: tls://10.13.0.217:37131 | |
Local directory: /scratch/dask-worker-space/worker-1sfshjt5 |
Worker: stackstac-eu-worker-6d58bfeb68
Comm: tls://10.13.6.88:37881 | Total threads: 2 |
Dashboard: http://10.13.6.88:43033/status | Memory: 3.78 GiB |
Nanny: tls://10.13.6.88:42519 | |
Local directory: /scratch/dask-worker-space/worker-_05i24fx |
Worker: stackstac-eu-worker-7219adbad8
Comm: tls://10.13.6.155:41093 | Total threads: 2 |
Dashboard: http://10.13.6.155:44145/status | Memory: 3.78 GiB |
Nanny: tls://10.13.6.155:40903 | |
Local directory: /scratch/dask-worker-space/worker-nyn2bx5y |
Worker: stackstac-eu-worker-8e85e4c382
Comm: tls://10.13.9.55:46041 | Total threads: 2 |
Dashboard: http://10.13.9.55:38937/status | Memory: 3.78 GiB |
Nanny: tls://10.13.9.55:37685 | |
Local directory: /scratch/dask-worker-space/worker-citt4n04 |
Worker: stackstac-eu-worker-94e6a6f986
Comm: tls://10.13.9.87:45865 | Total threads: 2 |
Dashboard: http://10.13.9.87:44885/status | Memory: 3.78 GiB |
Nanny: tls://10.13.9.87:40225 | |
Local directory: /scratch/dask-worker-space/worker-ra8d7x6a |
Worker: stackstac-eu-worker-992ac83584
Comm: tls://10.13.15.61:38647 | Total threads: 2 |
Dashboard: http://10.13.15.61:44421/status | Memory: 3.78 GiB |
Nanny: tls://10.13.15.61:37285 | |
Local directory: /scratch/dask-worker-space/worker-crqmz1h5 |
Worker: stackstac-eu-worker-99dd20edb1
Comm: tls://10.13.1.89:45699 | Total threads: 2 |
Dashboard: http://10.13.1.89:38215/status | Memory: 3.78 GiB |
Nanny: tls://10.13.1.89:37465 | |
Local directory: /scratch/dask-worker-space/worker-o0bz5iop |
Worker: stackstac-eu-worker-9c3964ac9a
Comm: tls://10.13.13.222:35139 | Total threads: 2 |
Dashboard: http://10.13.13.222:41119/status | Memory: 3.78 GiB |
Nanny: tls://10.13.13.222:42447 | |
Local directory: /scratch/dask-worker-space/worker-aiita6fc |
Worker: stackstac-eu-worker-a7654d5f4e
Comm: tls://10.13.4.186:45027 | Total threads: 2 |
Dashboard: http://10.13.4.186:44589/status | Memory: 3.78 GiB |
Nanny: tls://10.13.4.186:36503 | |
Local directory: /scratch/dask-worker-space/worker-f4sempk3 |
Worker: stackstac-eu-worker-d54ea75f08
Comm: tls://10.13.15.176:46855 | Total threads: 2 |
Dashboard: http://10.13.15.176:45823/status | Memory: 3.78 GiB |
Nanny: tls://10.13.15.176:44999 | |
Local directory: /scratch/dask-worker-space/worker-2tm5j5_j |
Interactively pick the area of interest from a map. Just move the map around and re-run all cells to generate the timeseries somewhere else!
[3]:
m = ipyleaflet.Map(scroll_wheel_zoom=True)
m.center = 41.64933994767867, -69.94438630063088
m.zoom = 12
m.layout.height = "800px"
m
[4]:
bbox = (m.west, m.south, m.east, m.north)
Search for STAC items#
Use pystac-client to connect to Microsoft’s STAC API endpoint and search for Landsat-8 scenes.
[5]:
catalog = pystac_client.Client.open('https://planetarycomputer.microsoft.com/api/stac/v1')
search = catalog.search(
collections=['landsat-8-c2-l2'],
bbox=bbox,
)
Load and sign all the STAC items with a token from Planetary Computer. Without this, loading the data will fail.
[6]:
%%time
items = pc.sign(search)
len(items)
CPU times: user 595 ms, sys: 38.1 ms, total: 633 ms
Wall time: 4.64 s
[6]:
399
These are the footprints of all the items we’ll use:
[7]:
dsp.GeoJSON(items.to_dict())
<IPython.display.GeoJSON object>
Create an xarray with stacksatc#
Set bounds_latlon=bbox
to automatically clip to our area of interest (instead of using the full footprints of the scenes).
[8]:
%%time
stack = stackstac.stack(items, bounds_latlon=bbox)
stack
CPU times: user 305 ms, sys: 8.05 ms, total: 313 ms
Wall time: 312 ms
[8]:
<xarray.DataArray 'stackstac-539bd2f8e840894b8d42e26c2e913c32' (time: 399, band: 19, y: 774, x: 1233)> dask.array<fetch_raster_window, shape=(399, 19, 774, 1233), dtype=float64, chunksize=(1, 1, 774, 1024), chunktype=numpy.ndarray> Coordinates: (12/27) * time (time) datetime64[ns] 2013-03-22T15:19:00.54... id (time) <U31 'LC08_L2SP_011031_20130322_02_T1... * band (band) <U13 'SR_B1' 'SR_B2' ... 'SR_QA_AEROSOL' * x (x) float64 4.028e+05 4.029e+05 ... 4.398e+05 * y (y) float64 4.623e+06 4.623e+06 ... 4.6e+06 eo:cloud_cover (time) float64 92.94 0.77 11.81 ... 4.92 94.53 ... ... gsd (band) float64 30.0 30.0 30.0 ... 30.0 30.0 title (band) <U46 'Coastal/Aerosol Band (B1)' ... ... common_name (band) object 'coastal' 'blue' ... None None center_wavelength (band) object 0.44 0.48 0.56 ... None None None full_width_half_max (band) object 0.02 0.06 0.06 ... None None None epsg int64 32619 Attributes: spec: RasterSpec(epsg=32619, bounds=(402840.0, 4599690.0, 439830.0... crs: epsg:32619 transform: | 30.00, 0.00, 402840.00|\n| 0.00,-30.00, 4622910.00|\n| 0.0... resolution: 30.0
And that’s it for stackstac! Everything from here on is just standard xarray operations.
[9]:
# use common_name for bands
stack = stack.assign_coords(band=stack.common_name.fillna(stack.band).rename("band"))
stack.band
[9]:
<xarray.DataArray 'band' (band: 19)> array(['coastal', 'blue', 'green', 'red', 'nir08', 'swir16', 'swir22', 'ST_QA', 'lwir11', 'ST_DRAD', 'ST_EMIS', 'ST_EMSD', 'ST_TRAD', 'ST_URAD', 'QA_PIXEL', 'ST_ATRAN', 'ST_CDIST', 'QA_RADSAT', 'SR_QA_AEROSOL'], dtype=object) Coordinates: (12/16) * band (band) object 'coastal' ... 'SR_QA_AEROSOL' description (band) <U91 'Collection 2 Level-2 Coastal/Aero... landsat:wrs_type <U1 '2' platform <U9 'landsat-8' landsat:collection_number <U2 '02' landsat:wrs_row <U3 '031' ... ... gsd (band) float64 30.0 30.0 30.0 ... 30.0 30.0 30.0 title (band) <U46 'Coastal/Aerosol Band (B1)' ... 'A... common_name (band) object 'coastal' 'blue' ... None None center_wavelength (band) object 0.44 0.48 0.56 ... None None None full_width_half_max (band) object 0.02 0.06 0.06 ... None None None epsg int64 32619
See how much input data there is for just RGB. This is the amount of data we’ll end up processing
[10]:
stack.sel(band=["red", "green", "blue"])
[10]:
<xarray.DataArray 'stackstac-539bd2f8e840894b8d42e26c2e913c32' (time: 399, band: 3, y: 774, x: 1233)> dask.array<getitem, shape=(399, 3, 774, 1233), dtype=float64, chunksize=(1, 1, 774, 1024), chunktype=numpy.ndarray> Coordinates: (12/27) * time (time) datetime64[ns] 2013-03-22T15:19:00.54... id (time) <U31 'LC08_L2SP_011031_20130322_02_T1... * band (band) object 'red' 'green' 'blue' * x (x) float64 4.028e+05 4.029e+05 ... 4.398e+05 * y (y) float64 4.623e+06 4.623e+06 ... 4.6e+06 eo:cloud_cover (time) float64 92.94 0.77 11.81 ... 4.92 94.53 ... ... gsd (band) float64 30.0 30.0 30.0 title (band) <U46 'Red Band (B4)' ... 'Blue Band (... common_name (band) object 'red' 'green' 'blue' center_wavelength (band) object 0.65 0.56 0.48 full_width_half_max (band) object 0.04 0.06 0.06 epsg int64 32619 Attributes: spec: RasterSpec(epsg=32619, bounds=(402840.0, 4599690.0, 439830.0... crs: epsg:32619 transform: | 30.00, 0.00, 402840.00|\n| 0.00,-30.00, 4622910.00|\n| 0.0... resolution: 30.0
Mask cloudy pixels using the QA band#
Use the bit values of the Landsat-8 QA band to mask out bad pixels. We’ll mask pixels labeled as dilated cloud, cirrus, cloud, or cloud shadow. (By “mask”, we mean just replacing those pixels with NaNs).
See page 14 on this PDF for the data table describing which bit means what.
[11]:
# Make a bitmask---when we bitwise-and it with the data, it leaves just the 4 bits we care about
mask_bitfields = [1, 2, 3, 4] # dilated cloud, cirrus, cloud, cloud shadow
bitmask = 0
for field in mask_bitfields:
bitmask |= 1 << field
bin(bitmask)
[11]:
'0b11110'
[12]:
qa = stack.sel(band="QA_PIXEL").astype("uint16")
bad = qa & bitmask # just look at those 4 bits
good = stack.where(bad == 0) # mask pixels where any one of those bits are set
[13]:
# What's the typical interval between scenes?
good.time.diff("time").dt.days.plot.hist();
Make biannual median composites#
The Landsat-8 scenes appear to typically be 5-15 days apart. Let’s composite that down to a 6-month interval.
Since the cloudy pixels we masked with NaNs will be ignored in the median
, this should give us a decent cloud-free-ish image for each.
[14]:
# Make biannual median composites (`2Q` means 2 quarters)
composites = good.resample(time="2Q").median("time")
composites
[14]:
<xarray.DataArray 'stackstac-539bd2f8e840894b8d42e26c2e913c32' (time: 19, band: 19, y: 774, x: 1233)> dask.array<stack, shape=(19, 19, 774, 1233), dtype=float64, chunksize=(1, 1, 774, 512), chunktype=numpy.ndarray> Coordinates: (12/13) * time (time) datetime64[ns] 2013-03-31 ... 2022-03-31 * band (band) object 'coastal' ... 'SR_QA_AEROSOL' * x (x) float64 4.028e+05 4.029e+05 ... 4.398e+05 * y (y) float64 4.623e+06 4.623e+06 ... 4.6e+06 landsat:wrs_type <U1 '2' platform <U9 'landsat-8' ... ... landsat:wrs_row <U3 '031' landsat:processing_level <U4 'L2SP' proj:epsg int64 32619 view:off_nadir int64 0 instruments object {'oli', 'tirs'} epsg int64 32619
Pick the red-green-blue bands to make a true-color image.
[15]:
rgb = composites.sel(band=["red", "green", "blue"])
rgb
[15]:
<xarray.DataArray 'stackstac-539bd2f8e840894b8d42e26c2e913c32' (time: 19, band: 3, y: 774, x: 1233)> dask.array<getitem, shape=(19, 3, 774, 1233), dtype=float64, chunksize=(1, 1, 774, 512), chunktype=numpy.ndarray> Coordinates: (12/13) * time (time) datetime64[ns] 2013-03-31 ... 2022-03-31 * band (band) object 'red' 'green' 'blue' * x (x) float64 4.028e+05 4.029e+05 ... 4.398e+05 * y (y) float64 4.623e+06 4.623e+06 ... 4.6e+06 landsat:wrs_type <U1 '2' platform <U9 'landsat-8' ... ... landsat:wrs_row <U3 '031' landsat:processing_level <U4 'L2SP' proj:epsg int64 32619 view:off_nadir int64 0 instruments object {'oli', 'tirs'} epsg int64 32619
Some final cleanup to make a nicer-looking animation:
Forward-fill any NaN pixels from the previous frame, to make the animation look less jumpy.
Also skip the first frame, since its NaNs can’t be filled from anywhere.
[16]:
cleaned = rgb.ffill("time")[1:]
Render the GIF#
Use GeoGIF to turn the stack into an animation. We’ll use dgif to render the GIF on the cluster, so there’s less data to send back. (GIFs are a lot smaller than NumPy arrays!)
[17]:
client.wait_for_workers(20)
[18]:
%%time
gif_img = geogif.dgif(cleaned).compute()
CPU times: user 2.98 s, sys: 287 ms, total: 3.27 s
Wall time: 55.3 s
[19]:
# we turned ~7GiB of data into a 4MB GIF!
dask.utils.format_bytes(len(gif_img.data))
[19]:
'4.73 MiB'
[20]:
gif_img
[20]: