Skip to content

Commit d2b0597

Browse files
authored
Merge pull request #96 from fish-pace/copilot/update-docs-home-sidebar
Improve docs: dynamic Examples sidebar on home page + styled DataFrame tables
2 parents 0f9e920 + 64c2f91 commit d2b0597

15 files changed

Lines changed: 7318 additions & 295 deletions
Lines changed: 222 additions & 37 deletions
Large diffs are not rendered by default.
Lines changed: 161 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# L2 matchups with PACE data -- RRS
1+
# Level 2 matchups with PACE data
22

33
* Create a plan for files to use `pc.plan()`
44
* Print the plan to check it `print(plan.summary())`
5-
* Do the plan and get matchups `pc.matchup(plan, geometry="swath")`
5+
* Do the plan and get matchups `pc.matchup(plan, open_method="datatree-merge", spatial_method="xoak")`
66

77
## Prerequisite -- Login to EarthData
88

@@ -18,7 +18,7 @@ earthaccess.login()
1818

1919

2020

21-
<earthaccess.auth.Auth at 0x7f66b86c5880>
21+
<earthaccess.auth.Auth at 0x7fc30a694d70>
2222

2323

2424

@@ -45,14 +45,13 @@ print(short_names)
4545

4646

4747
```python
48-
from pathlib import Path
49-
import earthaccess
50-
import point_collocation as pc
5148
import pandas as pd
52-
53-
HERE = Path.cwd()
54-
POINTS_CSV = HERE / "fixtures" / "points.csv"
55-
df_points = pd.read_csv(POINTS_CSV) # lat, lon, date columns
49+
url = (
50+
"https://raw.githubusercontent.com/"
51+
"fish-pace/point-collocation/main/"
52+
"examples/fixtures/points.csv"
53+
)
54+
df_points = pd.read_csv(url)
5655
print(len(df_points))
5756
df_points.head()
5857
```
@@ -123,15 +122,15 @@ df_points.head()
123122

124123

125124

126-
## Get a plan for matchups from PACE data
125+
## Get a plan for matchups for 1st 50 points from PACE data
127126

128127

129128
```python
130129
%%time
131-
# bounding_box = (lon_min, lat_min, lon_max, lat_max)
130+
# time 11 s / 5 s
132131
import point_collocation as pc
133132
plan = pc.plan(
134-
df_points[0:100], # -82.7375, 27.3835
133+
df_points[0:50],
135134
data_source="earthaccess",
136135
source_kwargs={
137136
"short_name": "PACE_OCI_L2_AOP",
@@ -140,18 +139,18 @@ plan = pc.plan(
140139
)
141140
```
142141

143-
CPU times: user 336 ms, sys: 19.5 ms, total: 356 ms
144-
Wall time: 2.03 s
142+
CPU times: user 640 ms, sys: 61 ms, total: 701 ms
143+
Wall time: 9.02 s
145144

146145

147146

148147
```python
149148
plan.summary()
150149
```
151150

152-
Plan: 100 points → 24 unique granule(s)
151+
Plan: 50 points → 13 unique granule(s)
153152
Points with 0 matches : 0
154-
Points with >1 matches: 20
153+
Points with >1 matches: 10
155154
Time buffer: 0 days 12:00:00
156155

157156
First 5 point(s):
@@ -170,56 +169,73 @@ plan.summary()
170169

171170

172171
```python
173-
plan.show_variables(geometry="swath")
172+
%%time
173+
# This uses open_method="auto". It will try xr.open_dataset
174+
# discover no lat/lon and then try xr.open_datatree + merge.
175+
# If you know, the netcdfs are grouped, you can pass in
176+
# open_method="datatree-merge" yourself
177+
plan.show_variables()
174178
```
175179

176-
geometry : 'swath'
177-
open_method : 'datatree-merge'
178-
Dimensions : {'number_of_bands': 286, 'number_of_reflective_bands': 286, 'wavelength_3d': 172, 'number_of_lines': 1710, 'pixels_per_line': 1272}
179-
Variables : ['wavelength', 'vcal_gain', 'vcal_offset', 'F0', 'aw', 'bbw', 'k_oz', 'k_no2', 'Tau_r', 'year', 'day', 'msec', 'time', 'detnum', 'mside', 'slon', 'clon', 'elon', 'slat', 'clat', 'elat', 'csol_z', 'Rrs', 'Rrs_unc', 'aot_865', 'angstrom', 'avw', 'nflh', 'l2_flags', 'longitude', 'latitude', 'tilt']
180+
open_method: {'xarray_open': 'datatree', 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge': 'all', 'merge_kwargs': {}}
181+
182+
Dimensions: {'number_of_bands': 286, 'number_of_reflective_bands': 286, 'wavelength_3d': 172, 'number_of_lines': 1710, 'pixels_per_line': 1272}
183+
184+
Variables: ['wavelength', 'vcal_gain', 'vcal_offset', 'F0', 'aw', 'bbw', 'k_oz', 'k_no2', 'Tau_r', 'year', 'day', 'msec', 'time', 'detnum', 'mside', 'slon', 'clon', 'elon', 'slat', 'clat', 'elat', 'csol_z', 'Rrs', 'Rrs_unc', 'aot_865', 'angstrom', 'avw', 'nflh', 'l2_flags', 'longitude', 'latitude', 'tilt']
180185

181186
Geolocation: ('longitude', 'latitude') — lon dims=('number_of_lines', 'pixels_per_line'), lat dims=('number_of_lines', 'pixels_per_line')
182187

183188
DataTree groups (detail):
184189
/
185-
Dimensions : {}
186-
Variables : []
190+
Dimensions: {}
191+
Variables: []
187192
/sensor_band_parameters
188-
Dimensions : {'number_of_bands': 286, 'number_of_reflective_bands': 286, 'wavelength_3d': 172}
189-
Variables : ['wavelength', 'vcal_gain', 'vcal_offset', 'F0', 'aw', 'bbw', 'k_oz', 'k_no2', 'Tau_r']
193+
Dimensions: {'number_of_bands': 286, 'number_of_reflective_bands': 286, 'wavelength_3d': 172}
194+
Variables: ['wavelength', 'vcal_gain', 'vcal_offset', 'F0', 'aw', 'bbw', 'k_oz', 'k_no2', 'Tau_r']
190195
/scan_line_attributes
191-
Dimensions : {'number_of_lines': 1710}
192-
Variables : ['year', 'day', 'msec', 'time', 'detnum', 'mside', 'slon', 'clon', 'elon', 'slat', 'clat', 'elat', 'csol_z']
196+
Dimensions: {'number_of_lines': 1710}
197+
Variables: ['year', 'day', 'msec', 'time', 'detnum', 'mside', 'slon', 'clon', 'elon', 'slat', 'clat', 'elat', 'csol_z']
193198
/geophysical_data
194-
Dimensions : {'number_of_lines': 1710, 'pixels_per_line': 1272, 'wavelength_3d': 172}
195-
Variables : ['Rrs', 'Rrs_unc', 'aot_865', 'angstrom', 'avw', 'nflh', 'l2_flags']
199+
Dimensions: {'number_of_lines': 1710, 'pixels_per_line': 1272, 'wavelength_3d': 172}
200+
Variables: ['Rrs', 'Rrs_unc', 'aot_865', 'angstrom', 'avw', 'nflh', 'l2_flags']
196201
/navigation_data
197-
Dimensions : {'number_of_lines': 1710, 'pixels_per_line': 1272}
198-
Variables : ['longitude', 'latitude', 'tilt']
202+
Dimensions: {'number_of_lines': 1710, 'pixels_per_line': 1272}
203+
Variables: ['longitude', 'latitude', 'tilt']
199204
/processing_control
200-
Dimensions : {}
201-
Variables : []
205+
Dimensions: {}
206+
Variables: []
202207
/processing_control/input_parameters
203-
Dimensions : {}
204-
Variables : []
208+
Dimensions: {}
209+
Variables: []
205210
/processing_control/flag_percentages
206-
Dimensions : {}
207-
Variables : []
211+
Dimensions: {}
212+
Variables: []
213+
CPU times: user 513 ms, sys: 40.7 ms, total: 554 ms
214+
Wall time: 1.25 s
208215

209216

210217
## Get the matchups using that plan
211218

219+
`pc.matchup()` with `open_method="datatree-merge"` opens each L2 granule as a DataTree and merges all groups into a flat dataset. Use `spatial_method="xoak"` for 2-D swath geolocation. I turn on `batch_size=5` and `silent=False` to watch the progress.
220+
221+
Notice, that point 0 is matched to 2 granules and so has 2 rows with the same `pc_id`.
222+
212223

213224
```python
214225
%%time
215-
res = pc.matchup(plan[0:5], geometry="swath", variables=["Rrs"])
216-
res
226+
# 1 min /
227+
res = pc.matchup(plan, spatial_method="xoak", variables=["Rrs"])
217228
```
218229

219-
CPU times: user 10.8 s, sys: 806 ms, total: 11.7 s
220-
Wall time: 18.8 s
230+
CPU times: user 32.3 s, sys: 2.28 s, total: 34.6 s
231+
Wall time: 58.9 s
232+
221233

222234

235+
```python
236+
res.head()
237+
```
238+
223239

224240

225241

@@ -244,13 +260,13 @@ res
244260
<th>lat</th>
245261
<th>lon</th>
246262
<th>time</th>
263+
<th>pc_id</th>
247264
<th>granule_id</th>
265+
<th>granule_time</th>
266+
<th>granule_lat</th>
267+
<th>granule_lon</th>
248268
<th>Rrs_346</th>
249269
<th>Rrs_348</th>
250-
<th>Rrs_351</th>
251-
<th>Rrs_353</th>
252-
<th>Rrs_356</th>
253-
<th>Rrs_358</th>
254270
<th>...</th>
255271
<th>Rrs_706</th>
256272
<th>Rrs_707</th>
@@ -270,11 +286,11 @@ res
270286
<td>27.3835</td>
271287
<td>-82.7375</td>
272288
<td>2024-06-13 12:00:00</td>
289+
<td>0</td>
273290
<td>https://obdaac-tea.earthdatacloud.nasa.gov/ob-...</td>
274-
<td>NaN</td>
275-
<td>NaN</td>
276-
<td>NaN</td>
277-
<td>NaN</td>
291+
<td>2024-06-13 17:18:49+00:00</td>
292+
<td>27.443144</td>
293+
<td>-82.612923</td>
278294
<td>NaN</td>
279295
<td>NaN</td>
280296
<td>...</td>
@@ -294,11 +310,11 @@ res
294310
<td>27.3835</td>
295311
<td>-82.7375</td>
296312
<td>2024-06-13 12:00:00</td>
313+
<td>0</td>
297314
<td>https://obdaac-tea.earthdatacloud.nasa.gov/ob-...</td>
298-
<td>NaN</td>
299-
<td>NaN</td>
300-
<td>NaN</td>
301-
<td>NaN</td>
315+
<td>2024-06-13 18:52:08+00:00</td>
316+
<td>27.383293</td>
317+
<td>-82.721527</td>
302318
<td>NaN</td>
303319
<td>NaN</td>
304320
<td>...</td>
@@ -318,13 +334,13 @@ res
318334
<td>27.1190</td>
319335
<td>-82.7125</td>
320336
<td>2024-06-14 12:00:00</td>
337+
<td>1</td>
321338
<td>https://obdaac-tea.earthdatacloud.nasa.gov/ob-...</td>
339+
<td>2024-06-14 17:53:34+00:00</td>
340+
<td>27.101389</td>
341+
<td>-82.717186</td>
322342
<td>0.01299</td>
323343
<td>0.012946</td>
324-
<td>0.013148</td>
325-
<td>0.013172</td>
326-
<td>0.012918</td>
327-
<td>0.012968</td>
328344
<td>...</td>
329345
<td>0.000238</td>
330346
<td>0.000228</td>
@@ -342,11 +358,11 @@ res
342358
<td>26.9435</td>
343359
<td>-82.8170</td>
344360
<td>2024-06-14 12:00:00</td>
361+
<td>2</td>
345362
<td>https://obdaac-tea.earthdatacloud.nasa.gov/ob-...</td>
346-
<td>NaN</td>
347-
<td>NaN</td>
348-
<td>NaN</td>
349-
<td>NaN</td>
363+
<td>2024-06-14 17:53:34+00:00</td>
364+
<td>26.954554</td>
365+
<td>-82.810219</td>
350366
<td>NaN</td>
351367
<td>NaN</td>
352368
<td>...</td>
@@ -366,35 +382,11 @@ res
366382
<td>26.6875</td>
367383
<td>-82.8065</td>
368384
<td>2024-06-14 12:00:00</td>
385+
<td>3</td>
369386
<td>https://obdaac-tea.earthdatacloud.nasa.gov/ob-...</td>
370-
<td>NaN</td>
371-
<td>NaN</td>
372-
<td>NaN</td>
373-
<td>NaN</td>
374-
<td>NaN</td>
375-
<td>NaN</td>
376-
<td>...</td>
377-
<td>NaN</td>
378-
<td>NaN</td>
379-
<td>NaN</td>
380-
<td>NaN</td>
381-
<td>NaN</td>
382-
<td>NaN</td>
383-
<td>NaN</td>
384-
<td>NaN</td>
385-
<td>NaN</td>
386-
<td>NaN</td>
387-
</tr>
388-
<tr>
389-
<th>5</th>
390-
<td>26.6675</td>
391-
<td>-82.6455</td>
392-
<td>2024-06-14 12:00:00</td>
393-
<td>https://obdaac-tea.earthdatacloud.nasa.gov/ob-...</td>
394-
<td>NaN</td>
395-
<td>NaN</td>
396-
<td>NaN</td>
397-
<td>NaN</td>
387+
<td>2024-06-14 17:53:34+00:00</td>
388+
<td>26.703817</td>
389+
<td>-82.817726</td>
398390
<td>NaN</td>
399391
<td>NaN</td>
400392
<td>...</td>
@@ -411,11 +403,89 @@ res
411403
</tr>
412404
</tbody>
413405
</table>
414-
<p>6 rows × 176 columns</p>
406+
<p>5 rows × 180 columns</p>
415407
</div>
416408

417409

418410

411+
## Predefined profiles for opening granules
412+
413+
Granules that have groups can be opened with `xr.open_datatree()` but the user will need to specify how the groups are merged so that the lat, lon and variables can be found. `point-collocation` has predefined profiles that you can use or modify.
414+
415+
416+
```python
417+
import point_collocation.profiles as pf
418+
pf.pace_l2
419+
```
420+
421+
422+
423+
424+
{'xarray_open': 'datatree', 'merge': 'all'}
425+
426+
427+
428+
You could modify this for PACE level 2 netcdfs by telling it to only merge the relevant groups. This doesn't actually affect speed or performance in this case.
429+
430+
431+
```python
432+
test = pf.pace_l2
433+
test['merge'] = ['/geophysical_data', '/navigation_data']
434+
```
435+
436+
Pass to `open_method`:
437+
438+
439+
```python
440+
%%time
441+
out = pc.matchup(plan, open_method=test, variables=["Rrs"],
442+
spatial_method="xoak")
443+
```
444+
445+
CPU times: user 28 s, sys: 1.41 s, total: 29.4 s
446+
Wall time: 52 s
447+
448+
449+
450+
```python
451+
plan.show_variables(open_method=test)
452+
```
453+
454+
open_method: {'xarray_open': 'datatree', 'merge': ['/geophysical_data', '/navigation_data'], 'open_kwargs': {'chunks': {}, 'engine': 'h5netcdf', 'decode_timedelta': False}, 'coords': 'auto', 'set_coords': True, 'dim_renames': None, 'auto_align_phony_dims': None, 'merge_kwargs': {}}
455+
456+
Dimensions: {'number_of_lines': 1710, 'pixels_per_line': 1272, 'wavelength_3d': 172}
457+
458+
Variables: ['Rrs', 'Rrs_unc', 'aot_865', 'angstrom', 'avw', 'nflh', 'l2_flags', 'longitude', 'latitude', 'tilt']
459+
460+
Geolocation: ('longitude', 'latitude') — lon dims=('number_of_lines', 'pixels_per_line'), lat dims=('number_of_lines', 'pixels_per_line')
461+
462+
DataTree groups (detail):
463+
/
464+
Dimensions: {}
465+
Variables: []
466+
/sensor_band_parameters
467+
Dimensions: {'number_of_bands': 286, 'number_of_reflective_bands': 286, 'wavelength_3d': 172}
468+
Variables: ['wavelength', 'vcal_gain', 'vcal_offset', 'F0', 'aw', 'bbw', 'k_oz', 'k_no2', 'Tau_r']
469+
/scan_line_attributes
470+
Dimensions: {'number_of_lines': 1710}
471+
Variables: ['year', 'day', 'msec', 'time', 'detnum', 'mside', 'slon', 'clon', 'elon', 'slat', 'clat', 'elat', 'csol_z']
472+
/geophysical_data
473+
Dimensions: {'number_of_lines': 1710, 'pixels_per_line': 1272, 'wavelength_3d': 172}
474+
Variables: ['Rrs', 'Rrs_unc', 'aot_865', 'angstrom', 'avw', 'nflh', 'l2_flags']
475+
/navigation_data
476+
Dimensions: {'number_of_lines': 1710, 'pixels_per_line': 1272}
477+
Variables: ['longitude', 'latitude', 'tilt']
478+
/processing_control
479+
Dimensions: {}
480+
Variables: []
481+
/processing_control/input_parameters
482+
Dimensions: {}
483+
Variables: []
484+
/processing_control/flag_percentages
485+
Dimensions: {}
486+
Variables: []
487+
488+
419489

420490
```python
421491

0 commit comments

Comments
 (0)