Skip to content

Feasibility Report: Virtual Stores for NASA #318

@abarciauskas-bgse

Description

@abarciauskas-bgse

We want to support adoption by ESDIS by reporting:

  1. Why they would want to use this technology
    1. What does it enable?
    2. What does the future look like?
    3. How does it help NASA data users?
    4. How does it help NASA save money?
  2. Compatibility + Feasibility Assessment: What data can it be applied to now? What are some current limitations, and what are the plans to address them, if any?
    1. What works now:
      a. Consistently gridded data into collection-level aggregated chunk manifests (PODAAC)
      b. Displacement data (ASF)
      c. Ongoing data: Appending via icechunk or overwriting kerchunk JSON
    2. Future: different grids and compression schemes, L2/orbital swath data
    3. No plans
      a. Appending to kerchunk
      b. Other format support besides what exists
  3. Known limitations:
    1. Icechunk is very python + rust centric
    2. Structural decisions about data formatting, chunking and chunk manifests made early on impact performance for different use cases. Still cannot simultaneously optimize for all use cases. Ideally, chunk manifests could be aggregated dynamically depending on the use case. For NISAR, for example, the current design for chunk manifests assumes users will be working with frames and thus optimizes for loading a chunk manifest per frame, but then you cannot easily load across frames.
  4. Governance decisions which need to be made
    1. Standards for where to put the metadata - collection-level and frame-level
  5. Established Best practices
    1. Typical use case patterns should be considered when designing the files/chunks and aggregated chunk manifests. For example, frames with NISAR data because of typical time series analysis.

This report would also include a link to:

  1. An onboarding guide for anyone who is looking to get started virtualizing data.
  2. An initial library of virtualization examples. These examples would represent the variety of data and use case patterns which have already been solved and serve as a resource for virtual layer producers.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions