201: create a SSOT doc for Cirrus payloads by ping-e84 · Pull Request #203 · stac-utils/stac-task

ping-e84 · 2026-01-07T14:56:03Z

Related Issue(s):

create a single-source-of-truth definition for a Cirrus process payload #201

Proposed Changes:

Add a doc on payload structure (intended to be a single source of truth across Cirrus, stac-task, and other docs)
change payload references elsewhere in stac-task documentation to point to the payload doc

PR Checklist:

I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.

Use of AI:

initialized basic document structure and syntax using Copilot; documentation is otherwise human written and edited

jkeifer

This is so great, I love it. I have some strong feelings about some of the details, so I've left some pretty nitpicky comments. I did call out a few things that I think are a little inaccurate and would like to see tweaked a bit. Let me know if you have any questions about how to address any of my feedback, or think I'm wrong about any of it (because I very well could be).

jkeifer · 2026-01-07T19:37:14Z

docs/source/payload.rst

+2. Define the workflow:
+
+   - How should the workflow be structured? (*e.g.* single STAC task? series
+     of STAC tasks?)


This feels misleading to me, like the payload alone defines the workflow and the tasks executed therein. And there's a part of me deep down that wishes this were in fact true, that we could define a workflow in a payload ad hoc. But for reasons that idea falls apart, and isn't even attractive at scale, such that we end up with workflow definitions to define the workflows, and payload merely contain a pointer to a definition (the workflow name) that should be executed.

per in-person convo, changed the wording of this section to be less about defining and more about configuring the workflow.

jkeifer · 2026-01-08T07:07:21Z

docs/source/payload.rst

+   - What global parameters should be applied to all tasks in the workflow?
+   - How should each task be configured? (*i.e.*, what parameters are required
+     for each task)?


These are the same thing, in my opinion, so I'd try to use the same language for both, if it is even worth calling them out separately. That is, maybe this is just one item, "Input argument specification for task parameters (via both task-specific and globally accessible argument lists)." Or something. That's not great either, but maybe it can spark something???

per in-person convo, changed wording to combine the ideas of task and workflow parameters.

jkeifer · 2026-01-08T07:08:23Z

docs/source/payload.rst

+   - What global parameters should be applied to all tasks in the workflow?
+   - How should each task be configured? (*i.e.*, what parameters are required
+     for each task)?
+3. How should Collections be assigned to the output STAC Item(s)?


I'm being being very nitpicky but it is both which collections and how they are assigned.

per in-person convo, changed wording per suggestion.

jkeifer · 2026-01-08T07:11:12Z

docs/source/payload.rst

+   - How should each task be configured? (*i.e.*, what parameters are required
+     for each task)?
+3. How should Collections be assigned to the output STAC Item(s)?
+4. How and where should Items and Item Assets be uploaded?


Now I'm going to be overly nitpicky in the other direction: what does the payload specify about how? How to me indicates via what mechanism...maybe its perfectly valid though referring to arguments to the upload mechanism, I guess I could maybe accept the reasoning that they change the behavior and thus the how?

I'm really getting nitpicky here. Feel free to ignore this one entirely.

per in-person convo, changed to only indicate that the where concept is key here.

jkeifer · 2026-01-08T07:18:52Z

docs/source/payload.rst

+   * - features
+     - [Item]
+     - An array of STAC Items


Zero or more STAC items...and this is technically not required, per our current validation. Except it is required by the GeoJSON spec for FeatureCollections, so we probably should make it required here too if we are going to stick with this payload format much longer...but that's a separate problem, and the genie is likely out of that bottle. For now let's just not call attention to the fact that it is optional and pretend it is required.

But the zero or more STAC items piece is important. I'd honestly like to see a section discussing that case, where you don't have valid STAC items or Features. Say for non-STACified data coming in, what do you have? Often just a reference to a file somewhere, right? So it is perfectly valid to have an empty features array, but what do you do with that reference?

per in-person convo, made the suggested change -- also added a payload examples section at the end of the doc to demo this particular case.

jkeifer · 2026-01-08T07:27:11Z

docs/source/payload.rst

+ProcessDefinition Object
+========================
+
+The ``process`` block (array) must include a *single* ``ProcessDefinition``


Suggested change

The ``process`` block (array) must include a *single* ``ProcessDefinition``

The ``process`` array must include a *single* ``ProcessDefinition`` at the beginning. [more about chaining and that constraint...]

But maybe start with what one ProcessDefinition is, then explain the process array?

Per in-person convo, changed wording to reflect the larger conversation about the process block...

jkeifer · 2026-01-08T07:30:25Z

docs/source/payload.rst

+     - ``Map<string, Map<string, Any>>``
+     - Dictionary of collection-specific configuration options
+
+.. _workflow-options:


I personally would want to explain these after introducing tasks in the next section. I think the tasks description is more illustrative of the concepts here, and I also thing it's generally preferable for people to use task options over workflow options (explicit is better than implicit).

I think if you introduce the task options first, then you can also show an example here.

Per in-person convo, switched the order of the tasks and workflow_options sub-sections to improve flow.

jkeifer · 2026-01-08T07:36:07Z

docs/source/payload.rst

+--------------------
+
+An ``UploadOptions`` object includes parameters that define how Items and Item
+Assets should be uploaded to the cloud (note that presently only AWS S3 is


Maybe this is correct, but is only AWS S3 supported? Or is any S3-compatible object store potentially supported? Recent boto3 versions allow changing the AWS endpoint via an env var (AWS_ENDPOINT_URL). And if you just want to target S3 and leave other AWS services with their default endpoints that's possible using AWS_ENDPOINT_URL_S3.

Of course we need a better story here, like the ability to have per-bucket endpoints. But I'm not certain we are strictly limited to AWS S3 here.

This comment is old, but I think it still applies: #20 (comment)
For upload, it looks like, yes, one could upload to non-S3, but stac-task doesn't make it easy:

the code still relies on boto3utils.s3() as the global client

URL generation assumes S3 URIs

Because of the use of stac-asset, downloads can use whatever fsspec can do (so multiple cloud providers) but you have to use stac_asset.Config which is not locally documented in any way and is, frankly, a bit opaque for a non-expert user.

For both downloading and uploading, there would be a lot of customization involved for non-S3, so, yes, non-S3 is possible, but I think to say "supported" is a stretch to be honest.

I removed the specific "only AWS S3 is supported" language and added another info box with a little more high-level detail...

docs/source/payload.rst

jkeifer · 2026-01-08T07:43:35Z

docs/source/payload.rst

+complete) - consider this when uploading Items to S3 or otherwise writing them
+*prior* to returning them. Also note that ``collection_matchers`` are mutually


It is effectively required to assign collections prior to upload, no? I feel like the language here could be stronger. And that maybe we should have some gate in place to prevent people from doing an upload without doing the collection assignment? Or perhaps I'm getting too close to #186 with that latter idea.

Per in-person convo...

It is effectively required to assign collections prior to upload, no?

Yes, it is required, however, stac-task only assigns collections on an automated basis after the process method completes. So, given that fact, the suggestion here is to assign collections somewhere in the process method prior to uploading (even though, unfortunately, collections might be assigned twice).

I added another info box to clarify this and hopefully provide some useful guidance to users...

jkeifer · 2026-01-09T18:03:29Z

docs/source/payload.rst

@@ -0,0 +1,435 @@
+======================
+Cirrus Process payload


This isn't actually a cirrus payload...we should keep this doc generic, i.e., non-cirrus-specific. Once we have this doc we can revise the cirrus documentation to reference this doc and extend it with the cirrus-specific bits.

per in-person convo, changed title to STAC Task Payload.

jkeifer

Happy to chat again if you have questions. I'm still being quite nitpicky so if we just need go ahead and merge this I am happy to approve as-is and we can iterate on it sometime in the future. The current state I feel still could be improved, but it is really good, and profoundly better than anything we have, so I'd be happy to take it as-is for the win.

jkeifer · 2026-01-14T18:32:17Z

docs/source/payload.rst

+ProcessDefinition Object
+========================
+
+The ``process`` block (array) should include a *single* ``ProcessDefinition``


I would use stronger language, should typically is a recommendation, versus must where something is strictly required.

Suggested change

The ``process`` block (array) should include a *single* ``ProcessDefinition``

The ``process`` block (array) must include a *single* ``ProcessDefinition``

jkeifer · 2026-01-14T18:34:12Z

docs/source/payload.rst

+objects, ``stac-task`` only supports a single ``ProcessDefinition`` and will
+only read the first one in the ``process`` array. **\***


I still find this language a bit misleading. stac-task supports mutliple process defintions in the sense that it will not error if there are additional ones.

What do you think about something like this?

Suggested change

objects, ``stac-task`` only supports a single ``ProcessDefinition`` and will

only read the first one in the ``process`` array. **\***

objects, ``stac-task`` reads only the first ``ProcessDefinition`` in the

``process`` array. **\***

jkeifer · 2026-01-14T18:39:17Z

docs/source/payload.rst

+Here is an example of a ``workflow_options`` dictionary with a single global
+parameter:
+
+.. code-block:: json
+
+   {
+       "workflow_options": {
+           "global_param": "global_value"
+       }
+   }


I was thinking maybe an example of how this works with task parameters would be valuable. Consider:

Suggested change

Here is an example of a ``workflow_options`` dictionary with a single global

parameter:

.. code-block:: json

{

"workflow_options": {

"global_param": "global_value"

}

}

Here is an example of a ``workflow_options`` dictionary with a single global

parameter, and how ``workflow_options`` interacts with task-specific parameter

values:

.. code-block:: json

{

"workflow_options": {

"param1": "global_value"

},

"tasks": {

"task-a": {

"param1": "value1"

},

"task-c": {

"param2": "value2"

}

}

}

In this case, ``task-a`` would get one keyword argument ``param1='value1'``,

where ``task-c`` would get two: ``param1='global_value'`` and

``param2='value2'``. If another task ``task-b`` also runs, it would get one

keyword argument ``param1='global_value'``.

jkeifer · 2026-01-14T18:39:54Z

docs/source/payload.rst

+In the example above, a task named ``task-a`` would have the ``param1=value1``
+key-value pair passed as a keyword, while ``task-c`` would have
+``param2=value2`` passed in. If there were a ``task-b`` to be run, it would not be
+passed any keywords.


I would put quotes around the values to indicate they are strings:

Suggested change

In the example above, a task named ``task-a`` would have the ``param1=value1``

key-value pair passed as a keyword, while ``task-c`` would have

``param2=value2`` passed in. If there were a ``task-b`` to be run, it would not be

passed any keywords.

In the example above, a task named ``task-a`` would have the ``param1='value1'``

key-value pair passed as a keyword, while ``task-c`` would have

``param2='value2'`` passed in. If there were a ``task-b`` to be run, it would not be

passed any keywords.

jkeifer · 2026-01-14T18:41:13Z

docs/source/payload.rst

+        "features": [],
+        "process": [
+            {
+                "workflow": "image-ingest",


Workflow is an orchestration thing, not stac-task. I don't think this belongs here.

jkeifer · 2026-01-14T18:41:51Z

docs/source/payload.rst

+        ],
+        "process": [
+            {
+                "workflow": "cog-generation",


Again, workflow is not part of the stac-task payload model, and is currently only added by cirrus (and swoop if you can count it).

jkeifer · 2026-01-14T18:43:30Z

docs/source/payload.rst

+        ],
+        "process": [
+            {
+                "description": "NDVI COG Workflow",


Where is description modeled? Is anything modeling this? I suppose I could go look myself, but I am lazy...

Description feels super weird to me though. What are we describing? The payload? The process definition? It can't be the workflow, because the payload doesn't define the workflow, it merely contains a pointer to it in systems like cirrus where we have a workflow concept. stac-task does not.

jkeifer · 2026-01-14T18:47:49Z

docs/source/payload.rst

+        ]
+    }
+
+The workflow described by this payload configures the `asset-fetcher` task to


I take issue with framing this as describing a workflow. What is running this? Not stac-task. stac-task only supports single tasks. That multiple tasks can be present in the payload is a function of the utility of stac-task in workflow situations like cirrus.

I think you could reframe this as something like "multiple stac-tasks are often composed together into workflows by STAC Workflow orchestration frameworks such as cirrus." My main concern here is to make it clear that a stac-task is concerned with a single task, itself, and something above stac-task is required to define and execute a workflow.

jkeifer · 2026-01-14T18:53:11Z

docs/source/payload.rst

+2. Configure the STAC workflow:
+
+   - What is the proposed workflow (*e.g.* single task? series of tasks?)


I am still suspect of this explanation, because the payload is not currently used to define a workflow anywhere. I don't think it reasonably can. The payload is merely the input to a single task. Yes, it may contain extra configuration to passed along to successive tasks, but that's outside the scope of stac-task. Granted, you invoke "STAC workflows" here, which is conceptually a thing meant to convey the possibility of having multiple stac-tasks composed together into a workflow, but even then the payload does not define what tasks are executed. A payload my include task parameters for tasks that will never run. A payload may not include task parameters for tasks that do not require any.

In other words, one cannot look at a payload and have any conception of what tasks will or will not be executed by a STAC workflow.

ping-e84 added 3 commits January 7, 2026 09:49

add draft payload doc

ffd2d52

update CHANGELOG

4828071

minor edits to payload.rst

ac35a5d

jkeifer requested changes Jan 8, 2026

View reviewed changes

merge in main and resolve conflicts

e0bd4ae

jkeifer reviewed Jan 9, 2026

View reviewed changes

edits to payload.rst based on PR review

162c46a

ping-e84 requested a review from jkeifer January 12, 2026 22:24

jkeifer requested changes Jan 14, 2026

View reviewed changes

	The ``process`` block (array) must include a single ``ProcessDefinition``
	The ``process`` array must include a single ``ProcessDefinition`` at the beginning. [more about chaining and that constraint...]

		complete) - consider this when uploading Items to S3 or otherwise writing them
		prior to returning them. Also note that ``collection_matchers`` are mutually

		@@ -0,0 +1,435 @@
		======================
		Cirrus Process payload

	The ``process`` block (array) should include a single ``ProcessDefinition``
	The ``process`` block (array) must include a single ``ProcessDefinition``

		objects, ``stac-task`` only supports a single ``ProcessDefinition`` and will
		only read the first one in the ``process`` array. \*

-Here is an example of a ``workflow_options`` dictionary with a single global
-parameter:
-.. code-block:: json
-   {
-       "workflow_options": {
-           "global_param": "global_value"
-       }
-   }
+Here is an example of a ``workflow_options`` dictionary with a single global
+parameter, and how ``workflow_options`` interacts with task-specific parameter
+values:
+.. code-block:: json
+   {
+       "workflow_options": {
+           "param1": "global_value"
+       },
+       "tasks": {
+           "task-a": {
+               "param1": "value1"
+           },
+           "task-c": {
+               "param2": "value2"
+           }
+       }
+   }
+In this case, ``task-a`` would get one keyword argument ``param1='value1'``,
+where ``task-c`` would get two: ``param1='global_value'`` and
+``param2='value2'``. If another task ``task-b`` also runs, it would get one
+keyword argument ``param1='global_value'``.

		2. Configure the STAC workflow:

		- What is the proposed workflow (e.g. single task? series of tasks?)

Conversation

ping-e84 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkeifer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkeifer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ping-e84 commented Jan 7, 2026 •

edited

Loading