Skip to content

Commit 7f5e50e

Browse files
move around docs
1 parent a3efcaa commit 7f5e50e

File tree

4 files changed

+339
-342
lines changed

4 files changed

+339
-342
lines changed

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2022-2023 LLNS, LLC and other HPCIC DevTools Developers.
3+
Copyright (c) 2022-2023 The Snakemake team, LLNS, LLC and other HPCIC DevTools Developers.
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 2 additions & 341 deletions
Original file line numberDiff line numberDiff line change
@@ -1,346 +1,7 @@
1-
# Snakemake Executor Google Batch
2-
3-
> This is currently a skeleton and not ready for use, but come back soon! 🎃️
1+
# Snakemake executor plugin: google-batch
42

53
This is the [Google Batch](https://cloud.google.com/batch/docs/get-started) external executor plugin for snakemake.
6-
If you are migrating from Google Life Sciences see [this documentation](https://cloud.google.com/batch/docs/migrate-to-batch-from-cloud-life-sciences). For the underlying Python SDK, see [google-cloud-batch](https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-batch) on GitHub.
7-
8-
## Usage
9-
10-
### Setup
11-
12-
You'll likely want to start by setting up [application default credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc#how-to)
13-
The easiest thing to do is run:
14-
15-
```bash
16-
gcloud auth application-default login
17-
```
18-
19-
### Quick Start
20-
21-
The basic usage is, from a directory with your Snakefile, to ask for `googlebatch` as the
22-
executor.
23-
24-
```bash
25-
$ snakemake --jobs 1 --executor googlebatch
26-
```
27-
28-
You are minimally required to provide a project and region, and can do this through the environment or command line:
29-
30-
```bash
31-
export SNAKEMAKE_GOOGLEBATCH_PROJECT=myproject
32-
export SNAKEMAKE_GOOGLEBATCH_REGION=us-central1
33-
snakemake --jobs 1 --executor googlebatch
34-
```
35-
36-
or
37-
38-
```bash
39-
export SNAKEMAKE_GOOGLEBATCH_PROJECT=myproject
40-
export SNAKEMAKE_GOOGLEBATCH_REGION=us-central1
41-
snakemake --jobs 1 --executor googlebatch --googlebatch-project myproject --googlebatch-region us-central1
42-
```
43-
44-
You can provide one or more custom arguments, as shown in the table below, to customize your batch run.
45-
Note that batch offers setup snippets to help with more complex setups (e.g,. MPI). See [batch snippets](#batch-snippets)
46-
for more information.
47-
48-
49-
### Logging
50-
51-
For logging, for an interactive run from the command line we provide status updates in the console you have running locally. For full logs, you can
52-
go to the [Google Cloud Batch interface](https://console.cloud.google.com/batch/jobs?project=llnl-flux) and click
53-
on your job of interest, and then the "Logs" tab. If you don't see logs, look in the "Events" tab, as usually there
54-
is an error with your configuration (e.g., an unknown image or family).
55-
56-
#### Isolated Logs
57-
58-
If you need to retrieve logs for a job outside of this context (e.g., after a run or in a Pythonic test) you can use the provided script in [example](example).
59-
Here is how to run it using the local poetry environment. You can either provide `--project` and `--region` or export the environment variables for them
60-
described above.
61-
62-
```bash
63-
# <jobid>
64-
poetry run python example/show-logs.py a-898674
65-
```
66-
67-
Note that this is currently provided as a helper script because the [Google Cloud API limits](https://cloud.google.com/logging/quotas#api-limits)
68-
set a rate limit of 60/minute.
69-
70-
> Number of entries.list requests 60 per minute, per Google Cloud project
71-
72-
For some perspective, a "hello world" job will produce over 3K lines of logs, and (without a sleep between calls)
73-
the ratelimit is hit very easily. We are currently assessing strategies to deliver full logs to .snakemake logging files
74-
without hitting issues with this rate limit. It looks possible to create "[sinks](https://cloud.google.com/logging/docs/routing/overview#sinks)" using
75-
Pub Sub, however this would be adding an extra API dependency (and cost).
76-
77-
### Arguments
78-
79-
And custom arguments can be any of the following, either on the command line or provided in the environment.
80-
81-
| Name | Description | Flag | Type | Environment Variable | Required | Default |
82-
|------|-------------|------|------|----------------------|----------|---------|
83-
| project | The name of the Google Project | `--googlebatch-project` | str | `SNAKEMAKE_GOOGLEBATCH_PROJECT` | True | unset |
84-
| region | The name of the Google Project region (e.g., us-central1) | str | `--googlebatch-region` |`SNAKEMAKE_GOOGLEBATCH_REGION` | True | unset |
85-
| machine_type | Google Cloud machine type or VM (mpitune configurations are on c2 and c2d family) | str | `--googlebatch-machine-type` | | False | c2-standard-4 |
86-
| image_family | Google Cloud image family (defaults to hpc-centos-7) | `--googlebatch-image-family` | str | | False | hpc-centos-7 |
87-
| image_project | The project the selected image belongs to (defaults to cloud-hpc-image-public) | `--googlebatch-image-project` | str | | False | cloud-hpc-image-public |
88-
| bucket | A bucket to mount with snakemake data | `--googlebatch-bucket` | str | `SNAKEMAKE_GOOGLEBATCH_BUCKET` | True | unset |
89-
| mount_path | The mount path for a bucket (if provided) | `--googlebatch-mount-path` | str | | False | /mnt/share |
90-
| work_tasks | The default number of work tasks (these are NOT MPI ranks) | `--googlebatch-work-tasks` | int | | False | 1 |
91-
| cpu_milli | Milliseconds per cpu-second | `--googlebatch-cpu-milli` | int | | False | 1000 |
92-
| work_tasks_per_node | The default number of work tasks per node (Google Batch calls these tasks) | `--googlebatch-work-tasks-per-node` | int | | False | 1 |
93-
| memory | Memory in MiB | `--googlebatch-memory` | int | | False | 1000 |
94-
| retry_count | Retry count (default to 1) | `--googlebatch-retry-count` | int | | False | 1 |
95-
| max_run_duration | Maximum run duration, string (e.g., 3600s) | `--googlebatch-max-run-duration` | str | | False | "3600s" |
96-
| labels | Comma separated key value pairs to label job (e.g., model=a3,stage=test) |`--googlebatch-labels` | str | | False | unset|
97-
| container | Container to use (only when image_family is batch-cos*) [see here](https://cloud.google.com/batch/docs/vm-os-environment-overview#supported_vm_os_images) for families/projects | `--googlebatch-container` | str | | False | unset|
98-
| keep_source_cache | Cache workflows in your Google Cloud Storage Bucket | `--googlebatch-keep-source-cache` | bool | | False | False |
99-
| snippet | A comma separated list of one or more snippets to add to your setup | `--googelbatch-snippets` | str | | False | unset |
100-
101-
For machine type, note that for MPI workloads, mpitune configurations are validated on c2 and c2d instances only.
102-
Also note that you can customize the machine type on the level of the step (see [Step Options](#step-options) below).
103-
104-
#### Choosing an Image
105-
106-
You can read about how to choose an image [here](https://cloud.google.com/batch/docs/view-os-images). Note that
107-
the image family and project must match or you'll see that your job does not run (but has an event that indicates a mismatch in the online table).
108-
Since this is a changing set we do not validate, however we suggest that you check before running to not waste time.
109-
I am not entirely sure how to choose correctly, because there is some information [here]() but this listing offers
110-
different information:
111-
112-
```bash
113-
gcloud compute images list | grep cos
114-
```
115-
116-
### Batch Snippets
117-
118-
Batch, by way of running on virtual machines, can support custom more complex setups or running steps such as running MPI.
119-
However, the setups here are non trivial, so if you choose, a custom snippet can be added. There are
120-
two types of snippets:
121-
122-
- named, built-in snippets provided by the googlebatch executor plugin here
123-
- your custom snippet provided via a script file (not implemented yet)
124-
125-
For each named snippet, depending on the functionality it might add custom logic to the setup or final runnable step.
126-
Examples for providing both are shown below. To determine if the snippet is custom, it should be a json or yaml file that
127-
exists. The order that you provide any number of snippets is the order they
128-
are added. To provide more than one, provide them via a comma separated list.
129-
130-
```bash
131-
$ snakemake --jobs 1 --executor googlebatch --googlebatch-bucket snakemake-cache-dinosaur --googlebatch-snippets intel-mpi
132-
```
133-
134-
### Additional Environment Variables
135-
136-
The following environment variables are available within any Google batch run:
137-
138-
- `BATCH_TASK_INDEX`: The index of the workflow step (Google Batch calls a "task")
139-
140-
### Step Options
141-
142-
The following options are allowed for batch steps. This predominantly includes most arguments.
143-
144-
#### googlebatch_machine_type
145-
146-
This will define the machine type for a particular step, overriding the default from the command line.
147-
148-
```console
149-
rule hello_world:
150-
output:
151-
"...",
152-
resources:
153-
googlebatch_machine_type="c3-standard-112"
154-
shell:
155-
"..."
156-
```
157-
158-
#### googlebatch_image_family
159-
160-
This will define the image family for a particular step, overriding the default from the command line.
161-
162-
```console
163-
rule hello_world:
164-
output:
165-
"...",
166-
resources:
167-
googlebatch_image_family="hpc-centos-7"
168-
shell:
169-
"..."
170-
```
171-
172-
173-
#### googlebatch_image_project
174-
175-
This will define the image project for a particular step, overriding the default from the command line.
176-
177-
```console
178-
rule hello_world:
179-
output:
180-
"...",
181-
resources:
182-
googlebatch_image_project="cloud-hpc-image-public"
183-
shell:
184-
"..."
185-
```
186-
187-
188-
#### googlebatch_bucket
189-
190-
This will define the bucket for a particular step, overriding the default from the command line.
191-
192-
```console
193-
rule hello_world:
194-
output:
195-
"...",
196-
resources:
197-
googlebatch_bucket="my-snakemake-batch-bucket"
198-
shell:
199-
"..."
200-
```
201-
202-
#### googlebatch_mount_path
203-
204-
This will define the mount path for a bucket for a particular step, overriding the default from the command line.
205-
206-
```console
207-
rule hello_world:
208-
output:
209-
"...",
210-
resources:
211-
googlebatch_mount_path="/mnt/workflow"
212-
shell:
213-
"..."
214-
```
215-
216-
217-
#### googlebatch_work_tasks
218-
219-
This will define the work tasks for a particular step, overriding the default from the command line.
220-
221-
```console
222-
rule hello_world:
223-
output:
224-
"...",
225-
resources:
226-
googlebatch_work_tasks=1
227-
shell:
228-
"..."
229-
```
230-
231-
#### googlebatch_cpu_milli
232-
233-
This will define the milliseconds per cpu-second for a particular step, overriding the default from the command line.
234-
235-
```console
236-
rule hello_world:
237-
output:
238-
"...",
239-
resources:
240-
googlebatch_cpu_mulli=2000
241-
shell:
242-
"..."
243-
```
244-
245-
#### googlebatch_work_tasks_per_node
246-
247-
This will define the work tasks per node (Google batch calls these tasks) for a particular step, overriding the default from the command line.
248-
249-
```console
250-
rule hello_world:
251-
output:
252-
"...",
253-
resources:
254-
googlebatch_work_tasks_per_node=2
255-
shell:
256-
"..."
257-
```
258-
259-
#### googlebatch_memory
260-
261-
This will define the memory for a particular step as an integer in MiB, overriding the default from the command line.
262-
263-
```console
264-
rule hello_world:
265-
output:
266-
"...",
267-
resources:
268-
googlebatch_memory=2000
269-
shell:
270-
"..."
271-
```
272-
273-
274-
#### googlebatch_retry_count
275-
276-
This will define the retry times for a step overriding the default from the command line.
277-
278-
```console
279-
rule hello_world:
280-
output:
281-
"...",
282-
resources:
283-
googlebatch_retry_count=2
284-
shell:
285-
"..."
286-
```
287-
288-
#### googlebatch_max_run_duration
289-
290-
This will define the max run duration for a step overriding the default from the command line.
291-
292-
```console
293-
rule hello_world:
294-
output:
295-
"...",
296-
resources:
297-
googlebatch_max_run_duration="3600s"
298-
shell:
299-
"..."
300-
```
301-
302-
#### googlebatch_labels
303-
304-
This will define the extra labels to add to the Google Batch job.
305-
306-
```console
307-
rule hello_world:
308-
output:
309-
"...",
310-
resources:
311-
googlebatch_labels="model=c3,stage=test"
312-
shell:
313-
"..."
314-
```
315-
316-
317-
#### googlebatch_container
318-
319-
A container to use only with `image_family` set to batch-cos* (see [here](https://cloud.google.com/batch/docs/vm-os-environment-overview#supported_vm_os_images) for how to see VM choices)
320-
321-
```console
322-
rule hello_world:
323-
output:
324-
"...",
325-
resources:
326-
googlebatch_container="ghcr.io/rse-ops/atacseq:app-latest"
327-
shell:
328-
"..."
329-
```
330-
331-
#### googlebatch_snippets
332-
333-
One or more named (or file-derived) snippets to add to setup.
334-
335-
```console
336-
rule hello_world:
337-
output:
338-
"...",
339-
resources:
340-
googlebatch_snippets="mpi,myscript.sh"
341-
shell:
342-
"..."
343-
```
4+
For documentation, see the [Snakemake plugin catalog](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/googlebatch.html).
3445

3456
### TODO
3467

0 commit comments

Comments
 (0)