Enable block gzip compression example by sagehen03 · Pull Request #85 · broadinstitute/dig-aggregator-methods

sagehen03 · 2023-04-18T20:12:24Z

Making this work comes down to specifying the codec when writing from pyspark and also having that codec on the Spark classpath.

sagehen03 · 2023-04-18T20:16:52Z

bioindex/src/main/resources/compression.sh

@@ -0,0 +1,3 @@
+#!/bin/bash -xe
+
+sudo aws s3 cp s3://dig-data-registry/hail.jar /usr/lib/spark/jars/


I built the hail.jar from the hail source (it needs to compiled using java 8 since that's what our EMR clusters use) and then uploaded it to S3. We probably need a better location in s3, but I used this one now since it's not in production use. This solution also relies on EMR continuing to put /usr/lib/spark/jars on the classpath.

So for things like this we tend to use s3://dig-aggregator-data/bin/

psmadbec

Magical. For the generation of the hail.jar either we'll want instructions, or probably we'll want something to be put into dig-analysis-data/scripts which is where I put things that have a specific generation sequence to help in case someone needs to generate or update the file itself.

psmadbec · 2023-04-18T21:05:32Z

bioindex/src/main/resources/compression.sh

@@ -0,0 +1,3 @@
+#!/bin/bash -xe
+
+sudo aws s3 cp s3://dig-data-registry/hail.jar /usr/lib/spark/jars/


So for things like this we tend to use s3://dig-aggregator-data/bin/

seems to work

14457d6

sagehen03 commented Apr 18, 2023

View reviewed changes

removing unused import

32e78b8

psmadbec approved these changes Apr 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable block gzip compression example#85

Enable block gzip compression example#85
sagehen03 wants to merge 2 commits intomasterfrom
dh-block-gzip-example

sagehen03 commented Apr 18, 2023 •

edited

Loading

Uh oh!

sagehen03 Apr 18, 2023 •

edited

Loading

Uh oh!

psmadbec Apr 18, 2023

Uh oh!

psmadbec left a comment

Uh oh!

psmadbec Apr 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,3 @@
		#!/bin/bash -xe

		sudo aws s3 cp s3://dig-data-registry/hail.jar /usr/lib/spark/jars/

Conversation

sagehen03 commented Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sagehen03 Apr 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psmadbec Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

psmadbec left a comment

Choose a reason for hiding this comment

Uh oh!

psmadbec Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sagehen03 commented Apr 18, 2023 •

edited

Loading

sagehen03 Apr 18, 2023 •

edited

Loading