[SVCS-548] Chunked Uploads for CloudFiles by Johnetordoff · Pull Request #289 · CenterForOpenScience/waterbutler

Johnetordoff · 2017-10-24T18:37:34Z

Note (Added by Longze)

This PR contains code from: #283.

Ticket

https://openscience.atlassian.net/browse/SVCS-548

Purpose

This PR allows one to upload files larger than 5GBs

Changes

Adds a new method to clould files provider with tests and raises max body limit to arbitrarily high 1 Terra byte.

Side effects

None that I know of.

QA Notes

It will take a long time to upload a > 5 GB files, have something else to do while you wait.

Deployment Notes

None that I know of.

…isions etc.

…erbutler into cloudfiles * 'develop' of https://github.com/CenterForOpenScience/waterbutler: Add PR template.

coveralls · 2017-10-24T22:04:21Z

Coverage increased (+0.3%) to 89.511% when pulling a13d088 on Johnetordoff:cloudfiles-large-uploads into cc68aca on CenterForOpenScience:develop.

AddisonSchiller

Tried to mostly keep the review to things related to the chunked-upload commit. I think there were maybe 1 or two style things i commented on that may have been from your original CloudFiles ticket.

also maybe add QA notes for jira ticket. That way whoever is testing knows how to trigger the chunked uploads etc.

AddisonSchiller · 2017-10-31T15:33:52Z

waterbutler/core/provider.py

        """
        assert src_path.is_dir, 'src_path must be a directory'
        assert asyncio.iscoroutinefunction(func), 'func must be a coroutine'
-


No need to delete this blank line

AddisonSchiller · 2017-10-31T15:41:41Z

tests/providers/cloudfiles/test_provider.py

+            assert aiohttpretty.has_call(method='PUT', uri=url)
+
+    @pytest.mark.asyncio
+    @pytest.mark.aiohttpretty


What is this test testing over the one above it?

This: https://github.com/Johnetordoff/waterbutler/blob/a13d088935696c86b874c0411227e05cd26824e1/waterbutler/providers/cloudfiles/provider.py#L131

AddisonSchiller · 2017-10-31T15:45:48Z

tests/providers/cloudfiles/test_provider.py

+        revision_url = connected_provider.build_url('', container='versions-container', **query)
+        aiohttpretty.register_json_uri('GET', revision_url, body=revision_list)
+
+        metadata_url = connected_provider.build_url(path.path)


2 blank lines

AddisonSchiller · 2017-10-31T15:53:53Z

waterbutler/providers/cloudfiles/provider.py

+        :param bool fetch_metadata: If true upload will return metadata
+        :rtype (dict/None, bool):
        """
+        if stream.size > self.SEGMENT_SIZE:


Question: Does cloudfiles need a call to handle_naming?

AddisonSchiller · 2017-10-31T15:56:38Z

waterbutler/providers/cloudfiles/provider.py

-    async def delete(self, path, **kwargs):
+    async def chunked_upload(self, stream, path, check_created=True, fetch_metadata=True):
+
+        created = not (await self.exists(path)) if check_created else None


in upload, you have it as

if check_created: created = not (await self.exists(path)) else: created = None

which is much more readable. Maybe go back to that version instead of the more confusing 1 liner?

Readability is pretty subjective, if you recognize this as a ternary operator this is a pretty simple statement. You can read more about the ternary operators here: http://book.pythontips.com/en/latest/ternary_operators.html.

Hm I might agree with Addison here. If it's a simpler ternary I would agree, but because self.exists(path) is a little opaque and as soon as that async is added in it gets a little unclear.

created is a value that isn't used until after requests are made. If requests fail, this result is thrown away. This also means it delays upload until this result is awaited. This should go after the requests.

coveralls · 2017-11-06T20:21:17Z

Coverage increased (+0.3%) to 89.429% when pulling a75e70f on Johnetordoff:cloudfiles-large-uploads into ba55731 on CenterForOpenScience:develop.

coveralls · 2017-11-06T21:57:36Z

Coverage increased (+0.3%) to 89.401% when pulling 71ddc86 on Johnetordoff:cloudfiles-large-uploads into ba55731 on CenterForOpenScience:develop.

coveralls · 2017-11-07T14:38:38Z

Coverage increased (+0.3%) to 89.401% when pulling 2bfa324 on Johnetordoff:cloudfiles-large-uploads into ba55731 on CenterForOpenScience:develop.

AddisonSchiller

QA Notes
Other than that ready for next phase.

coveralls · 2017-11-08T03:11:36Z

Coverage increased (+0.3%) to 89.386% when pulling 680aa72 on Johnetordoff:cloudfiles-large-uploads into 473191c on CenterForOpenScience:develop.

coveralls · 2017-11-08T03:11:36Z

Coverage increased (+0.3%) to 89.386% when pulling 680aa72 on Johnetordoff:cloudfiles-large-uploads into 473191c on CenterForOpenScience:develop.

icereval

i'd like to review this, i'll try later this week.

cslzchen · 2017-11-08T21:11:57Z

@icereval thanks. Just fyi that this PR contains code from #283 which adds Cloudfiles as a provider. You might want to take a look at that one as well.

coveralls · 2017-11-15T21:40:57Z

Coverage increased (+0.2%) to 90.191% when pulling 03587f9 on Johnetordoff:cloudfiles-large-uploads into 26bf209 on CenterForOpenScience:develop.

coveralls · 2017-11-16T18:37:01Z

Coverage increased (+0.2%) to 90.191% when pulling 02f4cac on Johnetordoff:cloudfiles-large-uploads into 26bf209 on CenterForOpenScience:develop.

NyanHelsing · 2018-03-05T16:55:37Z

waterbutler/providers/cloudfiles/provider.py

+
+        created = not (await self.exists(path)) if check_created else None
+
+        for i, _ in enumerate(range(0, stream.size, self.SEGMENT_SIZE)):


All of these happen sequntially, Would be better to happen in parallel

NyanHelsing · 2018-03-05T16:58:03Z

waterbutler/providers/cloudfiles/provider.py

-    async def delete(self, path, **kwargs):
+    async def chunked_upload(self, stream, path, check_created=True, fetch_metadata=True):
+
+        created = not (await self.exists(path)) if check_created else None


created is a value that isn't used until after requests are made. If requests fail, this result is thrown away. This also means it delays upload until this result is awaited. This should go after the requests.

NyanHelsing · 2018-03-05T17:00:55Z

waterbutler/providers/cloudfiles/provider.py

+
+        for i, _ in enumerate(range(0, stream.size, self.SEGMENT_SIZE)):
+            data = await stream.read(self.SEGMENT_SIZE)
+            resp = await self.make_request(


This needs a comment explaining what the function of this request is in the upload.

NyanHelsing · 2018-03-05T17:01:05Z

waterbutler/providers/cloudfiles/provider.py

+            )
+            await resp.release()
+
+        resp = await self.make_request(


This needs a comment explaining what the function of this request is in the upload.

NyanHelsing · 2018-03-05T17:03:26Z

waterbutler/providers/cloudfiles/provider.py

        return CloudFilesFileMetadata(data)
+
+    @ensure_connection
+    async def create_folder(self, path, **kwargs):


Does this have anything to do with multipart upload or are there more than one ticket here?

Johnetordoff and others added 10 commits October 12, 2017 15:39

Expand cloudfiles provider to preform all major functions, handle rev…

291eb4a

…isions etc.

Merge branch 'develop' of https://github.com/CenterForOpenScience/wat…

435f830

…erbutler into cloudfiles * 'develop' of https://github.com/CenterForOpenScience/waterbutler: Add PR template.

Add new revisions behavior and clean up metadata

7a759d0

Allow all kwargs for registrations.

9fef6c6

fix import order

03442e3

Add doc strings and clean up

a596fa5

flake fix

d27bd74

fix misspelling

87e5fca

add chunked uploads for cloudfiles

34860a2

Merge branch 'develop' into cloudfiles-large-uploads

a13d088

AddisonSchiller suggested changes Oct 31, 2017

View reviewed changes

cslzchen added the To Be Replaced label Oct 31, 2017

icereval self-requested a review October 31, 2017 21:55

Johnetordoff and others added 3 commits November 6, 2017 14:30

Merge branch 'develop' into cloudfiles-large-uploads

a75e70f

Add conflict handling and clean up tests

0757cc6

remove extra blank line

d1a7a2b

handle name conflict handling.

71ddc86

fix segment size

2bfa324

AddisonSchiller approved these changes Nov 7, 2017

View reviewed changes

cslzchen added Code Review and removed To Be Replaced labels Nov 7, 2017

Merge branch 'develop' into cloudfiles-large-uploads

680aa72

icereval suggested changes Nov 8, 2017

View reviewed changes

Merge branch 'develop' into cloudfiles-large-uploads

03587f9

Update provider.py

02f4cac

cslzchen changed the title ~~[SVCS-548] Cloudfiles large uploads~~ [SVCS-548] [Blocked] [ChunkedUploads] Update Cloudfiles provider to use chunked-uploads Dec 26, 2017

cslzchen removed the Code Review label Jan 3, 2018

NyanHelsing reviewed Mar 5, 2018

View reviewed changes

cslzchen changed the title ~~[SVCS-548] [Blocked] [ChunkedUploads] Update Cloudfiles provider to use chunked-uploads~~ [SVCS-548] Chunked-uploads for Cloudiles Jun 19, 2018

cslzchen changed the title ~~[SVCS-548] Chunked-uploads for Cloudiles~~ [SVCS-548] Chunked Uploads for CloudFiles Jun 19, 2018

cslzchen added On Hold Blocked labels Jun 19, 2018


		created = not (await self.exists(path)) if check_created else None

		for i, _ in enumerate(range(0, stream.size, self.SEGMENT_SIZE)):

Conversation

Johnetordoff commented Oct 24, 2017 • edited by cslzchen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note (Added by Longze)

Ticket

Purpose

Changes

Side effects

QA Notes

Deployment Notes

Uh oh!

coveralls commented Oct 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AddisonSchiller left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NyanHelsing Mar 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Nov 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Nov 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Nov 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AddisonSchiller left a comment

Choose a reason for hiding this comment

Uh oh!

coveralls commented Nov 8, 2017

Uh oh!

coveralls commented Nov 8, 2017

Uh oh!

icereval left a comment

Choose a reason for hiding this comment

Uh oh!

cslzchen commented Nov 8, 2017

Uh oh!

coveralls commented Nov 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Nov 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Johnetordoff commented Oct 24, 2017 •

edited by cslzchen

Loading

coveralls commented Oct 24, 2017 •

edited

Loading

AddisonSchiller left a comment •

edited

Loading

NyanHelsing Mar 5, 2018 •

edited

Loading

coveralls commented Nov 6, 2017 •

edited

Loading

coveralls commented Nov 6, 2017 •

edited

Loading

coveralls commented Nov 7, 2017 •

edited

Loading

coveralls commented Nov 15, 2017 •

edited

Loading

coveralls commented Nov 16, 2017 •

edited

Loading