[SVCS-548] Chunked Uploads for CloudFiles#289
[SVCS-548] Chunked Uploads for CloudFiles#289Johnetordoff wants to merge 18 commits intoCenterForOpenScience:developfrom
Conversation
…erbutler into cloudfiles * 'develop' of https://github.com/CenterForOpenScience/waterbutler: Add PR template.
There was a problem hiding this comment.
Tried to mostly keep the review to things related to the chunked-upload commit. I think there were maybe 1 or two style things i commented on that may have been from your original CloudFiles ticket.
also maybe add QA notes for jira ticket. That way whoever is testing knows how to trigger the chunked uploads etc.
| """ | ||
| assert src_path.is_dir, 'src_path must be a directory' | ||
| assert asyncio.iscoroutinefunction(func), 'func must be a coroutine' | ||
|
|
There was a problem hiding this comment.
No need to delete this blank line
| assert aiohttpretty.has_call(method='PUT', uri=url) | ||
|
|
||
| @pytest.mark.asyncio | ||
| @pytest.mark.aiohttpretty |
There was a problem hiding this comment.
What is this test testing over the one above it?
There was a problem hiding this comment.
| revision_url = connected_provider.build_url('', container='versions-container', **query) | ||
| aiohttpretty.register_json_uri('GET', revision_url, body=revision_list) | ||
|
|
||
| metadata_url = connected_provider.build_url(path.path) |
| :param bool fetch_metadata: If true upload will return metadata | ||
| :rtype (dict/None, bool): | ||
| """ | ||
| if stream.size > self.SEGMENT_SIZE: |
There was a problem hiding this comment.
Question: Does cloudfiles need a call to handle_naming?
| async def delete(self, path, **kwargs): | ||
| async def chunked_upload(self, stream, path, check_created=True, fetch_metadata=True): | ||
|
|
||
| created = not (await self.exists(path)) if check_created else None |
There was a problem hiding this comment.
in upload, you have it as
if check_created:
created = not (await self.exists(path))
else:
created = None
which is much more readable. Maybe go back to that version instead of the more confusing 1 liner?
There was a problem hiding this comment.
Readability is pretty subjective, if you recognize this as a ternary operator this is a pretty simple statement. You can read more about the ternary operators here: http://book.pythontips.com/en/latest/ternary_operators.html.
There was a problem hiding this comment.
Hm I might agree with Addison here. If it's a simpler ternary I would agree, but because self.exists(path) is a little opaque and as soon as that async is added in it gets a little unclear.
There was a problem hiding this comment.
created is a value that isn't used until after requests are made. If requests fail, this result is thrown away. This also means it delays upload until this result is awaited. This should go after the requests.
AddisonSchiller
left a comment
There was a problem hiding this comment.
QA Notes
Other than that ready for next phase.
1 similar comment
icereval
left a comment
There was a problem hiding this comment.
i'd like to review this, i'll try later this week.
|
|
||
| created = not (await self.exists(path)) if check_created else None | ||
|
|
||
| for i, _ in enumerate(range(0, stream.size, self.SEGMENT_SIZE)): |
There was a problem hiding this comment.
All of these happen sequntially, Would be better to happen in parallel
| async def delete(self, path, **kwargs): | ||
| async def chunked_upload(self, stream, path, check_created=True, fetch_metadata=True): | ||
|
|
||
| created = not (await self.exists(path)) if check_created else None |
There was a problem hiding this comment.
created is a value that isn't used until after requests are made. If requests fail, this result is thrown away. This also means it delays upload until this result is awaited. This should go after the requests.
|
|
||
| for i, _ in enumerate(range(0, stream.size, self.SEGMENT_SIZE)): | ||
| data = await stream.read(self.SEGMENT_SIZE) | ||
| resp = await self.make_request( |
There was a problem hiding this comment.
This needs a comment explaining what the function of this request is in the upload.
| ) | ||
| await resp.release() | ||
|
|
||
| resp = await self.make_request( |
There was a problem hiding this comment.
This needs a comment explaining what the function of this request is in the upload.
| return CloudFilesFileMetadata(data) | ||
|
|
||
| @ensure_connection | ||
| async def create_folder(self, path, **kwargs): |
There was a problem hiding this comment.
Does this have anything to do with multipart upload or are there more than one ticket here?
Note (Added by Longze)
This PR contains code from: #283.
Ticket
https://openscience.atlassian.net/browse/SVCS-548
Purpose
This PR allows one to upload files larger than 5GBs
Changes
Adds a new method to clould files provider with tests and raises max body limit to arbitrarily high 1 Terra byte.
Side effects
None that I know of.
QA Notes
It will take a long time to upload a > 5 GB files, have something else to do while you wait.
Deployment Notes
None that I know of.