Skip to content

Added error handling#18

Open
alireza1111 wants to merge 1 commit intoborsna:mainfrom
alireza1111:main
Open

Added error handling#18
alireza1111 wants to merge 1 commit intoborsna:mainfrom
alireza1111:main

Conversation

@alireza1111
Copy link

No description provided.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking the url on a hardcoded list would not cover all cases, all repositories where schema.org contains distrubution information will work and the list of figshare repositories is quite long: https://knowledge.figshare.com/type-of-client/institutions

A more generic solution would be to check if get_file_list_from_repo() returns any values.

except urllib.error.HTTPError:
raise ResolveError(f"{url} not found")

try:
Copy link
Owner

@borsna borsna Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, giving more precise errors is good

# get desitnation directory and create directory
desitnation = os.path.realpath(args.destination)

if not os.path.exists(desitnation):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this was removed, checking for empty destination directory was a feature i added to make sure the downloaded dataset will be exactly as the remote source.

def __init__(self, message, url, supported_urls=None, http_response_code=None):
super().__init__(message)
self.url = url
self.supported_urls = supported_urls or ["dataverse.harvard.edu", "dataverse.no", "snd.se/catalogue", "su.figshare.com", "figshare.scilifelab.se", "zenodo.org"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded list of repository url:s should be removed.
daget should try to get a file list via schema.org distribution (if it´s not figshare or zenodo) and if this fails it should throw the error instead. keeping a list of all suported url:s in the source coude is not a sustainable soultion

def get_redirect_url(url):
# if url provided is a shorthand doi (TODO: check with regex)
if not url.startswith(('http://', 'https://')):
if not re.match(r'^https?://', url):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments