Conversation
daget/__main__.py
Outdated
There was a problem hiding this comment.
checking the url on a hardcoded list would not cover all cases, all repositories where schema.org contains distrubution information will work and the list of figshare repositories is quite long: https://knowledge.figshare.com/type-of-client/institutions
A more generic solution would be to check if get_file_list_from_repo() returns any values.
| except urllib.error.HTTPError: | ||
| raise ResolveError(f"{url} not found") | ||
|
|
||
| try: |
There was a problem hiding this comment.
looks good, giving more precise errors is good
| # get desitnation directory and create directory | ||
| desitnation = os.path.realpath(args.destination) | ||
|
|
||
| if not os.path.exists(desitnation): |
There was a problem hiding this comment.
not sure why this was removed, checking for empty destination directory was a feature i added to make sure the downloaded dataset will be exactly as the remote source.
| def __init__(self, message, url, supported_urls=None, http_response_code=None): | ||
| super().__init__(message) | ||
| self.url = url | ||
| self.supported_urls = supported_urls or ["dataverse.harvard.edu", "dataverse.no", "snd.se/catalogue", "su.figshare.com", "figshare.scilifelab.se", "zenodo.org"] |
There was a problem hiding this comment.
hardcoded list of repository url:s should be removed.
daget should try to get a file list via schema.org distribution (if it´s not figshare or zenodo) and if this fails it should throw the error instead. keeping a list of all suported url:s in the source coude is not a sustainable soultion
| def get_redirect_url(url): | ||
| # if url provided is a shorthand doi (TODO: check with regex) | ||
| if not url.startswith(('http://', 'https://')): | ||
| if not re.match(r'^https?://', url): |
No description provided.