The is_dir check is fairly expensive, but at least for S3 and Azure when the entries were created as a result of the client's _list_dir method, you can tell for each entry whether it is a directory or a file and immediately set the result on the created CloudPath instance.
For example for the S3Client._list_dir, you could write something like:
paginator = self.client.get_paginator("list_objects_v2")
for result in paginator.paginate(
Bucket=cloud_path.bucket, Prefix=prefix, Delimiter="/", MaxKeys=1000
):
# sub directory names
for result_prefix in result.get("CommonPrefixes", []):
path = S3Path(f"s3://{cloud_path.bucket}/{result_prefix.get('Prefix')}")
path._is_dir = True
yield path
# files in the directory
for result_key in result.get("Contents", []):
path = S3Path(f"s3://{cloud_path.bucket}/{result_key.get('Key')}")
path._is_dir = False
yield path
and modify S3Path.is_dir:
def is_dir(self) -> bool:
if self._is_dir is None:
self._is_dir = self.client._is_file_or_dir(self) == "dir"
return self._is_dir
This makes a HUGE performance difference if you need to call is_dir on the entries returned from iterdir or glob (in my case, when implementing a file dialog that works for cloud paths).
Not sure if this particular implementation is the best way to do this, but something like this is needed.
The
is_dircheck is fairly expensive, but at least for S3 and Azure when the entries were created as a result of the client's_list_dirmethod, you can tell for each entry whether it is a directory or a file and immediately set the result on the created CloudPath instance.For example for the
S3Client._list_dir, you could write something like:and modify
S3Path.is_dir:This makes a HUGE performance difference if you need to call
is_diron the entries returned fromiterdirorglob(in my case, when implementing a file dialog that works for cloud paths).Not sure if this particular implementation is the best way to do this, but something like this is needed.