Some libraries, such as polars and pandas, have an almost seamless method for interacting with cloud storage paths.
e.g.:
import polars as pl
pl.scan_csv('az://container/path/to/file.csv', storage_options={'account_name': 'mystorageaccount'}).collect()
This is nice, because I don't need to import any other libraries, setup credentials or blob clients, etc.
It automatically finds any available credentials in my local environment, presumably with something like DefaultAzureCredential.
This means that when testing locally, I just need to be authenticated with Azure CLI, and everything just works.
I don't even need to manually specify environment variables.
It also means that I can deploy the same code to the server, and it will automatically find the appropriate environment variables to authenticate as a service principal with AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, etc.
I may have missed something, but it seems that cloudpathlib has not enabled this kind of automatic credential detection with DefaultAzureCredential. Instead, I need to do the following to get an authenticated working CloudPath:
from azure.identity import DefaultAzureCredential
from cloudpathlib import CloudPath, AzureBlobClient
credential = DefaultAzureCredential()
client = AzureBlobClient(account_url="https://mystorageaccount.blob.core.windows.net", credential=credential)
path = CloudPath('az://container/path/to/file.csv', client=client)
Ideally, it would be nice to be able to do the setup automatically.
I'm imagining the following future state:
from cloudpathlib import CloudPath
path = CloudPath('az://container/path/to/file.csv', storage_options={'account_name': 'mystorageaccount'})
(There may be a nicer way to specify the account name. I'm just copying the API from polars and pandas here. I kind of wish that it was standard to include the account name in the path somehow, as passing the account name in separately feels clunky to me. It would be nice if we could use az://mystorageaccount/container/...)
See the documentation for DefaultAzureCredential. (There's a reason it's called Default!):
Note: If you are using fsspec + adlfs, adlfs requires the storage option anon=False to be set to enable DefaultAzureCredential.
For example, when using pandas, you must specify storage_options={'anon': False}.
When using fsspec directly, you need to pass it as follows:
fs = fsspec.filesystem('az', account_name='mystorageaccount', anon=False)
For more details, see:
https://github.com/fsspec/adlfs#setting-credentials
Some libraries, such as
polarsandpandas, have an almost seamless method for interacting with cloud storage paths.e.g.:
This is nice, because I don't need to import any other libraries, setup credentials or blob clients, etc.
It automatically finds any available credentials in my local environment, presumably with something like
DefaultAzureCredential.This means that when testing locally, I just need to be authenticated with Azure CLI, and everything just works.
I don't even need to manually specify environment variables.
It also means that I can deploy the same code to the server, and it will automatically find the appropriate environment variables to authenticate as a service principal with
AZURE_CLIENT_ID,AZURE_CLIENT_SECRET, etc.I may have missed something, but it seems that
cloudpathlibhas not enabled this kind of automatic credential detection withDefaultAzureCredential. Instead, I need to do the following to get an authenticated workingCloudPath:Ideally, it would be nice to be able to do the setup automatically.
I'm imagining the following future state:
(There may be a nicer way to specify the account name. I'm just copying the API from
polarsandpandashere. I kind of wish that it was standard to include the account name in the path somehow, as passing the account name in separately feels clunky to me. It would be nice if we could useaz://mystorageaccount/container/...)See the documentation for
DefaultAzureCredential. (There's a reason it's called Default!):Note: If you are using
fsspec+adlfs,adlfsrequires the storage optionanon=Falseto be set to enableDefaultAzureCredential.For example, when using
pandas, you must specifystorage_options={'anon': False}.When using
fsspecdirectly, you need to pass it as follows:For more details, see:
https://github.com/fsspec/adlfs#setting-credentials