Skip to content

Conversation

@pvillard31
Copy link
Contributor

Summary

NIFI-15386 - Upgrade Box SDK to Version 10

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000
  • Pull request contains commits signed with a registered key indicating Verified status

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using ./mvnw clean install -P contrib-check
    • JDK 21
    • JDK 25

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@pvillard31
Copy link
Contributor Author

@awelless @ncover21 - I would highly appreciate your eyes on this!

Copy link
Contributor

@awelless awelless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review.
The changes marked as "nitpicking" are mostly stylistic and subjective. Feel free to ignore them.
The rest of the comments are worth of attention, as they may affect the processor behavior.

I reviewed the controller services classes. Also I reviewed Events, Metadata Instance, and Extract Metadata processors and their tests.
I'll review the rest during the next review round.

ProxyServiceMigration.renameProxyConfigurationServiceProperty(config);
// Remove timeout properties that are no longer supported in Box SDK 10.x
config.removeProperty("Connect Timeout");
config.removeProperty("Read Timeout");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to still configure these fields.

        final OkHttpClient okHttpClient = BoxNetworkClient.getDefaultOkHttpClientBuilder()
                .connectTimeout(context.getProperty(CONNECT_TIMEOUT).asDuration())
                .readTimeout(context.getProperty(READ_TIMEOUT).asDuration())
                .build();
        final NetworkClient networkClient = new BoxNetworkClient(okHttpClient);
        final NetworkSession networkSession = new NetworkSession.Builder()
                .networkClient(networkClient)
                .build();
        return new BoxClient.Builder(new BoxJWTAuth(...))
                .networkSession(networkSession)
                .build();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option to consider, although a more radical, is to use NiFi web-client instead of okhttp.
NetworkClient provides a concise interface which we can use as an adapter to a web-client.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the use of the NiFi web client, I'd consider this as a follow up improvement. I think this does make sense but I don't think this should be part of this initial effort.

Copy link
Contributor

@awelless awelless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 of review.
I reviewed all source files, but not the tests.

Comment on lines 157 to 159
} catch (Exception e) {
throw new ProcessException("Failed to download file from Box", e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the SDK errors thrown by fetchFile would've been propagated to the caller and handled in onTrigger.
Since everything is wrapped into a ProcessException, so the handling of BoxAPIError in onTrigger won't take place. Let's catch only IOException here, passing the rest up the call chain

}

final GetFileByIdQueryParams queryParams = new GetFileByIdQueryParams.Builder()
.fields(List.of("id", "name", "size", "modified_at", "path_collection"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking. These fields can be moved to a constant.

"owned_by", "parent", "etag", "sha1", "item_status", "sequence_id", "path_collection",
"content_created_at", "content_modified_at", "trashed_at", "purged_at", "shared_link");
final GetFileByIdQueryParams queryParams = new GetFileByIdQueryParams.Builder()
.fields(List.of("name", "description", "size", "created_at", "modified_at",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking. A constant for the parameter list.

Comment on lines 177 to 178
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a method in the SDK to download the representation, either.
I have some concerns about using java's HttpClient here. Any configuration of BoxClientService is ignored.

I recommend considering the usage of NetworkSession and NetworkClient from boxClient and see how it's done in boxClient.getDownloads().downloadFile() method. This is going to be more verbose, but will use an already configured Box connection.
If we decide to proceed with another client, will a dedicated WebClient be a better option?
If we decide to keep using java HttpClient, shall we create it only once in @OnScheduled?

final Set<String> allowedRoles,
final Set<String> allowedStatuses,
final Map<String, List<String>> attributeValues) {
int count = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking. No need to keep count, as collaborations.getEntries() returns a list, so size() can be used.

final Items items = boxClient.getFolders().getFolderItems(parentFolderId, queryParams);

if (items.getEntries() != null) {
for (Object itemObj : items.getEntries()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

items.getEntries() returns a list of Item. Item is OneOfThree<FileFull, FolderFull, WebLink>.
So:

  1. We should use OneOfThree api instead of instanceof checks.
  2. We can operate on FolderFull instead of folder ids. This will allow us to get FolderFull in onTrigger (line 232), so fetching a folder info on line 233 would be redundant.

Comment on lines 232 to 233
final String parentFolderId = getOrCreateDirectParentFolder(context, flowFile);
final FolderFull parentFolderInfo = getFolderInfo(parentFolderId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching a full folder representation seems redundant. See my comment in the getFolderIdByName method.

Comment on lines 309 to 310
if (existingFileId.isPresent()) {
final String fileId = existingFileId.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking. Optional.isPresent() - get() can be replaced with a more type-safe and idiomatic handling. Otherwise it doesn't differ from a plain null check.

        return getFileIdByName(filename, parentFolderId)
                .flatMap(fileId -> {
                    final UploadFileVersionRequestBodyAttributesField attributes = new UploadFileVersionRequestBodyAttributesField(filename);
                    final UploadFileVersionRequestBody requestBody = new UploadFileVersionRequestBody(attributes, inputStream);
                    final Files files = boxClient.getUploads().uploadFileVersion(fileId, requestBody);

                    return files.getEntries() != null && !files.getEntries().isEmpty()
                            ? Optional.of(files.getEntries().getFirst())
                            : Optional.empty();
                });

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plain null check is also fine, but then there is little value in wrapping the result of getFileIdByName in Optional.

Comment on lines 316 to 318
if (files.getEntries() != null && !files.getEntries().isEmpty()) {
return files.getEntries().get(0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we throw an exception if file upload succeeds, but no file metadata is returned?
How shall the caller handle absent FileFull?

Comment on lines 329 to 332
if (files.getEntries() != null && !files.getEntries().isEmpty()) {
return files.getEntries().get(0);
}
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we throw an exception if file upload succeeds, but no file metadata is returned?
How shall the caller handle absent FileFull?

@pvillard31
Copy link
Contributor Author

Thanks @awelless - I pushed another commit to address your comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants