chore: add timing logs for file index partition and file listing by suryaprasanna · Pull Request #18417 · apache/hudi

suryaprasanna · 2026-03-29T20:22:55Z

Describe the issue this Pull Request addresses

File index operations currently do not provide enough visibility into how much time is spent listing partitions, fetching files from metadata, and filtering files into file slices. This makes it harder to debug slow queries and identify where file-index time is being spent.

Summary and Changelog

Adds timing and context logs in BaseHoodieTableFileIndex for key file-index stages.

Log time taken to list partition paths with and without partition predicates
Log cache miss counts before fetching uncached partition files
Log time taken by getAllFilesInPartitions
Log time taken by filterFiles while building file slices
Include table name in partition listing failure messages

Impact

No public API or user-facing behavior change. This improves observability for file-index execution and helps diagnose performance issues in partition and file listing paths.

Risk Level

low

This change only adds logging and slightly updates an internal exception message with table context. No functional behavior is intended to change.

Documentation Update

none

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

voonhous

Main concern here is that the logs are too noisy.

Are these logs only used for debugging?
Should we use log.debug(...) for routine timing logs instead of log.info(...)?
We can consider logging at info level when something is slow, i.e. if (elapsed > threshold) log.info(...) or unexpected (e.g., large cache miss ratio)

voonhous · 2026-03-30T07:54:57Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

                      .collect(Collectors.toList())
          ));
+    } finally {
+      log.info("On {} with query instant as {}, it took {}ms to filter {} files into file slices across {} partitions",


My main concern here is that the logs generated here will be too noisy.

This fires on every single query, even if it took 1ms. For a busy Spark application hitting multiple Hudi tables, this alone generates one info line per query.

Moved to debug.

voonhous · 2026-03-30T07:57:01Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

+      log.info("On {}, out of {} partition paths, {} are missing from cache. Loading them.",
+          metaClient.getTableConfig().getTableName(), partitionPaths.size(), missingPartitionPaths.size());


uard the "missing from cache" log with if (missingPartitionPaths.size() > 0) there is no point logging when everything is cached.

Added a condition to handle, please review it again.

codecov-commenter · 2026-03-31T03:45:53Z

Codecov Report

❌ Patch coverage is 78.26087% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.25%. Comparing base (1ff0506) to head (d846af1).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
...java/org/apache/hudi/BaseHoodieTableFileIndex.java	78.26%	4 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18417      +/-   ##
============================================
- Coverage     68.26%   68.25%   -0.01%     
- Complexity    27761    27769       +8     
============================================
  Files          2439     2439              
  Lines        134355   134374      +19     
  Branches      16210    16209       -1     
============================================
+ Hits          91715    91722       +7     
- Misses        35543    35554      +11     
- Partials       7097     7098       +1

Flag	Coverage Δ
common-and-other-modules	`44.36% <65.21%> (+<0.01%)`	⬆️
hadoop-mr-java-client	`44.98% <65.21%> (-0.01%)`	⬇️
spark-client-hadoop-common	`48.41% <65.21%> (+<0.01%)`	⬆️
spark-java-tests	`48.78% <78.26%> (+<0.01%)`	⬆️
spark-scala-tests	`45.29% <78.26%> (+<0.01%)`	⬆️
utilities	`38.38% <65.21%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...java/org/apache/hudi/BaseHoodieTableFileIndex.java	`83.12% <78.26%> (-0.37%)`	⬇️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hudi-bot · 2026-03-31T04:44:49Z

CI report:

d846af1 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

voonhous · 2026-03-31T06:20:58Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

      return result;
    } catch (IOException e) {
-      throw new HoodieIOException("Failed to list partition paths", e);
+      throw new HoodieIOException("On " + metaClient.getTableConfig().getTableName() + " Failed to list partition paths", e);


Nit:

Suggested change

throw new HoodieIOException("On " + metaClient.getTableConfig().getTableName() + " Failed to list partition paths", e);

throw new HoodieIOException("On " + metaClient.getTableConfig().getTableName() + " Failed to list partition paths", e);```

voonhous · 2026-03-31T06:22:25Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

      matchedPartitionPaths = tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(relativePartitionPaths,
          partitionFields, partitionColumnPredicates);
    } catch (IOException e) {
      throw new HoodieIOException("Error fetching partition paths", e);


Nit: Inconsistent error message enrichment, debug has table name, we should also include table name here too for better debuggability. (Even though the exception might contain some clues to this / have partition path.)

Ideally, let's make it the same exception message in listPartitionPathFiles.

voonhous · 2026-03-31T06:24:54Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

    } catch (IOException e) {
      throw new HoodieIOException("Error fetching partition paths", e);
+    } finally {
+      log.debug("On {}, it took {} ms to list partition paths with {} relativePartitionPaths and partition predicates",


Did you mean to log partitionColumnPredicates here too after and partition predicates.

voonhous · 2026-03-31T06:27:30Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

        if (queryType == HoodieTableQueryType.INCREMENTAL && incrementalQueryStartTime.isPresent() && !isBeforeTimelineStarts()) {
          HoodieTimeline timelineToQuery = findInstantsInRange();
          matchedPartitionPaths = TimelineUtils.getWrittenPartitions(timelineToQuery);
        } else {


Is this intentional? Only the getPartitionPathWithPathPrefixes branch is timed.

The TimelineUtils.getWrittenPartitions(timelineToQuery) branch (incremental queries) gets no timing.

IIUC, the incremental-query partition listing is the not likely to be slow?

🤖 The asymmetry is reasonable IMO — TimelineUtils.getWrittenPartitions reads from the already-loaded in-memory timeline, so it's typically O(commits) with no filesystem I/O, while getPartitionPathWithPathPrefixes does actual path listing against the filesystem or metadata table. That said, for very large timelines (many commits, long retention) it could be non-trivial, so adding timing there for completeness wouldn't hurt — but I wouldn't call it a blocker.

voonhous · 2026-03-31T06:28:44Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

+    if (missingPartitionPaths.isEmpty()) {
+      return cachedPartitionPaths.values().stream()
+          .flatMap(Collection::stream)
+          .collect(Collectors.toList());
+    }
+


Early return is a behavioral change, this should be called out.

Good optimization, but the PR description says "No functional behavior is intended to change".

Please update the description accordingly.

voonhous · 2026-03-31T06:29:36Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

+      throw new HoodieIOException("On " + metaClient.getTableConfig().getTableName() + " Failed to list partition paths", e);
+    } finally {
+      log.debug("On {}, it took {} ms to fetch files for {} uncached partitions via getAllFilesInPartitions",
+          metaClient.getTableConfig().getTableName(), timer.endTimer(), missingPartitionPaths.size());


Nit: feel free to ignore, timer includes close time here. I assume closing is cheap, so this doesn't matter.

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for contributing! The approach is solid and the observability improvements are useful. CI is passing and the change is low-risk. A couple of small observations in the inline comments — the main open threads from @voonhous around the getPartitionPathWithPathPrefixes branch not being timed and the missing table name in the listPartitionPaths exception message seem worth addressing before merge.

yihua · 2026-04-04T14:37:29Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

                      .collect(Collectors.toList())
          ));
+    } finally {
+      log.debug("On {} with query instant as {}, it took {}ms to filter {} files into file slices across {} partitions",


🤖 The finally here has no corresponding catch, so this log fires with the same wording whether the filter succeeded or threw an exception. During debugging, seeing "it took Xms to filter Y files into file slices" right before a stack trace could be confusing. Would it be worth omitting the log (or using a different message) on the exception path, similar to how you guard the cache-miss log?

yihua · 2026-04-04T14:37:29Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

@@ -502,7 +527,10 @@



🤖 Same pattern here — on the IOException path the finally logs "it took Xms to fetch files for N uncached partitions via getAllFilesInPartitions", which implies the fetch completed. Since the exception message is also logged by the caller, the timing line could mislead someone into thinking the fetch succeeded. Could this message be written in a way that's unambiguous on both success and failure paths, e.g. "...time spent in getAllFilesInPartitions attempt..."?

yihua · 2026-04-04T14:37:30Z

hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java

        if (queryType == HoodieTableQueryType.INCREMENTAL && incrementalQueryStartTime.isPresent() && !isBeforeTimelineStarts()) {
          HoodieTimeline timelineToQuery = findInstantsInRange();
          matchedPartitionPaths = TimelineUtils.getWrittenPartitions(timelineToQuery);
        } else {


🤖 The asymmetry is reasonable IMO — TimelineUtils.getWrittenPartitions reads from the already-loaded in-memory timeline, so it's typically O(commits) with no filesystem I/O, while getPartitionPathWithPathPrefixes does actual path listing against the filesystem or metadata table. That said, for very large timelines (many commits, long retention) it could be non-trivial, so adding timing there for completeness wouldn't hurt — but I wouldn't call it a blocker.

github-actions bot added the size:S PR with lines of changes in (10, 100] label Mar 29, 2026

voonhous reviewed Mar 30, 2026

View reviewed changes

suryaprasanna added 2 commits March 30, 2026 19:23

Log time taken in listing partitions and list file under them

38dfb7d

Address review comments

d846af1

suryaprasanna force-pushed the file-index-logging branch from 6b4efb3 to d846af1 Compare March 31, 2026 02:32

voonhous reviewed Mar 31, 2026

View reviewed changes

yihua reviewed Apr 4, 2026

View reviewed changes

		log.info("On {}, out of {} partition paths, {} are missing from cache. Loading them.",
		metaClient.getTableConfig().getTableName(), partitionPaths.size(), missingPartitionPaths.size());

	throw new HoodieIOException("On " + metaClient.getTableConfig().getTableName() + " Failed to list partition paths", e);
	throw new HoodieIOException("On " + metaClient.getTableConfig().getTableName() + " Failed to list partition paths", e);```

Conversation

suryaprasanna commented Mar 29, 2026

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

voonhous left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 31, 2026

Codecov Report

Uh oh!

hudi-bot commented Mar 31, 2026

CI report:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants