Skip to content

[BUG] Failed to read a hfile log block if the key is very long #18450

@Joy-2000

Description

@Joy-2000

Bug Description

What happened:

If the partition path is very long, a exception occurs while reading the hfile log block:

Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files in partition /var/folders/m1/dqq83wwx42l29ckt3tvr8lgc0000gp/T/junit-17249189237798564258/dataset/2016qwertyuiop045619dd-11b3-4395-84d2-217f95d1f20b/03asdfghjkl93c6b2d0-4922-48cb-ab0e-785c807fbf2e/15zxcvbnmb91d2213-6581-44d1-aa5b-7cd842976d0d from metadata
	at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:144)
	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFilesInPartition(AbstractTableFileSystemView.java:452)
	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$18(AbstractTableFileSystemView.java:480)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:469)
	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:1006)
	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.lambda$getLatestFileSlicesBeforeOrOn$65db6450$1(PriorityBasedFileSystemView.java:233)
	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:127)
	at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:232)
	at org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFileCandidates(SparkUpsertDeltaCommitPartitioner.java:95)
	at org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFiles(SparkUpsertDeltaCommitPartitioner.java:67)
	at org.apache.hudi.table.action.commit.UpsertPartitioner.lambda$getSmallFilesForPartitions$6e0f90ba$1(UpsertPartitioner.java:287)
	at org.apache.hudi.client.common.HoodieSparkEngineContext.lambda$mapToPair$786cea6a$1(HoodieSparkEngineContext.java:177)
	at org.apache.spark.api.java.JavaPairRDD$.$anonfun$pairFunToScalaFun$1(JavaPairRDD.scala:1073)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1022)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:139)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.hudi.exception.HoodieException: Exception when reading log file 
	at org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.scanInternal(BaseHoodieLogRecordReader.java:397)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordReader.performScan(HoodieMergedLogRecordReader.java:100)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordReader.<init>(HoodieMergedLogRecordReader.java:75)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordReader.<init>(HoodieMergedLogRecordReader.java:55)
	at org.apache.hudi.common.table.log.HoodieMergedLogRecordReader$Builder.build(HoodieMergedLogRecordReader.java:270)
	at org.apache.hudi.common.table.read.buffer.LogScanningRecordBufferLoader.scanLogFiles(LogScanningRecordBufferLoader.java:52)
	at org.apache.hudi.common.table.read.buffer.DefaultFileGroupRecordBufferLoader.getRecordBuffer(DefaultFileGroupRecordBufferLoader.java:81)
	at org.apache.hudi.common.table.read.HoodieFileGroupReader.initRecordIterators(HoodieFileGroupReader.java:134)
	at org.apache.hudi.common.table.read.HoodieFileGroupReader.getBufferedRecordIterator(HoodieFileGroupReader.java:292)
	at org.apache.hudi.common.table.read.HoodieFileGroupReader.getClosableIterator(HoodieFileGroupReader.java:301)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.readSliceWithFilter(HoodieBackedTableMetadata.java:620)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.readSliceAndFilterByKeysIntoList(HoodieBackedTableMetadata.java:690)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupRecordsItr(HoodieBackedTableMetadata.java:668)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.readSliceAndFilterByKeysIntoList(HoodieBackedTableMetadata.java:646)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupIndexRecords(HoodieBackedTableMetadata.java:297)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.readIndexRecords(HoodieBackedTableMetadata.java:540)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.readIndexRecordsWithKeys(HoodieBackedTableMetadata.java:501)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.readIndexRecordsWithKeys(HoodieBackedTableMetadata.java:495)
	at org.apache.hudi.metadata.HoodieBackedTableMetadata.readFilesIndexRecords(HoodieBackedTableMetadata.java:186)
	at org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:296)
	at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:142)
	... 41 more
Caused by: java.lang.NegativeArraySizeException: -110
	at org.apache.hudi.io.util.IOUtils.copy(IOUtils.java:160)
	at org.apache.hudi.io.hfile.HFileRootIndexBlock.readBlockIndexEntry(HFileRootIndexBlock.java:95)
	at org.apache.hudi.io.hfile.HFileRootIndexBlock.readBlockIndex(HFileRootIndexBlock.java:67)
	at org.apache.hudi.io.hfile.HFileReaderImpl.readDataBlockIndex(HFileReaderImpl.java:351)
	at org.apache.hudi.io.hfile.HFileReaderImpl.initializeMetadata(HFileReaderImpl.java:84)
	at org.apache.hudi.io.hfile.HFileReaderImpl.seekTo(HFileReaderImpl.java:204)
	at org.apache.hudi.io.storage.HoodieNativeAvroHFileReader$RecordByKeyIterator.<init>(HoodieNativeAvroHFileReader.java:423)
	at org.apache.hudi.io.storage.HoodieNativeAvroHFileReader.getEngineRecordsByKeysIterator(HoodieNativeAvroHFileReader.java:225)
	at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupEngineRecords(HoodieHFileDataBlock.java:203)
	at org.apache.hudi.common.table.log.block.HoodieDataBlock.getEngineRecordIterator(HoodieDataBlock.java:266)
	at org.apache.hudi.common.table.read.buffer.FileGroupRecordBuffer.getRecordsIterator(FileGroupRecordBuffer.java:198)
	at org.apache.hudi.common.table.read.buffer.KeyBasedFileGroupRecordBuffer.processDataBlock(KeyBasedFileGroupRecordBuffer.java:75)
	at org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.processQueuedBlocksForInstant(BaseHoodieLogRecordReader.java:437)
	at org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.scanInternal(BaseHoodieLogRecordReader.java:385)
	... 61 more

What you expected:

No exceptions happen.

Steps to reproduce:

  1. Modify partition paths in HoodieTestDataGenerator
public static final String DEFAULT_FIRST_PARTITION_PATH = "2016qwertyuiop" + UUID.randomUUID() + "/03asdfghjkl" + UUID.randomUUID() + "/15zxcvbnm" + UUID.randomUUID();
  public static final String DEFAULT_SECOND_PARTITION_PATH = "2015qwertyuiop" + UUID.randomUUID() + "/03asdfghjkl" + UUID.randomUUID() + "/16zxcvbnm" + UUID.randomUUID();
  public static final String DEFAULT_THIRD_PARTITION_PATH = "2015qwertyuiop" + UUID.randomUUID() + "/03asdfghjkl" + UUID.randomUUID() + "/17zxcvbnm" + UUID.randomUUID();
  1. Running test testIncrementalQueryMORWithCompactionAndClean in TestMORDataSource

Environment

Hudi version: 1.1.1
Query engine: (Spark/Flink/Trino etc) Spark
Relevant configs:

Logs and Stack Trace

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:bugBug reports and fixes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions