-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[BUG] Failed to read a hfile log block if the key is very long #18450
Copy link
Copy link
Open
Labels
type:bugBug reports and fixesBug reports and fixes
Description
Bug Description
What happened:
If the partition path is very long, a exception occurs while reading the hfile log block:
Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files in partition /var/folders/m1/dqq83wwx42l29ckt3tvr8lgc0000gp/T/junit-17249189237798564258/dataset/2016qwertyuiop045619dd-11b3-4395-84d2-217f95d1f20b/03asdfghjkl93c6b2d0-4922-48cb-ab0e-785c807fbf2e/15zxcvbnmb91d2213-6581-44d1-aa5b-7cd842976d0d from metadata
at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:144)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getAllFilesInPartition(AbstractTableFileSystemView.java:452)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$18(AbstractTableFileSystemView.java:480)
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:469)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestFileSlicesBeforeOrOn(AbstractTableFileSystemView.java:1006)
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.lambda$getLatestFileSlicesBeforeOrOn$65db6450$1(PriorityBasedFileSystemView.java:233)
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:127)
at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getLatestFileSlicesBeforeOrOn(PriorityBasedFileSystemView.java:232)
at org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFileCandidates(SparkUpsertDeltaCommitPartitioner.java:95)
at org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitPartitioner.getSmallFiles(SparkUpsertDeltaCommitPartitioner.java:67)
at org.apache.hudi.table.action.commit.UpsertPartitioner.lambda$getSmallFilesForPartitions$6e0f90ba$1(UpsertPartitioner.java:287)
at org.apache.hudi.client.common.HoodieSparkEngineContext.lambda$mapToPair$786cea6a$1(HoodieSparkEngineContext.java:177)
at org.apache.spark.api.java.JavaPairRDD$.$anonfun$pairFunToScalaFun$1(JavaPairRDD.scala:1073)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1022)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.hudi.exception.HoodieException: Exception when reading log file
at org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.scanInternal(BaseHoodieLogRecordReader.java:397)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordReader.performScan(HoodieMergedLogRecordReader.java:100)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordReader.<init>(HoodieMergedLogRecordReader.java:75)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordReader.<init>(HoodieMergedLogRecordReader.java:55)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordReader$Builder.build(HoodieMergedLogRecordReader.java:270)
at org.apache.hudi.common.table.read.buffer.LogScanningRecordBufferLoader.scanLogFiles(LogScanningRecordBufferLoader.java:52)
at org.apache.hudi.common.table.read.buffer.DefaultFileGroupRecordBufferLoader.getRecordBuffer(DefaultFileGroupRecordBufferLoader.java:81)
at org.apache.hudi.common.table.read.HoodieFileGroupReader.initRecordIterators(HoodieFileGroupReader.java:134)
at org.apache.hudi.common.table.read.HoodieFileGroupReader.getBufferedRecordIterator(HoodieFileGroupReader.java:292)
at org.apache.hudi.common.table.read.HoodieFileGroupReader.getClosableIterator(HoodieFileGroupReader.java:301)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.readSliceWithFilter(HoodieBackedTableMetadata.java:620)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.readSliceAndFilterByKeysIntoList(HoodieBackedTableMetadata.java:690)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupRecordsItr(HoodieBackedTableMetadata.java:668)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.readSliceAndFilterByKeysIntoList(HoodieBackedTableMetadata.java:646)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupIndexRecords(HoodieBackedTableMetadata.java:297)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.readIndexRecords(HoodieBackedTableMetadata.java:540)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.readIndexRecordsWithKeys(HoodieBackedTableMetadata.java:501)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.readIndexRecordsWithKeys(HoodieBackedTableMetadata.java:495)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.readFilesIndexRecords(HoodieBackedTableMetadata.java:186)
at org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:296)
at org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:142)
... 41 more
Caused by: java.lang.NegativeArraySizeException: -110
at org.apache.hudi.io.util.IOUtils.copy(IOUtils.java:160)
at org.apache.hudi.io.hfile.HFileRootIndexBlock.readBlockIndexEntry(HFileRootIndexBlock.java:95)
at org.apache.hudi.io.hfile.HFileRootIndexBlock.readBlockIndex(HFileRootIndexBlock.java:67)
at org.apache.hudi.io.hfile.HFileReaderImpl.readDataBlockIndex(HFileReaderImpl.java:351)
at org.apache.hudi.io.hfile.HFileReaderImpl.initializeMetadata(HFileReaderImpl.java:84)
at org.apache.hudi.io.hfile.HFileReaderImpl.seekTo(HFileReaderImpl.java:204)
at org.apache.hudi.io.storage.HoodieNativeAvroHFileReader$RecordByKeyIterator.<init>(HoodieNativeAvroHFileReader.java:423)
at org.apache.hudi.io.storage.HoodieNativeAvroHFileReader.getEngineRecordsByKeysIterator(HoodieNativeAvroHFileReader.java:225)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupEngineRecords(HoodieHFileDataBlock.java:203)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getEngineRecordIterator(HoodieDataBlock.java:266)
at org.apache.hudi.common.table.read.buffer.FileGroupRecordBuffer.getRecordsIterator(FileGroupRecordBuffer.java:198)
at org.apache.hudi.common.table.read.buffer.KeyBasedFileGroupRecordBuffer.processDataBlock(KeyBasedFileGroupRecordBuffer.java:75)
at org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.processQueuedBlocksForInstant(BaseHoodieLogRecordReader.java:437)
at org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.scanInternal(BaseHoodieLogRecordReader.java:385)
... 61 more
What you expected:
No exceptions happen.
Steps to reproduce:
- Modify partition paths in
HoodieTestDataGenerator
public static final String DEFAULT_FIRST_PARTITION_PATH = "2016qwertyuiop" + UUID.randomUUID() + "/03asdfghjkl" + UUID.randomUUID() + "/15zxcvbnm" + UUID.randomUUID();
public static final String DEFAULT_SECOND_PARTITION_PATH = "2015qwertyuiop" + UUID.randomUUID() + "/03asdfghjkl" + UUID.randomUUID() + "/16zxcvbnm" + UUID.randomUUID();
public static final String DEFAULT_THIRD_PARTITION_PATH = "2015qwertyuiop" + UUID.randomUUID() + "/03asdfghjkl" + UUID.randomUUID() + "/17zxcvbnm" + UUID.randomUUID();
- Running test
testIncrementalQueryMORWithCompactionAndCleaninTestMORDataSource
Environment
Hudi version: 1.1.1
Query engine: (Spark/Flink/Trino etc) Spark
Relevant configs:
Logs and Stack Trace
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
type:bugBug reports and fixesBug reports and fixes