chore: Query blobs in batches #2480

alpe · 2025-07-21T14:13:29Z

max 100

Overview

github-actions · 2025-07-21T14:13:42Z

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Jul 22, 2025, 1:51 PM

gemini-code-assist

Summary of Changes

Hello @alpe, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant optimization to the blob retrieval process by implementing batched queries. This change aims to enhance the efficiency and robustness of data access, particularly when handling numerous blob IDs, and also improves the clarity of error messages related to these operations.

Highlights

Blob Retrieval Optimization: The RetrieveWithHelpers function has been refactored to fetch blobs in batches of 100 IDs. This change replaces a single large query with multiple smaller, batched queries, which can improve performance and stability when dealing with a large number of blob identifiers.
Enhanced Error Reporting: Error messages generated during blob retrieval now include the specific batch range (e.g., batch 0-1) that failed. This provides more granular information for debugging issues related to partial blob retrieval failures.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces batching for blob retrieval to improve efficiency when handling a large number of blobs. I've identified a few areas for improvement, including addressing a potential compilation issue with the min function on older Go versions, using a named constant for the batch size, and correcting a typo in a test case.

types/da.go

gemini-code-assist · 2025-07-21T14:15:21Z

types/da.go

 		}
 	}
+	// 2. Get Blobs using the retrieved IDs in batches
+	batchSize := 100


The batch size is currently hardcoded. Defining it as a named constant improves readability and maintainability.

const batchSize = 100

gemini-code-assist · 2025-07-21T14:15:21Z

types/da_test.go

 			getBlobsErr:    errors.New("network error during blob retrieval"),
 			expectedCode:   coreda.StatusError,
-			expectedErrMsg: "failed to get blobs: network error during blob retrieval",
+			expectedErrMsg: "ailed to get blobs for batch 0-1: network error during blob retrieval",


There's a typo in the expected error message. The word 'failed' is missing the leading 'f'.

Suggested change

expectedErrMsg: "ailed to get blobs for batch 0-1: network error during blob retrieval",

expectedErrMsg: "failed to get blobs for batch 0-1: network error during blob retrieval",

not sure how the test currently passes with this typo, worth looking into

codecov · 2025-07-21T14:17:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.04%. Comparing base (2b3c72b) to head (a012bd6).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2480      +/-   ##
==========================================
- Coverage   73.05%   73.04%   -0.01%     
==========================================
  Files          68       68              
  Lines        6542     6548       +6     
==========================================
+ Hits         4779     4783       +4     
- Misses       1364     1366       +2     
  Partials      399      399

Flag	Coverage Δ
combined	`73.04% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Manav-Aggarwal

LGTM mostly, just a small typo that needs fixing

Manav-Aggarwal · 2025-07-21T18:53:54Z

types/da_test.go

 			getBlobsErr:    errors.New("network error during blob retrieval"),
 			expectedCode:   coreda.StatusError,
-			expectedErrMsg: "failed to get blobs: network error during blob retrieval",
+			expectedErrMsg: "ailed to get blobs for batch 0-1: network error during blob retrieval",


not sure how the test currently passes with this typo, worth looking into

claude · 2025-07-22T09:19:12Z

Claude finished @alpe's task —— View job

Code Review Complete

Reviewing PR: "Query blobs in batches"

Todo List:

Read and analyze types/da.go changes
Read and analyze types/da_test.go changes
Evaluate code quality and best practices
Check for potential bugs or issues
Assess performance considerations
Review security concerns
Analyze test coverage
Provide comprehensive review feedback

Overview

This PR implements blob retrieval batching in the RetrieveWithHelpers function, changing from a single large query to processing blob IDs in batches of 100. This is a solid optimization for handling large numbers of blob identifiers.

Issues Identified

🚨 Critical Issues

Go Compatibility Issue (types/da.go:159)
- The built-in min() function is only available in Go 1.21+
- Fix: Use explicit min logic for broader compatibility:
```
end := i + batchSize
if end > len(idsResult.IDs) {
    end = len(idsResult.IDs)
}
```
Test Typo (types/da_test.go:216)
- Expected error message contains typo: "ailed" should be "failed"
- Fix: Change to "failed to get blobs for batch 0-1: network error during blob retrieval"

⚠️ High Priority Issues

Hardcoded Magic Number (types/da.go:156)
- Batch size of 100 is hardcoded
- Fix: Define as named constant:
```
const defaultBatchSize = 100
```

Code Quality & Best Practices

Positive aspects:

✅ Clean implementation of batching logic
✅ Proper error handling with batch-specific error messages
✅ Maintains existing function signature (good for backward compatibility)
✅ Appropriate use of make() with capacity for performance
✅ Good logging with relevant context

Areas for improvement:

Consider making batch size configurable rather than hardcoded
The error message format is good for debugging (batch %d-%d)

Performance Considerations

Benefits:

✅ Reduced memory pressure: Processing in smaller chunks reduces peak memory usage
✅ Better error isolation: A failure in one batch doesn't affect others
✅ Improved responsiveness: Allows for better progress tracking and potential cancellation

Potential concerns:

Network overhead: Multiple smaller requests may have more overhead than one large request
Context switching: More iterations could add minor CPU overhead

Overall assessment: The performance trade-offs are reasonable. The benefits of reduced memory usage and better error handling likely outweigh the minor overhead.

Security Concerns

✅ No security issues identified:

No sensitive data exposure
Proper context usage for cancellation
Input validation remains unchanged (handled by underlying DA layer)
Error messages don't leak sensitive information

Test Coverage

Good coverage:

✅ Test cases cover the new batching error message format
✅ Existing test structure remains comprehensive
✅ All major error paths are tested

Suggestions:

Consider adding a test case with exactly 100 IDs to test batch boundary
Consider testing with >100 IDs to verify multiple batch behavior

Bug Risk Assessment

Low risk:

The batching logic is straightforward and well-contained
Error handling preserves existing behavior
No breaking changes to the interface

Potential edge cases:

Empty batch handling (already covered by existing null checks)
Single ID handling (works with current implementation)

Recommendations

Must fix: Address Go compatibility issue with min() function
Must fix: Fix typo in test case
Should fix: Use named constant for batch size
Consider: Making batch size configurable via parameter or config

Summary

This is a well-implemented optimization that addresses a real performance concern. The batching approach is sound and the error handling improvements are valuable. The main issues are relatively minor (compatibility and typo) and easily addressed.

Approval recommendation: Approve after fixing the critical Go compatibility issue and test typo.

alpe · 2025-07-22T09:20:02Z

The test passed because it was a assert.Contains check. Sorry for the typo

tac0turtle

utACK

claude · 2025-07-22T13:51:44Z

Claude finished @alpe's task —— View job

Code Review Complete

Reviewing PR: "Query blobs in batches"

Todo List:

Read and analyze types/da.go changes
Read and analyze types/da_test.go changes
Evaluate code quality and best practices
Check for potential bugs or issues
Assess performance considerations
Review security concerns
Analyze test coverage
Provide comprehensive review feedback

Overview

This PR implements blob retrieval batching in the RetrieveWithHelpers function, changing from a single large query to processing blob IDs in batches of 100. This is a solid optimization for handling large numbers of blob identifiers.

Issues Identified

🚨 Critical Issues

Go Compatibility Issue (types/da.go:159)
- The built-in min() function is only available in Go 1.21+
- Fix: Use explicit min logic for broader compatibility:
```
end := i + batchSize
if end > len(idsResult.IDs) {
    end = len(idsResult.IDs)
}
```

⚠️ High Priority Issues

Hardcoded Magic Number (types/da.go:156)
- Batch size of 100 is hardcoded without justification
- Fix: Define as named constant:
```
const defaultBatchSize = 100
```

Code Quality & Best Practices

Positive aspects:

✅ Clean implementation of batching logic
✅ Proper error handling with batch-specific error messages
✅ Maintains existing function signature (good for backward compatibility)
✅ Appropriate use of make() with capacity for performance
✅ Good logging with relevant context

Areas for improvement:

Consider making batch size configurable rather than hardcoded
The error message format is good for debugging (batch %d-%d)

Performance Considerations

Benefits:

✅ Reduced memory pressure: Processing in smaller chunks reduces peak memory usage
✅ Better error isolation: A failure in one batch doesn't affect others
✅ Improved responsiveness: Allows for better progress tracking and potential cancellation

Potential concerns:

Network overhead: Multiple smaller requests may have more overhead than one large request
Context switching: More iterations could add minor CPU overhead

Overall assessment: The performance trade-offs are reasonable. The benefits of reduced memory usage and better error handling likely outweigh the minor overhead.

Security Concerns

✅ No security issues identified:

No sensitive data exposure
Proper context usage for cancellation
Input validation remains unchanged (handled by underlying DA layer)
Error messages don't leak sensitive information

Test Coverage

Good coverage:

✅ Test cases cover the new batching error message format
✅ Existing test structure remains comprehensive
✅ All major error paths are tested

Note on the test: The test at types/da_test.go:216 is actually correct - it uses assert.Contains to check for substrings, so the error message will match properly.

Suggestions:

Consider adding a test case with exactly 100 IDs to test batch boundary
Consider testing with >100 IDs to verify multiple batch behavior

Bug Risk Assessment

Low risk:

The batching logic is straightforward and well-contained
Error handling preserves existing behavior
No breaking changes to the interface

Potential edge cases:

Empty batch handling (already covered by existing null checks)
Single ID handling (works with current implementation)

Recommendations

Must fix: Address Go compatibility issue with min() function
Should fix: Use named constant for batch size
Consider: Making batch size configurable via parameter or config

Summary

This is a well-implemented optimization that addresses a real performance concern. The batching approach is sound and the error handling improvements are valuable. The main issue is the Go compatibility problem which should be addressed before merging.

Approval recommendation: Approve after fixing the critical Go compatibility issue.

Query blobs in batches

7d9ade9

github-project-automation bot added this to Evolve Jul 21, 2025

github-actions bot assigned alpe Jul 21, 2025

gemini-code-assist bot reviewed Jul 21, 2025

View reviewed changes

Manav-Aggarwal reviewed Jul 21, 2025

View reviewed changes

Review feedback

cf060a1

tac0turtle approved these changes Jul 22, 2025

View reviewed changes

Merge branch 'main' into alex/batch_blobs

a012bd6

alpe changed the title ~~Query blobs in batches~~ chore: Query blobs in batches Jul 22, 2025

tac0turtle added this pull request to the merge queue Jul 22, 2025

Merged via the queue into main with commit b2fec2d Jul 22, 2025
27 of 29 checks passed

tac0turtle deleted the alex/batch_blobs branch July 22, 2025 14:29

github-project-automation bot moved this to Done in Evolve Jul 22, 2025

tac0turtle removed this from Evolve Aug 25, 2025

	expectedErrMsg: "ailed to get blobs for batch 0-1: network error during blob retrieval",
	expectedErrMsg: "failed to get blobs for batch 0-1: network error during blob retrieval",

chore: Query blobs in batches #2480

chore: Query blobs in batches #2480

Uh oh!

Conversation

alpe commented Jul 21, 2025

Overview

Uh oh!

github-actions bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Manav-Aggarwal Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Manav-Aggarwal left a comment

Choose a reason for hiding this comment

Uh oh!

Manav-Aggarwal Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Overview

Issues Identified

🚨 Critical Issues

⚠️ High Priority Issues

Code Quality & Best Practices

Performance Considerations

Security Concerns

Test Coverage

Bug Risk Assessment

Recommendations

Summary

Uh oh!

alpe commented Jul 22, 2025

Uh oh!

tac0turtle left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Overview

Issues Identified

🚨 Critical Issues

⚠️ High Priority Issues

Code Quality & Best Practices

Performance Considerations

Security Concerns

Test Coverage

Bug Risk Assessment

Recommendations

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

github-actions bot commented Jul 21, 2025 •

edited

Loading

codecov bot commented Jul 21, 2025 •

edited

Loading

claude bot commented Jul 22, 2025 •

edited

Loading

claude bot commented Jul 22, 2025 •

edited

Loading