Skip to content

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Jul 21, 2025

max 100

Overview

@github-actions
Copy link
Contributor

github-actions bot commented Jul 21, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJul 22, 2025, 1:51 PM

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant optimization to the blob retrieval process by implementing batched queries. This change aims to enhance the efficiency and robustness of data access, particularly when handling numerous blob IDs, and also improves the clarity of error messages related to these operations.

Highlights

  • Blob Retrieval Optimization: The RetrieveWithHelpers function has been refactored to fetch blobs in batches of 100 IDs. This change replaces a single large query with multiple smaller, batched queries, which can improve performance and stability when dealing with a large number of blob identifiers.
  • Enhanced Error Reporting: Error messages generated during blob retrieval now include the specific batch range (e.g., batch 0-1) that failed. This provides more granular information for debugging issues related to partial blob retrieval failures.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces batching for blob retrieval to improve efficiency when handling a large number of blobs. I've identified a few areas for improvement, including addressing a potential compilation issue with the min function on older Go versions, using a named constant for the batch size, and correcting a typo in a test case.

}
}
// 2. Get Blobs using the retrieved IDs in batches
batchSize := 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The batch size is currently hardcoded. Defining it as a named constant improves readability and maintainability.

const batchSize = 100

types/da_test.go Outdated
getBlobsErr: errors.New("network error during blob retrieval"),
expectedCode: coreda.StatusError,
expectedErrMsg: "failed to get blobs: network error during blob retrieval",
expectedErrMsg: "ailed to get blobs for batch 0-1: network error during blob retrieval",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the expected error message. The word 'failed' is missing the leading 'f'.

Suggested change
expectedErrMsg: "ailed to get blobs for batch 0-1: network error during blob retrieval",
expectedErrMsg: "failed to get blobs for batch 0-1: network error during blob retrieval",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how the test currently passes with this typo, worth looking into

@codecov
Copy link

codecov bot commented Jul 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.04%. Comparing base (2b3c72b) to head (a012bd6).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2480      +/-   ##
==========================================
- Coverage   73.05%   73.04%   -0.01%     
==========================================
  Files          68       68              
  Lines        6542     6548       +6     
==========================================
+ Hits         4779     4783       +4     
- Misses       1364     1366       +2     
  Partials      399      399              
Flag Coverage Δ
combined 73.04% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@Manav-Aggarwal Manav-Aggarwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mostly, just a small typo that needs fixing

types/da_test.go Outdated
getBlobsErr: errors.New("network error during blob retrieval"),
expectedCode: coreda.StatusError,
expectedErrMsg: "failed to get blobs: network error during blob retrieval",
expectedErrMsg: "ailed to get blobs for batch 0-1: network error during blob retrieval",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how the test currently passes with this typo, worth looking into

@claude
Copy link
Contributor

claude bot commented Jul 22, 2025

Claude finished @alpe's task —— View job


Code Review Complete

Reviewing PR: "Query blobs in batches"

Todo List:

  • Read and analyze types/da.go changes
  • Read and analyze types/da_test.go changes
  • Evaluate code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Review security concerns
  • Analyze test coverage
  • Provide comprehensive review feedback

Overview

This PR implements blob retrieval batching in the RetrieveWithHelpers function, changing from a single large query to processing blob IDs in batches of 100. This is a solid optimization for handling large numbers of blob identifiers.

Issues Identified

🚨 Critical Issues

  1. Go Compatibility Issue (types/da.go:159)

    • The built-in min() function is only available in Go 1.21+
    • Fix: Use explicit min logic for broader compatibility:
    end := i + batchSize
    if end > len(idsResult.IDs) {
        end = len(idsResult.IDs)
    }
  2. Test Typo (types/da_test.go:216)

    • Expected error message contains typo: "ailed" should be "failed"
    • Fix: Change to "failed to get blobs for batch 0-1: network error during blob retrieval"

⚠️ High Priority Issues

  1. Hardcoded Magic Number (types/da.go:156)
    • Batch size of 100 is hardcoded
    • Fix: Define as named constant:
    const defaultBatchSize = 100

Code Quality & Best Practices

Positive aspects:

  • ✅ Clean implementation of batching logic
  • ✅ Proper error handling with batch-specific error messages
  • ✅ Maintains existing function signature (good for backward compatibility)
  • ✅ Appropriate use of make() with capacity for performance
  • ✅ Good logging with relevant context

Areas for improvement:

  • Consider making batch size configurable rather than hardcoded
  • The error message format is good for debugging (batch %d-%d)

Performance Considerations

Benefits:

  • Reduced memory pressure: Processing in smaller chunks reduces peak memory usage
  • Better error isolation: A failure in one batch doesn't affect others
  • Improved responsiveness: Allows for better progress tracking and potential cancellation

Potential concerns:

  • Network overhead: Multiple smaller requests may have more overhead than one large request
  • Context switching: More iterations could add minor CPU overhead

Overall assessment: The performance trade-offs are reasonable. The benefits of reduced memory usage and better error handling likely outweigh the minor overhead.

Security Concerns

No security issues identified:

  • No sensitive data exposure
  • Proper context usage for cancellation
  • Input validation remains unchanged (handled by underlying DA layer)
  • Error messages don't leak sensitive information

Test Coverage

Good coverage:

  • ✅ Test cases cover the new batching error message format
  • ✅ Existing test structure remains comprehensive
  • ✅ All major error paths are tested

Suggestions:

  • Consider adding a test case with exactly 100 IDs to test batch boundary
  • Consider testing with >100 IDs to verify multiple batch behavior

Bug Risk Assessment

Low risk:

  • The batching logic is straightforward and well-contained
  • Error handling preserves existing behavior
  • No breaking changes to the interface

Potential edge cases:

  • Empty batch handling (already covered by existing null checks)
  • Single ID handling (works with current implementation)

Recommendations

  1. Must fix: Address Go compatibility issue with min() function
  2. Must fix: Fix typo in test case
  3. Should fix: Use named constant for batch size
  4. Consider: Making batch size configurable via parameter or config

Summary

This is a well-implemented optimization that addresses a real performance concern. The batching approach is sound and the error handling improvements are valuable. The main issues are relatively minor (compatibility and typo) and easily addressed.

Approval recommendation: Approve after fixing the critical Go compatibility issue and test typo.


@alpe
Copy link
Contributor Author

alpe commented Jul 22, 2025

The test passed because it was a assert.Contains check. Sorry for the typo

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@claude
Copy link
Contributor

claude bot commented Jul 22, 2025

Claude finished @alpe's task —— View job


Code Review Complete

Reviewing PR: "Query blobs in batches"

Todo List:

  • Read and analyze types/da.go changes
  • Read and analyze types/da_test.go changes
  • Evaluate code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Review security concerns
  • Analyze test coverage
  • Provide comprehensive review feedback

Overview

This PR implements blob retrieval batching in the RetrieveWithHelpers function, changing from a single large query to processing blob IDs in batches of 100. This is a solid optimization for handling large numbers of blob identifiers.

Issues Identified

🚨 Critical Issues

  1. Go Compatibility Issue (types/da.go:159)
    • The built-in min() function is only available in Go 1.21+
    • Fix: Use explicit min logic for broader compatibility:
    end := i + batchSize
    if end > len(idsResult.IDs) {
        end = len(idsResult.IDs)
    }

⚠️ High Priority Issues

  1. Hardcoded Magic Number (types/da.go:156)
    • Batch size of 100 is hardcoded without justification
    • Fix: Define as named constant:
    const defaultBatchSize = 100

Code Quality & Best Practices

Positive aspects:

  • ✅ Clean implementation of batching logic
  • ✅ Proper error handling with batch-specific error messages
  • ✅ Maintains existing function signature (good for backward compatibility)
  • ✅ Appropriate use of make() with capacity for performance
  • ✅ Good logging with relevant context

Areas for improvement:

  • Consider making batch size configurable rather than hardcoded
  • The error message format is good for debugging (batch %d-%d)

Performance Considerations

Benefits:

  • Reduced memory pressure: Processing in smaller chunks reduces peak memory usage
  • Better error isolation: A failure in one batch doesn't affect others
  • Improved responsiveness: Allows for better progress tracking and potential cancellation

Potential concerns:

  • Network overhead: Multiple smaller requests may have more overhead than one large request
  • Context switching: More iterations could add minor CPU overhead

Overall assessment: The performance trade-offs are reasonable. The benefits of reduced memory usage and better error handling likely outweigh the minor overhead.

Security Concerns

No security issues identified:

  • No sensitive data exposure
  • Proper context usage for cancellation
  • Input validation remains unchanged (handled by underlying DA layer)
  • Error messages don't leak sensitive information

Test Coverage

Good coverage:

  • ✅ Test cases cover the new batching error message format
  • ✅ Existing test structure remains comprehensive
  • ✅ All major error paths are tested

Note on the test: The test at types/da_test.go:216 is actually correct - it uses assert.Contains to check for substrings, so the error message will match properly.

Suggestions:

  • Consider adding a test case with exactly 100 IDs to test batch boundary
  • Consider testing with >100 IDs to verify multiple batch behavior

Bug Risk Assessment

Low risk:

  • The batching logic is straightforward and well-contained
  • Error handling preserves existing behavior
  • No breaking changes to the interface

Potential edge cases:

  • Empty batch handling (already covered by existing null checks)
  • Single ID handling (works with current implementation)

Recommendations

  1. Must fix: Address Go compatibility issue with min() function
  2. Should fix: Use named constant for batch size
  3. Consider: Making batch size configurable via parameter or config

Summary

This is a well-implemented optimization that addresses a real performance concern. The batching approach is sound and the error handling improvements are valuable. The main issue is the Go compatibility problem which should be addressed before merging.

Approval recommendation: Approve after fixing the critical Go compatibility issue.


@alpe alpe changed the title Query blobs in batches chore: Query blobs in batches Jul 22, 2025
@tac0turtle tac0turtle added this pull request to the merge queue Jul 22, 2025
Merged via the queue into main with commit b2fec2d Jul 22, 2025
27 of 29 checks passed
@tac0turtle tac0turtle deleted the alex/batch_blobs branch July 22, 2025 14:29
@github-project-automation github-project-automation bot moved this to Done in Evolve Jul 22, 2025
@tac0turtle tac0turtle removed this from Evolve Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants