Skip to content

Conversation

@phodal
Copy link
Owner

@phodal phodal commented Nov 29, 2025

Description

Fixes #32

This PR fixes a bug where class variables and global variables in Python files were being incorrectly captured in every function's LocalVariables instead of being stored in the Fields property of the CodeDataStruct.

Problem

Previously, when parsing Python code like this:

class UserService:
    # Class variables
    USER_NOT_FOUND = "User not found"
    
    def create_user(self, db: Session, user: UserCreate) -> User:
        hashed_password = get_password_hash(user.password)
        db_user = User(
            username=user.username,
            email=user.email,
            hashed_password=hashed_password
        )
        db.add(db_user)
        db.commit()
        db.refresh(db_user)
        return db_user
    
    def get_users(self, db: Session, skip: int = 0, limit: int = 100):
        return db.query(User).offset(skip).limit(limit).all()

The class variable USER_NOT_FOUND would incorrectly appear in the LocalVariables of both create_user and get_users functions. The same issue occurred with global variables in procedural files.

Root Cause

The issue was in the Python parser's variable tracking mechanism:

  1. All variable assignments were being added to a shared localVars map
  2. This map was never cleared between functions
  3. When exiting a function, all variables in localVars were added to that function's LocalVariables
  4. There was no distinction between class-level, module-level, and function-level variables

Solution

Modified the Python parser to properly track variable scope:

Changes to PythonAstBaseListener.kt:

  • Added isInsideFunction: Boolean flag to track function scope
  • Added classFields: MutableList<CodeField> to store class-level variables
  • Added moduleFields: MutableList<CodeField> to store module-level variables
  • Modified buildExprStmt() to determine variable scope and store accordingly:
    • Inside function → add to localVars (function's LocalVariables)
    • Inside class but not function → add to classFields (class's Fields)
    • At module level → add to moduleFields (default node's Fields)

Changes to PythonFullIdentListener.kt:

  • Set isInsideFunction = true and clear localVars when entering a function
  • Set isInsideFunction = false and clear localVars when exiting a function
  • Populate currentNode.Fields with classFields when exiting a class
  • Populate defaultNode.Fields with moduleFields for procedural files
  • Pass hasEnterClass parameter to buildExprStmt()

Test Coverage

Added two comprehensive tests in PythonFullIdentListenerTest.kt:

  1. shouldIdentifyClassVariablesAsFields() - Verifies class variables are in Fields, not in function LocalVariables
  2. shouldIdentifyGlobalVariablesInProceduralFiles() - Verifies global variables in procedural files are in the default node's Fields

Results

✅ All 27 existing tests continue to pass
✅ 2 new tests added and passing
✅ Class variables now correctly appear in CodeDataStruct.Fields
✅ Global variables in procedural files now correctly appear in the default node's Fields
✅ Function local variables remain in CodeFunction.LocalVariables
✅ No duplication of variables across functions

Files Changed

  • chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonAstBaseListener.kt
  • chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonFullIdentListener.kt
  • chapi-ast-python/src/test/kotlin/chapi/ast/pythonast/PythonFullIdentListenerTest.kt

Pull Request opened by Augment Code with guidance from the PR author

Summary by CodeRabbit

  • Bug Fixes

    • Improved accuracy of Python variable scope analysis by properly distinguishing and tracking class variables, module-level variables, and function-local variables.
  • Tests

    • Added tests verifying correct identification of variables across different scopes in Python code analysis.

✏️ Tip: You can customize this high-level summary in your review settings.

…s instead of duplicated in functions

Fixes #32

Previously, class variables and global variables in procedural files were
incorrectly added to every function's LocalVariables. This was because all
variable assignments were being tracked in a shared localVars map that was
never cleared between functions.

Changes:
- Added isInsideFunction flag to track function scope
- Added classFields and moduleFields lists to store class and module-level variables
- Modified buildExprStmt to determine variable scope and store accordingly:
  * Inside function -> localVars (function LocalVariables)
  * Inside class but not function -> classFields (class Fields)
  * At module level -> moduleFields (default node Fields)
- Clear localVars when entering/exiting functions
- Populate Fields property of CodeDataStruct for classes and default node

Test coverage:
- Added test for class variables in Fields
- Added test for global variables in procedural files
- All existing tests continue to pass
@coderabbitai
Copy link

coderabbitai bot commented Nov 29, 2025

Walkthrough

Refactored Python AST parsing to properly track variable scope context by introducing state-tracking fields (isInsideFunction, classFields, moduleFields) and updating variable assignment handling to route variables to appropriate collections based on whether they are declared at function, class, or module scope.

Changes

Cohort / File(s) Summary
State management and variable routing
chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonAstBaseListener.kt
Introduced state-tracking fields (isInsideFunction, classFields, moduleFields). Updated buildExprStmt to accept hasEnterClass parameter and route variable assignments to localVars (if in function), classFields (if in class), or moduleFields (if at module level).
Listener state coordination
chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonFullIdentListener.kt
Manages state transitions: records classFields into currentNode.Fields on class exit, marks isInsideFunction state during function parsing, clears localVars appropriately at function/class boundaries. Copies module-level fields into defaultNode.Fields before emission.
Test coverage
chapi-ast-python/src/test/kotlin/chapi/ast/pythonast/PythonFullIdentListenerTest.kt
Added two tests: shouldIdentifyClassVariablesAsFields (verifies class-level constants are in Fields, not function variables) and shouldIdentifyGlobalVariablesInProceduralFiles (verifies module-level variables are in Fields, not captured by functions).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

  • State management logic across multiple listener methods requires careful verification that scope context is correctly maintained at all boundary conditions
  • Variable assignment routing logic depends on proper state flag coordination between base and full listeners
  • New tests establish expected behavior but logic flow across class/function boundaries should be verified
  • Attention areas:
    • Timing of state flag updates relative to variable collection modifications
    • Correct clearing of localVars and classFields at scope exit
    • Module-level field assignment to defaultNode before emission

Poem

🐰 Variables scattered like carrots in the yard,
Now sorted by scope—oh, what a guard!
Class fields here, module fields there,
Functions keep theirs with utmost care.
No more duplication in the parse,
Each variable finds its proper place! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main fix: class and global variables are now stored in Fields instead of being duplicated in functions, matching the core issue resolution.
Linked Issues check ✅ Passed The PR fully addresses issue #32 by implementing scope-aware variable classification, storing class variables in class Fields and global variables in module/default node Fields rather than duplicating them in functions.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the scope classification of variables in Python parsing, with no unrelated modifications detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/python-class-global-variables-issue-32

Comment @coderabbitai help to get the list of available commands and usage tips.

@phodal
Copy link
Owner Author

phodal commented Nov 29, 2025

augment review

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.


override fun exitClassdef(ctx: PythonParser.ClassdefContext?) {
// Add class fields to the current node
currentNode.Fields = classFields.toList()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a single shared classFields list here will misattribute fields when classes are nested: fields collected in the outer class before entering an inner class will be assigned to the inner class and then cleared, causing them to be missing from the outer class upon its exit (also applies to other locations in the PR).

🤖 Was this useful? React with 👍 or 👎

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonAstBaseListener.kt (1)

111-126: Potential duplicate entries in classFields and moduleFields when a variable is reassigned.

Unlike localVars which is a map (and thus overwrites duplicates), classFields and moduleFields are lists. If a variable is assigned multiple times at class or module level (e.g., x = 1 followed by x = 2), both assignments will be added as separate CodeField entries.

This may or may not be the intended behavior depending on your use case. If you want unique fields by name, consider using a map or deduplicating before returning.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c5c048 and a1384aa.

📒 Files selected for processing (3)
  • chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonAstBaseListener.kt (3 hunks)
  • chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonFullIdentListener.kt (4 hunks)
  • chapi-ast-python/src/test/kotlin/chapi/ast/pythonast/PythonFullIdentListenerTest.kt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (macos-latest)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: build (windows-latest)
🔇 Additional comments (8)
chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonAstBaseListener.kt (1)

12-14: LGTM!

The new state-tracking fields are appropriately declared for the scope classification logic. The design correctly separates concerns: the base class provides storage, while the subclass (PythonFullIdentListener) manages the lifecycle (setting/clearing flags and collecting fields).

chapi-ast-python/src/test/kotlin/chapi/ast/pythonast/PythonFullIdentListenerTest.kt (2)

376-422: LGTM! Comprehensive test for class variable scope.

This test effectively validates that class-level variables (USER_NOT_FOUND, MAX_RETRIES) are correctly stored in Fields and not duplicated into function LocalVariables. The assertions verify both positive cases (what should be present) and negative cases (what should not be present via size checks).


424-463: LGTM! Thorough test for module-level variable scope.

This test properly validates that global/module-level variables (DATABASE_URL, MAX_CONNECTIONS, DEBUG) are stored in the default node's Fields rather than being duplicated into each function's LocalVariables. The test correctly verifies that functions only capture their own local variables.

chapi-ast-python/src/main/kotlin/chapi/ast/pythonast/PythonFullIdentListener.kt (5)

76-78: Correctly assigns class-level variables to Fields.

The assignment of classFields to currentNode.Fields is correct, and the subsequent clearing ensures class variables don't leak into subsequent classes. The .toList() creates a defensive copy.


123-126: LGTM: Proper cleanup sequencing.

Clearing isInsideFunction and localVars after adding them to the function ensures proper state management and prevents variable leakage between functions.


133-133: LGTM: Passing scope context for variable classification.

The additional hasEnterClass parameter enables buildExprStmt to correctly route variable assignments to classFields, moduleFields, or localVars based on scope.


143-145: LGTM: Correctly assigns module-level variables to default node.

The check for moduleFields.isNotEmpty() ensures the default node is added when there are module-level variables, and the assignment correctly populates them. Note that moduleFields is not cleared after assignment, which is acceptable given the listener is typically instantiated per file.


109-112: Nested function support is a pre-existing architectural limitation, not a regression from this fix.

The review concern about nested functions losing access to outer function variables is technically accurate but conflates two separate issues:

  1. The fix (lines 109-112) is correct. Clearing localVars on function entry fixes the actual bug: preventing class/module-level variables from being incorrectly captured as function-local variables. The recent tests (shouldIdentifyClassVariablesAsFields and shouldIdentifyGlobalVariablesInProceduralFiles) explicitly validate this behavior.

  2. Nested functions are a pre-existing architectural limitation. The codebase uses a single currentFunction variable that is overwritten per function, and there are no test cases for nested functions anywhere in the test suite. This design limitation predates the current fix and is not introduced by it.

The localVars.clear() is necessary and correct. Nested function support would require broader architectural changes (e.g., a function stack instead of a single currentFunction variable), which is separate from this fix.

@codecov
Copy link

codecov bot commented Nov 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.43%. Comparing base (7c5c048) to head (a1384aa).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master      #37      +/-   ##
============================================
+ Coverage     77.33%   77.43%   +0.10%     
- Complexity     1036     1042       +6     
============================================
  Files            68       68              
  Lines          3935     3953      +18     
  Branches        672      674       +2     
============================================
+ Hits           3043     3061      +18     
  Misses          504      504              
  Partials        388      388              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@phodal phodal merged commit dd51381 into master Nov 29, 2025
6 checks passed
phodal added a commit to phodal/coca that referenced this pull request Nov 29, 2025
Improvements inspired by Augment Code Review comments from:
- phodal/chapi#37 (nested scope tracking)
- phodal/chapi#38 (complete syntax coverage)

Added new rules:
- scope_tracking_correctness: Prevent misattribution in nested scopes
- complete_syntax_coverage: Handle all syntax variants in parsers
- edge_case_coverage: Ensure comprehensive test coverage
- test_completeness: Verify complete behavior, not just happy path
- data_structure_choice: Use appropriate data structures for the problem
- state_management: Proper scoping of mutable state

Enhanced existing rules:
- ast_parsing_correctness: Added grammar alternatives consideration
- analysis_result_validation: Emphasized no data loss during parsing
phodal added a commit to phodal/coca that referenced this pull request Nov 29, 2025
Improvements inspired by Augment Code Review comments from:
- phodal/chapi#37 (nested scope tracking)
- phodal/chapi#38 (complete syntax coverage)

Added new rules:
- scope_tracking_correctness: Prevent misattribution in nested scopes
- complete_syntax_coverage: Handle all syntax variants in parsers
- edge_case_coverage: Ensure comprehensive test coverage
- test_completeness: Verify complete behavior, not just happy path
- data_structure_choice: Use appropriate data structures for the problem
- state_management: Proper scoping of mutable state

Enhanced existing rules:
- ast_parsing_correctness: Added grammar alternatives consideration
- analysis_result_validation: Emphasized no data loss during parsing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Python- Class variables and Global Variables in Procedural Files are captured in every function

2 participants