Add Case-Phenotype Grid visualization for disease pages by kevinschaper · Pull Request #1247 · monarch-initiative/monarch-app

kevinschaper · 2026-01-17T01:20:38Z

Summary

This PR adds a new visualization section to disease pages showing a grid of cases (columns) × phenotypes (rows), with phenotypes grouped by histopheno bins (body systems) that expand to show individual phenotypes. This enables researchers to quickly see which phenotypes are present or absent across cases associated with a disease.

Adds Case-Phenotype Grid section to disease pages (between HistoPheno and Associations)
Fetches cases via CaseToDiseaseAssociation and phenotypes via CaseToPhenotypicFeatureAssociation
Groups phenotypes by body system using object_closure matching against histopheno bin IDs
Supports expandable bin rows, cell click details modal, and horizontal scrolling for many cases

Architecture

Data Flow

Disease Page
    │
    ├─ 1. GET /association?category=CaseToDiseaseAssociation&object={diseaseId}
    │      → Returns all cases linked to this disease
    │
    └─ 2. GET /association?category=CaseToPhenotypicFeatureAssociation&subject={case1}&subject={case2}...
           → Returns all phenotypes for those cases with object_closure for bin grouping

Client-side processing:
    → Group phenotypes by histopheno bins using object_closure
    → Build case × phenotype matrix
    → Render expandable grid

Files Added

File	Purpose
`frontend/src/api/case-phenotype-types.ts`	TypeScript interfaces for matrix data
`frontend/src/api/case-phenotype.ts`	API functions to fetch case-phenotype data
`frontend/src/util/case-phenotype-matrix.ts`	Matrix building and bin grouping logic
`frontend/src/components/CasePhenotypeGrid.vue`	Main grid component with expand/collapse
`frontend/src/components/CasePhenotypeModal.vue`	Detail popup modal for cell clicks
`frontend/src/pages/node/SectionCasePhenotypeGrid.vue`	Page section wrapper

Files Modified

File	Change
`frontend/src/pages/node/PageNode.vue`	Added `SectionCasePhenotypeGrid` import and usage

Implementation Details

1. Type Definitions (`case-phenotype-types.ts`)

Defines the core data structures for the visualization:

export interface CasePhenotypeMatrix {
  diseaseId: string;
  diseaseName?: string;
  cases: CaseEntity[];           // columns
  bins: HistoPhenoBin[];         // collapsed rows (body systems)
  phenotypes: CasePhenotype[];   // expanded rows (HP terms)
  cells: Map<string, CasePhenotypeCellData>; // key: "caseId:phenotypeId"
  totalCases: number;
  totalPhenotypes: number;
}

export interface CasePhenotypeCellData {
  present: boolean;
  negated?: boolean;
  onset?: string;
  frequency?: string;
  publications?: string[];
  publicationLinks?: ExpandedCurie[];
}

Includes the HISTOPHENO_BINS constant mapping UPHENO/HP IDs to human-readable labels (Nervous System, Eye, Skeletal System, etc.).

2. API Layer (`case-phenotype.ts`)

Orchestrates data fetching with batched requests to handle URL length limits:

export const getCasePhenotypeMatrix = async (
  diseaseId: string,
  diseaseName?: string,
): Promise<CasePhenotypeMatrix | null> => {
  // Fetch cases associated with the disease
  const casesResponse = await getCasesForDisease(diseaseId);
  if (casesResponse.items.length === 0) return null;

  // Extract case IDs and fetch phenotypes (batched in groups of 5)
  const caseIds = casesResponse.items.map((assoc) => assoc.subject);
  const phenotypeAssociations = await getCasePhenotypes(caseIds);

  // Build and return the matrix
  return buildMatrix(diseaseId, diseaseName, casesResponse.items, phenotypeAssociations);
};

Case phenotype requests are batched in groups of 5 due to URL length limits with unicode characters in case IDs.

3. Matrix Building (`case-phenotype-matrix.ts`)

Transforms flat associations into a structured matrix:

export function findBinForPhenotype(objectClosure: string[] | undefined): string {
  if (!objectClosure) return "OTHER";
  for (const id of objectClosure) {
    if (HISTOPHENO_BIN_IDS.has(id)) return id;
  }
  return "OTHER";
}

export function buildMatrix(
  diseaseId: string,
  diseaseName: string | undefined,
  caseAssociations: Association[],
  phenotypeAssociations: Association[],
): CasePhenotypeMatrix {
  // Build case map, phenotype map, cells map, and bin groupings
  // Returns structured matrix for visualization
}

Uses object_closure from phenotype associations to match against histopheno bin IDs.

4. Grid Component (`CasePhenotypeGrid.vue`)

Renders the interactive grid with expand/collapse functionality:

+----------------------------------------------------+
|  [Phenotype / System]    | 1 | 2 | 3 | 4 | ... | n |  <- Case numbers (hover for ID)
+------------------------+---+---+---+---+-----+-----+
| ▶ Nervous System (11)  | ● | ● | ○ | ● |     |     |  <- Collapsed bin row
| ▶ Eye (4)              | ○ | ● | ● | ○ |     |     |
| ▼ Skeletal System (2)  | ● | ○ | ○ | ● |     |     |  <- Expanded bin
|   Cerebral palsy       | ● |   |   | ● |     |     |  <- Individual phenotypes
|   Ataxia               | ● | ● |   |   |     |     |
+----------------------------------------------------+

Legend: ● = present (green), ○ = excluded/negated (red), ▶/▼ = expand/collapse

Features:

Sticky first column (phenotype names stay visible during horizontal scroll)
Tooltips on case headers showing full case ID/label
Tooltips on cells showing presence status, onset, frequency
Click handlers emitting events for modal display

5. Section Wrapper (`SectionCasePhenotypeGrid.vue`)

Handles loading states and conditionally renders the section:

const hideSection = computed(() => isSuccess.value && !matrix.value);

Only renders for disease nodes (node.category === 'biolink:Disease')
Hides automatically if no cases are found (returns null from API)
Shows loading/error states appropriately

Test Plan

Navigate to a disease page with phenopacket cases (e.g., MONDO:0007298 - spinocerebellar ataxia type 29)
Verify the "Case Phenotypes" section appears between HistoPheno and Associations
Verify bin rows show with phenotype counts and expand/collapse on click
Verify cell indicators show presence (●) and negated (○) correctly
Hover over case number headers to verify tooltip shows case ID
Hover over cells to verify tooltip shows phenotype details
Click on a cell to verify modal opens with full details
Navigate to a disease without cases (e.g., MONDO:0000001) and verify section doesn't render
Test horizontal scroll with diseases that have many cases

🤖 Generated with Claude Code

Adds a new grid visualization on disease pages showing phenotypes observed in cases (from phenopackets) associated with that disease. Features: - Grid with cases as columns, phenotypes grouped by body system (bins) as rows - Expandable bin rows to show individual phenotypes - Tooltips showing case/phenotype details on hover (using vue-tippy) - Click cells to open modal with full details including onset, frequency, publications - Handles unicode characters in case IDs with batched API requests New files: - case-phenotype-types.ts: TypeScript interfaces and HISTOPHENO_BINS constant - case-phenotype.ts: API functions for fetching case/phenotype data - case-phenotype-matrix.ts: Utilities for transforming data into matrix format - CasePhenotypeGrid.vue: Main grid component with expandable rows - CasePhenotypeModal.vue: Detail modal for cell clicks - SectionCasePhenotypeGrid.vue: Page section wrapper Co-Authored-By: Claude Opus 4.5 <[email protected]>

netlify · 2026-01-17T01:21:21Z

✅ Deploy Preview for monarch-app canceled.

Name	Link
🔨 Latest commit	`f4daa44`
🔍 Latest deploy log	https://app.netlify.com/projects/monarch-app/deploys/69c69f24c9f3150007dffcc1

claude · 2026-01-17T01:22:41Z

Code Review: Case-Phenotype Grid Visualization

Thank you for this comprehensive PR! This is a well-structured feature that adds valuable functionality to disease pages. Below is my detailed review covering code quality, potential issues, performance, security, and test coverage.

Overall Assessment

Summary: This is high-quality code with good architectural decisions. The implementation is clean, well-documented, and follows Vue 3 best practices. A few minor improvements are suggested below.

✅ Code Quality & Best Practices

Strengths

Excellent Code Organization
- Clean separation of concerns: types, API layer, utilities, and components
- Well-structured data flow from API → matrix builder → visualization
- Good use of TypeScript interfaces for type safety
Documentation
- Comprehensive PR description with architecture diagrams
- JSDoc comments on functions (following repo conventions)
- Clear inline comments explaining complex logic
Vue 3 Best Practices
- Proper use of Composition API with <script setup>
- Scoped SCSS styles
- Reactive state management with ref and computed
- Proper event emission patterns
Code Style Consistency
- Matches existing codebase patterns (compared with SectionVisualization.vue and associations.ts)
- Follows the repository's style guidelines from CONTRIBUTING.md

Minor Suggestions

Console.log Statements (frontend/src/components/CasePhenotypeGrid.vue:368, 430, 432)

According to the style guide: "Use console.log strictly for temporary debugging during local development that should be removed before merging PRs."
```
// Remove or convert to appropriate logging level:
console.log("toggleBin clicked:", binId);  // Line 368
console.log("handleCellClick:", caseId, phenotypeId);  // Line 430
console.log("cellData:", cellData);  // Line 432
```
Recommendation: Remove these or convert to console.debug if needed for production troubleshooting.

Typo in Documentation (frontend/src/api/case-phenotype.ts:122)

Comment says "Batches requests in groups of 50 case IDs" but the constant is set to 5.

// Line 122: Update comment to match CASE_BATCH_SIZE = 5
* Fetch CaseToPhenotypicFeatureAssociations for multiple cases Batches requests
* in groups of 50 case IDs to avoid URL length limits

Comment Formatting (frontend/src/api/case-phenotype-types.ts:7-9)

Missing period at end of sentence in JSDoc comment.
```
/**
 * Types for Case-Phenotype Grid visualization These are UI-specific types,
 * separate from the LinkML-generated model.ts
 */
```
Should be: ...visualization. These are UI-specific types...

🐛 Potential Bugs & Issues

1. Map Serialization Issue (frontend/src/api/case-phenotype-types.ts:51)

The CasePhenotypeMatrix interface uses a Map<string, CasePhenotypeCellData> for the cells property. While this works in memory, it could cause issues if you ever need to serialize this data (e.g., for caching, state management, or debugging).

export interface CasePhenotypeMatrix {
  // ...
  cells: Map<string, CasePhenotypeCellData>; // Maps don't serialize to JSON well
}

Impact: Low (no immediate issue, but could cause problems later)

Recommendation: Document this limitation or consider using Record<string, CasePhenotypeCellData> instead. Maps are fine for now since the data isn't serialized.

2. Missing Null Check (frontend/src/util/case-phenotype-matrix.ts:1021)

While unlikely, there's a non-null assertion that could theoretically fail:

binPhenotypes.get(binId)!.add(phenotypeId);  // Line 1021

Impact: Very Low (the check on line 1019 ensures this exists)

Recommendation: Code is actually safe due to the preceding if block, but for extra defensive programming, you could use optional chaining: binPhenotypes.get(binId)?.add(phenotypeId).

3. Hardcoded Limit (frontend/src/api/case-phenotype.ts:113)

The limit is hardcoded to 500 for case associations. If a disease has more than 500 cases, some would be silently excluded.

const params = {
  category: "biolink:CaseToDiseaseAssociation",
  object: diseaseId,
  limit: 500,  // Hardcoded limit
};

Impact: Low to Medium (depends on whether diseases with >500 cases exist in your data)

Recommendation:

Add pagination logic similar to getCasePhenotypes (lines 144-160)
Or add a comment explaining why 500 is sufficient
Or add logging when cases are truncated

⚡ Performance Considerations

Strengths

Good Batching Strategy
- Case phenotype requests are batched (5 per request) to avoid URL length issues
- Pagination implemented for phenotype associations to handle large datasets
Efficient Data Structures
- Uses Maps for O(1) cell lookups by key
- Uses Sets for unique phenotype tracking
Vue Performance Best Practices
- Sticky column using CSS position: sticky (no JS scroll listeners)
- Proper use of :key in v-for loops
- Conditional rendering with v-if to hide empty sections

Recommendations

Consider Virtual Scrolling for Large Datasets

If some diseases have hundreds of cases or phenotypes, the DOM could get quite large. The current implementation renders all rows.

Impact: Medium (only affects edge cases with many phenotypes/cases)

Suggestion: Monitor performance in production. If needed, consider:
- Virtual scrolling for the table body
- Pagination for cases (showing 10-20 at a time)
- Progressive disclosure (expand one bin at a time)

Batching API Calls Could Be Optimized (frontend/src/api/case-phenotype.ts:138-160)

Currently, batches are processed sequentially. For better performance, you could process batches in parallel:

// Current: sequential batching
for (let i = 0; i < caseIds.length; i += CASE_BATCH_SIZE) {
  // ... await request
}

// Alternative: parallel batching
const batchPromises = [];
for (let i = 0; i < caseIds.length; i += CASE_BATCH_SIZE) {
  batchPromises.push(fetchBatch(caseIds.slice(i, i + CASE_BATCH_SIZE)));
}
const results = await Promise.all(batchPromises);

Impact: Medium (could significantly speed up loading for diseases with many cases)

Trade-off: More concurrent requests to the backend. Assess backend capacity first.

Sorting Bins (frontend/src/util/case-phenotype-matrix.ts:1055)

Good decision to sort bins by count descending! This puts the most relevant systems at the top.

🔒 Security Considerations

Assessment: No Major Security Issues Found ✅

XSS Protection
- All user-generated content is properly escaped by Vue's template system
- Tooltips use v-tooltip directive (verify this uses safe HTML rendering)
- Modal uses AppNodeBadge component for IDs (good encapsulation)
API Security
- No direct user input passed to API calls without going through existing API layer
- Follows existing patterns from other API files (good consistency)
- URL construction uses existing apiUrl and request utilities
Data Validation
- Good null checks throughout
- Proper handling of optional fields

Minor Notes

HTML in Tooltips (frontend/src/components/CasePhenotypeGrid.vue:336-426)

The tooltips contain HTML with inline styles. Verify that your v-tooltip directive properly sanitizes HTML to prevent XSS if tooltip content could ever come from untrusted sources.
```
html += `<br><span style="color: ${statusColor}">${statusText}</span>`;
```
Current Risk: Very Low (all content is from backend API, not user input)

Recommendation: Ensure v-tooltip uses a safe HTML renderer (DOMPurify or similar).

🧪 Test Coverage

Concerns

No tests included in this PR. The repository uses Vitest for unit tests and Playwright for e2e tests (per CONTRIBUTING.md), but this PR adds:

7 new files
1,060 lines of code
0 test files

Recommendations

Unit Tests Needed

Critical functions that should have unit tests:

findBinForPhenotype() - Test with various object_closure arrays
buildMatrix() - Test matrix building logic with mock data
getBinCellSummary() - Test summary calculations

Example test structure:

// frontend/src/util/case-phenotype-matrix.test.ts
import { describe, it, expect } from 'vitest';
import { findBinForPhenotype, buildMatrix } from './case-phenotype-matrix';

describe('findBinForPhenotype', () => {
  it('should return correct bin ID when found in closure', () => {
    const closure = ['HP:0000001', 'UPHENO:0004523', 'HP:0000002'];
    expect(findBinForPhenotype(closure)).toBe('UPHENO:0004523');
  });
  
  it('should return OTHER when no bin found', () => {
    const closure = ['HP:0000001', 'HP:0000002'];
    expect(findBinForPhenotype(closure)).toBe('OTHER');
  });
  
  it('should return OTHER when closure is undefined', () => {
    expect(findBinForPhenotype(undefined)).toBe('OTHER');
  });
});

Component Tests

Consider testing:
- CasePhenotypeGrid.vue - Test expand/collapse, cell rendering, event emission
- CasePhenotypeModal.vue - Test data display, modal open/close
E2E Tests

The test plan in the PR description is good, but should be automated:
- Navigate to disease page with cases
- Verify grid section appears
- Test expand/collapse functionality
- Test modal opening on cell click
- Test that section is hidden for diseases without cases
API Tests

Mock API responses and test:
- getCasePhenotypeMatrix() with various response scenarios
- Batching logic in getCasePhenotypes()
- Error handling

Priority

High Priority: At minimum, add unit tests for the utility functions in case-phenotype-matrix.ts before merging. These are pure functions that are easy to test and critical to the feature.

Medium Priority: Component tests for the grid and modal.

Lower Priority: E2E tests (can be added in a follow-up PR if needed).

📋 Additional Observations

Positive Notes

Excellent PR Description - The architecture documentation with ASCII art diagrams is exceptional. This makes the feature easy to understand and maintain.
Progressive Enhancement - The feature degrades gracefully (hides section) when no data is available.
Accessibility Considerations - Tooltips, proper semantic HTML (table), and keyboard interaction support.
Responsive Design - Mobile breakpoint with adjusted padding (line 600).
Clean Integration - The feature integrates seamlessly into the existing page structure without modifying much existing code.

Questions

Histopheno Bin Maintenance - The HISTOPHENO_BINS constant is hardcoded. How will this be maintained if new bins are added upstream? Consider:
- Loading from API
- Documenting update process
- Adding a fallback for unknown bin IDs
Performance Testing - Has this been tested with a disease that has many cases (50+) and many phenotypes (100+)? What's the practical upper limit?
Empty Bins - What happens if a bin has no phenotypes for any case? Should it be hidden?

📝 Summary & Recommendations

Must Fix Before Merge

❗ Remove console.log statements (lines 368, 430, 432)
❗ Fix comment typo about batch size (line 122)

Should Fix Before Merge

⚠️ Add unit tests for utility functions (especially buildMatrix and findBinForPhenotype)
⚠️ Add pagination logic for cases or document the 500-case limit

Consider for Follow-up

💡 Implement parallel batching for API calls (performance improvement)
💡 Add component tests for the grid and modal
💡 Monitor performance with large datasets and consider virtual scrolling if needed
💡 Consider Record<> instead of Map<> for cells if serialization is ever needed

Verdict

Recommendation: Approve with minor changes ✅

This is excellent work! The code is well-structured, follows best practices, and adds valuable functionality. The main concerns are:

Remove console.log statements
Add basic unit tests

Once those are addressed, this is ready to merge. The performance and test coverage items can be addressed in follow-up PRs if needed.

Great job on the comprehensive PR description and clean implementation! 🎉

🤖 Review generated with Claude Code

codecov · 2026-01-17T01:23:01Z

Codecov Report

❌ Patch coverage is 89.77206% with 175 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.36%. Comparing base (5f7e191) to head (f4daa44).
⚠️ Report is 22 commits behind head on main.

Files with missing lines	Patch %	Lines
...ckend/tests/integration/test_case_phenotype_api.py	22.72%	68 Missing ⚠️
...rch_py/implementations/solr/solr_implementation.py	68.21%	41 Missing ⚠️
...kend/tests/integration/test_case_phenotype_solr.py	35.89%	25 Missing ⚠️
...sts/integration/test_case_phenotype_performance.py	48.00%	13 Missing ⚠️
backend/src/monarch_py/api/entity_grid.py	87.30%	8 Missing ⚠️
...ckend/src/monarch_py/utils/case_phenotype_utils.py	93.58%	5 Missing ⚠️
backend/src/monarch_py/api/case_phenotype.py	93.02%	3 Missing ⚠️
...ackend/src/monarch_py/datamodels/grid_groupings.py	85.71%	3 Missing ⚠️
...rc/monarch_py/implementations/solr/solr_parsers.py	90.00%	3 Missing ⚠️
...onarch_py/implementations/solr/solr_query_utils.py	96.22%	2 Missing ⚠️
... and 3 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1247      +/-   ##
==========================================
+ Coverage   73.80%   79.36%   +5.55%     
==========================================
  Files          95      113      +18     
  Lines        3478     5180    +1702     
==========================================
+ Hits         2567     4111    +1544     
- Misses        911     1069     +158

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

kevinschaper · 2026-01-17T01:23:12Z

Closes #1248

- Add check for Mondo subset markers (disease_grouping, ordo_group_of_disorders, rare_grouping) to identify organizational grouping classes vs specific diseases - Skip API calls and hide grid section for grouping classes like "Ehlers-Danlos syndrome" while still showing for specific subtypes - Update grid cell styling from text circles to styled squares Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-20T21:20:19Z

Code Review: Case-Phenotype Grid Visualization

This PR adds a well-architected feature for visualizing case-phenotype relationships on disease pages. The implementation is solid overall, with clear separation of concerns and good code organization.

Strengths

Architecture & Code Quality

Excellent separation of concerns: Type definitions, API layer, business logic, and UI components are cleanly separated
Well-documented with clear JSDoc comments and comprehensive PR description
Consistent with codebase patterns: Vue 3 Composition API, TypeScript, SCSS scoped styles
Smart optimization: Batching API requests (5 cases at a time) for URL length limits
Progressive disclosure: Expandable bin rows provide good UX

Type Safety

Strong TypeScript usage with well-defined interfaces
Clear type definitions in case-phenotype-types.ts

Issues Found

1. Debug Console Logging (HIGH PRIORITY)

Location: frontend/src/components/CasePhenotypeGrid.vue:364-365, 428-430

console.log("toggleBin clicked:", binId);
console.log("handleCellClick:", caseId, phenotypeId);
console.log("cellData:", cellData);

Issue: Per CONTRIBUTING.md lines 207-208, console.log should be removed before merging PRs.

Fix: Remove these statements or convert to console.debug.

2. Missing Pagination for Disease-to-Case Associations

Location: frontend/src/api/case-phenotype.ts:107-118

The getCasePhenotypes function implements pagination, but getCasesForDisease does not. If a disease has >500 cases, they'll be silently truncated.

Impact: Medium - rare but could cause incomplete data display.

Recommendation: Add pagination or document the 500-case limit.

3. Potential Race Condition in Data Fetching

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:942-950

The useQuery composable has race condition protection, but rapid navigation could cause brief UI flashes.

Impact: Low - acceptable but could be improved with debouncing.

4. Edge Case: OTHER Bin Handling

Location: frontend/src/util/case-phenotype-matrix.ts:994-1008

The OTHER bin handling works but could be clearer by adding "OTHER": "Other" to HISTOPHENO_BINS constant.

Performance Considerations

Positive:

Batched requests effectively manage API constraints
Efficient Vue reactivity with computed properties
CSS position: sticky is performant

Considerations:

Large matrices (500 cases x 200 phenotypes = 100k cells) should be tested
Bin sorting is O(n log n) but negligible with typical data volumes

Verdict: Performance looks solid.

Security Concerns

No critical issues found:

XSS Protection: Using Vue template syntax correctly
No code injection risks
Tooltip content constructed safely
API calls use parameterized requests
Publication links use rel="noopener" correctly

Test Coverage

Current State: No test files found for new functionality

Recommendations:

Unit tests for matrix building logic (buildMatrix, findBinForPhenotype, getBinCellSummary)
Unit tests for API batching
Component tests for CasePhenotypeGrid and CasePhenotypeModal
E2E integration test for the full user flow

Minor Suggestions

Convert inline comments to JSDoc format per style guidelines
Extract magic number 500 to named constant MAX_RESULTS_PER_REQUEST
Create typed key function for cells Map to prevent typos
Add aria-expanded to bin rows for accessibility

Summary

Must Fix Before Merge:

Remove debug console.log statements

Should Consider:
2. Add pagination to getCasesForDisease or document limit
3. Add unit tests for critical business logic

Nice to Have:
4. JSDoc comment style consistency
5. Named constants for magic numbers
6. aria-expanded for accessibility

Overall Assessment

This is high-quality work with excellent architecture and thoughtful design. With debug logging removed and consideration for pagination, this will be a valuable addition.

Recommendation: Approve pending removal of debug logs. Testing and pagination can follow in subsequent PRs.

Great job on this implementation! 🎉

Review conducted using Claude Code CLI

When viewing a disease with descendants, cases from descendant diseases are now properly identified and displayed: - Added tabs to toggle between direct cases only vs all cases (including descendants) - Show source disease in tooltip for cases from descendant diseases - Auto-select "All" tab when no direct cases exist Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-22T00:56:06Z

Code Review - PR #1247: Add Case-Phenotype Grid visualization

Summary

This is a well-structured PR that adds a sophisticated case-phenotype grid visualization to disease pages. The code quality is generally very good with clear separation of concerns, comprehensive documentation, and thoughtful user experience design.

✅ Strengths

Architecture & Code Organization

Excellent separation of concerns: Clear layering between API, data transformation, types, and UI components
Well-documented: Comprehensive JSDoc comments explaining complex logic
Consistent with existing patterns: Follows Vue 3 Composition API conventions
Proper TypeScript usage: Strong type safety throughout

User Experience

Progressive disclosure: Expandable bins provide clean, organized views
Informative tooltips: Rich contextual information on hover
Accessibility considerations: Proper ARIA attributes
Responsive design: Sticky columns and horizontal scrolling

🔍 Code Quality & Best Practices

Issues to Address

1. Console.log statements in production code

Location: frontend/src/components/CasePhenotypeGrid.vue:381, 446, 448

Remove debug console.log statements before merging.

2. Hardcoded color values

Location: Multiple locations in tooltip generation and CSS

Extract color values to SCSS variables for consistency and easier theming.

3. Magic numbers

Location: frontend/src/api/case-phenotype.ts:105

The CASE_BATCH_SIZE of 5 could use more detailed documentation explaining why this specific value.

🐛 Potential Bugs & Edge Cases

1. Race condition with tab switching

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:1038-1041

Rapid tab switching could cause multiple API requests with responses arriving out of order. Consider adding request cancellation.

2. Missing null checks

Location: frontend/src/util/case-phenotype-matrix.ts:1158

Replace non-null assertion operator (!) with proper null checks.

⚡ Performance Considerations

Sequential API batching

Location: frontend/src/api/case-phenotype.ts:148-170

Consider using Promise.all() for parallel batching to significantly reduce load time when there are many cases.

O(n) lookups in rendering

Location: frontend/src/components/CasePhenotypeGrid.vue

The getPhenotypeLabel function does O(n) array.find() on every render. Memoize with a computed Map for better performance.

🧪 Test Coverage - HIGH PRIORITY

This PR adds 1,197 lines of code across 7 files with 0 test files.

Tests Needed:

Unit tests for case-phenotype-matrix.ts utility functions
API batching logic tests
Component tests for grid expansion, cell clicks, tooltips
E2E tests for disease page integration

Example:

// frontend/unit/case-phenotype-matrix.test.ts
test('findBinForPhenotype returns OTHER for unknown closure', () => {
  expect(findBinForPhenotype(['HP:0000001'])).toBe('OTHER');
  expect(findBinForPhenotype(undefined)).toBe('OTHER');
});

🎯 Priority Action Items

Must Fix Before Merge

❗ Remove console.log statements
❗ Add unit tests for case-phenotype-matrix.ts

Should Fix Before Merge

⚠️ Extract hardcoded colors to SCSS variables
⚠️ Add request cancellation for tab switching
⚠️ Replace non-null assertions with proper null checks
⚠️ Add basic E2E test for disease page integration

Consider for Future PR

Optimize batching with parallel requests
Memoize phenotype label lookups
Add keyboard navigation for accessibility

Overall Assessment

Rating: 8/10 - Very Good

Substantial, well-architected feature with strong TypeScript/Vue 3 implementation. Main concerns are test coverage and minor code quality issues. Once critical items are addressed, this will be an excellent addition.

Great work on this feature! 🎉

…optimization Implements the case-phenotype matrix backend per the TDD implementation plan: - Add /v3/api/case-phenotype-matrix/{disease_id} endpoint - Use Solr JOIN query to fetch all phenotype associations in single request - Add facet queries for HistoPheno bin counts - Support direct_only parameter for filtering to exact disease matches - Add limit enforcement with clear error messages - Include data models: CasePhenotypeMatrixResponse, CaseEntity, CasePhenotype, HistoPhenoBin, CasePhenotypeCellData - Add HISTOPHENO_BIN_LABELS mapping for human-readable bin names - Add comprehensive unit and integration tests This reduces frontend API calls from 40+ to 1 for building the case-phenotype grid. Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-23T23:37:03Z

Code Review: PR 1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive PR! The implementation is well-structured with good separation of concerns.

OVERALL ASSESSMENT

Strengths:

Excellent test coverage with 60+ tests
Clean architecture with clear separation between API, business logic, and presentation
Well-documented code
Good use of modern Python and TypeScript patterns

Critical Issues to Address:

XSS vulnerabilities in frontend tooltip/link rendering
Unreachable code/potential runtime error in bin assignment logic
Missing HTTP request timeout (DoS risk)
Race conditions in frontend data fetching

CRITICAL ISSUES (MUST FIX)

Backend: Unreachable Code in _find_bin_for_phenotype
File: backend/src/monarch_py/utils/case_phenotype_utils.py:875-899
Line 899 will raise KeyError when intersection is empty. The return intersection.pop() contradicts the comment.
Fix: Return None instead of intersection.pop()
Backend: Missing HTTP Timeout
File: backend/src/monarch_py/implementations/solr/solr_implementation.py:620
Request could hang indefinitely on slow/unresponsive Solr.
Fix: Add timeout parameter: requests.get(url, timeout=30)
Frontend: XSS Vulnerability in Tooltip HTML
File: frontend/src/components/CasePhenotypeGrid.vue:128-162
Case labels/IDs are not HTML-escaped in tooltip strings.
Fix: Escape all user-provided data in getCaseTooltip(), getBinTooltip(), and getCellTooltip()
Frontend: XSS in External Links
File: frontend/src/components/CasePhenotypeModal.vue:54-55
Publication URLs are not validated and could use javascript: protocol.
Fix: Validate URLs to only allow http/https protocols and add rel=noopener noreferrer

HIGH PRIORITY ISSUES

Backend: Overly Restrictive MONDO ID Pattern
File: backend/src/monarch_py/api/case_phenotype.py:23
Requires exactly 7 digits, will reject valid IDs with different lengths.
Frontend: Performance - O(n) Lookups in Render Loop
File: frontend/src/components/CasePhenotypeGrid.vue:180-183
getPhenotypeLabel() performs find() on entire array for every cell. With 100 cases × 500 phenotypes = 50,000 iterations.
Fix: Use computed Map for O(1) lookups
Frontend: Race Condition in Data Fetching
File: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:204-230
No cancellation of in-flight requests when component unmounts.
Fix: Use AbortController
Frontend: Inefficient Count Fetching
File: frontend/src/api/case-phenotype.ts:210-216
Makes two full API calls with limit=1000 just to get counts.

MEDIUM PRIORITY ISSUES

Backend: Silent exception swallowing without logging (solr_implementation.py:633-637)
Backend: Unused import urlencode (solr_implementation.py:604)
Frontend: Debug console.log statements (CasePhenotypeGrid.vue:166, 231, 233)
Frontend: Fragile error parsing with regex (case-phenotype.ts:105, 228, 240)
Frontend: Missing ARIA attributes for accessibility

TEST COVERAGE: 7/10

Good unit test coverage but integration tests need strengthening. Missing tests for: Solr connection failures/timeouts, concurrent requests, Unicode/special characters, security (injection, XSS), and boundary values.

SUMMARY CHECKLIST

Before merging:

Fix unreachable code in _find_bin_for_phenotype
Add HTTP timeout to Solr requests
Fix XSS vulnerabilities in tooltips and links
Fix race condition in data fetching
Optimize phenotype label lookups
Fix MONDO ID regex pattern
Remove debug console.log statements
Add ARIA attributes for accessibility

Great work on this feature! Once the critical security and bug fixes are applied, this will be ready to merge.

- Use node.association_counts to check case count before API calls - Skip expensive matrix API for diseases with 0 or >100 cases - Move grid to render after Cases section for better layout - Remove unused getCaseCounts() function - Reduce limit from 1000 to 100 cases for performance Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-24T00:09:50Z

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive feature addition! I've completed a thorough review. Overall, the implementation is well-structured with good separation of concerns.

Summary

High Severity: 7 issues (Security, data consistency, validation)
Medium Severity: 16 issues (Error handling, performance, logic)
Low Severity: 6 issues (Code quality, documentation)

🔴 High Priority - Must Fix Before Merge

1. Solr Injection Risk

File: backend/src/monarch_py/implementations/solr/solr_query_utils.py:374, 425

The disease_id is not escaped before insertion into Solr query. Apply escape() function.

2. XSS via HTML Injection

File: frontend/src/components/CasePhenotypeGrid.vue:128-139

User labels concatenated directly into HTML. Use proper escaping or DOM construction.

3. Dead Code

File: frontend/src/util/case-phenotype-matrix.ts:52-148

The buildMatrix() function is never used - remove it and findBinForPhenotype().

4. Race Condition in Tab Switching

File: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:210-215

Setting selectedTab triggers watcher which calls fetchMatrix() again. Add guard flag.

5. Missing Response Validation

File: frontend/src/api/case-phenotype.ts:118

Validate backend response structure before transforming to prevent crashes.

6. Console Logs in Production

File: frontend/src/components/CasePhenotypeGrid.vue:166, 231, 233

Remove debug console.log statements.

7. Silent Phenotype Filtering

File: backend/src/monarch_py/utils/case_phenotype_utils.py:103-104

Phenotypes without HistoPhenoBin matches are silently dropped. Log warnings or include in OTHER bin.

🟡 Medium Priority

Overly broad exception handling (catches all Exception types)
Python 3.8 type compatibility (use typing.Tuple not tuple)
Fragile error message regex parsing
Missing frequency field display
Publication links missing noreferrer
Test coverage gaps for edge cases
Integration test fragility

🟢 Low Priority

Unused variables
Documentation typos
Hardcoded field names
UI flicker during loading

✅ Strengths

Clean architecture with good separation of concerns
Excellent PR documentation with diagrams
Comprehensive testing (with noted gaps)
Backend query optimization is excellent
Good UX with expandable bins and tooltips
Strong type safety

Great work! Please address the 7 high-priority issues before merging. Happy to discuss any feedback.

claude · 2026-01-28T22:22:14Z

Pull Request Review: Case-Phenotype Grid Visualization

Overview

This PR adds a sophisticated visualization feature for disease pages that displays a matrix of cases (phenopacket patients) and their phenotypes, grouped by body systems. The implementation includes both backend and frontend components with comprehensive test coverage.

Code Quality & Best Practices

✅ Strengths

Well-Documented Code: Excellent inline documentation throughout, especially in query building functions (e.g., build_case_phenotype_query with detailed explanation of JOIN logic)
Strong Type Safety: Comprehensive TypeScript interfaces and Python datamodels with proper typing throughout
Separation of Concerns: Clear architectural layering:
- API endpoint validation (case_phenotype.py:20-48)
- Query building (solr_query_utils.py:331-429)
- Data transformation (case_phenotype_utils.py)
- UI components (Grid, Modal, Section)
Comprehensive Test Coverage: Multiple test levels:
- Unit tests for matrix building, query construction, bin assignment
- Integration tests for API and Solr queries
- Performance tests
Performance Optimization:
- Uses association_counts to gate expensive API calls (SectionCasePhenotypeGrid.vue:116-121)
- Single JOIN query replaces 40+ sequential requests
- Configurable limits with clear error messages
User Experience:
- Progressive enhancement (loading states, error handling)
- Automatic tab switching when no direct cases exist (SectionCasePhenotypeGrid.vue:211-223)
- Informative tooltips and modal details

Potential Issues & Concerns

🔴 Critical

1. Hardcoded Row Limit May Cause Data Loss

Location: backend/src/monarch_py/implementations/solr/solr_query_utils.py:334

def build_case_phenotype_query(
    disease_id: str,
    direct_only: bool,
    rows: int = 50000,  # ⚠️ Hard limit
)

Issue: If a disease has many cases with many phenotypes each, the phenotype query could return >50,000 rows and silently truncate results. For example, 100 cases × 500 phenotypes each = 50,000 associations (already at the limit).

Impact: Grid would show incomplete data without warning to the user.

Recommendation:

Add pagination or streaming for large result sets
Validate that numFound matches returned rows and warn if truncated
Calculate reasonable limit based on case count (e.g., rows = min(50000, cases * 1000))

2. Missing Input Sanitization in ID Normalization

Location: backend/src/monarch_py/api/case_phenotype.py:38-40

normalized = disease_id.strip().upper()
if "_" in normalized and ":" not in normalized:
    normalized = normalized.replace("_", ":", 1)  # Only replaces first occurrence

Issue: The replacement logic replace("_", ":", 1) only replaces the first underscore. This could create invalid IDs like:

Input: "MONDO_0007_078" → Output: "MONDO:0007_078" (invalid)
Input: "mondo__0007078" → Output: "MONDO:_0007078" (invalid)

Recommendation: Either:

Remove the normalization entirely (rely on regex validation)
Or use: normalized.replace("_", ":", 1).replace("_", "") to strip remaining underscores

🟡 High Priority

3. Potential Race Condition in Tab Switching

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:211-223

if (isDirect && result && result.totalCases === 0) {
  if (caseCountFromAssociations.value > 0) {
    selectedTab.value = "all";
    matrix.value = await getCasePhenotypeMatrix(props.node.id || "", {
      direct: false,
      limit: MAX_CASES_LIMIT,
    });
    return;
  }
}

Issue: User could click tabs rapidly while auto-switching is happening, causing state inconsistency.

Recommendation: Add loading lock:

const isFetching = ref(false);

async function fetchMatrix() {
  if (isFetching.value) return;
  isFetching.value = true;
  try {
    // ... existing code
  } finally {
    isFetching.value = false;
  }
}

4. Unclear Error Messages for Bin Assignment Failure

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:101-104

bin_id = _find_bin_for_phenotype(object_closure, bin_ids)
if bin_id is None:
    continue  # Silently skips phenotype

Issue: Phenotypes without valid bins are silently dropped. Users won't know why expected phenotypes are missing from the grid.

Recommendation: Log a warning or track count of skipped phenotypes and include in response metadata.

5. Inconsistent Limit Defaults

The limit default varies across the codebase:

API endpoint: limit: int = 1000 (case_phenotype.py:99)
Solr implementation: limit: int = 200 (solr_implementation.py:579)
Frontend section: MAX_CASES_LIMIT = 100 (SectionCasePhenotypeGrid.vue:110)

Issue: This creates confusion about the actual enforced limit.

Recommendation: Define a single constant shared across backend/frontend or document why different layers have different defaults.

🟢 Low Priority / Nice-to-Have

6. Magic Numbers in Vue Component

Location: frontend/src/components/CasePhenotypeGrid.vue:56-68

The tooltip formatter hardcodes HTML structure and styling. Consider extracting to a separate formatter function for testability.

7. Potential Memory Issue with Large Maps

Location: frontend/src/api/case-phenotype.ts:168-177

const cells = new Map<string, CasePhenotypeCellData>();
for (const [key, cell] of Object.entries(data.cells)) {
  cells.set(key, {...});
}

For 100 cases × 100 phenotypes = 10,000 Map entries. This is fine for current limits, but document maximum expected size.

8. Missing Accessibility Features

Grid cells and bin rows need aria-label attributes for screen readers
Keyboard navigation not implemented for grid traversal
Modal doesn't trap focus when open

9. Test Coverage Gaps

While test coverage is generally good, I noticed:

No integration test for the auto-tab-switching behavior
Missing test for case limit exceeded error in frontend
No test for the MONDO ID normalization edge cases mentioned above

Performance Considerations

✅ Good:

JOIN query optimization reduces network overhead significantly
Facet queries run in parallel with main query
Frontend uses association_counts to gate expensive calls
Computed properties minimize re-renders

⚠️ Watch:

Large Maps (10k+ entries) could impact memory on older devices
Re-fetching entire matrix on tab switch (could cache both tabs)
No debouncing on rapid navigation

Security Concerns

✅ Low Risk: This feature is read-only and doesn't handle user-generated content.

Minor Concerns:

ID Injection: The MONDO ID regex (^MONDO:\d{7}$) prevents injection, but the normalization logic (issue Host from GCP #2 above) happens before validation
No Rate Limiting: Documented on API endpoint but consider adding client-side throttling for navigation events

Summary & Recommendations

This is a high-quality, well-architected feature with excellent documentation and test coverage. The main concerns are:

Must Fix Before Merge:

Row limit truncation (Critical) - Add validation/pagination
ID normalization logic (Critical) - Fix or remove

Should Fix Before Merge:

Tab switching race condition (High) - Add loading lock
Standardize limit defaults (High) - Document or unify

Consider for Follow-up PR:

Accessibility improvements
Error logging for skipped phenotypes
Client-side caching of both tabs

Overall assessment: Approve with requested changes 🟢

The architectural decisions are sound, performance optimizations are effective, and the feature provides clear value. Addressing the critical issues will make this production-ready.

Great work on the comprehensive test coverage and detailed documentation! 🎉

🤖 Generated via Claude Code PR review

The API returns null when totalCases === 0, but the auto-switch logic was checking for `result && result.totalCases === 0` which never matched. Changed to `!result` so diseases with only indirect cases now correctly auto-switch to the All tab on load. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Solr creates multiValued fields by default when the schema isn't pre-defined. This causes validation errors when new scalar fields (like has_biological_sex) are added to the Pydantic model but Solr returns them as lists. Added normalize_solr_doc_for_model() utility that inspects model type hints and converts single-element lists to scalars for fields expecting scalar types. Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-29T03:31:47Z

PR Review: Case-Phenotype Grid Visualization

Thank you for this comprehensive feature addition! I've conducted a thorough review covering code quality, security, performance, and test coverage.

🔴 CRITICAL ISSUES (Must Fix Before Merge)

1. Solr Injection Vulnerability

Location: backend/src/monarch_py/implementations/solr/solr_query_utils.py:373-374, 425

The code directly injects disease_id into Solr queries without proper escaping. While the API validates the disease_id format with regex, defense-in-depth principles require sanitization at the query construction level.

Fix: Use the existing escape() utility from monarch_py.utils.utils

2. Missing Dependencies - Broken Imports

Location: Multiple frontend files

The following imports reference files that do not exist in this PR:

frontend/src/api/case-phenotype-types.ts imports from "./entity-grid/types"
frontend/src/components/CasePhenotypeGrid.vue imports EntityGrid components
frontend/src/pages/PageTestbed.vue imports EntityGrid components

Required files missing:

/frontend/src/api/entity-grid/types.ts
/frontend/src/components/EntityGrid/EntityGrid.vue
/frontend/src/components/EntityGrid/EntityGridModal.vue
/frontend/src/components/EntityGrid/entity-grid-utils.ts

TypeScript compilation will fail without these files.

🟡 HIGH PRIORITY ISSUES

3. Network Error Handling

Location: backend/src/monarch_py/implementations/solr/solr_implementation.py:782-784

Network failures propagate as generic exceptions. Wrap requests in try-except to provide meaningful error messages (503 for network errors, 502 for JSON decode errors).

4. TypeScript Type Safety

Location: frontend/src/components/CasePhenotypeGrid.vue:76

Using any type defeats TypeScript's purpose. Should use CellData | null

5. Uncaught Promise Rejection

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:218

The nested getCasePhenotypeMatrix call in auto-switch logic could throw but isn't properly wrapped in error handling.

🟢 MEDIUM PRIORITY ISSUES

6. Missing JSDoc Comments

Per frontend/CONTRIBUTING.md:193, use JSDoc comments instead of regular comments. Missing on computed properties and functions in multiple files.

7. Accessibility Issues

Location: frontend/src/components/CasePhenotypeModal.vue

Should use semantic HTML (dl, dt, dd) instead of divs
Publication links don't indicate they open in new tab
Status colors rely only on color (add icons or text indicators)

8. Vague Exception Messages

Location: backend/src/monarch_py/api/case_phenotype.py:136-142

Broad exception catching masks real errors (network, Solr) as "not found". Add logging and be more specific.

9. Unreachable Code

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:117-141

Fallback return statement is unreachable because the loop iterates all possible bin IDs.

📊 TEST COVERAGE ANALYSIS

Backend Tests: B+ Grade

✅ Strengths: Excellent unit test coverage (889 lines), good integration tests, parametrized tests
⚠️ Gaps:

No tests for 100 case limit mentioned in PR (tests use 200)
No tests for disease grouping/class filtering
No tests for phenotypes without object_closure (OTHER bin fallback)
Performance tests only cover small/empty diseases

Frontend Tests: F Grade

❌ CRITICAL GAP: No frontend tests exist for:

Utility functions (case-phenotype-matrix.ts)
API layer (case-phenotype.ts)
Vue components
No E2E tests for grid interactions

Recommendation: Add unit tests for core functions and E2E test for the complete user flow.

✅ POSITIVE ASPECTS

Excellent architecture - Clear separation between API, query builders, and utilities
Good type safety - TypeScript interfaces are well-defined
Smart gating - Using association_counts to avoid expensive calls
Performance optimization - Backend consolidates 40+ API calls into 1 Solr query
Comprehensive PR description - Clear architecture diagrams
Proper Vue 3 patterns - Using Composition API correctly

📝 RECOMMENDATIONS

Before Merge:

Fix Solr injection vulnerability (add escape() calls)
Add missing EntityGrid component files or refactor imports
Fix TypeScript any type usage
Improve error handling in async auto-switch logic
Add JSDoc comments per style guide
Improve accessibility (semantic HTML, ARIA labels)

Follow-up PRs:

Add frontend unit and E2E tests
Test edge cases (100+ cases, missing closure, disease grouping)
Consider virtual scrolling for large grids
Extract magic numbers to configuration
Add performance monitoring

🎯 SUMMARY

This is a valuable feature with solid backend implementation. Main concerns:

Security: Solr injection must be fixed
Missing files: EntityGrid components must be added
Frontend testing: No tests exist for frontend code
Error handling: Several improvements needed

Once the critical issues are addressed, this will be a great addition to the Monarch App!

Phenotypes can belong to multiple body system bins (e.g., an eye phenotype is also a head/neck phenotype). Previously, each phenotype was assigned to only ONE bin based on priority order, causing bins like Eye, Ear, and Connective Tissue to show counts but have no phenotypes to display. Changes: - Add phenotype_ids field to HistoPhenoBin model - Build phenotype_ids by checking each phenotype's closure against ALL bins, not just the first matching one - Frontend now uses phenotype_ids directly from backend response This allows the same phenotype to correctly appear in multiple bin expansions (e.g., "Supranuclear gaze palsy" appears in both Eye and Head/Neck bins). Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-29T04:27:23Z

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive feature addition! The case-phenotype matrix visualization is well-architected with excellent separation of concerns. The PR includes 889 lines of new tests across 7 test files, demonstrating strong testing discipline.

Overview

Architecture: ✅ Excellent

Clean separation: API → Solr implementation → utility functions
Well-structured frontend with proper Vue 3 Composition API usage
Comprehensive data models with LinkML schema definitions

Test Coverage: ✅ Strong

7 test files (5 unit, 3 integration) covering core functionality
Performance benchmarks included
Total additions: 3,306 lines with 889 test lines (~27% test coverage)

Critical Issues 🔴

1. Security: Potential Solr Query Injection

Location: backend/src/monarch_py/implementations/solr/solr_query_utils.py:371-375, 425

The disease_id parameter is directly interpolated into Solr queries without proper escaping. While the API layer validates with regex MONDO:\d{7}, the utility functions are reusable and could be called without validation.

Recommendation: Apply the escape() function consistently (as done in build_multi_entity_association_query() at line 156)

2. Frontend: Missing Component Dependencies

Location: frontend/src/components/CasePhenotypeGrid.vue:24-25

References components that don't exist in the codebase. This will prevent compilation.

Action Required: Verify these dependencies exist or provide implementations before merge.

3. XSS Risk: Unsanitized HTML in Tooltips

Location: frontend/src/components/CasePhenotypeGrid.vue:56-64

HTML is constructed from user data without sanitization. If backend returns malicious content, it could execute JavaScript when rendered.

Recommendation: Use text interpolation with structured DOM elements instead of HTML strings, or sanitize with DOMPurify.

High Priority Issues 🟠

4. Overly Broad Exception Handling

Location: backend/src/monarch_py/api/case_phenotype.py:138-142

Catches ALL exceptions and masks unexpected errors. Makes debugging difficult.

Fix: Replace with specific exception types and add logging.

5. Silent Data Loss: Phenotypes Without Bins

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:101-104

Phenotypes that don't match HistoPheno bins are silently excluded without logging.

Recommendation: Add logging and consider an "Uncategorized" bin.

6. Fragile Regex Error Parsing

Location: frontend/src/api/case-phenotype.ts:105-113

Regex assumes exact backend error format. Missing radix parameter on parseInt() and no NaN validation.

Fix: Add radix parameter, validate extracted numbers, and add logging for parse failures.

Medium Priority Issues 🟡

7. Performance: Redundant Entity Name Fetch

Location: backend/src/monarch_py/implementations/solr/solr_implementation.py:742-799

Makes an additional network call after already fetching case and phenotype data.

Solution: Extract disease name from case association docs.

8. Input Validation: Incomplete Normalization

Location: backend/src/monarch_py/api/case_phenotype.py:39-40

Only replaces FIRST underscore. Edge case: MONDO_0007_078 becomes MONDO:0007_078 (invalid).

9. Type Safety: Unsafe any Type

Location: frontend/src/components/CasePhenotypeGrid.vue:75-76

Using any type disables type checking. Should use proper type or unknown with narrowing.

10. Missing Logging Throughout

No logging for debugging or performance monitoring. Would be valuable for production troubleshooting.

Positive Highlights ✨

Excellent Test Coverage: 889 lines across 7 test files with unit, integration, and performance tests
Smart Query Optimization: Uses Solr JOIN queries to avoid N+1 pattern
Proper Vue 3 Patterns: Clean Composition API usage with TypeScript
Performance Guardrails: 1-100 case limit prevents expensive queries
Comprehensive Documentation: LinkML schema definitions and detailed docstrings
Custom Error Classes: CaseLimitExceededError provides clear UX feedback
Defense in Depth: Multiple validation layers (API, Solr, client)

Test Coverage Summary

Type	Files	Lines	Coverage
Unit Tests	5	602	Models, builders, queries, bins
Integration Tests	3	287	API endpoints, Solr, performance
Total	8	889	27% of PR additions

Strong areas: Core matrix building, model validation, API contract

Gap: Frontend components lack unit tests (Vue Test Utils recommended)

Recommendations for Merge

Must Fix Before Merge:

✅ Resolve missing EntityGrid component references (build blocker)
✅ Apply escape() to disease_id in Solr queries (security)
✅ Sanitize HTML in tooltip formatters (XSS prevention)
✅ Replace broad exception handlers with specific types

Should Fix Before Merge:

Add logging for silent data loss scenarios
Fix regex parsing fragility with proper validation
Remove redundant entity name fetch

Can Address in Follow-up:

Add frontend component unit tests
Improve type safety (remove any types)
Add performance monitoring instrumentation

Overall Assessment

This is a well-engineered feature with strong architectural foundations and comprehensive backend testing. The critical issues are fixable and primarily involve applying existing security patterns consistently and resolving missing dependencies.

Once the 4 critical issues are addressed, this PR will be in excellent shape for merge. Great work on the test coverage and documentation!

Reviewed by: Claude Sonnet 4.5

Files Reviewed: 26 files (15 backend, 9 frontend, 8 tests)

Review Date: 2026-01-29

Backend (Phase 1): - New /entity-grid/{context_id} API endpoint with configurable traversals - Grid configs for association categories and row grouping - Grid groupings for histopheno row organization - Entity grid utilities for column building and cell aggregation - Extend model with GridColumnEntity and GridResponse types Frontend (Phase 2): - EntityGrid.vue: Generic configurable grid component with diagonal headers - EntityGridModal.vue: Detail modal for cell inspection - Entity grid API client with type-safe queries - Grid utilities for column grouping and cell formatting Playground (Phase 3): - PageGridPlayground.vue for interactive grid testing - Entity search, association category selection, row grouping options Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-30T18:26:40Z

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive feature addition! This is a well-structured PR with excellent documentation and test coverage. However, I've identified several critical security and code quality issues that should be addressed before merging.

Critical Issues (Must Fix)

1. Solr Injection Vulnerability (Backend)

Files: backend/src/monarch_py/implementations/solr/solr_query_utils.py
Lines: 369-375, 422-428, 453-456

User-provided IDs are interpolated directly into Solr queries without escaping. Apply the escape() utility function to all user-provided IDs.

2. XSS Vulnerability via Tooltip HTML (Frontend)

Files: frontend/src/components/EntityGrid/EntityGrid.vue, CasePhenotypeGrid.vue
Lines: EntityGrid.vue: 184-312, CasePhenotypeGrid.vue: 58-66

Tooltip HTML is constructed from unsanitized API data. Use DOMPurify or replace with Vue template rendering.

3. Missing Pagination (Backend API)

Current API returns ALL results up to limit in single response. Add pagination with offset/limit parameters.

4. Broad Exception Catching (Backend)

File: backend/src/monarch_py/api/case_phenotype.py
Lines: 136-148

ALL exceptions converted to 404 errors. Catch specific exceptions instead.

5. TypeScript any Type (Frontend)

File: frontend/src/components/CasePhenotypeGrid.vue
Lines: 77-78

Replace any type with proper type definition.

High Priority Issues

6. No Accessibility Support

Missing keyboard navigation, ARIA attributes, and focus indicators throughout grid components.

7. Memory/Performance Issues

Backend default 50,000 rows, frontend no virtual scrolling for large grids.

8. No Rate Limiting

Public endpoints with no throttling for expensive JOIN queries.

Medium Priority

9. Code Duplication

Entity validation logic duplicated across files.

10. Inefficient Nested Loop

case_phenotype_utils.py lines 195-200: Use set intersection instead.

11. MONDO ID Validation

Regex only accepts 7 digits, should accept 1-7.

Strengths

Excellent test coverage (15 test files)
Comprehensive documentation
Well-structured code
Performance-conscious (Solr JOIN queries)
Good error handling

Risk Assessment

Security: HIGH (Injection, XSS)
Performance: MEDIUM (Large result sets)
Accessibility: MEDIUM (No keyboard nav)
Maintainability: LOW (Well-structured)

Action Plan

Fix critical security issues (1-5)
Add security tests
Follow-up PR for accessibility/performance (6-8)

Overall solid implementation. Address security vulnerabilities before merge.

Uses a reverse JOIN query to filter column entities that have no second-hop (row) associations. For ortholog-phenotype grids, this excludes orthologs without phenotype data from the grid columns. The JOIN query finds column associations where the column entity appears as the context in at least one row association, reducing network transfer and eliminating empty columns from the response. Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-30T18:53:32Z

PR Review: Case-Phenotype Grid Visualization

Comprehensive review completed for PR #1247. This is impressive, well-architected work with excellent test coverage.

Strengths

Architecture & Design

Clean separation of concerns across API, data models, business logic, and UI
Generic grid framework supporting multiple types (case-phenotype, disease-phenotype, ortholog-phenotype)
Configuration-driven approach makes adding new grid types easy
Reusable EntityGrid.vue component

Code Quality

Strong type safety with TypeScript interfaces and Python dataclasses
Excellent documentation with clear docstrings and comments
Well-structured, focused utility functions
Proper input validation with regex patterns

Test Coverage

Comprehensive unit, integration, and performance tests
Edge cases covered: empty inputs, limits exceeded, invalid IDs, wrong entity types

Issues Found

Medium Priority

Solr Injection Risk (backend/src/monarch_py/implementations/solr/solr_query_utils.py)
- escape() function only handles colons, not other Solr special characters
- Risk mitigated by regex validation, but consider comprehensive escaping
Large Result Sets (solr_query_utils.py:405)
- Default 50,000 max rows could cause memory issues
- Consider pagination or document safety
Error Handling (solr_implementation.py:637)
- Silently returns empty list on Solr errors
- Add validation for expected response structure
Missing Type Annotation (case_phenotype.py:51)
- _is_disease_entity() parameter lacks type hint

Low Priority

Consider batch loading for multiple grids
Extract magic numbers to constants
Document test data choices

Assessments

Security: Low Risk - Input validation, no user-generated content in queries, proper error handling

Performance: Good - Limit parameters, efficient JOIN queries, faceted queries, performance tests included

Recommendations

Should Fix Soon:

Enhance Solr query escaping
Add explicit Solr response validation
Add type hints to _is_disease_entity()

Nice to Have:

Response caching
Extract frontend constants
Query performance monitoring

Summary

Category	Rating
Code Quality	5/5
Security	4/5
Performance	4/5
Test Coverage	5/5
Maintainability	5/5

Overall: High-quality work demonstrating strong engineering practices. The generic grid framework is particularly well-designed. Issues identified are minor and preventative.

Great job!

- Left-align phenotype/system column content in EntityGrid - Default to showing column labels and category grid lines in playground - Replace checkbox/radio controls with styled toggle switches showing on/off state Co-Authored-By: Claude Opus 4.5 <[email protected]>

claude · 2026-01-30T22:45:59Z

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Thank you for this comprehensive PR! This is a well-architected feature with excellent separation of concerns and solid documentation.

Overall Assessment

Strengths:

Clear architecture with good separation between API layer, utilities, and data models
Strong typing throughout (Pydantic models + TypeScript)
Comprehensive test coverage including unit, integration, and performance tests
Excellent documentation in the PR description

Areas Requiring Attention:

Security: Input validation and escaping needs strengthening
Bug Risk: Edge cases in cell key generation and race conditions
Performance: Opportunities for caching and optimization

CRITICAL Issues (Please Address Before Merge)

1. Solr Injection Risk - Input Escaping Insufficient

Location: backend/src/monarch_py/implementations/solr/solr_query_utils.py:371-375

While case_phenotype.py has strict MONDO ID validation, these query-building functions are reusable and could be called from contexts with looser validation. Direct string interpolation creates injection risk. The escape function in utils.py only escapes colons, not all Solr special characters.

2. Generic Entity Grid Lacks Input Validation

Location: backend/src/monarch_py/implementations/solr/solr_implementation.py:742-820

The context_id parameter is not validated before being used in Solr queries.

3. Cell Key Collision Risk

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:220

If a case ID contains a colon (e.g., PMID:12345), parsing will fail. Use a safer delimiter.

HIGH Priority Issues

4. Race Condition in Tab Switching

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:186-232

Multiple rapid tab switches could result in stale data. Implement request cancellation using AbortController.

5. Automatic Tab Switching Without User Notification

Location: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:213-221

The code silently switches tabs which could confuse users. Add a notification.

6. Missing Security Test Coverage

No tests for injection attempts, malformed IDs, or special characters.

MEDIUM Priority Issues

7. Bin Assignment Fallback is Non-Deterministic

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:117-141

8. Silent Filtering of Unbinned Phenotypes

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:100-104

9. Error Message Parsing is Brittle

Location: frontend/src/api/case-phenotype.ts:100-116

10. Inefficient Bin Building Loop

Location: backend/src/monarch_py/utils/case_phenotype_utils.py:188-200

Recommendations

Before Merge:

Implement comprehensive Solr escaping
Add input validation to generic entity grid
Fix cell key delimiter collision
Add security test cases

Short-term:
5. Implement request cancellation
6. Add user notifications
7. Expand test coverage
8. Fix error message parsing

Long-term:
9. Add caching layer
10. Implement pagination
11. Add rate limiting

Conclusion

This is a well-designed feature with solid architecture. However, the security concerns around input validation and Solr injection should be addressed before merging to production.

Great work overall!

…lection Adds subject/object category metadata to association type mappings, a new API endpoint to query traversable associations by entity type, and frontend logic to dynamically populate column/row options based on entity category. Co-Authored-By: Claude Opus 4.6 <[email protected]>

claude · 2026-02-10T19:04:16Z

Code Review: PR #1247 - Case-Phenotype Grid Visualization

Executive Summary

This PR delivers excellent architectural improvements, reducing API calls from 40+ to 1 through efficient Solr JOIN queries. The implementation is well-designed, type-safe, and includes comprehensive test coverage. However, several security, performance, and code quality issues should be addressed before merging.

Recommendation: ✅ Approve with Changes (address high-priority items first)

🎯 Positive Findings

Architecture & Design Excellence

Brilliant Query Optimization: The Solr JOIN query approach replacing 40+ sequential API calls with a single 2-hop query is a significant performance win
Generic & Reusable: The entity grid infrastructure is well-designed for extensibility with configuration-based patterns
Clear Separation of Concerns: Clean layering in both backend (API → Implementation → Utils) and frontend (API Client → Components → Types)
Strong Type Safety: Comprehensive TypeScript typing with proper transformations between snake_case/camelCase
Comprehensive Testing: 7 backend test files covering unit, integration, and performance tests with realistic thresholds

🚨 Issues Found

CRITICAL - Must Fix Before Merge

1. Solr Injection Vulnerability (Security)

Files: backend/src/monarch_py/implementations/solr/solr_query_utils.py:374, 425

Disease IDs are inserted into Solr queries with inadequate escaping. The escape() function only handles colons, missing special Solr characters like ", *, (, ), OR, AND.

Risk: While the API validates MONDO IDs with regex, if these query builders are reused elsewhere, this creates injection vulnerabilities.

Fix: Enhance escape function to handle all Solr special characters

MAJOR - Should Fix Before Merge

2. Race Condition in Frontend Data Fetching (Bugs)

File: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:261-263

The watch on route changes can cause race conditions where stale data displays if route changes during a fetch.

Fix: Add AbortController to cancel in-flight requests when route changes.

3. Performance: Inefficient Bin Summary Calculation (Performance)

File: frontend/src/components/EntityGrid/entity-grid-utils.ts

getBinCellSummary recalculates on every render. With 20 bins × 100 columns = 2,000 calls per render.

Fix: Memoize summaries in a computed property with Map cache.

4. No Upper Limit Validation on Solr Results (Performance)

File: backend/src/monarch_py/implementations/solr/solr_query_utils.py:63

rows=50000 is hardcoded. A disease with 100 cases × 500 phenotypes = 50,000 associations could hit this limit silently.

Fix: Add explicit limit checking and return warnings when approaching limits.

5. Missing Null Checks in Cell Data Access (Bugs)

File: frontend/src/components/EntityGrid/EntityGrid.vue:255-263

The isCellPresent function doesn't validate cell data structure.

Fix: Add validation to warn when cell data is missing expected fields.

MINOR - Can Address in Follow-up

6. XSS Risk in Tooltip HTML (Security)

Tooltip content constructed with raw HTML from user data. Fix with DOMPurify or component slots.

7. Expanded Bin State Not Reset (Bugs)

expandedBin ref persists across matrix changes if component is reused.

8. Inconsistent Limit Values (Code Quality)

Frontend MAX_CASES_LIMIT = 100 but backend allows up to 1000. Document why they differ.

9. Missing Edge Case Tests (Testing)

No tests for Unicode characters in case IDs, empty bins, negated-only phenotypes, or partial Solr results.

10. Hardcoded Bin Definitions (Maintainability)

HistoPheno bins defined separately in backend and frontend. Consider single source of truth.

📊 Analysis by Category

Code Quality: ⭐⭐⭐⭐ (Very Good)

✅ Excellent type hints and dataclasses
✅ Strong TypeScript typing
✅ Proper Vue 3 Composition API patterns
✅ Good inline documentation

Performance: ⭐⭐⭐⭐ (Great)

✅ Single JOIN query replaces 40+ calls
✅ Facet queries for aggregations
✅ Smart filtering with association_counts
✅ Performance tests with 2-3s thresholds
⚠️ Missing memoization for bin summaries

Security: ⭐⭐⭐ (Good)

✅ Strict input validation with regex
✅ Proper HTTP status codes
⚠️ Solr escaping insufficient (critical)
⚠️ No HTML sanitization in tooltips

Testing: ⭐⭐⭐⭐ (Comprehensive)

✅ Unit tests for query builders
✅ Integration tests for API
✅ Performance benchmarks
⚠️ Missing edge case coverage

🎯 Recommendations

High Priority (Before Merge)

Fix Solr injection escaping
Add fetch abort controller
Memoize bin summaries
Add row limit validation
Add null checks for cell data

Medium Priority (Near-Term)

Add edge case tests
Sanitize tooltip HTML
Reset expanded state on matrix change
Document limit differences

Low Priority (Future)

Centralize bin definitions
Add retry logic for network failures
Add grid virtualization for large datasets
Add E2E tests

🏁 Overall Assessment

Summary Ratings:

Architecture: ⭐⭐⭐⭐⭐ (Excellent design, highly reusable)
Code Quality: ⭐⭐⭐⭐ (Very good, minor improvements needed)
Performance: ⭐⭐⭐⭐ (Great optimization, some edge cases)
Security: ⭐⭐⭐ (Good validation, injection risks to address)
Testing: ⭐⭐⭐⭐ (Comprehensive, needs edge cases)
Documentation: ⭐⭐⭐⭐⭐ (Excellent PR description)

This is high-quality work delivering significant value. The architecture is sound and sets a great foundation for future grid visualizations. The security issues and race condition should be addressed before production deployment.

Estimated time for high-priority fixes: 4-8 hours

Great work overall! 🎉

Replace the old /case-phenotype-matrix endpoint and wrapper components with the generic /entity-grid endpoint, eliminating the transformation layer between CasePhenotypeMatrix and EntityGridMatrix types. - Fix transformCell to parse backend qualifiers into CellQualifier[] and build publicationLinks fallback from raw CURIEs - Rewrite SectionCasePhenotypeGrid to use getEntityGrid() directly with EntityGrid + EntityGridModal components - Delete CasePhenotypeGrid.vue, CasePhenotypeModal.vue, case-phenotype API/types, unused util, and old test - Simplify EntityGrid/entity-grid-utils to use only hasData (remove old present-field fallback)

Resolve conflicts: - main.py: combine branch imports (case_phenotype, entity_grid) with main's meta import - solr_query_utils.py: keep both dataclass import from main and Any type from branch

- Add escape() to disease_id in Solr query builders for defense-in-depth - Add type hint to _is_disease_entity parameter - Improve exception handling: log errors and return 500 instead of misleading 404 - Update tests to expect escaped disease IDs

claude · 2026-03-27T02:11:51Z

Code Review: Case-Phenotype Grid Visualization

This is a substantial and well-architected PR. The 2-hop Solr JOIN approach is clever, and the generic EntityGrid abstraction will pay dividends for the planned gene/ortholog grids. A few issues need attention before merge.

🐛 Bugs

1. FastAPI route ordering — traversable-associations endpoint is unreachable

In entity_grid.py, the generic grid endpoint is registered before the traversable-associations endpoint:

@router.get("-grid/{context_id}", ...)          # registered first
async def get_generic_entity_grid(...): ...

@router.get("-grid/traversable-associations/{entity_category}", ...)  # never reached
async def get_traversable_associations(...): ...

FastAPI matches routes in definition order. Any request to /entity-grid/traversable-associations/biolink:Gene will match context_id="traversable-associations" and call get_generic_entity_grid. The fix is to define the more-specific route first.

2. Double Solr entity lookup in get_generic_entity_grid

context_entity = self.get_entity(context_id, extra=False) is called twice — once at the top (line ~660) to determine field mappings, and again after step 4 (line ~760) to get context_category. The second call is redundant; reuse the first result.

3. build_grid_column_query will AttributeError when only column_assoc_categories is set

fq = [f'category:"{config.column_assoc_category.value}"']

column_assoc_category is Optional[AssociationCategory]. If a config is created with only column_assoc_categories set (as the docstring allows), this raises AttributeError: 'NoneType' object has no attribute 'value'. The function should call config.get_column_assoc_categories() instead.

4. TypeScript model.ts — cells typed incorrectly

// model.ts (wrong — this is a dict on the backend)
cells?: CasePhenotypeCellData[],

The backend sends cells as dict[str, CasePhenotypeCellData] (a map keyed by case_id:phenotype_id), not an array. The frontend api/entity-grid/ code correctly handles the dict, but this generated model type is misleading and will cause issues if used directly.

⚠️ Code Quality

5. _get_disease_name / _get_entity_name are identical

Both methods do the same thing. Keep only _get_entity_name and update the call in get_case_phenotype_matrix to use it.

6. Fragile category heuristics in get_generic_entity_grid

# Determine column entity category from association type
if "Disease" in column_assoc_categories[0]:
    col_entity_cat = EntityCategory.DISEASE
elif "Homology" in column_assoc_categories[0]:
    col_entity_cat = EntityCategory.GENE
elif "CaseToDisease" in column_assoc_categories[0]:
    col_entity_cat = EntityCategory.CASE

"CaseToDisease" won't match because "Disease" matches first. Also string-matching biolink category names is fragile. The AssociationTypeMappings already has subject_category / object_category — use those to derive the entity category rather than parsing the string.

7. Silent phenotype loss when no bin matches

In entity_grid_utils._build_rows, phenotypes that don't match any HistoPheno bin are silently dropped:

if bin_id is None:
    continue  # phenotype disappears from the grid with no log

At minimum add a debug-level log. For the case-phenotype grid especially, phenotypes in the "OTHER" category are likely real data worth showing.

8. Label capitalisation inconsistency

Backend (solr.py) uses lowercase: "skeletal system", "nervous system".
Frontend types.ts defines: "Skeletal System", "Nervous System".

Since the frontend displays labels from the API response, the types.ts constants diverge from what will actually render. Either align them or remove the frontend HISTOPHENO_BINS constant entirely (it appears to be unused in the new code path).

🏗️ Architecture

9. Two parallel API paths for the same data

case_phenotype.py + case_phenotype_utils.py provide a dedicated endpoint for case-phenotype matrices, while entity_grid.py + entity_grid_utils.py implement the same thing generically. The dedicated endpoint's validate_disease_id regex validation (MONDO-only) is valuable, but the rest of case_phenotype_utils.py is effectively duplicating entity_grid_utils.py.

Consider whether case_phenotype.py endpoint can delegate to get_entity_grid internally once the generic path is solid, keeping the MONDO validation but eliminating the duplicate builder.

10. _raw_solr_query bypasses the existing Solr service abstraction

The new method builds raw query strings and calls requests.get() directly, bypassing SolrService. This means it doesn't get connection pooling, timeout configuration, or error handling from the existing service layer. Consider extending SolrService or using the existing self.solr session instead.

🧪 Tests

Integration tests for disease-phenotype-grid and ortholog-phenotype-grid endpoints are missing (only case-phenotype-matrix is tested).
test_over_limit_error in test_case_phenotype_api.py — the limit is 1, but MONDO:0007078 may not have enough indirect cases. Worth asserting against a disease known to have many cases, or mocking.
No unit tests for build_entity_grid in entity_grid_utils.py.

Minor

SectionCasePhenotypeGrid.vue imports AppStatus but the template already uses AppStatus correctly — looks fine but worth double-checking that the AppStatus import is present (it isn't in the shown diff).
The autocomplete="off" addition to AppSelectAutocomplete.vue is a nice UX fix but is unrelated to the rest of the PR — consider splitting it out.
The from urllib.parse import urlencode import inside _raw_solr_query is unused (urlencode is imported but the method uses manual string joining instead).

Overall the design is solid and the test coverage for new unit tests is good. The routing bug (#1) is the most critical fix before merge — it silently breaks the traversable-associations endpoint. The duplicate entity lookup (#2) and the AttributeError risk (#3) should also be addressed.

Column header labels are now router-links that navigate to the entity page (e.g., case page) on click. Hover tooltip retained for info display.

claude · 2026-03-27T04:04:15Z

Code Review

This is a substantial and well-architected PR. The Solr JOIN query approach for the 2-hop traversal is clean, and the test coverage is comprehensive. A few issues worth addressing:

Bug: phenotype_count in bins uses association count, not unique phenotype count

In case_phenotype_utils.py, _build_bins uses Solr facet counts for phenotype_count, but facet queries count one document per case-phenotype association—not per unique phenotype. If 10 cases each have a phenotype in the nervous system bin, the facet count is 10x higher than the number of unique phenotypes. Meanwhile, phenotype_ids is deduplicated correctly.

The UI shows this count as the number of phenotypes (e.g., Nervous System (11)), so this would be misleading. Fix: use len(bin_phenotypes[bin_id]) instead of the facet count for phenotype_count.

Potential AttributeError: build_grid_column_query assumes column_assoc_category is not None

build_grid_column_query in solr_query_utils.py accesses config.column_assoc_category.value unconditionally, but GridTypeConfig.column_assoc_category is Optional[AssociationCategory]. A config that uses only column_assoc_categories (multi-category) would raise AttributeError here. Consider calling config.get_column_assoc_categories() instead.

Design: Duplicate API surface for case-phenotype data

case_phenotype.py registers /v3/api/case-phenotype-matrix/{disease_id} while entity_grid.py registers /v3/api/entity/{context_id}/case-phenotype-grid. Both serve case-phenotype data with different response models (CasePhenotypeMatrixResponse vs EntityGridResponse). Is case_phenotype.py intended to be removed once the generic entity_grid API is stable? If not, the intent should be documented; if yes, consider removing it in this PR to avoid maintaining two endpoints.

Fragile string-matching heuristics in get_generic_entity_grid

The method infers entity categories from association category strings using substring matching (e.g., checking if "Disease" appears in the category name). This is brittle—any new association type will silently fall through to NAMED_THING. Since subject_category/object_category are now in the YAML mappings, AssociationTypeMappings.get_mapping() already provides this information. Prefer deriving entity categories from mapping metadata rather than substring-matching category names.

Redundant entity fetch in get_generic_entity_grid

self.get_entity(context_id, extra=False) is called twice in the same method—once at the top to determine field mappings, and once near the bottom to get the context entity category again. The second call is unnecessary.

Redundant method: _get_disease_name vs _get_entity_name

Both methods do exactly the same thing (look up an entity name by ID). _get_disease_name should be removed and its callers updated to use _get_entity_name.

_raw_solr_query bypasses the existing service abstraction

The _raw_solr_query method in SolrImplementation manually constructs a URL using string concatenation and requests.utils.quote. This bypasses the existing SolrService abstraction and reimplements URL encoding in an ad-hoc way. Consider exposing a raw_query capability on SolrService to keep the Solr communication layer coherent.

Error handling in _validate_entity_category swallows real errors

The bare except Exception re-raises everything as a 404, so a Solr connection failure will appear as "Entity not found". At minimum, log the exception before re-raising; ideally differentiate between "not found" and "service unavailable".

Missing response_model on get_traversable_associations

The /entity-grid/traversable-associations/{entity_category} endpoint returns List[dict] with no Pydantic model, losing type safety and OpenAPI schema generation. Consider defining a small model for the traversal metadata.

HistoPhenoBin.phenotype_ids not in model.yaml

HistoPhenoBin in model.py has a phenotype_ids: list[str] field, but this slot is not defined in model.yaml. If the YAML is authoritative for code generation, this field will be lost on the next regeneration.

Minor: Integration tests with hardcoded data assumptions

Several integration tests assert specific counts or data presence for hardcoded disease IDs (e.g., MONDO:0007078). These are fragile if the underlying data changes. Consider adding a comment noting the expected data state, or gating them with a marker so CI failures are diagnosable.

Overall, the architecture is sound—the generic grid approach is a good foundation for future grid types, and the test coverage is thorough. The phenotype_count bug and the potential AttributeError in build_grid_column_query are the most pressing fixes before merge.

Generated with Claude Code

Mock get_entity and _get_entity_name at class level to prevent Solr connection attempts in unit tests. Fix accessibility and prettier issues in PageGridPlayground.vue.

claude · 2026-03-27T04:08:33Z

Code Review: Case-Phenotype Grid Visualization

This is a well-structured PR with a clear architecture. The separation of concerns across API, data models, Solr query utilities, and builder utilities is solid, and the test coverage is extensive. Here are my observations:

Issues

🐛 `_get_disease_name` is a duplicate of `_get_entity_name`

In solr_implementation.py, both _get_disease_name (line ~1675) and _get_entity_name (line ~1631) do the same thing — fetch an entity and return its name. _get_disease_name is unused by the get_entity_grid path but is still called in get_case_phenotype_matrix. Since _get_entity_name already exists, _get_disease_name should be removed and replaced with the generic one.

🐛 `get_generic_entity_grid` fetches the context entity twice

In solr_implementation.py, get_entity is called at the start (line ~1459) to determine field mappings, and then called again after step 4 (line ~1568) to re-read context_category. The second call is redundant and wastes a Solr round trip. The context_entity from the first call should be reused.

⚠️ `case_phenotype.py` endpoint is largely superseded by `entity_grid.py`

The PR adds both /case-phenotype-matrix/{id} (via case_phenotype.py) and /{id}/case-phenotype-grid (via entity_grid.py). Both serve similar data with slightly different shapes (CasePhenotypeMatrixResponse vs EntityGridResponse). Having two endpoints for the same data is confusing — it may be intentional for backward compatibility, but it should be documented or one deprecated.

⚠️ String-matching heuristics in `get_generic_entity_grid`

Field direction is determined by checking if "Gene", "Homology", "Disease", or "CaseToDisease" appear in the category string (lines ~1483–1500, ~1575–1582). This is fragile and will silently produce wrong results for any association type whose name doesn't match these patterns. The YAML-based metadata path is the right approach; the heuristic fallback should at minimum log a warning.

⚠️ `build_grid_column_query` references `config.column_assoc_category` directly

At line ~1928:

fq = [f'category:"{config.column_assoc_category.value}"']

This will fail with AttributeError if only column_assoc_categories (the list form) is set and column_assoc_category is None. config.get_column_assoc_categories() should be used here instead to handle both cases.

⚠️ `build_grid_row_query` also uses `config.column_assoc_category.value` directly

Same issue at line ~2148:

f'(category:"{config.column_assoc_category.value}" '

If the config only has column_assoc_categories, this will raise AttributeError.

⚠️ `make_cell_key` defined in two places

make_cell_key exists in both case_phenotype_utils.py and entity_grid_utils.py with identical logic. One should import from the other, or both should import from a shared location.

⚠️ `DISEASE_ID_PATTERN` only accepts MONDO IDs

In case_phenotype.py, the regex r"^MONDO:\d{7}$" is the gatekeeper for the legacy /case-phenotype-matrix endpoint. The newer entity-grid endpoint (entity_grid.py) doesn't have this restriction. If the intent is to keep both endpoints, this is fine — but it should be commented to avoid future confusion.

Nits / Style

_build_bins in both case_phenotype_utils.py and entity_grid_utils.py returns bins with phenotype_count=0 / count=0 for all empty bins. Callers may want to filter these out (or the frontend must handle it). Worth noting in a comment.
_validate_entity_category in entity_grid.py swallows all exceptions (not just HTTP ones) and converts them to 404, which can hide real errors like network timeouts. A more targeted catch would be safer.
The phenotype_to_bin variable returned from _build_phenotypes in case_phenotype_utils.py is computed but never used by the caller (build_matrix). It can be dropped from the return tuple.
build_grid_column_query accepts a filter_empty_columns parameter but the only call site from get_entity_grid (the non-generic path) passes the default. The parameter could be simplified or at least the behavior when it's False should be tested.

Performance

Both Solr queries (build_case_disease_query / build_case_phenotype_query) default to rows=50000. For diseases with many cases this could return large payloads. The case-disease query is used only for counting, so it should set rows=limit+1 (which the caller does pass) and ideally omit the fl fields beyond what's needed for counting.
get_case_phenotype_matrix and get_entity_grid each make 2 Solr queries sequentially (count + fetch). For the empty-case fast path this is fine, but for the full path consider whether the second query could be made concurrent if the Solr client supports it.

Tests

Test coverage is thorough — unit tests for builders, query construction, model validation, and integration + performance tests are all present. A few gaps:

No unit test for normalize_solr_doc_for_model with nested Optional types or Union types beyond Optional[T].
test_over_limit_error uses MONDO:0007078?direct=false&limit=1 — this relies on production data having more than 1 indirect case, which may be fragile for CI.
The test_bins_have_labels test doesn't verify that bins returned have non-empty labels (only that the key exists), so a bin with label="" would pass.

Summary

Solid work overall — the abstraction layer (GridTypeConfig + RowGroupingConfig) makes this extensible for future grid types, and the Solr JOIN approach is efficient. The main things to address before merging are:

Remove the duplicate _get_disease_name / fix the double get_entity call in get_generic_entity_grid
Use config.get_column_assoc_categories() in query builders instead of direct .column_assoc_category.value access
Deduplicate make_cell_key
Add a warning log for the heuristic fallback in get_generic_entity_grid

🤖 Generated with Claude Code

Expand test coverage for entity grid, case-phenotype, and generic grid features. Tests use flat functions with parametrize to match repo conventions. Covers API validation, Solr query builders, parsers, implementation methods, and utility functions.

claude · 2026-03-27T04:44:47Z

Welcome to the Monarch Initiative

The Monarch Initiative is an extensive knowledge graph and ecosystem of tools made for the benefit of clinicians, researchers, and scientists. The knowledge graph consists of millions of entities – genes, diseases, phenotypes, and many more – imported from dozens of sources.

We welcome the contributions of the community to help us maintain and improve the knowledge graph and the tools that use it. To help get started on contributing to the Monarch Initiative, please see our CONTRIBUTING.md file.

Monarch App

⭐️⭐️ View the website ⭐️⭐️

The monarch-app repo contains the source code for the Monarch Initiative website (a Vue webapp),
as well as monarch-py, a Python library for interacting with the Monarch Initiative knowledge graph. The monarch-py library also includes a FastAPI module that serves as the website's backend. Together, the frontend and backend make up the stack used to deply and run the Monarch Initiative website.

If you wish to run Monarch-App as a local web application, please install the requirements below and then follow on to the usage section to start the application. Refer to Using Local Data to see how you can run the full Monarch website locally and use your own data store.

The app also offers a widget called Phenogrid that can be embedded in any website.
For more information on how to use the Phenogrid widget, please refer to the Phenogrid documentation.

For developers

The monarch-app repository contains the code to run the Monarch Initiative website in the frontend and backend directories as well as documentation, helper scripts, Docker files and services to help set up a local environment for development and for deployment.

To start development, please refer to the CONTRIBUTING.md document with this README. If you are planning to only develop the frontend or backend of the application you can refer directly to the README and CONTRIBUTING files in each of those sections.

claude · 2026-03-27T04:48:58Z

Code Review: Case-Phenotype Grid Visualization. This is a substantial and well-structured PR. The 2-hop traversal pattern, config/grouping registries, and generic endpoint design are all solid. Test coverage is thorough. Key issues found:

BUG 1 - parseCellKey is broken for colon-containing IDs (frontend): IDs like MONARCH:case1 and HP:0001234 produce the key MONARCH:case1:HP:0001234. The current code does const [columnId, rowId] = cellKey.split(':') which yields columnId='MONARCH', rowId='case1' -- both wrong. Use a separator that cannot appear in CURIEs (e.g. |||).

BUG 2 - Duplicate get_entity call in get_generic_entity_grid (solr_implementation.py): The context entity is fetched at the top of the method (~line 1459), then again after building the grid (~line 1568). This doubles Solr round-trips. Reuse the first fetch.

BUG 3 - build_grid_column_query accesses config.column_assoc_category.value which may be None: GridTypeConfig.column_assoc_category is Optional[AssociationCategory]. If only column_assoc_categories is set (multi-category config), this raises AttributeError. Use config.get_column_assoc_categories() instead.

DESIGN 4 - Two overlapping endpoints: case_phenotype.py provides /case-phenotype-matrix/{disease_id} while entity_grid.py provides /{context_id}/case-phenotype-grid. Both are registered in main.py under different response schemas. Consider deprecating the specialized endpoint or documenting the intended relationship.

DESIGN 5 - String-based heuristics in get_generic_entity_grid: Checks like 'if Gene in first_col_cat' are fragile. Since association_type_mappings.yaml now has subject_category/object_category, use AssociationTypeMappings.get_mapping() instead.

DESIGN 6 - _raw_solr_query bypasses existing SolrService: Calls requests.get(url) with a manually-constructed URL, bypassing connection pooling and error handling in SolrService.

DESIGN 7 - _get_disease_name and _get_entity_name are identical: Consolidate to one method.

NITS: (8) SolrImplementation() instantiated at module level for skipif conditions -- prefer a lazy callable. (9) Duplicate comment block in test_association_type_mapping.py. (10) get_bin_label raises KeyError inconsistently with the safe dict.get pattern used elsewhere. (11) test_not_a_disease tests ID-format rejection not category check -- comment is misleading. (12) limit semantics differ between the two API files.

SUMMARY: Item 1 (cell key parsing) is a correctness bug that should be fixed before merge. Items 2-3 are also correctness issues. Items 4-7 are design concerns worth addressing; the rest can be tackled as follow-ups.

…ttributeErrors - Remove duplicate _get_disease_name method, use _get_entity_name instead - Remove redundant get_entity call in get_generic_entity_grid - Fix build_grid_column_query/build_grid_row_query to use config.get_column_assoc_categories() instead of direct .column_assoc_category.value access, preventing AttributeError when only column_assoc_categories list is set - Deduplicate make_cell_key: case_phenotype_utils now imports from entity_grid_utils - Add warning logs for heuristic fallback in field direction mapping - Log real exception in _validate_entity_category before re-raising as 404 - Remove unused urlencode import

claude · 2026-03-27T04:57:31Z

Test comment from review

claude · 2026-03-27T04:59:44Z

PR Review: Case-Phenotype Grid (1/3 - Potential Bugs)

1. context_id not escaped - In build_grid_column_query and build_grid_row_query, context_id is interpolated without escaping. Use escape(context_id) for consistency with other query builders.

2. Duplicate entity lookup (N+1) - Both get_entity_grid and get_generic_entity_grid call get_entity for validation then call _get_entity_name which calls get_entity again. Thread the entity name through to save a Solr round-trip.

3. intersection.pop() is unreachable dead code - In _find_bin_for_phenotype and _find_bin_for_entity, the final return intersection.pop() can never execute because if intersection is non-empty, the preceding for loop is guaranteed to match first. Remove this dead code.

4. get_row_grouping silent failure - get_entity_grid calls get_row_grouping at the top, so any new grid type with a non-phenotype row category will raise ValueError before fetching any data. Consider making this a config-time check.

claude · 2026-03-27T04:59:53Z

PR Review: Case-Phenotype Grid (2/3 - Architecture and Code Quality)

Architecture:

5. Two overlapping API endpoints - The PR registers both /v3/api/case-phenotype-matrix/{disease_id} (case_phenotype.py) and /v3/api/entity/{context_id}/case-phenotype-grid (entity_grid.py). Both serve the same data but return different shapes (CasePhenotypeMatrixResponse vs EntityGridResponse). Is the older endpoint still needed, or is it transitional and can be removed?

6. String heuristics for field direction - The fallback in get_generic_entity_grid checks string substrings (Gene in first_col_cat, Disease in column_assoc_categories[0]) to infer direction. This is fragile. Since the PR already adds subject_category/object_category to YAML, the heuristic branch should error explicitly.

7. Unbounded memory fetch - rows=50000 default means all associations load in one Solr call. Fine for current phenopacket scale, but worth documenting.

Code Quality:

8. Duplicate comment block - In test_association_type_mapping.py, the Tests for AssociationTypeMappings singleton header appears twice.

9. normalize_solr_doc_for_model side-effects get_entity - The change unwraps single-element lists to scalars for all Entity construction. If category was legitimately a list, code checking isinstance(entity.category, list) could break. Add a regression test.

10. Inconsistent validation - case_phenotype.py uses a MONDO regex (returns 422), entity_grid.py uses _validate_entity_category (returns 400). Same mistake, different errors.

11. _SENTINEL_NO_ATTR defined after use - In test_case_phenotype_api.py, _SENTINEL_NO_ATTR = object() is referenced before it is defined. Move it before the function.

claude · 2026-03-27T04:59:59Z

PR Review: Case-Phenotype Grid (3/3 - Summary)

What Works Well:

Excellent test coverage: unit, integration, and performance benchmarks
Solr JOIN queries: elegant 2-hop traversal avoids N+1 query patterns
GridTypeConfig registry: dataclass-driven config makes it easy to add new grid types
Empty-result handling: graceful early-return for diseases with no cases
YAML-driven field mapping: adding subject_category/object_category to association_type_mappings.yaml is the right abstraction

Minor Nits:

test_over_limit_error uses direct=false with limit=1 and assumes the disease has more than 1 case, which could be flaky if data changes
The RowGrouping enum in entity_grid.py and the raw strings used in get_generic_entity_grid are loosely coupled

Also, please delete the stale test comment at #1247 (comment) (that was accidentally posted by the review tooling).

Generated with Claude Code

tests/unit/test_case_phenotype_api.py and tests/integration/test_case_phenotype_api.py had the same module name, causing a pytest collection error when CI runs `pytest tests` (both directories). Renamed the unit test file to test_case_phenotype_api_unit.py.

claude · 2026-03-27T15:22:49Z

PR Review: Case-Phenotype Grid Visualization

This is a substantial and well-structured PR. The architecture is solid - a clean 2-hop traversal pattern, thoughtful generic abstraction, excellent test coverage. Here are observations organized by priority.

Bugs / Correctness

1. case_phenotype.py router is redundant / dead code

case_phenotype.py mounts at /case-phenotype-matrix/{id} while entity_grid.py already provides /{context_id}/case-phenotype-grid. They use different response models (CasePhenotypeMatrixResponse vs EntityGridResponse) and different implementation paths. If case_phenotype.py is superseded it should be removed now. If intentionally kept, a comment explaining why would help future maintainers.

2. _raw_solr_query builds URLs manually instead of using SolrService

The new method bypasses the existing SolrService abstraction used everywhere else, making these endpoints harder to test and inconsistent. It also manually URL-encodes values with requests.utils.quote but not keys. Consider routing through SolrService or passing params as a dict to requests.get(url, params=...).

3. get_generic_entity_grid has fragile string-based heuristics

The column entity category is determined by substring matching on the association category name. VariantToDiseaseAssociation would match Disease and return DISEASE instead of SEQUENCE_VARIANT. The YAML mappings already added (object_category, subject_category) contain exactly the right information to do this properly. The heuristic should at minimum log a warning when triggered.

4. _validate_entity_category swallows exceptions as 404

A Solr timeout or connection error becomes a misleading not-found response. The companion code in case_phenotype.py correctly returns 500 for unexpected errors - this looks like an oversight.

Design / Code Quality

5. Duplicate section comment in test_association_type_mapping.py

The comment # Tests for AssociationTypeMappings singleton appears twice in a row - copy-paste artifact.

6. _SENTINEL_NO_ATTR used before definition in test file

In test_case_phenotype_api_unit.py, _make_entity_with_category references _SENTINEL_NO_ATTR before it is defined on the next line. Works at runtime but confusing to read - move the sentinel above the function.

7. CasePhenotypeMatrixResponse models may be redundant

The PR adds both case-specific models (CasePhenotypeMatrixResponse, CaseEntity, etc.) and generic grid models (EntityGridResponse, GridColumnEntity, etc.). If the frontend is migrating to the generic endpoint, a note about the migration plan or removing the old models would help future maintainers.

8. make_cell_key uses : as separator with CURIE IDs

CURIEs like MONARCH:case123 and HP:0001250 already contain :, making the composite key MONARCH:case123:HP:0001250 ambiguous if anyone tries to parse it back. The key is opaque today so this works, but worth a comment noting this or using a less ambiguous separator like |.

Performance

9. Two Solr round-trips when count is within limit

Both get_case_phenotype_matrix and get_entity_grid make a count query then a data query. When count is within limit, first-query results are fully discarded. Worth documenting in code why two queries are needed.

10. No pagination for row entities

The row query defaults to rows=50000. The PR limits columns but not rows. For diseases with many phenotypes across many cases this could be a very large response. Worth a note in docs or a follow-up issue.

Minor

build_grid_column_query has a filter_empty_columns parameter but the False branch is unreachable given the default and current call sites.
Route prefixes -grid/{context_id} with a leading hyphen are unusual. Mounted at /entity, this produces /entity-grid/... which works, but the leading hyphen in the route string is surprising to read.

Overall this is good work - clean architecture, excellent test coverage (unit + integration + performance), and the generic grid abstraction is well-designed. The main things to address before merge: the duplicate/dead case_phenotype.py code path, swallowed exceptions in _validate_entity_category, and the fragile string heuristics in get_generic_entity_grid.

add phenotype grid to testbed, + misc

a024e98

kevinschaper and others added 2 commits January 28, 2026 19:23

kevinschaper added 2 commits March 26, 2026 18:43

Merge main into case-x-phenotype-visualization

9a644d0

Resolve conflicts: - main.py: combine branch imports (case_phenotype, entity_grid) with main's meta import - solr_query_utils.py: keep both dataclass import from main and Any type from branch

Address automated review feedback

e2ce4e1

- Add escape() to disease_id in Solr query builders for defense-in-depth - Add type hint to _is_disease_entity parameter - Improve exception handling: log errors and return 500 instead of misleading 404 - Update tests to expect escaped disease IDs

Add clickable links to entity pages on grid column headers

ea66cea

Column header labels are now router-links that navigate to the entity page (e.g., case page) on click. Hover tooltip retained for info display.

Fix CI failures: backend test mocking and frontend lint errors

28e908c

Mock get_entity and _get_entity_name at class level to prevent Solr connection attempts in unit tests. Fix accessibility and prettier issues in PageGridPlayground.vue.

kevinschaper merged commit 6368178 into main Mar 27, 2026
13 checks passed

kevinschaper deleted the case-x-phenotype-visualization branch March 27, 2026 15:38

Conversation

kevinschaper commented Jan 17, 2026

Summary

Architecture

Data Flow

Files Added

Files Modified

Implementation Details

1. Type Definitions (case-phenotype-types.ts)

2. API Layer (case-phenotype.ts)

3. Matrix Building (case-phenotype-matrix.ts)

4. Grid Component (CasePhenotypeGrid.vue)

5. Section Wrapper (SectionCasePhenotypeGrid.vue)

Test Plan

Uh oh!

netlify bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for monarch-app canceled.

Uh oh!

claude bot commented Jan 17, 2026

Code Review: Case-Phenotype Grid Visualization

Overall Assessment

✅ Code Quality & Best Practices

Strengths

Minor Suggestions

🐛 Potential Bugs & Issues

1. Map Serialization Issue (frontend/src/api/case-phenotype-types.ts:51)

2. Missing Null Check (frontend/src/util/case-phenotype-matrix.ts:1021)

3. Hardcoded Limit (frontend/src/api/case-phenotype.ts:113)

⚡ Performance Considerations

Strengths

Recommendations

🔒 Security Considerations

Assessment: No Major Security Issues Found ✅

Minor Notes

🧪 Test Coverage

Concerns

Recommendations

Priority

📋 Additional Observations

Positive Notes

Questions

📝 Summary & Recommendations

Must Fix Before Merge

Should Fix Before Merge

Consider for Follow-up

Verdict

Uh oh!

codecov bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevinschaper commented Jan 17, 2026

Uh oh!

claude bot commented Jan 20, 2026

Code Review: Case-Phenotype Grid Visualization

Strengths

Architecture & Code Quality

Type Safety

Issues Found

1. Debug Console Logging (HIGH PRIORITY)

2. Missing Pagination for Disease-to-Case Associations

3. Potential Race Condition in Data Fetching

4. Edge Case: OTHER Bin Handling

Performance Considerations

Security Concerns

Test Coverage

Minor Suggestions

Summary

Overall Assessment

Uh oh!

claude bot commented Jan 22, 2026

Code Review - PR #1247: Add Case-Phenotype Grid visualization

Summary

✅ Strengths

Architecture & Code Organization

User Experience

🔍 Code Quality & Best Practices

Issues to Address

1. Console.log statements in production code

1. Type Definitions (`case-phenotype-types.ts`)

2. API Layer (`case-phenotype.ts`)

3. Matrix Building (`case-phenotype-matrix.ts`)

4. Grid Component (`CasePhenotypeGrid.vue`)

5. Section Wrapper (`SectionCasePhenotypeGrid.vue`)

netlify bot commented Jan 17, 2026 •

edited

Loading

codecov bot commented Jan 17, 2026 •

edited

Loading