Add Case-Phenotype Grid visualization for disease pages#1247
Add Case-Phenotype Grid visualization for disease pages#1247kevinschaper merged 21 commits intomainfrom
Conversation
Adds a new grid visualization on disease pages showing phenotypes observed in cases (from phenopackets) associated with that disease. Features: - Grid with cases as columns, phenotypes grouped by body system (bins) as rows - Expandable bin rows to show individual phenotypes - Tooltips showing case/phenotype details on hover (using vue-tippy) - Click cells to open modal with full details including onset, frequency, publications - Handles unicode characters in case IDs with batched API requests New files: - case-phenotype-types.ts: TypeScript interfaces and HISTOPHENO_BINS constant - case-phenotype.ts: API functions for fetching case/phenotype data - case-phenotype-matrix.ts: Utilities for transforming data into matrix format - CasePhenotypeGrid.vue: Main grid component with expandable rows - CasePhenotypeModal.vue: Detail modal for cell clicks - SectionCasePhenotypeGrid.vue: Page section wrapper Co-Authored-By: Claude Opus 4.5 <[email protected]>
✅ Deploy Preview for monarch-app canceled.
|
Code Review: Case-Phenotype Grid VisualizationThank you for this comprehensive PR! This is a well-structured feature that adds valuable functionality to disease pages. Below is my detailed review covering code quality, potential issues, performance, security, and test coverage. Overall AssessmentSummary: This is high-quality code with good architectural decisions. The implementation is clean, well-documented, and follows Vue 3 best practices. A few minor improvements are suggested below. ✅ Code Quality & Best PracticesStrengths
Minor Suggestions
🐛 Potential Bugs & Issues1. Map Serialization Issue (frontend/src/api/case-phenotype-types.ts:51)The export interface CasePhenotypeMatrix {
// ...
cells: Map<string, CasePhenotypeCellData>; // Maps don't serialize to JSON well
}Impact: Low (no immediate issue, but could cause problems later) Recommendation: Document this limitation or consider using 2. Missing Null Check (frontend/src/util/case-phenotype-matrix.ts:1021)While unlikely, there's a non-null assertion that could theoretically fail: binPhenotypes.get(binId)!.add(phenotypeId); // Line 1021Impact: Very Low (the check on line 1019 ensures this exists) Recommendation: Code is actually safe due to the preceding 3. Hardcoded Limit (frontend/src/api/case-phenotype.ts:113)The limit is hardcoded to 500 for case associations. If a disease has more than 500 cases, some would be silently excluded. const params = {
category: "biolink:CaseToDiseaseAssociation",
object: diseaseId,
limit: 500, // Hardcoded limit
};Impact: Low to Medium (depends on whether diseases with >500 cases exist in your data) Recommendation:
⚡ Performance ConsiderationsStrengths
Recommendations
🔒 Security ConsiderationsAssessment: No Major Security Issues Found ✅
Minor Notes
🧪 Test CoverageConcernsNo tests included in this PR. The repository uses Vitest for unit tests and Playwright for e2e tests (per
Recommendations
PriorityHigh Priority: At minimum, add unit tests for the utility functions in Medium Priority: Component tests for the grid and modal. Lower Priority: E2E tests (can be added in a follow-up PR if needed). 📋 Additional ObservationsPositive Notes
Questions
📝 Summary & RecommendationsMust Fix Before Merge
Should Fix Before Merge
Consider for Follow-up
VerdictRecommendation: Approve with minor changes ✅ This is excellent work! The code is well-structured, follows best practices, and adds valuable functionality. The main concerns are:
Once those are addressed, this is ready to merge. The performance and test coverage items can be addressed in follow-up PRs if needed. Great job on the comprehensive PR description and clean implementation! 🎉 🤖 Review generated with Claude Code |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1247 +/- ##
==========================================
+ Coverage 73.80% 79.36% +5.55%
==========================================
Files 95 113 +18
Lines 3478 5180 +1702
==========================================
+ Hits 2567 4111 +1544
- Misses 911 1069 +158 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Closes #1248 |
- Add check for Mondo subset markers (disease_grouping, ordo_group_of_disorders, rare_grouping) to identify organizational grouping classes vs specific diseases - Skip API calls and hide grid section for grouping classes like "Ehlers-Danlos syndrome" while still showing for specific subtypes - Update grid cell styling from text circles to styled squares Co-Authored-By: Claude Opus 4.5 <[email protected]>
Code Review: Case-Phenotype Grid VisualizationThis PR adds a well-architected feature for visualizing case-phenotype relationships on disease pages. The implementation is solid overall, with clear separation of concerns and good code organization. StrengthsArchitecture & Code Quality
Type Safety
Issues Found1. Debug Console Logging (HIGH PRIORITY)Location: console.log("toggleBin clicked:", binId);
console.log("handleCellClick:", caseId, phenotypeId);
console.log("cellData:", cellData);Issue: Per CONTRIBUTING.md lines 207-208, console.log should be removed before merging PRs. Fix: Remove these statements or convert to console.debug. 2. Missing Pagination for Disease-to-Case AssociationsLocation: The Impact: Medium - rare but could cause incomplete data display. Recommendation: Add pagination or document the 500-case limit. 3. Potential Race Condition in Data FetchingLocation: The Impact: Low - acceptable but could be improved with debouncing. 4. Edge Case: OTHER Bin HandlingLocation: The OTHER bin handling works but could be clearer by adding Performance ConsiderationsPositive:
Considerations:
Verdict: Performance looks solid. Security ConcernsNo critical issues found:
Test CoverageCurrent State: No test files found for new functionality Recommendations:
Minor Suggestions
SummaryMust Fix Before Merge:
Should Consider: Nice to Have: Overall AssessmentThis is high-quality work with excellent architecture and thoughtful design. With debug logging removed and consideration for pagination, this will be a valuable addition. Recommendation: Approve pending removal of debug logs. Testing and pagination can follow in subsequent PRs. Great job on this implementation! 🎉 Review conducted using Claude Code CLI |
When viewing a disease with descendants, cases from descendant diseases are now properly identified and displayed: - Added tabs to toggle between direct cases only vs all cases (including descendants) - Show source disease in tooltip for cases from descendant diseases - Auto-select "All" tab when no direct cases exist Co-Authored-By: Claude Opus 4.5 <[email protected]>
Code Review - PR #1247: Add Case-Phenotype Grid visualizationSummaryThis is a well-structured PR that adds a sophisticated case-phenotype grid visualization to disease pages. The code quality is generally very good with clear separation of concerns, comprehensive documentation, and thoughtful user experience design. ✅ StrengthsArchitecture & Code Organization
User Experience
🔍 Code Quality & Best PracticesIssues to Address1. Console.log statements in production codeLocation: frontend/src/components/CasePhenotypeGrid.vue:381, 446, 448 Remove debug console.log statements before merging. 2. Hardcoded color valuesLocation: Multiple locations in tooltip generation and CSS Extract color values to SCSS variables for consistency and easier theming. 3. Magic numbersLocation: frontend/src/api/case-phenotype.ts:105 The CASE_BATCH_SIZE of 5 could use more detailed documentation explaining why this specific value. 🐛 Potential Bugs & Edge Cases1. Race condition with tab switchingLocation: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:1038-1041 Rapid tab switching could cause multiple API requests with responses arriving out of order. Consider adding request cancellation. 2. Missing null checksLocation: frontend/src/util/case-phenotype-matrix.ts:1158 Replace non-null assertion operator (!) with proper null checks. ⚡ Performance ConsiderationsSequential API batchingLocation: frontend/src/api/case-phenotype.ts:148-170 Consider using Promise.all() for parallel batching to significantly reduce load time when there are many cases. O(n) lookups in renderingLocation: frontend/src/components/CasePhenotypeGrid.vue The getPhenotypeLabel function does O(n) array.find() on every render. Memoize with a computed Map for better performance. 🧪 Test Coverage - HIGH PRIORITYThis PR adds 1,197 lines of code across 7 files with 0 test files. Tests Needed:
Example: // frontend/unit/case-phenotype-matrix.test.ts
test('findBinForPhenotype returns OTHER for unknown closure', () => {
expect(findBinForPhenotype(['HP:0000001'])).toBe('OTHER');
expect(findBinForPhenotype(undefined)).toBe('OTHER');
});🎯 Priority Action ItemsMust Fix Before Merge
Should Fix Before Merge
Consider for Future PR
Overall AssessmentRating: 8/10 - Very Good Substantial, well-architected feature with strong TypeScript/Vue 3 implementation. Main concerns are test coverage and minor code quality issues. Once critical items are addressed, this will be an excellent addition. Great work on this feature! 🎉 |
…optimization
Implements the case-phenotype matrix backend per the TDD implementation plan:
- Add /v3/api/case-phenotype-matrix/{disease_id} endpoint
- Use Solr JOIN query to fetch all phenotype associations in single request
- Add facet queries for HistoPheno bin counts
- Support direct_only parameter for filtering to exact disease matches
- Add limit enforcement with clear error messages
- Include data models: CasePhenotypeMatrixResponse, CaseEntity, CasePhenotype,
HistoPhenoBin, CasePhenotypeCellData
- Add HISTOPHENO_BIN_LABELS mapping for human-readable bin names
- Add comprehensive unit and integration tests
This reduces frontend API calls from 40+ to 1 for building the case-phenotype grid.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Code Review: PR 1247 - Case-Phenotype Grid Visualization Thank you for this comprehensive PR! The implementation is well-structured with good separation of concerns. OVERALL ASSESSMENT Strengths:
Critical Issues to Address:
CRITICAL ISSUES (MUST FIX)
HIGH PRIORITY ISSUES
MEDIUM PRIORITY ISSUES
TEST COVERAGE: 7/10 Good unit test coverage but integration tests need strengthening. Missing tests for: Solr connection failures/timeouts, concurrent requests, Unicode/special characters, security (injection, XSS), and boundary values. SUMMARY CHECKLIST Before merging:
Great work on this feature! Once the critical security and bug fixes are applied, this will be ready to merge. |
- Use node.association_counts to check case count before API calls - Skip expensive matrix API for diseases with 0 or >100 cases - Move grid to render after Cases section for better layout - Remove unused getCaseCounts() function - Reduce limit from 1000 to 100 cases for performance Co-Authored-By: Claude Opus 4.5 <[email protected]>
Code Review: PR #1247 - Case-Phenotype Grid VisualizationThank you for this comprehensive feature addition! I've completed a thorough review. Overall, the implementation is well-structured with good separation of concerns. Summary
🔴 High Priority - Must Fix Before Merge1. Solr Injection RiskFile: backend/src/monarch_py/implementations/solr/solr_query_utils.py:374, 425 The disease_id is not escaped before insertion into Solr query. Apply escape() function. 2. XSS via HTML InjectionFile: frontend/src/components/CasePhenotypeGrid.vue:128-139 User labels concatenated directly into HTML. Use proper escaping or DOM construction. 3. Dead CodeFile: frontend/src/util/case-phenotype-matrix.ts:52-148 The buildMatrix() function is never used - remove it and findBinForPhenotype(). 4. Race Condition in Tab SwitchingFile: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:210-215 Setting selectedTab triggers watcher which calls fetchMatrix() again. Add guard flag. 5. Missing Response ValidationFile: frontend/src/api/case-phenotype.ts:118 Validate backend response structure before transforming to prevent crashes. 6. Console Logs in ProductionFile: frontend/src/components/CasePhenotypeGrid.vue:166, 231, 233 Remove debug console.log statements. 7. Silent Phenotype FilteringFile: backend/src/monarch_py/utils/case_phenotype_utils.py:103-104 Phenotypes without HistoPhenoBin matches are silently dropped. Log warnings or include in OTHER bin. 🟡 Medium Priority
🟢 Low Priority
✅ Strengths
Great work! Please address the 7 high-priority issues before merging. Happy to discuss any feedback. |
Pull Request Review: Case-Phenotype Grid VisualizationOverviewThis PR adds a sophisticated visualization feature for disease pages that displays a matrix of cases (phenopacket patients) and their phenotypes, grouped by body systems. The implementation includes both backend and frontend components with comprehensive test coverage. Code Quality & Best Practices✅ Strengths
Potential Issues & Concerns🔴 Critical1. Hardcoded Row Limit May Cause Data Loss Location: def build_case_phenotype_query(
disease_id: str,
direct_only: bool,
rows: int = 50000, # ⚠️ Hard limit
)Issue: If a disease has many cases with many phenotypes each, the phenotype query could return >50,000 rows and silently truncate results. For example, 100 cases × 500 phenotypes each = 50,000 associations (already at the limit). Impact: Grid would show incomplete data without warning to the user. Recommendation:
2. Missing Input Sanitization in ID Normalization Location: normalized = disease_id.strip().upper()
if "_" in normalized and ":" not in normalized:
normalized = normalized.replace("_", ":", 1) # Only replaces first occurrenceIssue: The replacement logic
Recommendation: Either:
🟡 High Priority3. Potential Race Condition in Tab Switching Location: if (isDirect && result && result.totalCases === 0) {
if (caseCountFromAssociations.value > 0) {
selectedTab.value = "all";
matrix.value = await getCasePhenotypeMatrix(props.node.id || "", {
direct: false,
limit: MAX_CASES_LIMIT,
});
return;
}
}Issue: User could click tabs rapidly while auto-switching is happening, causing state inconsistency. Recommendation: Add loading lock: const isFetching = ref(false);
async function fetchMatrix() {
if (isFetching.value) return;
isFetching.value = true;
try {
// ... existing code
} finally {
isFetching.value = false;
}
}4. Unclear Error Messages for Bin Assignment Failure Location: bin_id = _find_bin_for_phenotype(object_closure, bin_ids)
if bin_id is None:
continue # Silently skips phenotypeIssue: Phenotypes without valid bins are silently dropped. Users won't know why expected phenotypes are missing from the grid. Recommendation: Log a warning or track count of skipped phenotypes and include in response metadata. 5. Inconsistent Limit Defaults The limit default varies across the codebase:
Issue: This creates confusion about the actual enforced limit. Recommendation: Define a single constant shared across backend/frontend or document why different layers have different defaults. 🟢 Low Priority / Nice-to-Have6. Magic Numbers in Vue Component Location: The tooltip formatter hardcodes HTML structure and styling. Consider extracting to a separate formatter function for testability. 7. Potential Memory Issue with Large Maps Location: const cells = new Map<string, CasePhenotypeCellData>();
for (const [key, cell] of Object.entries(data.cells)) {
cells.set(key, {...});
}For 100 cases × 100 phenotypes = 10,000 Map entries. This is fine for current limits, but document maximum expected size. 8. Missing Accessibility Features
9. Test Coverage Gaps While test coverage is generally good, I noticed:
Performance Considerations✅ Good:
Security Concerns✅ Low Risk: This feature is read-only and doesn't handle user-generated content. Minor Concerns:
Summary & RecommendationsThis is a high-quality, well-architected feature with excellent documentation and test coverage. The main concerns are: Must Fix Before Merge:
Should Fix Before Merge:
Consider for Follow-up PR:
Overall assessment: Approve with requested changes 🟢 The architectural decisions are sound, performance optimizations are effective, and the feature provides clear value. Addressing the critical issues will make this production-ready. Great work on the comprehensive test coverage and detailed documentation! 🎉 🤖 Generated via Claude Code PR review |
The API returns null when totalCases === 0, but the auto-switch logic was checking for `result && result.totalCases === 0` which never matched. Changed to `!result` so diseases with only indirect cases now correctly auto-switch to the All tab on load. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Solr creates multiValued fields by default when the schema isn't pre-defined. This causes validation errors when new scalar fields (like has_biological_sex) are added to the Pydantic model but Solr returns them as lists. Added normalize_solr_doc_for_model() utility that inspects model type hints and converts single-element lists to scalars for fields expecting scalar types. Co-Authored-By: Claude Opus 4.5 <[email protected]>
PR Review: Case-Phenotype Grid VisualizationThank you for this comprehensive feature addition! I've conducted a thorough review covering code quality, security, performance, and test coverage. 🔴 CRITICAL ISSUES (Must Fix Before Merge)1. Solr Injection VulnerabilityLocation: The code directly injects disease_id into Solr queries without proper escaping. While the API validates the disease_id format with regex, defense-in-depth principles require sanitization at the query construction level. Fix: Use the existing escape() utility from monarch_py.utils.utils 2. Missing Dependencies - Broken ImportsLocation: Multiple frontend files The following imports reference files that do not exist in this PR:
Required files missing:
TypeScript compilation will fail without these files. 🟡 HIGH PRIORITY ISSUES3. Network Error HandlingLocation: Network failures propagate as generic exceptions. Wrap requests in try-except to provide meaningful error messages (503 for network errors, 502 for JSON decode errors). 4. TypeScript Type SafetyLocation: Using 5. Uncaught Promise RejectionLocation: The nested getCasePhenotypeMatrix call in auto-switch logic could throw but isn't properly wrapped in error handling. 🟢 MEDIUM PRIORITY ISSUES6. Missing JSDoc CommentsPer 7. Accessibility IssuesLocation:
8. Vague Exception MessagesLocation: Broad exception catching masks real errors (network, Solr) as "not found". Add logging and be more specific. 9. Unreachable CodeLocation: Fallback return statement is unreachable because the loop iterates all possible bin IDs. 📊 TEST COVERAGE ANALYSISBackend Tests: B+ Grade✅ Strengths: Excellent unit test coverage (889 lines), good integration tests, parametrized tests
Frontend Tests: F Grade❌ CRITICAL GAP: No frontend tests exist for:
Recommendation: Add unit tests for core functions and E2E test for the complete user flow. ✅ POSITIVE ASPECTS
📝 RECOMMENDATIONSBefore Merge:
Follow-up PRs:
🎯 SUMMARYThis is a valuable feature with solid backend implementation. Main concerns:
Once the critical issues are addressed, this will be a great addition to the Monarch App! |
Phenotypes can belong to multiple body system bins (e.g., an eye phenotype is also a head/neck phenotype). Previously, each phenotype was assigned to only ONE bin based on priority order, causing bins like Eye, Ear, and Connective Tissue to show counts but have no phenotypes to display. Changes: - Add phenotype_ids field to HistoPhenoBin model - Build phenotype_ids by checking each phenotype's closure against ALL bins, not just the first matching one - Frontend now uses phenotype_ids directly from backend response This allows the same phenotype to correctly appear in multiple bin expansions (e.g., "Supranuclear gaze palsy" appears in both Eye and Head/Neck bins). Co-Authored-By: Claude Opus 4.5 <[email protected]>
Code Review: PR #1247 - Case-Phenotype Grid VisualizationThank you for this comprehensive feature addition! The case-phenotype matrix visualization is well-architected with excellent separation of concerns. The PR includes 889 lines of new tests across 7 test files, demonstrating strong testing discipline. OverviewArchitecture: ✅ Excellent
Test Coverage: ✅ Strong
Critical Issues 🔴1. Security: Potential Solr Query InjectionLocation: The Recommendation: Apply the 2. Frontend: Missing Component DependenciesLocation: References components that don't exist in the codebase. This will prevent compilation. Action Required: Verify these dependencies exist or provide implementations before merge. 3. XSS Risk: Unsanitized HTML in TooltipsLocation: HTML is constructed from user data without sanitization. If backend returns malicious content, it could execute JavaScript when rendered. Recommendation: Use text interpolation with structured DOM elements instead of HTML strings, or sanitize with DOMPurify. High Priority Issues 🟠4. Overly Broad Exception HandlingLocation: Catches ALL exceptions and masks unexpected errors. Makes debugging difficult. Fix: Replace with specific exception types and add logging. 5. Silent Data Loss: Phenotypes Without BinsLocation: Phenotypes that don't match HistoPheno bins are silently excluded without logging. Recommendation: Add logging and consider an "Uncategorized" bin. 6. Fragile Regex Error ParsingLocation: Regex assumes exact backend error format. Missing radix parameter on parseInt() and no NaN validation. Fix: Add radix parameter, validate extracted numbers, and add logging for parse failures. Medium Priority Issues 🟡7. Performance: Redundant Entity Name FetchLocation: Makes an additional network call after already fetching case and phenotype data. Solution: Extract disease name from case association docs. 8. Input Validation: Incomplete NormalizationLocation: Only replaces FIRST underscore. Edge case: 9. Type Safety: Unsafe any TypeLocation: Using 10. Missing Logging ThroughoutNo logging for debugging or performance monitoring. Would be valuable for production troubleshooting. Positive Highlights ✨
Test Coverage Summary
Strong areas: Core matrix building, model validation, API contract Gap: Frontend components lack unit tests (Vue Test Utils recommended) Recommendations for MergeMust Fix Before Merge:
Should Fix Before Merge:
Can Address in Follow-up:
Overall AssessmentThis is a well-engineered feature with strong architectural foundations and comprehensive backend testing. The critical issues are fixable and primarily involve applying existing security patterns consistently and resolving missing dependencies. Once the 4 critical issues are addressed, this PR will be in excellent shape for merge. Great work on the test coverage and documentation! Reviewed by: Claude Sonnet 4.5 Files Reviewed: 26 files (15 backend, 9 frontend, 8 tests) Review Date: 2026-01-29 |
Backend (Phase 1):
- New /entity-grid/{context_id} API endpoint with configurable traversals
- Grid configs for association categories and row grouping
- Grid groupings for histopheno row organization
- Entity grid utilities for column building and cell aggregation
- Extend model with GridColumnEntity and GridResponse types
Frontend (Phase 2):
- EntityGrid.vue: Generic configurable grid component with diagonal headers
- EntityGridModal.vue: Detail modal for cell inspection
- Entity grid API client with type-safe queries
- Grid utilities for column grouping and cell formatting
Playground (Phase 3):
- PageGridPlayground.vue for interactive grid testing
- Entity search, association category selection, row grouping options
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Code Review: PR #1247 - Case-Phenotype Grid VisualizationThank you for this comprehensive feature addition! This is a well-structured PR with excellent documentation and test coverage. However, I've identified several critical security and code quality issues that should be addressed before merging. Critical Issues (Must Fix)1. Solr Injection Vulnerability (Backend)Files: backend/src/monarch_py/implementations/solr/solr_query_utils.py User-provided IDs are interpolated directly into Solr queries without escaping. Apply the escape() utility function to all user-provided IDs. 2. XSS Vulnerability via Tooltip HTML (Frontend)Files: frontend/src/components/EntityGrid/EntityGrid.vue, CasePhenotypeGrid.vue Tooltip HTML is constructed from unsanitized API data. Use DOMPurify or replace with Vue template rendering. 3. Missing Pagination (Backend API)Current API returns ALL results up to limit in single response. Add pagination with offset/limit parameters. 4. Broad Exception Catching (Backend)File: backend/src/monarch_py/api/case_phenotype.py ALL exceptions converted to 404 errors. Catch specific exceptions instead. 5. TypeScript any Type (Frontend)File: frontend/src/components/CasePhenotypeGrid.vue Replace any type with proper type definition. High Priority Issues6. No Accessibility SupportMissing keyboard navigation, ARIA attributes, and focus indicators throughout grid components. 7. Memory/Performance IssuesBackend default 50,000 rows, frontend no virtual scrolling for large grids. 8. No Rate LimitingPublic endpoints with no throttling for expensive JOIN queries. Medium Priority9. Code DuplicationEntity validation logic duplicated across files. 10. Inefficient Nested Loopcase_phenotype_utils.py lines 195-200: Use set intersection instead. 11. MONDO ID ValidationRegex only accepts 7 digits, should accept 1-7. Strengths
Risk Assessment
Action Plan
Overall solid implementation. Address security vulnerabilities before merge. |
Uses a reverse JOIN query to filter column entities that have no second-hop (row) associations. For ortholog-phenotype grids, this excludes orthologs without phenotype data from the grid columns. The JOIN query finds column associations where the column entity appears as the context in at least one row association, reducing network transfer and eliminating empty columns from the response. Co-Authored-By: Claude Opus 4.5 <[email protected]>
PR Review: Case-Phenotype Grid VisualizationComprehensive review completed for PR #1247. This is impressive, well-architected work with excellent test coverage. StrengthsArchitecture & Design
Code Quality
Test Coverage
Issues FoundMedium Priority
Low Priority
AssessmentsSecurity: Low Risk - Input validation, no user-generated content in queries, proper error handling Performance: Good - Limit parameters, efficient JOIN queries, faceted queries, performance tests included RecommendationsShould Fix Soon:
Nice to Have:
Summary
Overall: High-quality work demonstrating strong engineering practices. The generic grid framework is particularly well-designed. Issues identified are minor and preventative. Great job! |
- Left-align phenotype/system column content in EntityGrid - Default to showing column labels and category grid lines in playground - Replace checkbox/radio controls with styled toggle switches showing on/off state Co-Authored-By: Claude Opus 4.5 <[email protected]>
Code Review: PR #1247 - Case-Phenotype Grid VisualizationThank you for this comprehensive PR! This is a well-architected feature with excellent separation of concerns and solid documentation. Overall AssessmentStrengths:
Areas Requiring Attention:
CRITICAL Issues (Please Address Before Merge)1. Solr Injection Risk - Input Escaping InsufficientLocation: backend/src/monarch_py/implementations/solr/solr_query_utils.py:371-375 While case_phenotype.py has strict MONDO ID validation, these query-building functions are reusable and could be called from contexts with looser validation. Direct string interpolation creates injection risk. The escape function in utils.py only escapes colons, not all Solr special characters. 2. Generic Entity Grid Lacks Input ValidationLocation: backend/src/monarch_py/implementations/solr/solr_implementation.py:742-820 The context_id parameter is not validated before being used in Solr queries. 3. Cell Key Collision RiskLocation: backend/src/monarch_py/utils/case_phenotype_utils.py:220 If a case ID contains a colon (e.g., PMID:12345), parsing will fail. Use a safer delimiter. HIGH Priority Issues4. Race Condition in Tab SwitchingLocation: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:186-232 Multiple rapid tab switches could result in stale data. Implement request cancellation using AbortController. 5. Automatic Tab Switching Without User NotificationLocation: frontend/src/pages/node/SectionCasePhenotypeGrid.vue:213-221 The code silently switches tabs which could confuse users. Add a notification. 6. Missing Security Test CoverageNo tests for injection attempts, malformed IDs, or special characters. MEDIUM Priority Issues7. Bin Assignment Fallback is Non-DeterministicLocation: backend/src/monarch_py/utils/case_phenotype_utils.py:117-141 8. Silent Filtering of Unbinned PhenotypesLocation: backend/src/monarch_py/utils/case_phenotype_utils.py:100-104 9. Error Message Parsing is BrittleLocation: frontend/src/api/case-phenotype.ts:100-116 10. Inefficient Bin Building LoopLocation: backend/src/monarch_py/utils/case_phenotype_utils.py:188-200 RecommendationsBefore Merge:
Short-term: Long-term: ConclusionThis is a well-designed feature with solid architecture. However, the security concerns around input validation and Solr injection should be addressed before merging to production. Great work overall! |
…lection Adds subject/object category metadata to association type mappings, a new API endpoint to query traversable associations by entity type, and frontend logic to dynamically populate column/row options based on entity category. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Code Review: PR #1247 - Case-Phenotype Grid VisualizationExecutive SummaryThis PR delivers excellent architectural improvements, reducing API calls from 40+ to 1 through efficient Solr JOIN queries. The implementation is well-designed, type-safe, and includes comprehensive test coverage. However, several security, performance, and code quality issues should be addressed before merging. Recommendation: ✅ Approve with Changes (address high-priority items first) 🎯 Positive FindingsArchitecture & Design Excellence
🚨 Issues FoundCRITICAL - Must Fix Before Merge1. Solr Injection Vulnerability (Security)Files: Disease IDs are inserted into Solr queries with inadequate escaping. The Risk: While the API validates MONDO IDs with regex, if these query builders are reused elsewhere, this creates injection vulnerabilities. Fix: Enhance escape function to handle all Solr special characters MAJOR - Should Fix Before Merge2. Race Condition in Frontend Data Fetching (Bugs)File: The Fix: Add AbortController to cancel in-flight requests when route changes. 3. Performance: Inefficient Bin Summary Calculation (Performance)File:
Fix: Memoize summaries in a computed property with Map cache. 4. No Upper Limit Validation on Solr Results (Performance)File:
Fix: Add explicit limit checking and return warnings when approaching limits. 5. Missing Null Checks in Cell Data Access (Bugs)File: The Fix: Add validation to warn when cell data is missing expected fields. MINOR - Can Address in Follow-up6. XSS Risk in Tooltip HTML (Security)Tooltip content constructed with raw HTML from user data. Fix with DOMPurify or component slots. 7. Expanded Bin State Not Reset (Bugs)
8. Inconsistent Limit Values (Code Quality)Frontend 9. Missing Edge Case Tests (Testing)No tests for Unicode characters in case IDs, empty bins, negated-only phenotypes, or partial Solr results. 10. Hardcoded Bin Definitions (Maintainability)HistoPheno bins defined separately in backend and frontend. Consider single source of truth. 📊 Analysis by CategoryCode Quality: ⭐⭐⭐⭐ (Very Good)
Performance: ⭐⭐⭐⭐ (Great)
Security: ⭐⭐⭐ (Good)
Testing: ⭐⭐⭐⭐ (Comprehensive)
🎯 RecommendationsHigh Priority (Before Merge)
Medium Priority (Near-Term)
Low Priority (Future)
🏁 Overall AssessmentSummary Ratings:
This is high-quality work delivering significant value. The architecture is sound and sets a great foundation for future grid visualizations. The security issues and race condition should be addressed before production deployment. Estimated time for high-priority fixes: 4-8 hours Great work overall! 🎉 |
Replace the old /case-phenotype-matrix endpoint and wrapper components with the generic /entity-grid endpoint, eliminating the transformation layer between CasePhenotypeMatrix and EntityGridMatrix types. - Fix transformCell to parse backend qualifiers into CellQualifier[] and build publicationLinks fallback from raw CURIEs - Rewrite SectionCasePhenotypeGrid to use getEntityGrid() directly with EntityGrid + EntityGridModal components - Delete CasePhenotypeGrid.vue, CasePhenotypeModal.vue, case-phenotype API/types, unused util, and old test - Simplify EntityGrid/entity-grid-utils to use only hasData (remove old present-field fallback)
Resolve conflicts: - main.py: combine branch imports (case_phenotype, entity_grid) with main's meta import - solr_query_utils.py: keep both dataclass import from main and Any type from branch
- Add escape() to disease_id in Solr query builders for defense-in-depth - Add type hint to _is_disease_entity parameter - Improve exception handling: log errors and return 500 instead of misleading 404 - Update tests to expect escaped disease IDs
Code Review: Case-Phenotype Grid VisualizationThis is a substantial and well-architected PR. The 2-hop Solr JOIN approach is clever, and the generic 🐛 Bugs1. FastAPI route ordering — traversable-associations endpoint is unreachable In @router.get("-grid/{context_id}", ...) # registered first
async def get_generic_entity_grid(...): ...
@router.get("-grid/traversable-associations/{entity_category}", ...) # never reached
async def get_traversable_associations(...): ...FastAPI matches routes in definition order. Any request to 2. Double Solr entity lookup in
3. fq = [f'category:"{config.column_assoc_category.value}"']
4. TypeScript // model.ts (wrong — this is a dict on the backend)
cells?: CasePhenotypeCellData[],The backend sends
|
Column header labels are now router-links that navigate to the entity page (e.g., case page) on click. Hover tooltip retained for info display.
Code ReviewThis is a substantial and well-architected PR. The Solr JOIN query approach for the 2-hop traversal is clean, and the test coverage is comprehensive. A few issues worth addressing: Bug: phenotype_count in bins uses association count, not unique phenotype count In case_phenotype_utils.py, _build_bins uses Solr facet counts for phenotype_count, but facet queries count one document per case-phenotype association—not per unique phenotype. If 10 cases each have a phenotype in the nervous system bin, the facet count is 10x higher than the number of unique phenotypes. Meanwhile, phenotype_ids is deduplicated correctly. The UI shows this count as the number of phenotypes (e.g., Nervous System (11)), so this would be misleading. Fix: use len(bin_phenotypes[bin_id]) instead of the facet count for phenotype_count. Potential AttributeError: build_grid_column_query assumes column_assoc_category is not None build_grid_column_query in solr_query_utils.py accesses config.column_assoc_category.value unconditionally, but GridTypeConfig.column_assoc_category is Optional[AssociationCategory]. A config that uses only column_assoc_categories (multi-category) would raise AttributeError here. Consider calling config.get_column_assoc_categories() instead. Design: Duplicate API surface for case-phenotype data case_phenotype.py registers /v3/api/case-phenotype-matrix/{disease_id} while entity_grid.py registers /v3/api/entity/{context_id}/case-phenotype-grid. Both serve case-phenotype data with different response models (CasePhenotypeMatrixResponse vs EntityGridResponse). Is case_phenotype.py intended to be removed once the generic entity_grid API is stable? If not, the intent should be documented; if yes, consider removing it in this PR to avoid maintaining two endpoints. Fragile string-matching heuristics in get_generic_entity_grid The method infers entity categories from association category strings using substring matching (e.g., checking if "Disease" appears in the category name). This is brittle—any new association type will silently fall through to NAMED_THING. Since subject_category/object_category are now in the YAML mappings, AssociationTypeMappings.get_mapping() already provides this information. Prefer deriving entity categories from mapping metadata rather than substring-matching category names. Redundant entity fetch in get_generic_entity_grid self.get_entity(context_id, extra=False) is called twice in the same method—once at the top to determine field mappings, and once near the bottom to get the context entity category again. The second call is unnecessary. Redundant method: _get_disease_name vs _get_entity_name Both methods do exactly the same thing (look up an entity name by ID). _get_disease_name should be removed and its callers updated to use _get_entity_name. _raw_solr_query bypasses the existing service abstraction The _raw_solr_query method in SolrImplementation manually constructs a URL using string concatenation and requests.utils.quote. This bypasses the existing SolrService abstraction and reimplements URL encoding in an ad-hoc way. Consider exposing a raw_query capability on SolrService to keep the Solr communication layer coherent. Error handling in _validate_entity_category swallows real errors The bare except Exception re-raises everything as a 404, so a Solr connection failure will appear as "Entity not found". At minimum, log the exception before re-raising; ideally differentiate between "not found" and "service unavailable". Missing response_model on get_traversable_associations The /entity-grid/traversable-associations/{entity_category} endpoint returns List[dict] with no Pydantic model, losing type safety and OpenAPI schema generation. Consider defining a small model for the traversal metadata. HistoPhenoBin.phenotype_ids not in model.yaml HistoPhenoBin in model.py has a phenotype_ids: list[str] field, but this slot is not defined in model.yaml. If the YAML is authoritative for code generation, this field will be lost on the next regeneration. Minor: Integration tests with hardcoded data assumptions Several integration tests assert specific counts or data presence for hardcoded disease IDs (e.g., MONDO:0007078). These are fragile if the underlying data changes. Consider adding a comment noting the expected data state, or gating them with a marker so CI failures are diagnosable. Overall, the architecture is sound—the generic grid approach is a good foundation for future grid types, and the test coverage is thorough. The phenotype_count bug and the potential AttributeError in build_grid_column_query are the most pressing fixes before merge. Generated with Claude Code |
Mock get_entity and _get_entity_name at class level to prevent Solr connection attempts in unit tests. Fix accessibility and prettier issues in PageGridPlayground.vue.
Code Review: Case-Phenotype Grid VisualizationThis is a well-structured PR with a clear architecture. The separation of concerns across API, data models, Solr query utilities, and builder utilities is solid, and the test coverage is extensive. Here are my observations: Issues🐛
|
Expand test coverage for entity grid, case-phenotype, and generic grid features. Tests use flat functions with parametrize to match repo conventions. Covers API validation, Solr query builders, parsers, implementation methods, and utility functions.
Welcome to the Monarch InitiativeThe Monarch Initiative is an extensive knowledge graph and ecosystem of tools made for the benefit of clinicians, researchers, and scientists. The knowledge graph consists of millions of entities – genes, diseases, phenotypes, and many more – imported from dozens of sources. We welcome the contributions of the community to help us maintain and improve the knowledge graph and the tools that use it. To help get started on contributing to the Monarch Initiative, please see our CONTRIBUTING.md file. Monarch AppThe monarch-app repo contains the source code for the Monarch Initiative website (a Vue webapp), If you wish to run Monarch-App as a local web application, please install the requirements below and then follow on to the usage section to start the application. Refer to Using Local Data to see how you can run the full Monarch website locally and use your own data store. The app also offers a widget called Phenogrid that can be embedded in any website. For developersThe monarch-app repository contains the code to run the Monarch Initiative website in the To start development, please refer to the CONTRIBUTING.md document with this README. If you are planning to only develop the frontend or backend of the application you can refer directly to the README and CONTRIBUTING files in each of those sections. |
|
Code Review: Case-Phenotype Grid Visualization. This is a substantial and well-structured PR. The 2-hop traversal pattern, config/grouping registries, and generic endpoint design are all solid. Test coverage is thorough. Key issues found: BUG 1 - parseCellKey is broken for colon-containing IDs (frontend): IDs like MONARCH:case1 and HP:0001234 produce the key MONARCH:case1:HP:0001234. The current code does const [columnId, rowId] = cellKey.split(':') which yields columnId='MONARCH', rowId='case1' -- both wrong. Use a separator that cannot appear in CURIEs (e.g. |||). BUG 2 - Duplicate get_entity call in get_generic_entity_grid (solr_implementation.py): The context entity is fetched at the top of the method (~line 1459), then again after building the grid (~line 1568). This doubles Solr round-trips. Reuse the first fetch. BUG 3 - build_grid_column_query accesses config.column_assoc_category.value which may be None: GridTypeConfig.column_assoc_category is Optional[AssociationCategory]. If only column_assoc_categories is set (multi-category config), this raises AttributeError. Use config.get_column_assoc_categories() instead. DESIGN 4 - Two overlapping endpoints: case_phenotype.py provides /case-phenotype-matrix/{disease_id} while entity_grid.py provides /{context_id}/case-phenotype-grid. Both are registered in main.py under different response schemas. Consider deprecating the specialized endpoint or documenting the intended relationship. DESIGN 5 - String-based heuristics in get_generic_entity_grid: Checks like 'if Gene in first_col_cat' are fragile. Since association_type_mappings.yaml now has subject_category/object_category, use AssociationTypeMappings.get_mapping() instead. DESIGN 6 - _raw_solr_query bypasses existing SolrService: Calls requests.get(url) with a manually-constructed URL, bypassing connection pooling and error handling in SolrService. DESIGN 7 - _get_disease_name and _get_entity_name are identical: Consolidate to one method. NITS: (8) SolrImplementation() instantiated at module level for skipif conditions -- prefer a lazy callable. (9) Duplicate comment block in test_association_type_mapping.py. (10) get_bin_label raises KeyError inconsistently with the safe dict.get pattern used elsewhere. (11) test_not_a_disease tests ID-format rejection not category check -- comment is misleading. (12) limit semantics differ between the two API files. SUMMARY: Item 1 (cell key parsing) is a correctness bug that should be fixed before merge. Items 2-3 are also correctness issues. Items 4-7 are design concerns worth addressing; the rest can be tackled as follow-ups. |
…ttributeErrors - Remove duplicate _get_disease_name method, use _get_entity_name instead - Remove redundant get_entity call in get_generic_entity_grid - Fix build_grid_column_query/build_grid_row_query to use config.get_column_assoc_categories() instead of direct .column_assoc_category.value access, preventing AttributeError when only column_assoc_categories list is set - Deduplicate make_cell_key: case_phenotype_utils now imports from entity_grid_utils - Add warning logs for heuristic fallback in field direction mapping - Log real exception in _validate_entity_category before re-raising as 404 - Remove unused urlencode import
|
Test comment from review |
PR Review: Case-Phenotype Grid (1/3 - Potential Bugs)1. context_id not escaped - In build_grid_column_query and build_grid_row_query, context_id is interpolated without escaping. Use escape(context_id) for consistency with other query builders. 2. Duplicate entity lookup (N+1) - Both get_entity_grid and get_generic_entity_grid call get_entity for validation then call _get_entity_name which calls get_entity again. Thread the entity name through to save a Solr round-trip. 3. intersection.pop() is unreachable dead code - In _find_bin_for_phenotype and _find_bin_for_entity, the final return intersection.pop() can never execute because if intersection is non-empty, the preceding for loop is guaranteed to match first. Remove this dead code. 4. get_row_grouping silent failure - get_entity_grid calls get_row_grouping at the top, so any new grid type with a non-phenotype row category will raise ValueError before fetching any data. Consider making this a config-time check. |
PR Review: Case-Phenotype Grid (2/3 - Architecture and Code Quality)Architecture: 5. Two overlapping API endpoints - The PR registers both /v3/api/case-phenotype-matrix/{disease_id} (case_phenotype.py) and /v3/api/entity/{context_id}/case-phenotype-grid (entity_grid.py). Both serve the same data but return different shapes (CasePhenotypeMatrixResponse vs EntityGridResponse). Is the older endpoint still needed, or is it transitional and can be removed? 6. String heuristics for field direction - The fallback in get_generic_entity_grid checks string substrings (Gene in first_col_cat, Disease in column_assoc_categories[0]) to infer direction. This is fragile. Since the PR already adds subject_category/object_category to YAML, the heuristic branch should error explicitly. 7. Unbounded memory fetch - rows=50000 default means all associations load in one Solr call. Fine for current phenopacket scale, but worth documenting. Code Quality: 8. Duplicate comment block - In test_association_type_mapping.py, the Tests for AssociationTypeMappings singleton header appears twice. 9. normalize_solr_doc_for_model side-effects get_entity - The change unwraps single-element lists to scalars for all Entity construction. If category was legitimately a list, code checking isinstance(entity.category, list) could break. Add a regression test. 10. Inconsistent validation - case_phenotype.py uses a MONDO regex (returns 422), entity_grid.py uses _validate_entity_category (returns 400). Same mistake, different errors. 11. _SENTINEL_NO_ATTR defined after use - In test_case_phenotype_api.py, _SENTINEL_NO_ATTR = object() is referenced before it is defined. Move it before the function. |
PR Review: Case-Phenotype Grid (3/3 - Summary)What Works Well:
Minor Nits:
Also, please delete the stale test comment at #1247 (comment) (that was accidentally posted by the review tooling). Generated with Claude Code |
tests/unit/test_case_phenotype_api.py and tests/integration/test_case_phenotype_api.py had the same module name, causing a pytest collection error when CI runs `pytest tests` (both directories). Renamed the unit test file to test_case_phenotype_api_unit.py.
PR Review: Case-Phenotype Grid VisualizationThis is a substantial and well-structured PR. The architecture is solid - a clean 2-hop traversal pattern, thoughtful generic abstraction, excellent test coverage. Here are observations organized by priority. Bugs / Correctness1. case_phenotype.py router is redundant / dead code
2. The new method bypasses the existing 3. The column entity category is determined by substring matching on the association category name. 4. A Solr timeout or connection error becomes a misleading not-found response. The companion code in Design / Code Quality5. Duplicate section comment in The comment 6. In 7. The PR adds both case-specific models ( 8. CURIEs like Performance9. Two Solr round-trips when count is within limit Both 10. No pagination for row entities The row query defaults to Minor
Overall this is good work - clean architecture, excellent test coverage (unit + integration + performance), and the generic grid abstraction is well-designed. The main things to address before merge: the duplicate/dead |
Summary
This PR adds a new visualization section to disease pages showing a grid of cases (columns) × phenotypes (rows), with phenotypes grouped by histopheno bins (body systems) that expand to show individual phenotypes. This enables researchers to quickly see which phenotypes are present or absent across cases associated with a disease.
CaseToDiseaseAssociationand phenotypes viaCaseToPhenotypicFeatureAssociationArchitecture
Data Flow
Files Added
frontend/src/api/case-phenotype-types.tsfrontend/src/api/case-phenotype.tsfrontend/src/util/case-phenotype-matrix.tsfrontend/src/components/CasePhenotypeGrid.vuefrontend/src/components/CasePhenotypeModal.vuefrontend/src/pages/node/SectionCasePhenotypeGrid.vueFiles Modified
frontend/src/pages/node/PageNode.vueSectionCasePhenotypeGridimport and usageImplementation Details
1. Type Definitions (
case-phenotype-types.ts)Defines the core data structures for the visualization:
Includes the
HISTOPHENO_BINSconstant mapping UPHENO/HP IDs to human-readable labels (Nervous System, Eye, Skeletal System, etc.).2. API Layer (
case-phenotype.ts)Orchestrates data fetching with batched requests to handle URL length limits:
Case phenotype requests are batched in groups of 5 due to URL length limits with unicode characters in case IDs.
3. Matrix Building (
case-phenotype-matrix.ts)Transforms flat associations into a structured matrix:
Uses
object_closurefrom phenotype associations to match against histopheno bin IDs.4. Grid Component (
CasePhenotypeGrid.vue)Renders the interactive grid with expand/collapse functionality:
Features:
5. Section Wrapper (
SectionCasePhenotypeGrid.vue)Handles loading states and conditionally renders the section:
node.category === 'biolink:Disease')Test Plan
🤖 Generated with Claude Code