This research template implements a clear two-layer architecture separating generic build infrastructure from project-specific scientific content. This document explains the architecture, design rationale, and how to work within this structure.
| Aspect | Layer 1: Infrastructure | Layer 2: Project |
|---|---|---|
| Location | infrastructure/ (root level) |
projects/{name}/src/ (project-specific) |
| Purpose | Generic, reusable build tools | Domain-specific research code |
| Scope | Works with any project | Specific to this research |
| Test Coverage | 60% minimum for infrastructure/ |
90% minimum for projects/{name}/src/ |
| Scripts | scripts/ (root, generic orchestrators) |
projects/{name}/scripts/ (project orchestrators) |
| Tests | tests/infra_tests/ (root level) |
projects/{name}/tests/ (project-specific) |
| Imports | from infrastructure.module import |
from project.src.module import |
| Dependencies | No project dependencies | Can import from infrastructure |
| Examples | PDF generation, validation, figure management | Algorithms, simulations, analysis |
Location: infrastructure/ (root level)
Purpose: Reusable tools and utilities that apply to any research project using this template. These handle:
- Build orchestration and PDF generation
- Document validation and quality checking
- Build artifact verification
- Environment reproducibility tracking
- Academic publishing assistance
- Figure and image management
- Markdown integration
Modules:
flowchart TB
INFRA[/infrastructure//]
INFRA --> CORE[/core/<br/>exceptions · logging · config_loader/]
INFRA --> VAL[/validation/<br/>pdf · markdown · integrity/]
INFRA --> DOC[/documentation/<br/>figure · image · markdown integration · glossary/]
INFRA --> PUB[/publishing/<br/>academic publishing tools/]
INFRA --> LLM[/llm/<br/>LLM integration · literature workflows/]
INFRA --> REND[/rendering/<br/>multi-format · PDF · slides · HTML/]
INFRA --> SCI[/scientific/<br/>scientific dev tools/]
INFRA --> SEARCH[/search/<br/>multi-source literature search/]
INFRA --> REF[/reference/<br/>BibTeX I/O/]
INFRA --> REP[/reporting/<br/>pipeline reports/]
INFRA --> STEG[/steganography/<br/>secure PDF post-processing/]
classDef root fill:#0f172a,stroke:#0f172a,color:#fff
classDef pkg fill:#1e3a8a,stroke:#0f172a,color:#fff
class INFRA root
class CORE,VAL,DOC,PUB,LLM,REND,SCI,SEARCH,REF,REP,STEG pkg
Key Characteristics:
- Generic and reusable across projects
- Handles template infrastructure concerns
- 60% minimum test coverage for infrastructure (see
docs/_generated/canonical_facts.mdfor measured status) - No domain-specific logic
- Interfaces with project files (manuscript/, output/)
Usage Pattern:
# Infrastructure usage from scripts
from infrastructure.documentation import FigureManager
from infrastructure.documentation import MarkdownIntegration
# These manage the document structure, not the science
fm = FigureManager()
fm.register_figure(
filename="convergence_plot.png",
caption="Algorithm convergence comparison",
label="fig:convergence"
)Location: projects/{name}/src/ (project-specific code), projects/{name}/scripts/ (project orchestrators)
Purpose: Domain-specific code implementing the research project's scientific algorithms, data processing, analysis, and visualization.
Modules:
flowchart LR
SRC[/projects/<name>/src//]
SRC --> EX[example.py<br/>basic operations]
SRC --> OTHER[*.py<br/>project-specific modules]
classDef d fill:#0f172a,stroke:#0f172a,color:#fff
classDef f fill:#0f766e,stroke:#0f172a,color:#fff
class SRC d
class EX,OTHER f
Scripts (thin orchestrators):
flowchart LR
SC[/projects/<name>/scripts//]
SC --> EF[example_figure.py<br/>basic figure generation]
SC --> RF[generate_research_figures.py<br/>complex figures]
SC --> AP[analysis_pipeline.py<br/>analysis workflow]
SC --> SS[scientific_simulation.py<br/>simulation execution]
SC --> SF[generate_scientific_figures.py<br/>automated figures]
classDef d fill:#0f172a,stroke:#0f172a,color:#fff
classDef f fill:#0f766e,stroke:#0f172a,color:#fff
class SC d
class EF,RF,AP,SS,SF f
Key Characteristics:
- Domain-specific and research-focused
- Implements algorithms and computations
- Calls infrastructure when needed
- 90% minimum test coverage for project
src/(measure locally or seedocs/_generated/canonical_facts.md) - Follows thin orchestrator pattern
Usage Pattern:
# Project-specific usage from scripts
from project.src.simulation import SimpleSimulation
from project.src.statistics import calculate_descriptive_stats
from infrastructure.documentation import FigureManager
# Science: Run simulation and analysis
sim = SimpleSimulation()
results = sim.run()
stats = calculate_descriptive_stats(results)
# Infrastructure: Manage figures
fm = FigureManager()
fm.register_figure(
filename="results.png",
caption="Simulation results",
label="fig:results"
)graph TB
subgraph L1["LAYER 1: INFRASTRUCTURE<br/>(Build orchestration, validation, document management)"]
subgraph SCRIPTS["Pipeline Orchestrators"]
RUN_ALL[execute_pipeline.py<br/>10-stage DAG pipeline]
SCRIPT_LIST[scripts/*.py<br/>- 00_setup_environment.py<br/>- 01_run_tests.py<br/>- 02_run_analysis.py<br/>- 03_render_pdf.py<br/>- 04_validate_output.py<br/>- 05_copy_outputs.py]
end
subgraph INFRA["infrastructure/"]
INFRA_MODS[core/, validation/,<br/>documentation/, publishing/,<br/>llm/, rendering/,<br/>scientific/, skills/, steganography/]
end
end
subgraph L2["LAYER 2: SCIENTIFIC<br/>(Algorithms, analysis, visualization, data)"]
subgraph SRC["projects/{name}/src/"]
SRC_MODS[simulation, statistics,<br/>data_processing, metrics,<br/>parameters, performance,<br/>plots, reporting, validation,<br/>visualization, data_generator,<br/>example]
end
subgraph PROJ_SCRIPTS["projects/{name}/scripts/<br/>(thin orchestrators)"]
PROJ_SCRIPT_LIST[example_figure.py<br/>generate_research_figures.py<br/>analysis_pipeline.py<br/>scientific_simulation.py<br/>generate_scientific_figures.py]
end
end
subgraph MANUSCRIPT["manuscript/<br/>(research content)"]
MANUSCRIPT_FILES[01_abstract.md through<br/>99_references.md]
end
L1 -->|"Manages structure and<br/>validates outputs"| L2
L1 -->|"Validates science"| L2
L2 -->|"Input: Manuscripts, configurations<br/>Output: Figures, data, reports"| MANUSCRIPT
classDef layer1 fill:#e1f5fe,stroke:#01579b,stroke-width:3px
classDef layer2 fill:#f1f8e9,stroke:#33691e,stroke-width:3px
classDef manuscript fill:#fff3e0,stroke:#e65100,stroke-width:2px
class L1,SCRIPTS,INFRA,RUN_ALL,SCRIPT_LIST,INFRA_MODS layer1
class L2,SRC,PROJ_SCRIPTS,SRC_MODS,PROJ_SCRIPT_LIST layer2
class MANUSCRIPT,MANUSCRIPT_FILES manuscript
✅ Layer 1 → Layer 1: Infrastructure modules can import from other infrastructure modules
from infrastructure.documentation import FigureManager
from infrastructure.documentation import ImageManager✅ Layer 2 → Layer 1: Project code can import infrastructure
from project.src.visualization import plot_results
from infrastructure.documentation import FigureManager
# Use infrastructure for figure management
fig = plot_results(data)
fig.savefig("output/figures/results.png")
fm = FigureManager()
fm.register_figure(
filename="results.png",
caption="Results visualization",
label="fig:results"
)✅ Layer 2 → Layer 2: Project modules can import from other project modules
from project.src.simulation import SimpleSimulation
from project.src.statistics import calculate_descriptive_stats❌ Layer 1 → Layer 2: Infrastructure should NOT import project code
# BAD: Build tools shouldn't depend on project-specific code
from infrastructure.validation.integrity.integrity.integrity.checks.checks import verify_output_integrity
from project.src.simulation import SimpleSimulation # ❌ WRONG
# This breaks the abstraction and makes infrastructure project-specificflowchart TB
INFRA[/infrastructure//<br/>Layer 1 · 15 subpackages]
INFRA --> META[__init__.py · AGENTS.md ·<br/>README.md · SKILL.md]
INFRA --> CFG[/config/<br/>Shared configuration/]
INFRA --> CORE[/core/<br/>logging · config · pipeline ·<br/>checkpoint · security · telemetry/]
INFRA --> DOCK[/docker/<br/>Container specs/]
INFRA --> DOC[/documentation/<br/>figure manager · glossary gen/]
INFRA --> LLM[/llm/<br/>Ollama integration · prompts/]
INFRA --> PROJ[/project/<br/>multi-project discovery/]
INFRA --> PUB[/publishing/<br/>Zenodo · arXiv · GitHub/]
INFRA --> REND[/rendering/<br/>PDF · HTML · slides/]
INFRA --> REP[/reporting/<br/>pipeline · executive reports/]
INFRA --> SCI[/scientific/<br/>numerical stability · benchmarking/]
INFRA --> SK[/skills/<br/>SKILL.md discovery/]
INFRA --> STEG[/steganography/<br/>PDF hardening/]
INFRA --> VAL[/validation/<br/>PDF · markdown · integrity · audit/]
INFRA --> SEARCH[/search/<br/>literature search/]
INFRA --> REF[/reference/<br/>BibTeX I/O/]
classDef root fill:#0f172a,stroke:#0f172a,color:#fff
classDef pkg fill:#1e3a8a,stroke:#0f172a,color:#fff
classDef meta fill:#0f766e,stroke:#0f172a,color:#fff
class INFRA root
class CFG,CORE,DOCK,DOC,LLM,PROJ,PUB,REND,REP,SCI,SK,STEG,VAL,SEARCH,REF pkg
class META meta
File-level layout inside each package: see infrastructure/AGENTS.md.
flowchart TB
PROJ[/project//<br/>Project-specific code]
PROJ --> SRC[/src/<br/>Project scientific code/]
PROJ --> SC[/scripts/<br/>Project orchestrators/]
PROJ --> T[/tests/<br/>Project tests/]
SRC --> SRC_FILES[__init__.py · AGENTS.md · README.md ·<br/>example.py · ...]
SC --> SC_FILES[example_figure.py · generate_research_figures.py ·<br/>analysis_pipeline.py · scientific_simulation.py ·<br/>generate_scientific_figures.py]
T --> T_FILES[__init__.py · test_example.py ·<br/>test_simulation.py · test_statistics.py · ...]
classDef d fill:#0f172a,stroke:#0f172a,color:#fff
classDef pkg fill:#1e3a8a,stroke:#0f172a,color:#fff
classDef f fill:#0f766e,stroke:#0f172a,color:#fff
class PROJ d
class SRC,SC,T pkg
class SRC_FILES,SC_FILES,T_FILES f
flowchart TB
ROOT_TESTS[/tests//<br/>Root level · infrastructure tests]
ROOT_TESTS --> INFRA_T[/infra_tests/<br/>Layer 1 tests/]
ROOT_TESTS --> INTEG[/integration/<br/>Cross-layer tests/]
ROOT_TESTS --> HELPERS[/helpers/<br/>Test utilities/]
INFRA_T --> INFRA_F[__init__.py · test_build/ ·<br/>test_validation/ · test_documentation/ · ...]
INTEG --> INTEG_F[__init__.py · test_integration_pipeline.py · ...]
PROJ_TESTS[/projects/<name>/tests//<br/>Layer 2 · project tests]
PROJ_TESTS --> PROJ_F[__init__.py · test_example.py ·<br/>test_simulation.py · test_statistics.py · ...]
classDef d fill:#0f172a,stroke:#0f172a,color:#fff
classDef pkg fill:#1e3a8a,stroke:#0f172a,color:#fff
classDef f fill:#0f766e,stroke:#0f172a,color:#fff
class ROOT_TESTS,PROJ_TESTS d
class INFRA_T,INTEG,HELPERS pkg
class INFRA_F,INTEG_F,PROJ_F f
flowchart TD
START([User runs:<br/>uv run python scripts/execute_pipeline.py --project {name} --core-only]) --> CLEAN[STAGE 0: Clean Output Directories<br/>- Remove old outputs<br/>- Prepare fresh build]
CLEAN --> STAGE00[STAGE 00: LAYER 1<br/>Setup Environment<br/>- Validate Python, dependencies<br/>- Check build tools]
STAGE00 --> PHASE1[PHASE 1: LAYER 1<br/>Test Validation<br/>- Run tests/infra_tests/<br/>- Run projects/{name}/tests/<br/>- Run tests/integration/<br/>- Validate coverage requirements<br/>Report: [LAYER-1-INFRASTRUCTURE] Running]
PHASE1 --> PHASE2[PHASE 2: LAYER 2<br/>Project Execution<br/>- Run projects/{name}/scripts/*.py<br/>- Generate figures<br/>- Process data<br/>- Create outputs<br/>Report: [LAYER-2-PROJECT] Running]
PHASE2 --> PHASE2_5[PHASE 2.5: LAYER 1<br/>Utilities<br/>- Generate API glossary<br/>- Validate markdown<br/>- Check cross-references<br/>Report: [LAYER-1-INFRASTRUCTURE] Running]
PHASE2_5 --> PHASE3_5[PHASE 3-5: LAYER 1<br/>Document Generation<br/>- Generate LaTeX preamble<br/>- Build individual PDFs<br/>- Build combined PDF<br/>- Create HTML version<br/>Report: [LAYER-1-INFRASTRUCTURE] Building]
PHASE3_5 --> PHASE6[PHASE 6: LAYER 1<br/>Validation<br/>- Validate PDF quality<br/>- Check for rendering issues<br/>Report: [LAYER-1-INFRASTRUCTURE] Done]
PHASE6 --> SUCCESS([Success:<br/>All PDFs generated,<br/>all layers working])
classDef layer1 fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef layer2 fill:#f1f8e9,stroke:#33691e,stroke-width:2px
classDef success fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px
classDef start fill:#fff3e0,stroke:#e65100,stroke-width:2px
class STAGE00,PHASE1,PHASE2_5,PHASE3_5,PHASE6 layer1
class PHASE2 layer2
class SUCCESS success
class START start
━━━ LAYER 1: Infrastructure Validation ━━━
[YYYY-MM-DD HH:MM:SS] [INFO] Running tests (infrastructure + scientific)
...tests output...
[YYYY-MM-DD HH:MM:SS] [INFO] ✅ All tests passed with adequate coverage
━━━ LAYER 2: Project Computation ━━━
[YYYY-MM-DD HH:MM:SS] [INFO] Executing project scripts...
[YYYY-MM-DD HH:MM:SS] [INFO] [LAYER-2-PROJECT] Starting analysis pipeline...
...script output...
[YYYY-MM-DD HH:MM:SS] [INFO] ✅ ALL project scripts executed successfully
━━━ LAYER 1: Infrastructure Validation ━━━
[YYYY-MM-DD HH:MM:SS] [INFO] Running repository utilities (glossary + markdown validation)
...validation output...
[YYYY-MM-DD HH:MM:SS] [INFO] ✅ Repository utilities completed
━━━ LAYER 1: Document Generation ━━━
[YYYY-MM-DD HH:MM:SS] [INFO] Step 3: Generating LaTeX preamble from markdown...
[YYYY-MM-DD HH:MM:SS] [INFO] Step 4: Discovering and building ALL markdown modules...
...PDF generation output...
[YYYY-MM-DD HH:MM:SS] [INFO] ✅ Combined document built successfully
flowchart TB
Q1{Is this specific to<br/>our research project?}
Q1 -- yes --> L2[Layer 2<br/>projects/<name>/src/]
Q1 -- no --> Q2{Is it about<br/>building / validating?}
Q2 -- yes --> L1[Layer 1<br/>infrastructure/]
Q2 -- no --> RECONSIDER[Reconsider scope]
L2 -.examples.-> L2EX[Simulation algorithms ·<br/>Statistical analysis ·<br/>Custom visualization ·<br/>Parameter sweeps ·<br/>Domain-specific processing]
L1 -.examples.-> L1EX[PDF generation · Figure management ·<br/>Document validation · Build verification ·<br/>Generic utilities · Cross-project templates]
Q3{Reusable across<br/>projects?} -.tiebreaker.- Q1
Q3 -- yes --> L1
Q3 -- no --> L2
classDef q fill:#1e3a8a,stroke:#0f172a,color:#fff
classDef l1 fill:#0f766e,stroke:#0f172a,color:#fff
classDef l2 fill:#7c2d12,stroke:#0f172a,color:#fff
class Q1,Q2,Q3 q
class L1,L1EX l1
class L2,L2EX,RECONSIDER l2
-
Create the module:
vim projects/{name}/src/new_algorithm.py -
Implement with type hints and docstrings:
"""New algorithm implementation.""" from typing import List, Optional def analyze_data(data: List[float]) -> Optional[float]: """Analyze data. Args: data: Input data Returns: Analysis result """ pass
-
Write tests:
vim projects/{name}/tests/test_new_algorithm.py -
Add to projects/{name}/src/init.py:
from .new_algorithm import analyze_data
-
Use in scripts:
from project.src.new_algorithm import analyze_data
-
Update documentation:
- Add to projects/{name}/src/AGENTS.md
- Add to projects/{name}/src/README.md
-
Create the module:
vim infrastructure/validation/new_validator.py
-
Implement generic, project-independent logic:
"""New validation tool.""" def validate_output_structure(output_dir: str) -> bool: """Validate output directory structure.""" pass
-
Write tests:
vim tests/infra_tests/validation/test_pdf_validator.py
-
Document usage:
- Add to infrastructure/validation/AGENTS.md
- Include usage examples
-
Integrate with build pipeline:
- Update scripts/execute_pipeline.py if needed
- Update infrastructure modules if applicable
- Verify build orchestration works
- Test validation logic
- Check file integrity checking
- Validate PDF generation
- No dependency on scientific code
Command:
pytest tests/infra_tests/ --cov=infrastructure- Test algorithms correctness
- Verify statistical computations
- Check data processing
- Validate visualization output
- No dependency on build infrastructure
Command:
pytest projects/{name}/tests/ --cov=projects/{name}/src- End-to-end pipeline validation
- Script execution testing
- Layer interaction verification
- Output completeness checking
Command:
pytest tests/integration/ --cov=projects/{name}/src --cov=infrastructure# All tests with coverage
pytest tests/ projects/{name}/tests/ --cov=infrastructure --cov=projects/{name}/src --cov-fail-under=70
# Generate coverage report
pytest tests/ projects/{name}/tests/ --cov=infrastructure --cov=projects/{name}/src --cov-report=html
open htmlcov/index.html✅ Do:
- Write generic, reusable code
- Document with project-independent examples
- Test extensively with real scenarios
- Handle errors gracefully
- Provide clear logging
❌ Don't:
- Import scientific modules
- Assume specific research domain
- Skip tests to ship features
- Hardcode project-specific values
- Mix concerns (building vs. computation)
✅ Do:
- Use infrastructure tools for document management
- Follow thin orchestrator pattern in projects/{name}/scripts/
- Implement algorithms in projects/{name}/src/ modules
- Test with data
- Document domain-specific concepts
❌ Don't:
- Duplicate build/validation logic
- Implement document generation in scripts
- Skip layer abstraction
- Mix orchestration with computation
- Depend on infrastructure internals
# In project scripts - mark layer transitions
import logging
logger = logging.getLogger(__name__)
logger.info("[LAYER-2-PROJECT] Starting simulation...")
logger.info("[LAYER-1-INFRASTRUCTURE] Using FigureManager for output...")# In build scripts - mark phase transitions
log_info "━━━ LAYER 1: Infrastructure Validation ━━━"
log_info "━━━ LAYER 2: Scientific Computation ━━━"If you have an old project with flat src/, migrating to the two-layer structure:
-
Create packages:
mkdir -p infrastructure projects/{name}/src -
Move modules:
- Infrastructure modules → infrastructure/
- Project modules → projects/{name}/src/
-
Update imports:
from example import→from projects.{name}.src.example import- Build verification is handled by the validation module
-
Update tests:
- Infrastructure tests → tests/infra_tests/
- Project tests → projects/{name}/tests/
- Update conftest.py if needed
-
Validate:
pytest tests/ projects/{name}/tests/ --cov=infrastructure --cov=projects/{name}/src uv run python scripts/execute_pipeline.py --project {name} --core-only
Error: ModuleNotFoundError: No module named 'project.src'
Solution: Ensure tests/conftest.py includes projects/{name}/ on path:
import sys
sys.path.insert(0, os.path.join(repo_root, "projects", project_name))Error: Infrastructure module imports from project
Solution: Refactor to remove dependency or move code to appropriate layer
Check:
# Find infrastructure imports of project code
grep -r "from projects\." infrastructure/
grep -r "import projects\." infrastructure/Error: Build logic in project module
Solution: Move to infrastructure layer or extract into separate module
- ../core/architecture.md - system architecture overview
- decision-tree.md - Code placement flowchart
- thin-orchestrator-summary.md - Thin orchestrator pattern details
- infrastructure/AGENTS.md - Infrastructure layer documentation
- infrastructure/README.md - Infrastructure quick reference
- template_code_project/src/AGENTS.md - Project layer documentation
- template_code_project/src/README.md - Project quick reference
- ../AGENTS.md - system documentation
- ../README.md - Project overview
- ../core/how-to-use.md - usage guide
Layers separate concerns:
- [LAYER 1: INFRASTRUCTURE] handles how research is documented and built
- [LAYER 2: PROJECT] focuses on what research is conducted
This separation makes code more modular, reusable, and maintainable.
- Understanding the architecture: Start with the Quick Reference table above
- Adding code: See Decision Tree section
- Import patterns: See Import Guidelines section
- Testing: See Testing Strategy section