Skip to content

Report generation, YAML, etc #68

@tommyod

Description

@tommyod

Some interesting notes from the original probabilit proposal by @dafeda . There are things that we can consider in the future.


Report Generation

Report Generation: Export a summary of the experiment, including variable definitions, correlation matrices, plots, and statistics, in HTML, PDF, or Markdown.

A high-level method such as exp.report() can generate a default document with all the standard content:

  • Configuration Overview (sample size, sampling method, correlation strategy, etc.).
  • Variable List with definitions (names, distribution types, parameters).
  • Correlation Matrices used in the experiment.
  • Summary Statistics (mean, standard deviation, percentiles) for each variable or expression.
  • Visualizations (histograms, boxplots, correlation plots) for quick inspection.
  • Interpretations or Key Findings (optional narrative about which variables significantly influence results).
  • Output Formats: HTML, PDF, Markdown.
  • Customization: Possibly via YAML or similar config files to tailor the final report.

Example of yml

metadata:
  name: "Comprehensive Project Risk Analysis"
  description: "Complete risk analysis including direct costs, quality impacts, schedule effects, market variations, and risk adjustments"
  created_at: "2024-03-20T10:00:00Z"
  version: "1.0"
  sample_size: 10000
  sampling_method: "latin_hypercube"

variables:
  # Direct Material and Equipment
  Steel_Cost:
    type: "Normal"
    specification: "percentiles"
    parameters:
      p10: 800000
      p90: 1200000

  Maintenance_Cost:
    type: "LogNormal"
    specification: "percentiles"
    parameters:
      p10: 50000
      p90: 150000

  Daily_Fuel_Cost:
    type: "Uniform"
    parameters:
      min_val: 1000
      max_val: 1500

  Equipment_Lifetime:
    type: "Weibull"
    parameters:
      shape: 2.5
      scale: 5000

  # Labor and Productivity
  Productivity_Factor:
    type: "Triangular"
    specification: "percentiles"
    parameters:
      p10: 85
      p50: 100
      p90: 110

  Daily_Worker_Absences:
    type: "DiscreteUniform"
    parameters:
      min_val: 0
      max_val: 5

  # Quality and Inspection
  Quality_Score:
    type: "Beta"
    parameters:
      alpha: 5
      beta: 2

  Weekly_Quality_Issues:
    type: "Poisson"
    parameters:
      lambda_param: 3.5

  Successful_Inspections:
    type: "Binomial"
    parameters:
      n: 10
      p: 0.8

  Certification_Attempts:
    type: "NegativeBinomial"
    parameters:
      r: 3
      p: 0.6

  # Schedule and Timing
  Schedule_Duration:
    type: "BetaPERT"
    parameters:
      min_val: 8
      most_likely: 10
      max_val: 14

  Repair_Time:
    type: "Gamma"
    parameters:
      shape: 2
      scale: 1.5

  Time_Between_Failures:
    type: "Exponential"
    parameters:
      rate: 0.1

  Price_Variation:
    type: "StudentT"
    parameters:
      df: 5

  Overhead_Rate:
    type: "KDE"
    data: [0.12, 0.15, 0.14, 0.13, 0.16, 0.15, 0.14]

correlations:
  cost_variables:
    method: "iman-conover"
    variables:
      - Steel_Cost
      - Maintenance_Cost
    pairs:
      # Correlation coefficients can themselves be uncertain
      # Here we specify that the correlation between Steel_Cost and Maintenance_Cost
      # follows a truncated normal distribution with mean 0.6 and std 0.1
      # This allows for uncertainty in our correlation estimates to be included in the analysis
      - [Steel_Cost, Maintenance_Cost, {
          mean: 0.6,
          std: 0.1,  # or min/max, or p10/p90
          distribution: "truncated_normal"
        }]

  productivity_variables:
    method: "iman-conover"
    variables:
      - Productivity_Factor
      - Daily_Worker_Absences
    pairs:
      - [Productivity_Factor, Daily_Worker_Absences, -0.4]

  quality_schedule_variables:
    method: "iman-conover"
    variables:
      - Quality_Score
      - Schedule_Duration
      - Repair_Time
    pairs:
      - [Quality_Score, Schedule_Duration, -0.5]
      - [Quality_Score, Repair_Time, -0.3]
      - [Schedule_Duration, Repair_Time, 0.4]

expressions:
  equipment_replacement_cost:
    formula: "100000 * (10000 / Equipment_Lifetime)"

  rework_cost:
    formula: "Weekly_Quality_Issues * 5000 * (1 - Quality_Score)"

  inspection_cost:
    formula: "(10 - Successful_Inspections) * 2000"

  certification_cost:
    formula: "Certification_Attempts * 10000"

  downtime_cost:
    formula: "(Repair_Time / Time_Between_Failures) * Schedule_Duration * 5000"

  schedule_delay_cost:
    formula: "Schedule_Duration * Daily_Fuel_Cost"

  market_adjusted_cost:
    formula: "(1 + Price_Variation * 0.1)"

  direct_cost:
    formula: "(Steel_Cost + Maintenance_Cost + equipment_replacement_cost + Productivity_Factor * 1000000 * (1 + Daily_Worker_Absences * 0.01)) * market_adjusted_cost"

  indirect_cost:
    formula: "rework_cost + inspection_cost + certification_cost + downtime_cost + schedule_delay_cost"

  total_cost:
    formula: "(direct_cost + indirect_cost) * (1 + Overhead_Rate)"

outputs:
  plots:
    - type: "histogram"
      variable: "total_cost"
      title: "Total Project Cost Distribution"
    - type: "tornado"
      variable: "total_cost"
      title: "Sensitivity Analysis"

  report:
    format: "html"
    title: "Comprehensive Project Risk Analysis"
    description: |
      Complete risk analysis with all variables contributing to final cost:
      - Direct costs (materials, equipment, labor)
      - Quality and inspection impacts
      - Schedule effects
      - Market variations
      - Risk adjustments

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions