Skip to content

A team project submitted at the LLM Hackathon for Applications in Materials Science & Chemistry

Notifications You must be signed in to change notification settings

MHphysicist/SCARA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Steel Corrosion Agent for Risk Assessment (SCARA)

King Fahd University of Petroleum and Mineral (KFUPM)
College of Chemistry and Materials
Materials Science and Engineering Department (MSE)
LLM-Hackathon for Applications in Materials Science and Chemistry

Author(s): Hussein Al'Adwan, Mohammed ALI AlKubaish, Oswaldo Rodriguez, Chahd Rahyl Adjmi, Muhammed Ahmed, Motasem Ajlouni


This FastAPI application provides an intelligent API for assessing the corrosion risk of steel components. It evaluates risks based on application environments, material properties (steel grades and composition), and optional test data, leveraging Retrieval-Augmented Generation (RAG) from technical documents, rules-based enhancements, and LLM-driven analysis to deliver risk classifications, rationales, alternatives, and metrics.

1. Corrosion Risk Assessment: Motivation and Scope

Steel is the backbone of industrial infrastructure, from pipelines and refineries to marine and transportation systems. Despite its strength and cost-effectiveness, its durability is constantly threatened by corrosion, especially in aggressive environments such as (CO₂ / H₂S/ HCl) mixtures.[1] Corrosion weakens structural integrity and contributes to billions of dollars in global annual losses through maintenance, inspection, and premature failures [2], [3].

Conventional corrosion assessment methods based on ASTM/NACE standards, experimental testing, and scattered databases are slow, fragmented, and reactive, limiting timely decision-making in critical industries like oil & gas and infrastructure [3], [4], [5].

Recent advances in Large Language Models (LLMs) offer a transformative opportunity. By integrating standards, datasets, and literature, LLMs can act as digital corrosion experts, providing rapid predictions, explainable reasoning, and alloy recommendations.[6], [7], [8]

This hackathon project introduces SCARA (Steel Corrosion Agent for Risk Assessment), an LLM powered assistant designed to:

  • Predict corrosion risk under defined conditions.
  • Provide standardized risk scores (low, medium, high).
  • Recommend safer alloys supported by data and standards.

SCARA consolidates scattered corrosion knowledge into a unified, intelligent platform, aiming to enhance reliability, safety, and cost efficiency across industrial applications.

2. Knowledge Integration and LLM Agent-Framework

SCARA workflow is a designed LLM-Based framework to predict the steel performance of industrial and urban environments. The process initiates with the user input, which describes the environment (phase types, composition, thermodynamic conditions, and a brief description of the application (geometry, methodology, and location). The steel standard used, such as AISI, API, or UNS is fed, and the system automatically links the standard to the composition of the steel through the LLM model and the MatWeb materials database extracting the composition of the given designations. The prompt information is submitted to the LLM-Agent trained in scientific papers, review papers, books, and risk assessment standards.

The agent analyzes the data provided and generates supporting evidence, estimates the corrosion risk, and recommends possible steel suggestions. A validator checks the consistency of the predictions within a defined risk scale, if the validation converges the results are approved and the final risk assessment is report with the validated evidence while the inconsistent outputs are recalculated. SCARA integrates in this form the documentation, material data and LLM-agent reasoning to deliver evidence-based corrosion evaluations.

Features

How it Works

Component Diagram

flowchart LR
  subgraph Client
    UI[Web UI /app/static] -- fetch /assess --> API
  end

  subgraph Backend
    API[FastAPI app/api.py]
    RULES[Rules & Helpers app/rules.py]
    DECIDE[Decision Engine app/decision.py]
    RAG[RAGIndex app/rag.py]
  end

  subgraph Data
    QDRANT[(Qdrant Vector DB)]
    META[(meta.json backup)]
    PDFs[(Technical PDFs)]
    CACHE[(cache/metrics.json)]
  end

  subgraph External
    MATWEB[(MatWeb)]
    LLM[(Groq-hosted LLM via LangChain)]
  end

  UI --> API
  API <--> RAG
  API <--> RULES
  API <--> DECIDE

  RAG <--> QDRANT
  RAG <--> META
  PDFs --> RAG

  RULES <--> MATWEB
  RULES <--> CACHE

  DECIDE <--> LLM
Loading

Request Flow: /assess

sequenceDiagram
  participant B as Browser UI
  participant A as FastAPI (/assess)
  participant R as Rules (aliases, metrics)
  participant G as RAGIndex (retrieve)
  participant D as Decision (LLM)
  participant V as Qdrant
  participant M as MatWeb

  B->>A: POST /assess (Application, Steel, Tests, top_k)
  A->>R: steel_aliases(code_system, code_value)
  R-->>A: aliases, env hints
  A->>G: staged/enhanced retrieve (aliases, environment, k)
  G->>V: similarity_search[_with_score]
  V-->>G: Documents
  G-->>A: Evidence Chunks
  A->>R: get_component_metrics (UNS?)
  R->>M: scrape_uns_metrics (Playwright)
  M-->>R: Composition metrics
  A->>D: decide_with_rag(req, evidence, metrics)
  D->>LLM: Prompt (context + chunks)
  LLM-->>D: JSON (risk_band, evidence_used, consequences)
  D-->>A: risk_band, evidence_used
  A-->>B: AssessmentOutput (risk, rationale, alternatives, evidence, metrics)
Loading

RAG Ingestion Pipeline

flowchart TD
  U[Uploads / Local PDFs] --> P[PDF Extractor]
  P -->|PyMuPDF or pypdf| C[Chunker + Clean]
  C --> T[Tagging]
  T -->|steel_mentions, corrosion_types, section_type| E[Embedding all-mpnet-base-v2]
  E --> S[Qdrant Upsert]
  S --> B[meta.json backup]
  B --> W[Watcher: periodic flush]
  W --> S
Loading

Data Model (Pydantic)

classDiagram
  class Application {
    +List~str~ phase
    +Dict~str,str~ composition
    +Dict~str,str~ conditions
    +str? description
  }
  class Steel {
    +str code_system
    +str code_value
    +str? geometry
  }
  class AssessmentInput {
    +Application application
    +Steel steel
    +Dict~str,str~ tests
  }
  class AssessRequest {
    +int evidence_top_k = 8
  }
  AssessRequest --|> AssessmentInput

  class AssessmentOutput {
    +str corrosion_risk
    +str rationale
    +Dict~str,List~str~~ better_alternatives
    +List~Dict~str,str~~ evidence
    +Dict~str,str~? component_metrics
  }

  class AskRequest { +str query +int top_k=5 }
  class AskResponse { +List~str~ answers +List~Dict~str,str~~ citations }

  class IndexInfo { +str cwd +str store_dir +bool meta_exists +bool emb_exists }
  class IndexState { +int chunks +str emb_shape }
Loading

Technologies Used

Setup and Installation

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/mo-alkubaish/SCARA.git
    cd SCARA
  2. Create and activate virtual environment (uv handles this):

    uv sync

    This installs dependencies from pyproject.toml (FastAPI, Pydantic, LangChain, Qdrant-client, etc.).

  3. Install Playwright browsers for scraping:

    playwright install chromium
  4. Set environment variables:

    • GROQ_API_KEY: Required for LLM (obtain from groq.com).
    • QDRANT_URL: Optional (defaults to local ./index_store/qdrant).
    • QDRANT_API_KEY: For remote Qdrant if needed.
    • INDEX_STORE: Vector store path (default: ./index_store).
    • RAG_DEVICE: Embeddings device (cuda/cpu/mps).

Running the Server

uvicorn server:app --reload --port 8000

Indexing Documents

Usage Example

To assess corrosion risk, send a POST request to /assess with the following JSON payload:

curl -X POST "http://localhost:8000/assess" \
  -H "Content-Type: application/json" \
  -d '{
    "application": {
      "phase": ["marine"],
      "composition": {
        "C": "0.03",
        "Cr": "18-20",
        "Ni": "8-10"
      },
      "conditions": {
        "temperature": "ambient",
        "exposure": "chloride-rich seawater"
      },
      "description": "Offshore pipeline component in saline environment"
    },
    "steel": {
      "code_system": "UNS",
      "code_value": "S30400",
      "geometry": "pipe"
    },
    "tests": {
      "pitting_potential": "high",
      "corrosion_rate": "0.1 mm/year"
    },
    "evidence_top_k": 8
  }'

Expected successful response (HTTP 200 OK):

{
  "corrosion_risk": "Medium",
  "rationale": "S30400 austenitic stainless steel shows moderate resistance in chloride environments but risks pitting corrosion without proper alloying.",
  "better_alternatives": {
    "higher_molybdenum": ["S31600", "S31700"],
    "duplex": ["S31803"],
    "super_austenitic": ["S31254"]
  },
  "evidence": [
    {
      "source": "RAFAEL_1.PDF",
      "snippet": "Pitting in 304SS under marine conditions...",
      "relevance_score": 0.95
    }
  ]
}

For full schema details, refer to the interactive docs or Pydantic models in app/models.py.

About

A team project submitted at the LLM Hackathon for Applications in Materials Science & Chemistry

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 56.7%
  • CSS 19.1%
  • HTML 17.1%
  • JavaScript 7.1%