Steel Corrosion Agent for Risk Assessment (SCARA)

King Fahd University of Petroleum and Mineral (KFUPM)
College of Chemistry and Materials
Materials Science and Engineering Department (MSE)
LLM-Hackathon for Applications in Materials Science and Chemistry

Author(s): Hussein Al'Adwan, Mohammed ALI AlKubaish, Oswaldo Rodriguez, Chahd Rahyl Adjmi, Muhammed Ahmed, Motasem Ajlouni

This FastAPI application provides an intelligent API for assessing the corrosion risk of steel components. It evaluates risks based on application environments, material properties (steel grades and composition), and optional test data, leveraging Retrieval-Augmented Generation (RAG) from technical documents, rules-based enhancements, and LLM-driven analysis to deliver risk classifications, rationales, alternatives, and metrics.

1. Corrosion Risk Assessment: Motivation and Scope

Steel is the backbone of industrial infrastructure, from pipelines and refineries to marine and transportation systems. Despite its strength and cost-effectiveness, its durability is constantly threatened by corrosion, especially in aggressive environments such as (CO₂ / H₂S/ HCl) mixtures.[1] Corrosion weakens structural integrity and contributes to billions of dollars in global annual losses through maintenance, inspection, and premature failures [2], [3].

Conventional corrosion assessment methods based on ASTM/NACE standards, experimental testing, and scattered databases are slow, fragmented, and reactive, limiting timely decision-making in critical industries like oil & gas and infrastructure [3], [4], [5].

Recent advances in Large Language Models (LLMs) offer a transformative opportunity. By integrating standards, datasets, and literature, LLMs can act as digital corrosion experts, providing rapid predictions, explainable reasoning, and alloy recommendations.[6], [7], [8]

This hackathon project introduces SCARA (Steel Corrosion Agent for Risk Assessment), an LLM powered assistant designed to:

Predict corrosion risk under defined conditions.
Provide standardized risk scores (low, medium, high).
Recommend safer alloys supported by data and standards.

SCARA consolidates scattered corrosion knowledge into a unified, intelligent platform, aiming to enhance reliability, safety, and cost efficiency across industrial applications.

2. Knowledge Integration and LLM Agent-Framework

SCARA workflow is a designed LLM-Based framework to predict the steel performance of industrial and urban environments. The process initiates with the user input, which describes the environment (phase types, composition, thermodynamic conditions, and a brief description of the application (geometry, methodology, and location). The steel standard used, such as AISI, API, or UNS is fed, and the system automatically links the standard to the composition of the steel through the LLM model and the MatWeb materials database extracting the composition of the given designations. The prompt information is submitted to the LLM-Agent trained in scientific papers, review papers, books, and risk assessment standards.

The agent analyzes the data provided and generates supporting evidence, estimates the corrosion risk, and recommends possible steel suggestions. A validator checks the consistency of the predictions within a defined risk scale, if the validation converges the results are approved and the final risk assessment is report with the validated evidence while the inconsistent outputs are recalculated. SCARA integrates in this form the documentation, material data and LLM-agent reasoning to deliver evidence-based corrosion evaluations.

Features

How it Works

Component Diagram

flowchart LR
  subgraph Client
    UI[Web UI /app/static] -- fetch /assess --> API
  end

  subgraph Backend
    API[FastAPI app/api.py]
    RULES[Rules & Helpers app/rules.py]
    DECIDE[Decision Engine app/decision.py]
    RAG[RAGIndex app/rag.py]
  end

  subgraph Data
    QDRANT[(Qdrant Vector DB)]
    META[(meta.json backup)]
    PDFs[(Technical PDFs)]
    CACHE[(cache/metrics.json)]
  end

  subgraph External
    MATWEB[(MatWeb)]
    LLM[(Groq-hosted LLM via LangChain)]
  end

  UI --> API
  API <--> RAG
  API <--> RULES
  API <--> DECIDE

  RAG <--> QDRANT
  RAG <--> META
  PDFs --> RAG

  RULES <--> MATWEB
  RULES <--> CACHE

  DECIDE <--> LLM

Request Flow: /assess

sequenceDiagram
  participant B as Browser UI
  participant A as FastAPI (/assess)
  participant R as Rules (aliases, metrics)
  participant G as RAGIndex (retrieve)
  participant D as Decision (LLM)
  participant V as Qdrant
  participant M as MatWeb

  B->>A: POST /assess (Application, Steel, Tests, top_k)
  A->>R: steel_aliases(code_system, code_value)
  R-->>A: aliases, env hints
  A->>G: staged/enhanced retrieve (aliases, environment, k)
  G->>V: similarity_search[_with_score]
  V-->>G: Documents
  G-->>A: Evidence Chunks
  A->>R: get_component_metrics (UNS?)
  R->>M: scrape_uns_metrics (Playwright)
  M-->>R: Composition metrics
  A->>D: decide_with_rag(req, evidence, metrics)
  D->>LLM: Prompt (context + chunks)
  LLM-->>D: JSON (risk_band, evidence_used, consequences)
  D-->>A: risk_band, evidence_used
  A-->>B: AssessmentOutput (risk, rationale, alternatives, evidence, metrics)

RAG Ingestion Pipeline

flowchart TD
  U[Uploads / Local PDFs] --> P[PDF Extractor]
  P -->|PyMuPDF or pypdf| C[Chunker + Clean]
  C --> T[Tagging]
  T -->|steel_mentions, corrosion_types, section_type| E[Embedding all-mpnet-base-v2]
  E --> S[Qdrant Upsert]
  S --> B[meta.json backup]
  B --> W[Watcher: periodic flush]
  W --> S

Data Model (Pydantic)

classDiagram
  class Application {
    +List~str~ phase
    +Dict~str,str~ composition
    +Dict~str,str~ conditions
    +str? description
  }
  class Steel {
    +str code_system
    +str code_value
    +str? geometry
  }
  class AssessmentInput {
    +Application application
    +Steel steel
    +Dict~str,str~ tests
  }
  class AssessRequest {
    +int evidence_top_k = 8
  }
  AssessRequest --|> AssessmentInput

  class AssessmentOutput {
    +str corrosion_risk
    +str rationale
    +Dict~str,List~str~~ better_alternatives
    +List~Dict~str,str~~ evidence
    +Dict~str,str~? component_metrics
  }

  class AskRequest { +str query +int top_k=5 }
  class AskResponse { +List~str~ answers +List~Dict~str,str~~ citations }

  class IndexInfo { +str cwd +str store_dir +bool meta_exists +bool emb_exists }
  class IndexState { +int chunks +str emb_shape }

Technologies Used

Setup and Installation

Prerequisites

Installation

Clone the repository:

git clone https://github.com/mo-alkubaish/SCARA.git
cd SCARA

Create and activate virtual environment (uv handles this):
```
uv sync
```
This installs dependencies from pyproject.toml (FastAPI, Pydantic, LangChain, Qdrant-client, etc.).
Install Playwright browsers for scraping:
```
playwright install chromium
```
Set environment variables:
- GROQ_API_KEY: Required for LLM (obtain from groq.com).
- QDRANT_URL: Optional (defaults to local ./index_store/qdrant).
- QDRANT_API_KEY: For remote Qdrant if needed.
- INDEX_STORE: Vector store path (default: ./index_store).
- RAG_DEVICE: Embeddings device (cuda/cpu/mps).

Running the Server

uvicorn server:app --reload --port 8000

Indexing Documents

Usage Example

To assess corrosion risk, send a POST request to /assess with the following JSON payload:

curl -X POST "http://localhost:8000/assess" \
  -H "Content-Type: application/json" \
  -d '{
    "application": {
      "phase": ["marine"],
      "composition": {
        "C": "0.03",
        "Cr": "18-20",
        "Ni": "8-10"
      },
      "conditions": {
        "temperature": "ambient",
        "exposure": "chloride-rich seawater"
      },
      "description": "Offshore pipeline component in saline environment"
    },
    "steel": {
      "code_system": "UNS",
      "code_value": "S30400",
      "geometry": "pipe"
    },
    "tests": {
      "pitting_potential": "high",
      "corrosion_rate": "0.1 mm/year"
    },
    "evidence_top_k": 8
  }'

Expected successful response (HTTP 200 OK):

{
  "corrosion_risk": "Medium",
  "rationale": "S30400 austenitic stainless steel shows moderate resistance in chloride environments but risks pitting corrosion without proper alloying.",
  "better_alternatives": {
    "higher_molybdenum": ["S31600", "S31700"],
    "duplex": ["S31803"],
    "super_austenitic": ["S31254"]
  },
  "evidence": [
    {
      "source": "RAFAEL_1.PDF",
      "snippet": "Pitting in 304SS under marine conditions...",
      "relevance_score": 0.95
    }
  ]
}

For full schema details, refer to the interactive docs or Pydantic models in app/models.py.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
server.py		server.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steel Corrosion Agent for Risk Assessment (SCARA)

1. Corrosion Risk Assessment: Motivation and Scope

2. Knowledge Integration and LLM Agent-Framework

Features

How it Works

Component Diagram

Request Flow: /assess

RAG Ingestion Pipeline

Data Model (Pydantic)

Technologies Used

Setup and Installation

Prerequisites

Installation

Running the Server

Indexing Documents

Usage Example

About

Uh oh!

Releases

Packages

Languages

MHphysicist/SCARA

Folders and files

Latest commit

History

Repository files navigation

Steel Corrosion Agent for Risk Assessment (SCARA)

1. Corrosion Risk Assessment: Motivation and Scope

2. Knowledge Integration and LLM Agent-Framework

Features

How it Works

Component Diagram

Request Flow: /assess

RAG Ingestion Pipeline

Data Model (Pydantic)

Technologies Used

Setup and Installation

Prerequisites

Installation

Running the Server

Indexing Documents

Usage Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages