King Fahd University of Petroleum and Mineral (KFUPM)
College of Chemistry and Materials
Materials Science and Engineering Department (MSE)
LLM-Hackathon for Applications in Materials Science and Chemistry
Author(s): Hussein Al'Adwan, Mohammed ALI AlKubaish, Oswaldo Rodriguez, Chahd Rahyl Adjmi, Muhammed Ahmed, Motasem Ajlouni
This FastAPI application provides an intelligent API for assessing the corrosion risk of steel components. It evaluates risks based on application environments, material properties (steel grades and composition), and optional test data, leveraging Retrieval-Augmented Generation (RAG) from technical documents, rules-based enhancements, and LLM-driven analysis to deliver risk classifications, rationales, alternatives, and metrics.
Steel is the backbone of industrial infrastructure, from pipelines and refineries to marine and transportation systems. Despite its strength and cost-effectiveness, its durability is constantly threatened by corrosion, especially in aggressive environments such as (CO₂ / H₂S/ HCl) mixtures.[1] Corrosion weakens structural integrity and contributes to billions of dollars in global annual losses through maintenance, inspection, and premature failures [2], [3].
Conventional corrosion assessment methods based on ASTM/NACE standards, experimental testing, and scattered databases are slow, fragmented, and reactive, limiting timely decision-making in critical industries like oil & gas and infrastructure [3], [4], [5].
Recent advances in Large Language Models (LLMs) offer a transformative opportunity. By integrating standards, datasets, and literature, LLMs can act as digital corrosion experts, providing rapid predictions, explainable reasoning, and alloy recommendations.[6], [7], [8]
This hackathon project introduces SCARA (Steel Corrosion Agent for Risk Assessment), an LLM powered assistant designed to:
- Predict corrosion risk under defined conditions.
- Provide standardized risk scores (low, medium, high).
- Recommend safer alloys supported by data and standards.
SCARA consolidates scattered corrosion knowledge into a unified, intelligent platform, aiming to enhance reliability, safety, and cost efficiency across industrial applications.
SCARA workflow is a designed LLM-Based framework to predict the steel performance of industrial and urban environments. The process initiates with the user input, which describes the environment (phase types, composition, thermodynamic conditions, and a brief description of the application (geometry, methodology, and location). The steel standard used, such as AISI, API, or UNS is fed, and the system automatically links the standard to the composition of the steel through the LLM model and the MatWeb materials database extracting the composition of the given designations. The prompt information is submitted to the LLM-Agent trained in scientific papers, review papers, books, and risk assessment standards.
The agent analyzes the data provided and generates supporting evidence, estimates the corrosion risk, and recommends possible steel suggestions. A validator checks the consistency of the predictions within a defined risk scale, if the validation converges the results are approved and the final risk assessment is report with the validated evidence while the inconsistent outputs are recalculated. SCARA integrates in this form the documentation, material data and LLM-agent reasoning to deliver evidence-based corrosion evaluations.
flowchart LR
subgraph Client
UI[Web UI /app/static] -- fetch /assess --> API
end
subgraph Backend
API[FastAPI app/api.py]
RULES[Rules & Helpers app/rules.py]
DECIDE[Decision Engine app/decision.py]
RAG[RAGIndex app/rag.py]
end
subgraph Data
QDRANT[(Qdrant Vector DB)]
META[(meta.json backup)]
PDFs[(Technical PDFs)]
CACHE[(cache/metrics.json)]
end
subgraph External
MATWEB[(MatWeb)]
LLM[(Groq-hosted LLM via LangChain)]
end
UI --> API
API <--> RAG
API <--> RULES
API <--> DECIDE
RAG <--> QDRANT
RAG <--> META
PDFs --> RAG
RULES <--> MATWEB
RULES <--> CACHE
DECIDE <--> LLM
sequenceDiagram
participant B as Browser UI
participant A as FastAPI (/assess)
participant R as Rules (aliases, metrics)
participant G as RAGIndex (retrieve)
participant D as Decision (LLM)
participant V as Qdrant
participant M as MatWeb
B->>A: POST /assess (Application, Steel, Tests, top_k)
A->>R: steel_aliases(code_system, code_value)
R-->>A: aliases, env hints
A->>G: staged/enhanced retrieve (aliases, environment, k)
G->>V: similarity_search[_with_score]
V-->>G: Documents
G-->>A: Evidence Chunks
A->>R: get_component_metrics (UNS?)
R->>M: scrape_uns_metrics (Playwright)
M-->>R: Composition metrics
A->>D: decide_with_rag(req, evidence, metrics)
D->>LLM: Prompt (context + chunks)
LLM-->>D: JSON (risk_band, evidence_used, consequences)
D-->>A: risk_band, evidence_used
A-->>B: AssessmentOutput (risk, rationale, alternatives, evidence, metrics)
flowchart TD
U[Uploads / Local PDFs] --> P[PDF Extractor]
P -->|PyMuPDF or pypdf| C[Chunker + Clean]
C --> T[Tagging]
T -->|steel_mentions, corrosion_types, section_type| E[Embedding all-mpnet-base-v2]
E --> S[Qdrant Upsert]
S --> B[meta.json backup]
B --> W[Watcher: periodic flush]
W --> S
classDiagram
class Application {
+List~str~ phase
+Dict~str,str~ composition
+Dict~str,str~ conditions
+str? description
}
class Steel {
+str code_system
+str code_value
+str? geometry
}
class AssessmentInput {
+Application application
+Steel steel
+Dict~str,str~ tests
}
class AssessRequest {
+int evidence_top_k = 8
}
AssessRequest --|> AssessmentInput
class AssessmentOutput {
+str corrosion_risk
+str rationale
+Dict~str,List~str~~ better_alternatives
+List~Dict~str,str~~ evidence
+Dict~str,str~? component_metrics
}
class AskRequest { +str query +int top_k=5 }
class AskResponse { +List~str~ answers +List~Dict~str,str~~ citations }
class IndexInfo { +str cwd +str store_dir +bool meta_exists +bool emb_exists }
class IndexState { +int chunks +str emb_shape }
-
Clone the repository:
git clone https://github.com/mo-alkubaish/SCARA.git cd SCARA -
Create and activate virtual environment (uv handles this):
uv sync
This installs dependencies from
pyproject.toml(FastAPI, Pydantic, LangChain, Qdrant-client, etc.). -
Install Playwright browsers for scraping:
playwright install chromium
-
Set environment variables:
GROQ_API_KEY: Required for LLM (obtain from groq.com).QDRANT_URL: Optional (defaults to local./index_store/qdrant).QDRANT_API_KEY: For remote Qdrant if needed.INDEX_STORE: Vector store path (default:./index_store).RAG_DEVICE: Embeddings device (cuda/cpu/mps).
uvicorn server:app --reload --port 8000To assess corrosion risk, send a POST request to /assess with the following JSON payload:
curl -X POST "http://localhost:8000/assess" \
-H "Content-Type: application/json" \
-d '{
"application": {
"phase": ["marine"],
"composition": {
"C": "0.03",
"Cr": "18-20",
"Ni": "8-10"
},
"conditions": {
"temperature": "ambient",
"exposure": "chloride-rich seawater"
},
"description": "Offshore pipeline component in saline environment"
},
"steel": {
"code_system": "UNS",
"code_value": "S30400",
"geometry": "pipe"
},
"tests": {
"pitting_potential": "high",
"corrosion_rate": "0.1 mm/year"
},
"evidence_top_k": 8
}'Expected successful response (HTTP 200 OK):
{
"corrosion_risk": "Medium",
"rationale": "S30400 austenitic stainless steel shows moderate resistance in chloride environments but risks pitting corrosion without proper alloying.",
"better_alternatives": {
"higher_molybdenum": ["S31600", "S31700"],
"duplex": ["S31803"],
"super_austenitic": ["S31254"]
},
"evidence": [
{
"source": "RAFAEL_1.PDF",
"snippet": "Pitting in 304SS under marine conditions...",
"relevance_score": 0.95
}
]
}For full schema details, refer to the interactive docs or Pydantic models in app/models.py.