A lightweight pipeline for generating synthetic access logs, preprocessing & feature-engineering them, and applying a Random Forest anomaly detector—complete with a real-time Streamlit dashboard.
This project uses a desk-based research approach, synthesizing peer-reviewed studies, industry frameworks (MITRE ATT&CK, CERT Insider Threat), and case analyses under PRISMA guidelines to inform our ML design and scenarios.
https://github.com/AvishekDhakal/courseworks/blob/main/220064_Avishek_Dhakal_Anamoly_Detection_1739520765066.pdf
- Clone & install
git clone https://github.com/your-username/InsightMed.git cd InsightMed python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt
- synthetic log generation
python3 main.py --total_logs 1000 --anomaly_ratio 0.02
- Run pipeline
python3 auto_inference.py
- Lauch Dashboard
streamlit run dashboard.py
-
Anomaly-based ML outperformed rule-based RBAC, catching subtle insider misuses (after-hours access, excessive DELETEs) that static rules miss.
-
High accuracy & low false positives achieved via tailored feature-engineering (rolling-window request rates, role-risk encoding).
-
Scalable & real-time: runs end-to-end in under a minute per batch, with a live dashboard for security analysts.
Synthetic Data Bias: current evaluation uses simulated logs; real-world patterns may differ.
Scenario Scope: focuses on HTTP-based insider activities; doesn’t yet cover database or network anomalies.
Model Update: lacks an online learning mechanism for continuous adaptation.
Resource Constraints: performance on edge or low-resource environments needs further testing.
Integrate real hospital logs to validate model robustness.
Extend to network-level and database query anomaly detection.
Implement federated learning for cross-institution collaboration without sharing raw data.
Add analyst feedback loops to refine model and reduce false positives.
Snapshot of the real-time anomaly monitor
