Godwin Etim Akpan
Applied Data Scientist & GIS Specialist • MSc Big Data Technologies (In Progress)
📍 Raleigh, North Carolina • GitHub • LinkedIn • Email
Applied Data Scientist & GIS Specialist with 10+ years of experience building data-driven solutions across public health, infrastructure, and environmental systems.
This portfolio demonstrates the integration of:
- Machine Learning & GeoAI
- Big Data Engineering (Spark, Hive, PySpark)
- Cloud Computing & Security
- Cybersecurity Analytics
- Automation & Monitoring Systems
to deliver scalable, real-world decision-support solutions.
flowchart TB
A[Data Sources] --> B[Data Processing & Automation]
B --> C[Analytics & GeoAI Modeling]
C --> D[Visualization & Reporting]
D --> E[Stakeholder Decision-Making]
subgraph SecurityLayer["Security & Monitoring"]
F[Access Control]
G[Monitoring & Logging]
H[Audit & Compliance]
end
B --> SecurityLayer
C --> SecurityLayer
D --> SecurityLayer
-
Lassa Fever GeoAI Forecasting
Spatio-temporal forecasting using ARIMA, Prophet, STL, and XGBoost for outbreak prediction. -
Intrusion Detection (UNSW-NB15)
Distributed cybersecurity analytics using Hive and PySpark. -
NCEM Flood Exposure Mapping
GIS-based hazard exposure modeling for emergency management. -
Cloud Infrastructure / CloudSim Cloud performance simulation and infrastructure modeling.
-
AWS EC2 CLI Lab Infrastructure provisioning and management using AWS CLI.
-
System Health Monitoring Automation
Python-based system monitoring and alerting for reliability engineering.
-
1️⃣ Big Data & Machine Learning
Spark • Hive • PySpark • ML Pipelines • Feature Engineering -
2️⃣ Public Health GeoAI & Spatio-Temporal Modeling
Disease Forecasting • Hotspot Modeling • Surveillance Analytics -
3️⃣ GIS, Spatial Data Science & Remote Sensing
ArcGIS • GeoPandas • Remote Sensing • Spatial Modeling -
4️⃣ Cloud Computing & Security Engineering
AWS • Azure • GCP • IAM • SIEM • Threat Detection -
5️⃣ Data Engineering & Analytics Engineering
ETL/ELT • Spark • Orchestration • Lakehouse Patterns • Microsoft Fabric -
6️⃣ Automation, Reliability & Secure Systems
Python Automation • Monitoring • Reporting • APIs
flowchart LR
A[Local Code] --> B[GitHub Repo]
B --> C[CI Tests]
C --> D[Build]
D --> E[Deploy]
E --> F[Monitoring]
Experience in public health and geospatial analytics includes:
- Lassa Fever surveillance and forecasting
- COVID-19 monitoring systems and dashboards
- Malaria vector and environmental modeling
- Emergency preparedness and health systems analytics
Detailed outputs are available within relevant project folders.
Data & ML:
Python • Spark • PySpark • Hive • SQL • Pandas • GeoPandas • scikit-learn • XGBoost • Prophet
GIS & GeoAI:
ArcGIS Pro • ArcGIS Online • QGIS • Shapely • Rasterio • Remote Sensing
Public Health Analytics:
DHIS2 pipelines • Epidemiological modeling • Outbreak dashboards
Cloud & Security:
AWS • Azure • GCP • IAM • VPC • SIEM • Splunk • Nessus
Automation & Systems:
Python • Linux • Bash • APIs • Monitoring • Logging
Visualization:
ArcGIS Dashboards • Matplotlib • Seaborn • Power BI • Fabric Visuals
CompTIA CySA+ • Security+ • Splunk Core Certified User • Google IT Automation • IBM Data Science
Each project folder includes:
README.md- Jupyter Notebooks
- Python scripts / PySpark scripts
figures/(maps, charts, plots)data_template/(synthetic or sample data)
Install dependencies:
pip install -r requirements.txt- Scalable GeoAI pipelines using Spark
- Real-time public health early warning systems
- Cloud-native analytics and security integration
- Advanced geospatial risk and hazard modeling
Applied Data Science • GeoAI • Big Data • Cloud • Cybersecurity • Public Health Informatics