Atharva Chandwadkar atharva21-stack

I build high-performance AI systems.

Everything else is noise.

Stack

GPU → CUDA • cuBLAS • cuDNN • TensorRT • Triton • NCCL
ML → PyTorch • JAX • TensorFlow
Infra → Kubernetes • OpenShift • Docker • AWS
Distributed → DDP • Model Parallelism • Run:AI • Slurm
Code → Python • C++ • Bash

What I Optimize

LLM inference latency, throughput & cost
Multi-GPU training efficiency
Kernel performance & memory behavior
GPU utilization at scale
Cloud-native AI deployments

Current Focus

TensorRT-LLM pipelines
Triton multi-model serving
Advanced CUDA optimization
Multi-node distributed workloads
Mapping workloads → NVIDIA hardware

Connect

LinkedIn → linkedin.com/in/atharva21
Email → [email protected]

Speed. Parallelism. Precision.
Accelerating neural force — one GPU cycle at a time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atharva Chandwadkar atharva21-stack

Achievements

Achievements

Highlights

Organizations

Block or report atharva21-stack

I build high-performance AI systems.

Stack

What I Optimize

Current Focus

Connect

Pinned Loading

Uh oh!