A repository containing my data science and machine learning projects. Where possible, data used in the project and models created by me have also been shared.
- Area: computer vision
- Summary: Malaria is traditionally diagnosed by microscopic examination of blood smear slides. Here, I design a pipeline that can automatically detect and identify infected red blood cells in microscopy images and output diagnostic and other clinically relevant information.
- Learnt: how to fine-tune a pretrained DEtection TRansformer (DETR) model and create a custom convolutional neural network to solve an object detection and classification task
- Tools: PyTorch, object-oriented programming (OOP), imgaug and cv2
- Area: cheminformatics
- Summary: Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis and is one of the top ten causes of death worldwide. In this study, I explore the relationship between chemical structure and mechanism of action.
- Learnt: how to extract meaningful information from noisy dose-response data and how to convert SMILES information into Morgan fingerprints.
- Tools: RDKit, UMAP
- Area: natural language processing
- Summary: Creating a comment classifier that can effectively handle nuanced and contextual language will be a crucial step towards the development of more automated content moderation systems. In this project, I created a model that can detect toxic comments and accurately identify different types of toxicity, such as hate speech, threats and obscenity.
- Learnt: how to represent text as fixed-length vectors known as bag-of-words
- Tools: Keras, Scikit Learn, NLTK