About
I am a data professional with experience in exploring data and developing data-driven end-to-end solutions. I have experience in Natural Language Processing, Computer Vision, Audio Processing, Knowledge Graphs, and creating Data Pipelines. My passion lies in leveraging these skills to solve complex problems and drive innovation in a variety of domains.
Skills
Resume
Professional Experience
Data Scientist
Feb 2024 - Present
paiqo GmbH, Paderborn, Germany
Data Scientist
Mar 2022 - Jan 2024
BASF Digital Solutions GmbH, Ludwigshafen, Germany
- Developed an app for summary extraction from unstructured documents with 85 % accuracy
- Implemented a document-based search application using LLMs (OpenAI API) and LangChain to enhance search capabilities
- Developed deep learning model for handwriting recognition for contractor timesheets, to automate and accelerate data input
- Implemented machine learning models and RegEx to extract company names from PDFs
- Refined and processed data for over 6000 products, preparing them for integration into knowledge graphs
- Collaborated on the development of table extraction model using LayoutLMV2
- Developed structured information extraction pipelines for multiple data sources i.e. PDF, web-pages etc. using OCR and web-scrapping
- Secured second place in an internal AI-based hackathon
- Led customer demos
Data Science Intern
Oct 2021 - Feb 2022
Chemovator GmbH, Mannheim, Germany
- Developed cognitive search for semi-structured and unstructured data utilizing transformers and knowledge graphs (Medium Article)
- Collaborated on development of chatbot for chemical and pharma domain
- Implemented an entity-based cognitive search using knowledge graphs
- Customer demos and rollouts
Freelancer
2017 - 2018
Pune, India
- Developed and implemented advanced computer vision algorithms for client-specific needs
- Developed and automated the ETL process
Project Engineer
2014 - 2016
Godrej and Boyce, Mumbai, India
- Led the automation of industrial machines
- Managed project scheduling and predictive maintenance tasks for timely project completion and reduced machine downtime
Education
Master of Science
Nov 2023
Universtiy of Paderborn, Paderborn, Germany
Thesis: Unuspervised Graph Neural Networks-based Shape Retrieval using Invariant Contour Features
Bachelor of Engineering
2014
University of Pune, Pune, India
Projects
- Links to github

APS Failure Prediction
A predictive system for heavy-duty vehicles to predict failure due to APS, boosting safety and repair cost.

Detection and Extraction of Number Plates
An automated vehicle number plate recognition system using Inception-ResNet-v2 for plate detection and Paddle OCR for character extraction, designed for applications in traffic monitoring, parking management, toll plazas, and security surveillance.

Book Recommander System
A collaborative filtering-based book recommendation system implemented in Streamlit, designed to suggest similar books tailored to individual user interests.

Content Based Image Retrieval
An advanced reverse image search system that collects, trains, and predicts related images using FastAPI, MongoDB, AWS S3, and various Python libraries, streamlining the process of retrieving relevant image results from an input query.

Prediction of Consumer Grievances
An NLP-driven solution that determines if a consumer will file a dispute after a company's response, leveraging data from the Consumer Financial Protection Bureau (CFPB) and employing CatBoost for classification, enabling companies to prioritize responses and efficiently manage potential disputes.

Authentication App
A two-factor authentication system that integrates password-based security with FaceNet-driven face recognition and MTCNN-based face detection, offering enhanced digital security and mitigating vulnerabilities associated with traditional password systems.

Real-Time Safety Detection
A computer vision-based Industry Safety Detection (ISD) system utilizing YOLOv7 to monitor and ensure employees wear essential safety gear, including helmets, gloves, jackets, goggles, and footwear, before entering the workplace.

Language Identification App
An application designed to identify spoken content in four Indian languages (Hindi, Kannada, Tamil, and Telugu) from YouTube audio extracts, utilizing a custom CNN model trained on Mel Spectrogram images.

News Summerization App
An NLP solution that employs the t5-small transformer model from Hugging Face, fine-tuned on a dataset of news articles, to generate extractive summaries capturing the main points of news articles, offering a quick overview for readers with time constraints.

Defect Detection in Steel Manufacturing
A defect detection system for steel sheets using Xception CNN and Vision Transformers for classification, paired with ResUNet models for precise localization of four distinct surface defects, outputting images with segmentation masks.

Quora Insincere Question Classification
A sentiment analysis system designed to identify and flag insincere questions on Quora using various models, including Random Forest, LSTM with Google embeddings, and BERT, with comprehensive exploratory data analysis and model evaluations presented in dedicated notebooks.

Stack Overflow Question Classification
A classification system that differentiates between Python and non-Python questions on StackOverflow, utilizing Data Version Control (DVC) for the process.

Estimation of Shipment Price
An application that leverages machine learning to predict shipment costs based on factors like package weight, dimensions, distance, transportation mode, and special handling requirements, enabling logistics companies to set dynamic pricing and optimize their services.

Face Swap
A computer vision application that swaps the faces of individuals in images, leveraging deep learning models for accurate face detection and replacement, enhancing entertainment and creative content generation.