Sriram Rampelli – AI/ML Engineer Portfolio

About Me

I am an AI/ML Engineer specializing in bridging cutting‑edge AI research with production‑grade enterprise systems. My expertise lies in architecting Generative AI, Retrieval‑Augmented Generation (RAG) pipelines and Vision‑Language models that are scalable, cost‑efficient and deterministically safe.

I focus on optimizing complex AI architectures using techniques like 4‑bit quantization, LoRA/QLoRA fine‑tuning and hybrid edge‑to‑cloud deployments, ensuring models run efficiently on constrained hardware. My work ranges from healthcare‑focused generative AI and quantum‑inspired vision‑language adapters to TinyLlama‑based multimodal systems.

Whether publishing healthcare NLP research at IEEE or deploying quantized LLMs on edge devices, I thrive at the intersection of research and implementation. I design rules‑driven RAG pipelines and compliant AI systems that turn theoretical advances into reliable, cost‑effective software, and I have delivered ML‑based anomaly detection frameworks in industry and built clinical NLP pipelines in academia.

Core Skills & Technologies

Python, PyTorch, TensorFlow
GPT/LLaMA/BioBERT & Transformers
LoRA/QLoRA, 4‑bit & mixed precision, GGUF/AWQ
RAG Models, LlamaIndex, LangChain, vector DBs (Pinecone, FAISS, Weaviate)
spaCy, NLTK, Clinical NLP & OCR
CLIP & Vision-Language Models (TinyLlama‑VLM, VLM Adapters)
AWS & GCP (BigQuery, Airflow), Kubernetes, Docker, Terraform, CI/CD
FastAPI, Flask, MQTT, serverless & offline‑first edge systems (BLE, mmWave, UWB)
SQL & PostgreSQL, data pipelines, dataset governance & MLOps
Reinforcement Learning, transfer learning & prompt engineering

Certifications

NVIDIA-Certified Associate: Generative AI & LLMs
Oracle Cloud Infrastructure 2024 Generative AI Certified Professional

Professional Experience

AI Engineer – Providence Wave Group

Dec 2025 – Present | Detroit, MI

Engineered and deployed Edge AI backend microservices using FastAPI, successfully integrating 4‑bit quantized LLMs to reduce latency and memory footprint.
Deployed 4‑bit quantized LLMs on edge hardware to optimize performance on resource‑constrained devices.
Developed context‑aware ingestion pipelines for BLE sensor data, storing information in SQL databases and incorporating user context into LLM prompts; designed offline‑first architectures for robust edge operation.
Standardized API contracts using strict JSON schemas to enable parallel development and accelerate feature delivery.
Led dataset governance and retrieval enhancements, implementing best practices for dataset synthesis, validation and audit; enhanced retrieval mechanisms with lexical scoring and deterministic tie‑breakers.

AI/ML Intern – BCG GenAI Job Simulation

Feb 2025 - Mar 2025 (Virtual Experience)

Built an AI-powered financial assistant that ingested SEC 10-K filings and automated extraction of key KPIs and risk indicators.
Designed NLP pipelines using spaCy and pandas to structure unstructured filings into analysis-ready tables.
Implemented sentiment and topic analysis across sections (MD&A, Risk Factors) to summarize company outlook.
Experimented with retrieval-augmented generation (RAG) to ground LLM responses directly in source documents.
Focused on explainability and reproducible outputs to align with consulting-grade deliverables.

Research Assistant – Lawrence Technological University

May 2024 – Dec 2024 | Southfield, MI

Developed a healthcare chatbot integrating fine-tuned LLaMA and Enhanced BioBERT for clinical question answering.
Built OCR and NLP pipelines to parse prescriptions and clinical documents, extracting entities like diagnoses, medications, and dosages.
Optimized model deployment on a local NVIDIA 3050 Ti using 4-bit quantization and LoRA adapters to fit resource limits.
Co-authored an IEEE CCWC 2025 paper on generative AI for healthcare data systems and presented findings at the conference.
Implemented hybrid retrieval (keyword + semantic search) to ground generative outputs in medical corpora and reduce hallucinations.

Project Engineer – Wipro Technologies

Nov 2021 – Dec 2022 | India

Designed ML-based anomaly detection for infrastructure and application logs, reducing critical incident volume by ~35%.
Built monitoring dashboards and alerting workflows that improved mean time to detection (MTTD) and resolution (MTTR).
Collaborated with cross-functional teams across 100+ Agile sprints to deliver stable and performant releases.
Automated portions of manual validation using Python scripts and data-driven rule engines to reduce operational overhead.

R&D Projects (Quantum AI & Healthcare GenAI)

Quantum-VLM Adapter

Quantum-inspired adapter layer for compressing and accelerating vision-language models without sacrificing multimodal reasoning quality.

View Details

Combines LoRA with quantum-inspired linear projections to reduce parameter count and GPU memory consumption.
Integrates with TinyLlama-VLM as a base model for multimodal captioning and reasoning tasks.
Includes rank ablations, latency/throughput benchmarks, and quality metrics on captioning datasets.

Tech: TinyLlama-VLM, LoRA, PyTorch, Hugging Face Transformers

View on GitHub

Healthcare Generative AI – IEEE CCWC 2025

Research codebase backing my IEEE paper on generative AI for healthcare data systems.

View Details

Enhanced BioBERT + CRF pipeline for clinical NER over prescriptions and discharge summaries.
OCR-based extraction of prescription text and normalization of medical entities.
Integration with generative models to answer clinical questions and summarize patient-level information.

Tech: BioBERT, CRF, PyTorch, OCR (Tesseract), Flask, Transformers

View on GitHub

Competitions

RSNA 2024 – Lumbar Spine Degeneration Classification

Built a deep learning pipeline for classifying lumbar spine degeneration from MRI scans in the RSNA 2024 challenge.

View Details

Used CNN-based models with aggressive data augmentation to handle scanner and patient variability.
Implemented ensemble strategies and test-time augmentation to boost robustness and accuracy.
Achieved >92% validation accuracy while respecting inference latency and memory constraints.

Tech: PyTorch, OpenCV, NumPy, Docker, AWS S3

View on GitHub

Highlighted AI/ML Projects

TinyLlama-VLM LoRA

Multimodal TinyLlama-based VLM that injects CLIP vision tokens into the language model context via LoRA adapters.

View Details

Preprocessed Flickr30k-style caption datasets and aligned image-text pairs for multimodal training.
Used a frozen CLIP ViT encoder to generate vision tokens fed into TinyLlama’s context window.
Evaluated with BLEU, ROUGE, and perplexity metrics to measure captioning quality and fluency.

Tech: TinyLlama, CLIP, LoRA, PyTorch, Transformers

View on GitHub

AI Call Center – Whisper + TinyLLaMA

Real-time AI call center prototype combining streaming ASR with a lightweight LLM to handle customer interactions.

View Details

Used Faster-Whisper for streaming speech-to-text; passed transcripts to TinyLLaMA for intent classification and response generation.
Implemented conversational state tracking to handle multi-turn dialogues and context carry-over.
Built a simple web interface and WebSocket pipeline for near real-time interaction.

Tech: Faster-Whisper, TinyLLaMA, Flask, WebSockets, PyTorch

View on GitHub

TRASHPRED – Waste Classification

Smart waste classifier that distinguishes recyclable vs non-recyclable items using transformer-based image models.

View Details

Curated and labeled a custom waste image dataset with varied lighting and backgrounds.
Applied data augmentation and fine-tuning to reach >90% accuracy on test sets.
Exposed a FastAPI-based inference endpoint ready for integration into smart-bin systems.

Tech: PyTorch, Transformers, FastAPI, Docker

View on GitHub

Multi-Source Data Analytics Chatbot

A multimodal analytics assistant that ingests CSVs, Excel, PDFs, DOCX, JSON, images, and DICOM files to generate insights and visualizations.

View Details

Implemented automatic detection of file types and parsing into structured pandas DataFrames.
Generated histograms, bar charts, and pie charts for quick EDA from natural-language prompts.
Integrated a GPT-2-based text model for open-domain Q&A with configurable safety filters.

Tech: Python, Flask, GPT-2, pandas, matplotlib, DICOM processing

View on GitHub

Education & Publications

Master of Science in Computer Science

Lawrence Technological University – Southfield, MI

Graduated: Dec 2024 | GPA: 3.67 / 4.0

Relevant Coursework: Deep Learning, Natural Language Processing, Computer Vision, Advanced Algorithms, Data Mining.

Publication:
IEEE CCWC 2025 – “Empowering Healthcare Data Systems with an Innovative Chatbot Application Utilizing Python and Advanced Generative AI Models”

Get in Touch

Open to AI/ML engineering roles, VLM/LLM research collaborations, and quantum-inspired ML projects.