Sriram Rampelli

AI/ML Engineer | Generative AI & LLMs | Vision-Language Models | Healthcare AI | Quantum-Inspired ML

Master’s in Computer Science – Lawrence Technological University (GPA 3.67) | F1 OPT (STEM Eligible)

Let’s Connect

About Me

I’m an AI/ML Engineer specializing in generative AI, large language models, and vision-language systems. I design and deploy scalable AI solutions that run efficiently even on constrained hardware, leveraging techniques like 4-bit quantization, LoRA/QLoRA, CLIP-based vision adapters, and optimized inference pipelines.

My work spans quantum-inspired adapters for VLMs, TinyLlama-based multimodal systems, and healthcare-focused generative AI—culminating in an IEEE CCWC 2025 publication. I enjoy sitting at the intersection of research and production: converting papers into practical, reliable systems that fit real-world constraints.

Previously at Wipro, I built ML-driven anomaly detection frameworks that reduced system errors by over 30%. At Lawrence Tech, I worked as a Research Assistant on LLaMA + BioBERT pipelines for clinical NLP, OCR-driven prescription analysis, and multimodal healthcare chatbots.

Core Skills & Technologies

Certifications

Professional Experience

AI/ML Intern – BCG GenAI Job Simulation

Feb 2025 - Mar 2025 (Virtual Experience)

  • Built an AI-powered financial assistant that ingested SEC 10-K filings and automated extraction of key KPIs and risk indicators.
  • Designed NLP pipelines using spaCy and pandas to structure unstructured filings into analysis-ready tables.
  • Implemented sentiment and topic analysis across sections (MD&A, Risk Factors) to summarize company outlook.
  • Experimented with retrieval-augmented generation (RAG) to ground LLM responses directly in source documents.
  • Focused on explainability and reproducible outputs to align with consulting-grade deliverables.

Research Assistant – Lawrence Technological University

May 2024 – Dec 2024 | Southfield, MI

  • Developed a healthcare chatbot integrating fine-tuned LLaMA and Enhanced BioBERT for clinical question answering.
  • Built OCR and NLP pipelines to parse prescriptions and clinical documents, extracting entities like diagnoses, medications, and dosages.
  • Optimized model deployment on a local NVIDIA 3050 Ti using 4-bit quantization and LoRA adapters to fit resource limits.
  • Co-authored an IEEE CCWC 2025 paper on generative AI for healthcare data systems and presented findings at the conference.
  • Implemented hybrid retrieval (keyword + semantic search) to ground generative outputs in medical corpora and reduce hallucinations.

Project Engineer – Wipro Technologies

Nov 2021 – Dec 2022 | India

  • Designed ML-based anomaly detection for infrastructure and application logs, reducing critical incident volume by ~35%.
  • Built monitoring dashboards and alerting workflows that improved mean time to detection (MTTD) and resolution (MTTR).
  • Collaborated with cross-functional teams across 100+ Agile sprints to deliver stable and performant releases.
  • Automated portions of manual validation using Python scripts and data-driven rule engines to reduce operational overhead.

R&D Projects (Quantum AI & Healthcare GenAI)

Quantum-VLM Adapter

Quantum-inspired adapter layer for compressing and accelerating vision-language models without sacrificing multimodal reasoning quality.

View Details
  • Combines LoRA with quantum-inspired linear projections to reduce parameter count and GPU memory consumption.
  • Integrates with TinyLlama-VLM as a base model for multimodal captioning and reasoning tasks.
  • Includes rank ablations, latency/throughput benchmarks, and quality metrics on captioning datasets.

Tech: TinyLlama-VLM, LoRA, PyTorch, Hugging Face Transformers

View on GitHub

Healthcare Generative AI – IEEE CCWC 2025

Research codebase backing my IEEE paper on generative AI for healthcare data systems.

View Details
  • Enhanced BioBERT + CRF pipeline for clinical NER over prescriptions and discharge summaries.
  • OCR-based extraction of prescription text and normalization of medical entities.
  • Integration with generative models to answer clinical questions and summarize patient-level information.

Tech: BioBERT, CRF, PyTorch, OCR (Tesseract), Flask, Transformers

View on GitHub

Competitions

RSNA 2024 – Lumbar Spine Degeneration Classification

Built a deep learning pipeline for classifying lumbar spine degeneration from MRI scans in the RSNA 2024 challenge.

View Details
  • Used CNN-based models with aggressive data augmentation to handle scanner and patient variability.
  • Implemented ensemble strategies and test-time augmentation to boost robustness and accuracy.
  • Achieved >92% validation accuracy while respecting inference latency and memory constraints.

Tech: PyTorch, OpenCV, NumPy, Docker, AWS S3

View on GitHub

Highlighted AI/ML Projects

TinyLlama-VLM LoRA

Multimodal TinyLlama-based VLM that injects CLIP vision tokens into the language model context via LoRA adapters.

View Details
  • Preprocessed Flickr30k-style caption datasets and aligned image-text pairs for multimodal training.
  • Used a frozen CLIP ViT encoder to generate vision tokens fed into TinyLlama’s context window.
  • Evaluated with BLEU, ROUGE, and perplexity metrics to measure captioning quality and fluency.

Tech: TinyLlama, CLIP, LoRA, PyTorch, Transformers

View on GitHub

AI Call Center – Whisper + TinyLLaMA

Real-time AI call center prototype combining streaming ASR with a lightweight LLM to handle customer interactions.

View Details
  • Used Faster-Whisper for streaming speech-to-text; passed transcripts to TinyLLaMA for intent classification and response generation.
  • Implemented conversational state tracking to handle multi-turn dialogues and context carry-over.
  • Built a simple web interface and WebSocket pipeline for near real-time interaction.

Tech: Faster-Whisper, TinyLLaMA, Flask, WebSockets, PyTorch

View on GitHub

TRASHPRED – Waste Classification

Smart waste classifier that distinguishes recyclable vs non-recyclable items using transformer-based image models.

View Details
  • Curated and labeled a custom waste image dataset with varied lighting and backgrounds.
  • Applied data augmentation and fine-tuning to reach >90% accuracy on test sets.
  • Exposed a FastAPI-based inference endpoint ready for integration into smart-bin systems.

Tech: PyTorch, Transformers, FastAPI, Docker

View on GitHub

Multi-Source Data Analytics Chatbot

A multimodal analytics assistant that ingests CSVs, Excel, PDFs, DOCX, JSON, images, and DICOM files to generate insights and visualizations.

View Details
  • Implemented automatic detection of file types and parsing into structured pandas DataFrames.
  • Generated histograms, bar charts, and pie charts for quick EDA from natural-language prompts.
  • Integrated a GPT-2-based text model for open-domain Q&A with configurable safety filters.

Tech: Python, Flask, GPT-2, pandas, matplotlib, DICOM processing

View on GitHub

Education & Publications

Master of Science in Computer Science

Lawrence Technological University – Southfield, MI

Graduated: Dec 2024 | GPA: 3.67 / 4.0

Relevant Coursework: Deep Learning, Natural Language Processing, Computer Vision, Advanced Algorithms, Data Mining.

Publication:
IEEE CCWC 2025 – “Empowering Healthcare Data Systems with an Innovative Chatbot Application Utilizing Python and Advanced Generative AI Models”

Get in Touch

Open to AI/ML engineering roles, VLM/LLM research collaborations, and quantum-inspired ML projects.