Sriram Rampelli – AI/ML Engineer Portfolio

About Me

I’m an AI/ML Engineer specializing in generative AI, large language models, and vision-language systems. I design and deploy scalable AI solutions that run efficiently even on constrained hardware, leveraging techniques like 4-bit quantization, LoRA/QLoRA, CLIP-based vision adapters, and optimized inference pipelines.

My work spans quantum-inspired adapters for VLMs, TinyLlama-based multimodal systems, and healthcare-focused generative AI—culminating in an IEEE CCWC 2025 publication. I enjoy sitting at the intersection of research and production: converting papers into practical, reliable systems that fit real-world constraints.

Previously at Wipro, I built ML-driven anomaly detection frameworks that reduced system errors by over 30%. At Lawrence Tech, I worked as a Research Assistant on LLaMA + BioBERT pipelines for clinical NLP, OCR-driven prescription analysis, and multimodal healthcare chatbots.

Core Skills & Technologies

Python, PyTorch, TensorFlow
Transformers, LLaMA, TinyLlama
LoRA, QLoRA, 4-bit Quantization
Vision-Language Models, CLIP, VLM Adapters
RAG Systems, LangChain, Vector DBs
BioBERT, Clinical NER, OCR Pipelines
Flask, FastAPI, REST APIs
AWS (EC2, S3, Lambda)
Docker, CI/CD (GitHub Actions)
SQL, PostgreSQL, Data Pipelines

Certifications

NVIDIA-Certified Associate: Generative AI & LLMs
Oracle Cloud Infrastructure 2024 Generative AI Certified Professional

Professional Experience

AI/ML Intern – BCG GenAI Job Simulation

Feb 2025 - Mar 2025 (Virtual Experience)

Built an AI-powered financial assistant that ingested SEC 10-K filings and automated extraction of key KPIs and risk indicators.
Designed NLP pipelines using spaCy and pandas to structure unstructured filings into analysis-ready tables.
Implemented sentiment and topic analysis across sections (MD&A, Risk Factors) to summarize company outlook.
Experimented with retrieval-augmented generation (RAG) to ground LLM responses directly in source documents.
Focused on explainability and reproducible outputs to align with consulting-grade deliverables.

Research Assistant – Lawrence Technological University

May 2024 – Dec 2024 | Southfield, MI

Developed a healthcare chatbot integrating fine-tuned LLaMA and Enhanced BioBERT for clinical question answering.
Built OCR and NLP pipelines to parse prescriptions and clinical documents, extracting entities like diagnoses, medications, and dosages.
Optimized model deployment on a local NVIDIA 3050 Ti using 4-bit quantization and LoRA adapters to fit resource limits.
Co-authored an IEEE CCWC 2025 paper on generative AI for healthcare data systems and presented findings at the conference.
Implemented hybrid retrieval (keyword + semantic search) to ground generative outputs in medical corpora and reduce hallucinations.

Project Engineer – Wipro Technologies

Nov 2021 – Dec 2022 | India

Designed ML-based anomaly detection for infrastructure and application logs, reducing critical incident volume by ~35%.
Built monitoring dashboards and alerting workflows that improved mean time to detection (MTTD) and resolution (MTTR).
Collaborated with cross-functional teams across 100+ Agile sprints to deliver stable and performant releases.
Automated portions of manual validation using Python scripts and data-driven rule engines to reduce operational overhead.

R&D Projects (Quantum AI & Healthcare GenAI)

Quantum-VLM Adapter

Quantum-inspired adapter layer for compressing and accelerating vision-language models without sacrificing multimodal reasoning quality.

View Details

Combines LoRA with quantum-inspired linear projections to reduce parameter count and GPU memory consumption.
Integrates with TinyLlama-VLM as a base model for multimodal captioning and reasoning tasks.
Includes rank ablations, latency/throughput benchmarks, and quality metrics on captioning datasets.

Tech: TinyLlama-VLM, LoRA, PyTorch, Hugging Face Transformers

View on GitHub

Healthcare Generative AI – IEEE CCWC 2025

Research codebase backing my IEEE paper on generative AI for healthcare data systems.

View Details

Enhanced BioBERT + CRF pipeline for clinical NER over prescriptions and discharge summaries.
OCR-based extraction of prescription text and normalization of medical entities.
Integration with generative models to answer clinical questions and summarize patient-level information.

Tech: BioBERT, CRF, PyTorch, OCR (Tesseract), Flask, Transformers

View on GitHub

Competitions

RSNA 2024 – Lumbar Spine Degeneration Classification

Built a deep learning pipeline for classifying lumbar spine degeneration from MRI scans in the RSNA 2024 challenge.

View Details

Used CNN-based models with aggressive data augmentation to handle scanner and patient variability.
Implemented ensemble strategies and test-time augmentation to boost robustness and accuracy.
Achieved >92% validation accuracy while respecting inference latency and memory constraints.

Tech: PyTorch, OpenCV, NumPy, Docker, AWS S3

View on GitHub

Highlighted AI/ML Projects

TinyLlama-VLM LoRA

Multimodal TinyLlama-based VLM that injects CLIP vision tokens into the language model context via LoRA adapters.

View Details

Preprocessed Flickr30k-style caption datasets and aligned image-text pairs for multimodal training.
Used a frozen CLIP ViT encoder to generate vision tokens fed into TinyLlama’s context window.
Evaluated with BLEU, ROUGE, and perplexity metrics to measure captioning quality and fluency.

Tech: TinyLlama, CLIP, LoRA, PyTorch, Transformers

View on GitHub

AI Call Center – Whisper + TinyLLaMA

Real-time AI call center prototype combining streaming ASR with a lightweight LLM to handle customer interactions.

View Details

Used Faster-Whisper for streaming speech-to-text; passed transcripts to TinyLLaMA for intent classification and response generation.
Implemented conversational state tracking to handle multi-turn dialogues and context carry-over.
Built a simple web interface and WebSocket pipeline for near real-time interaction.

Tech: Faster-Whisper, TinyLLaMA, Flask, WebSockets, PyTorch

View on GitHub

TRASHPRED – Waste Classification

Smart waste classifier that distinguishes recyclable vs non-recyclable items using transformer-based image models.

View Details

Curated and labeled a custom waste image dataset with varied lighting and backgrounds.
Applied data augmentation and fine-tuning to reach >90% accuracy on test sets.
Exposed a FastAPI-based inference endpoint ready for integration into smart-bin systems.

Tech: PyTorch, Transformers, FastAPI, Docker

View on GitHub

Multi-Source Data Analytics Chatbot

A multimodal analytics assistant that ingests CSVs, Excel, PDFs, DOCX, JSON, images, and DICOM files to generate insights and visualizations.

View Details

Implemented automatic detection of file types and parsing into structured pandas DataFrames.
Generated histograms, bar charts, and pie charts for quick EDA from natural-language prompts.
Integrated a GPT-2-based text model for open-domain Q&A with configurable safety filters.

Tech: Python, Flask, GPT-2, pandas, matplotlib, DICOM processing

View on GitHub

Education & Publications

Master of Science in Computer Science

Lawrence Technological University – Southfield, MI

Graduated: Dec 2024 | GPA: 3.67 / 4.0

Relevant Coursework: Deep Learning, Natural Language Processing, Computer Vision, Advanced Algorithms, Data Mining.

Publication:
IEEE CCWC 2025 – “Empowering Healthcare Data Systems with an Innovative Chatbot Application Utilizing Python and Advanced Generative AI Models”

Get in Touch

Open to AI/ML engineering roles, VLM/LLM research collaborations, and quantum-inspired ML projects.