ML / AI 2024-06

Closed-Loop VLM Content Automation at CVS Health

Production workflow that generated image captions with a Hugging Face vision-language model, evaluated candidates with an LLM judge, and wrote back only improved outputs, raising catalog coverage by 18%.

Coverage Lift 18%

Hugging FaceComputer VisionVLMLLM-as-JudgePythonCI/CD

What this project proves

Applied ML workflow automation

Closed-loop image-caption pipeline using a Hugging Face VLM and LLM judge, raising catalog coverage by 18%.

Core challenge

Improve image metadata coverage without blindly replacing existing captions.

Evaluation lens

Vision-language generation, LLM-as-judge scoring, and controlled production update workflows.

A production ML workflow with measured catalog coverage lift.

Overview

At CVS Health, I built a closed-loop content automation pipeline for image metadata quality. The system generated candidate captions with a Hugging Face vision-language model, evaluated them with an LLM judge prompted for CVS brand framing, and wrote back only captions that beat the existing text.

The engineering value was the controlled update loop: model output was not blindly accepted. The workflow improved catalog coverage by 18% while preserving a review boundary around brand fit, caption quality, and production safety.

What I Owned

Built the transformer-based caption generation pipeline.
Designed the LLM-judge step for comparing generated captions against existing alt text.
Framed evaluation around CVS brand fit and production content quality rather than generic image captioning.
Connected the workflow to production-grade content automation.

Hard Problems Solved

Avoid blind replacement: the system only wrote back captions that improved on the existing alt text.
Balance quality and brand voice: captions needed to be useful, controlled, and consistent with CVS framing.
Turn model output into workflow output: the value came from closing the loop between generation, evaluation, and controlled update behavior.

Impact

Raised catalog coverage by 18%.
Created a repeatable applied-ML workflow for content quality.
Demonstrated a practical pattern for using VLM generation with an LLM quality gate.

Why It Matters

This project shows applied ML in a real enterprise workflow: model generation, policy-aware evaluation, and production update behavior all had to work together.

Tech Stack

AI/ML: Hugging Face vision-language models, LLM-as-judge evaluation
Workflow: Controlled caption updates, quality gates, production integration
Backend: Python
Delivery: Enterprise production pipeline integration

Related Projects

ML / AI

RAG Support Assistant at CVS Health

Internal AI platform builder

ML / AI

SliceWise — MRI Brain Tumor Detection & Segmentation

End-to-end model design, evaluation, and automation