Closed-Loop VLM Content Automation at CVS Health
Production workflow that generated image captions with a Hugging Face vision-language model, evaluated candidates with an LLM judge, and wrote back only improved outputs, raising catalog coverage by 18%.
What this project proves
Applied ML workflow automation
Closed-loop image-caption pipeline using a Hugging Face VLM and LLM judge, raising catalog coverage by 18%.
Core challenge
Improve image metadata coverage without blindly replacing existing captions.
Evaluation lens
Vision-language generation, LLM-as-judge scoring, and controlled production update workflows.
A production ML workflow with measured catalog coverage lift.
Overview
At CVS Health, I built a closed-loop content automation pipeline for image metadata quality. The system generated candidate captions with a Hugging Face vision-language model, evaluated them with an LLM judge prompted for CVS brand framing, and wrote back only captions that beat the existing text.
The engineering value was the controlled update loop: model output was not blindly accepted. The workflow improved catalog coverage by 18% while preserving a review boundary around brand fit, caption quality, and production safety.
What I Owned
- Built the transformer-based caption generation pipeline.
- Designed the LLM-judge step for comparing generated captions against existing alt text.
- Framed evaluation around CVS brand fit and production content quality rather than generic image captioning.
- Connected the workflow to production-grade content automation.
Hard Problems Solved
- Avoid blind replacement: the system only wrote back captions that improved on the existing alt text.
- Balance quality and brand voice: captions needed to be useful, controlled, and consistent with CVS framing.
- Turn model output into workflow output: the value came from closing the loop between generation, evaluation, and controlled update behavior.
Impact
- Raised catalog coverage by 18%.
- Created a repeatable applied-ML workflow for content quality.
- Demonstrated a practical pattern for using VLM generation with an LLM quality gate.
Why It Matters
This project shows applied ML in a real enterprise workflow: model generation, policy-aware evaluation, and production update behavior all had to work together.
Tech Stack
- AI/ML: Hugging Face vision-language models, LLM-as-judge evaluation
- Workflow: Controlled caption updates, quality gates, production integration
- Backend: Python
- Delivery: Enterprise production pipeline integration