Cloud NLP Classification on GCP
Production-ready multi-model text classification service with zero-downtime model switching, deployed on GCP Compute Engine. DistilBERT trained to 96.57% accuracy on a 24,783-sample dataset; 326+ test suite at 100% pass rate.
What this project proves
Production ML service engineering
Multi-model GCP service with zero-downtime switching, 96.57% DistilBERT accuracy, and 326+ passing tests.
Core challenge
Serve text classification reliably while preserving model-switching flexibility and test confidence.
Evaluation lens
Service design, deployment behavior, and operational safety for ML inference.
A production-ready NLP service with strong accuracy, uptime posture, and test depth.
Overview
A production-grade text-classification service built to compare accuracy, latency, and operating cost across multiple model backends under one deployable API. I designed the service so teams could switch between DistilBERT, TF-IDF + Logistic Regression, and TF-IDF + SVM without downtime, then deployed the system on GCP Compute Engine with Dockerized packaging and a large automated test suite.
This project is strongest as a proof of production ML service engineering: model quality was only one requirement. The harder part was giving the system operational flexibility — preserving one interface while making model swaps safe, testable, and cheap to run.
What I Owned
- Designed the multi-model API surface and switching behavior
- Built the inference service and containerized deployment path
- Trained and benchmarked the model variants against the same dataset
- Added automated validation to catch regressions before rollout
Hard Problems Solved
- Zero-downtime model switching: all model variants sit behind the same API contract, so the service can change inference backends without breaking consumers
- Serving for tradeoff, not just accuracy: DistilBERT leads on quality, but TF-IDF models dramatically reduce latency and compute cost; the service was built so those tradeoffs are explicit and operable
- Validation discipline: model behavior, service behavior, and integration behavior are all covered by automated testing rather than relying on ad hoc manual checks
Model Benchmarks
| Model | Accuracy | Latency | Cost Factor |
|---|---|---|---|
| DistilBERT | 96.57% | 60–100ms | Baseline |
| LogReg (TF-IDF) | 85–88% | 5ms (21× faster) | — |
| SVM (TF-IDF) | 85–88% | 2ms (44× faster) | — |
Trained and benchmarked on a 24,783-sample dataset with comprehensive end-to-end validation.
Why It Matters
This project shows I can build ML systems that are usable in production, not just accurate in notebooks. It balances model performance, latency, cost, and service stability, which is the real engineering problem in deployed inference systems.
Key Features
- Zero-Downtime Switching: Hot-swap between models without service interruption
- 326+ Test Suite: Automated E2E validation with 100% pass rate
- Cloud Deployment: Live on GCP at
$0.07/hr ($50/mo) - Multi-Model Architecture: Pluggable model backends behind a unified API
Tech Stack
- ML: Hugging Face Transformers, PyTorch, scikit-learn, TF-IDF
- API: FastAPI, Uvicorn, Pydantic
- Infrastructure: Docker, GCP Compute Engine (e2-standard-2)
- Testing: pytest, 326+ automated tests