Research 2025-04

PICO-LLM Research Pipeline

Modular LLM research pipeline (NYU CSCI-GA 2565) for training and evaluating K-Gram MLP, LSTM, and KV-cache Transformer architectures with 22+ experiment configs and rigorous cross-run analysis.

73.21% Token Accuracy

GitHub

PyTorchLLMTransformersLSTMKV-CachewandbResearch

Overview

A modular research pipeline for training and evaluating small-scale language model architectures, built for NYU’s CSCI-GA 2565 Machine Learning course. The pipeline supports K-Gram MLP, LSTM, and KV-cache Transformer models with systematic cross-run analysis across 22+ experiment configurations.

Best Results (KV-Cache Transformer)

Metric	Score
Validation Loss	1.665
Perplexity	6.389
Token Accuracy	73.21%

See full result interpretation in the README →

Key Features

Multi-Architecture Support: K-Gram MLP, LSTM, and KV-cache Transformer training loops with shared evaluation harness.
22+ Experiment Configs: Systematic hyperparameter sweeps and architecture comparisons with deterministic config tracking and multi-model batch training.
Rich Logging: 20+ tracked fields per run (loss, val_loss, perplexity, token_accuracy, gradient norms, learning rate, hyperparameters).
Advanced Evaluation: Pareto frontier analysis, embedding similarity metrics, regression insights, and cross-run statistical analysis.
Configurable Sampling: Greedy, top-p, and repetition-penalty decoding; monosemantic token probing; configurable generation pipelines.
Reproducible: Deterministic seeding and config-driven experimentation throughout.

Tech Stack

ML: PyTorch, custom Transformer implementation with KV-cache
Experiment Tracking: Weights & Biases (wandb)
Analysis: NumPy, Pandas, Matplotlib

Related Projects

Research

Solaris — Multiplayer Video World Model in Minecraft

First multiplayer video world model generating consistent first-person observations for two players simultaneously, trained on 12.6M frames of coordinated Minecraft gameplay. Published on arXiv, NYU.