Primary Focus

AI Memorization

LLMs memorize through three mechanisms: in-parameter (knowledge baked into weights), in-context (information held in the KV cache during inference), and external retrieval (augmenting generation with retrieved documents or tools). My research studies how these mechanisms interact — and how to make models that remember what matters, forget what they should, and reason efficiently within real-world memory budgets.

Key Research Topics

In-Parameter Memory

Knowledge stored directly in model weights during pretraining and fine-tuning. Research on how facts, skills, and biases are encoded across layers, how weight updates create or destroy memories, and the interplay between parameter count and memorization capacity.

In-Context Memory (KV Cache)

How models leverage the key-value cache to hold and reason over information within a single context window. Research on KV-cache compression, eviction policies, attention sink heads, sparse retrieval from long contexts, and memory-efficient serving.

External Retrieval Mechanisms

Augmenting LLMs with retrieval-augmented generation (RAG), tool use, and episodic memory stores. Research on when to retrieve vs. recall from parameters, retrieval quality's impact on generation, and hybrid architectures that blend parametric and non-parametric memory.

Forgetting & Unlearning

Controlled removal of memorized information — from mitigating catastrophic forgetting in continual learning, to targeted unlearning of private or copyrighted data. Research on self-distillation, gradient-based erasure, and benchmarking what models truly forget.

Inference Optimization

Making LLM serving faster and cheaper through efficient attention, KV-cache compression, sparse and low-rank approximations, speculative decoding, and quantization — all grounded in understanding which memories the model actually needs at inference time.

Reasoning Under Memory Constraints

How memory limitations shape reasoning quality. Research on chain-of-thought as working memory, the relationship between context length and reasoning depth, and how models degrade gracefully (or don't) when memory is constrained.

Continual Learning

Enabling deployed models to absorb new knowledge over time without catastrophic forgetting. Research on replay-based, regularization-based, and architecture-based strategies for lifelong learning in LLMs — connecting memorization theory to practical model updates.

Related Publications

Featured

Multimodal Synthetic Data Finetuning and Model Collapse

Zizhao Hu, et al.

2025ACM International Conference on Multimodal Interaction (ICMI)
View Paper