research
directions
memory

Agentic Memory

Continual learning of AI agents — in-context learning, continual fine-tuning, and unlearning.

world model

World Model

In-context world models, adaptation to post-training task worlds, and adapting agents in evolving envs.

latency

Low-Latency AI

Efficient attention architectures, KV-cache compression, latent segmentation, and recurrent transformers.

safety

AI Safety

Synthetic data training, risks of multi-agent interaction, post-training guardrails, and AI behavioral study.

recent papers
  1. 012026memoryin submission · NeurIPS 2026

    SHRED: Document Unlearning via Self-Distillation and Entropy Demotion

    A document-level unlearning method that combines self-distillation on retain data with entropy demotion on the forget set. Removes targeted knowledge from LLMs without catastrophic damage to unrelated capabilities.

    view
  2. 022026safetyin submission · EMNLP 2026

    Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM

    Persona effectiveness is task-type dependent: expert prompts consistently improve alignment-dependent tasks (safety, preference) but reliably damage pretraining-dependent knowledge retrieval. PRISM teaches models when to invoke a persona via intent-based self-modeling, preserving accuracy while keeping alignment gains.

    view
  3. 032026latencyin progress

    AttendTwice: Long-Context Inference via Dynamic Token-Level KV-Cache Selection

    A two-pass attention scheme that dynamically selects which KV-cache tokens to attend to per query, enabling long-context inference at a fraction of the standard memory footprint.

  4. 042025safetyACM ICMI 2025

    Multimodal Synthetic Data Finetuning and Model Collapse

    Studies how vision-language models degrade when fine-tuned on AI-generated multimodal data. Characterizes the collapse dynamics specific to the multimodal regime and proposes mitigation strategies that preserve diversity across modalities.

    view
  5. 052024latencypreprint

    Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion

    A brain-inspired MLP architecture with hemispheric lateralization applied to diffusion models. Shows competitive sample quality at reduced parameter count, suggesting structured asymmetry as an inductive bias for generative modeling.

    view
  6. 062024latencypreprint

    Static Key Attention in Vision

    A more efficient attention variant for vision transformers that pre-computes a static key projection, reducing per-token compute while maintaining downstream task performance.

    view
full list on Google Scholar
academic service
  • Reviewer · NeurIPS 2024–2026
  • Reviewer · ICLR 2024–2025
  • Reviewer · ICML 2024–2025
  • TA · DSCI 552 (USC)