Research Direction

Architecture

Building the architectural foundations for scalable, memory-efficient AI systems. My work focuses on transformer memory mechanisms for long-range reasoning, efficient architectures that maximize capability per FLOP, multimodal designs for unified perception-language-action, and scalable architectures that grow from research to production.

Key Research Topics

Transformer Memory Mechanisms

How transformers store, retrieve, and reason over information. Research on KV-cache architectures, recurrent memory layers, state-space models (Mamba/S4), memory-augmented attention, and hybrid designs that give transformers explicit long-term memory without quadratic cost.

Efficient Architecture

Reducing compute and memory costs without sacrificing capability. Static key attention, sparse attention patterns, linear attention variants, weight sharing, knowledge distillation, and quantization-aware architecture design for deployment on constrained hardware.

Multimodal Architecture

Unified backbones that natively process vision, language, audio, and action in a single model. Research on early vs. late fusion strategies, cross-modal attention, modality-specific tokenization, and architectures that scale gracefully across input types.

Scalable Architecture

Designs that scale from small research models to production systems. Mixture of Experts (MoE) for conditional computation, expert routing and load balancing, pipeline and tensor parallelism-friendly architectures, and brain-inspired lateralization for asymmetric processing.

Related Publications

Featured

Static Key Attention in Vision

Zizhao Hu, et al.

2024Preprint
View Paper

Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion

Zizhao Hu, et al.

2024Preprint
View Paper