About Me | Kaijian Wang

COMP 600 Assignment, Self-introduction

Hi! I’m Kaijian Wang, a Ph.D. student in Computer Science at Rice University, where I’m fortunate to be advised by Prof. Yuke Wang. My research lies at the intersection of efficient large-scale machine learning systems, distributed systems, and heterogeneous hardware. Broadly, I’m interested in understanding how modern AI workloads interact with the underlying system stack—and in designing principled, high-performance solutions that push the boundaries of what these models can do.

Before coming to Rice, I completed my B.E. in Information Security at the University of Science and Technology of China (USTC), where I also spent several years competing with the USTC-NEBULA CTF team. Those experiences in cybersecurity shaped the way I think about systems: attention to detail, performance awareness, and a mindset for uncovering hidden bottlenecks.

My recent work focuses on building efficient, scalable inference and training systems for large models:

At UC San Diego, I worked on a tool-augmented LLM serving system, combining speculative decoding with external tool engines. I developed the core system framework over vLLM, optimized memory scheduling, and improved kernel efficiency—achieving 30–40% speedups over conventional large-model-only pipelines.

I also helped design an efficient DLRM training system on CXL-based disaggregated memory, proposing access-aware embedding placement strategies that significantly reduce step latency for large-scale recommendation models.

More recently at Rice, I’ve been working on Diffusion Transformers on TPU, focusing on parallelism strategies, workload partitioning, and hardware-aware scheduling for distributed inference.

Across these projects, a common theme defines my research: leveraging system and hardware insights to make frontier AI workloads faster, cheaper, and more scalable. I particularly enjoy challenges that require navigating between algorithmic properties, GPU/TPU memory hierarchies, distributed execution, and end-to-end performance profiling.

Outside research, I enjoy exploring competitive programming and security puzzles, learning about computer architecture, and building small tools to better understand complex systems. Recently, I’ve been especially excited about emerging model-system co-design directions in LLMs, diffusion models, and ML training architectures.

Enjoy Reading This Article?