sAI Zhang

Assistant Professor

Tandon School of Engineering and Courant Institute

New York University

About Me

I am an Assistant Professor of Electrical Engineering and Computer Science at New York University. Previously I worked as a senior research scientist at Meta Reality Labs. I received my PhD degree of Computer Science at the Harvard University, and my bachelor's degree and master's degree in Electrical Engineering and Statistics from University of Toronto.

My chinese name is 张赛骞.

Research Overview

I am a researcher whose work lies on the boundary between deep learning, AR/VR and hardware system design. I am passionate about designing optimized AI algorithms and efficient hardware implementations for AR/VR computing.

Hardware Architecture:

Domain-specific accelerator for compute-intensive AI applications, new computation paradigm for DNN
Recent interest: AI accelerator for efficient AR/VR

Application & Algorithm:

Efficient DNN computing, pruning, quantization, NAS
Recent interest: Efficient LM KV caching, LM quantization, AI Privacy for AR/VR

Others:

Multi-agent reinforcement learning and its application
AI compiler

News

[6/2025] One paper got accepted in TMLR' 25!

[6/2025] I presented our POLO paper at ISCA' 25!

[5/2025] One paper with my collaborator got accepted in ICIP' 25!

[5/2025] One paper with my collaborator and my students got accepted in ACL' 25!

[4/2025] My PhD student, Tianhua Xia, has been awarded 2025 DAC Young Fellow!

[4/2025] My master student, Wenxuan Liu, has been awarded 2025 ECE Theodor Tamir Award for the Best MS Research!

[4/2025] My master student, Zhenyuan Dong, has been awarded 2025 ECE Myron M. Rosenthal Award for Best MS Academic Achievement!

[4/2025] Our paper weith my intern and PhD students on incremental gaze-tracked foveated rendering got accepted at ICS' 25!

[3/2025] Our paper on efficient gaze-tracked foveated rendering for virtual reality got accepted at ISCA' 25!

[3/2025] I delivered a seminar talk on ARVR computing at Stevens Institute of Technology!

[3/2025] I presented our paper at IEEE VR'25!

[3/2025] Check our our latest survey on speculative decoding! This survey covers almost all the current research literature on speculative decoding strategies, ranging from advanced algorithms, system implementation and its applications on other domains.

[2/2025] Our paper on efficient segmentation in augmented reality got accepted at CVPR'25!

[2/2025] One paper with my intern student and PhD student got accepted at ASPLOS'25!

[2/2025] I gave a talk at EI 2025 on hardware and software codesign for AR/VR computing.

[1/2025] One paper with my master student got accepted at IEEE VR'25 as a journal paper!

[12/2024] I gave a talk to Boston Fusion Corp.

[12/2024] Serving as TPC for ISLPED'25, please consider to submit your work there.

[11/2024] One paper with my intern students got accepted at DATE'25!

[11/2024] I will co-organize a workshop on AR/VR Computing at ASPLOS'25!

[10/2024] One paper got accepted at WACV'25!

[10/2024] Our survey on Parameter efficient finetuning is accepted by Transactions on Machine Learning Research (TMLR)!

[10/2024] I am serving as a PC of DAC'25 and ISCA'25, please consider to submit your great work there!

[10/2024] One paper got accepted at IEDM'24!

[9/2024] One paper got accepted at EMNLP'24!

[9/2024] I received a gift fund from Meta. Thank you Meta!

[9/2024] One paper got accepted at ASPDAC'25!

[8/2024] Serving as the session chair of ISLPED'24.

[7/2024] One paper got accepted at MLCAD'24!

[7/2024] I am serving as a PC of HPCA'25.

[6/2024] I delivered a talk on Efficient LLM and Accelerator Design to Andes Technology.

[6/2024] One paper get accepted at ICPP'24!

[5/2024] Our paper on AR/VR system simulation got accepted at ACM TODAES!

[5/2024] Two papers get accepted at ISLPED'24!

[4/2024] Check our our latest survey on Parameter Efficient Fine-tuning (PEFT) for Large Models. This work is done with my talented intern students. From algorithm design to hardware efficiency and system implementation, this comprehensive survey covers multiple aspects of PEFT research nowadays.

[3/2024] One paper with my intern student wenshuo, got accepted in NAACL'24!

[2/2024] Two papers accepted in ISQED'24.

[11/2023] Serving as TPC for DAC'24 and ISQED'24.

[10/2023] Our paper "Co-Designing AI Models and DRAMs for On-Device Training" is accepted by HPCA 2024!

[9/2023] Our paper on efficient reinforcement learning, which I co-authored with my high school mentee, Gavin An, has been accepted for publication in JEI.

[6/2023] I gave two talks on DNN hardware and algorithm codesign at Tsinghua University and Peking University.

[5/2023] Our paper "Co-Designing AI Models and DRAMs for On-Device Training" is submitted to Arxiv. This paper proposes an algorithm/hardware codesign solution for efficient on-chip transfer learning which completely eliminates the off-chip DRAM traffic during the training process.

[3/2023] My high school mentee, Gavin An, has successfully finished his AI project on efficient reinforcement learning. A paper got accepted in JEI!

[9/2022] Start working at Meta!

[7/2022] Our paper “Hyperspherical Federated Learning" is accepted by ECCV 2022!

[7/2022] I gave an invited talk at AI times on multi-agent reinformcent learning and its applications.

[6/2022] I gave an invited talk at IEEE Dallas Circuits and Systems Conference (DCAS), 2022.

[4/2022] I started my AI memtorship at Veritas AI!

[2/2022] I started my postdoc study at Harvard!

[12/2021] I successfully defended my PhD!

[11/2021] Our paper “Learning Advanced Client Selection Strategy for Federated Learning" is accepted by AAAI 2022!

[10/2021] Our paper “FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding" is accepted by IEEE HPCA 2022!

[06/2021] Finished my internship at Microsoft, such a great place for research!

[03/2021] I started my virtual internship at Microsoft Research, Redmond.

[03/2021] I gave a guest lecture on DNN accelerator design on Havard Course ES201, hosted by Prof. Demba Ba.

[01/2021] One paper got accepted by IEEE International Symposium on Circuits & Systems (ISCAS), 2021.

[12/2020] I presented (virtually) our work "Succinct and Robust Multi-Agent Communication With Temporal Message Control" in NeurIPS 2020.

[11/2020] I presented (virtually) our work "Term quantization: furthering quantization at run time" in SC 2020.

[11/2020] Our paper “Training for Multi-resolution Inference Using Reusable Quantization Terms" is accepted by ACM ASPLOS 2021!

[09/2020] Our paper "Succinct and Robust Multi-Agent Communication With Temporal Message Control" is accepted by NeurIPS 2020!

[08/2020] I presented our work "Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN" in ICPP 2020.

[06/2020] Our paper "Term Revealing: Furthering Quantization at Run Time on Quantized DNNs" is accepted by ACM/IEEE SC 2020!

[05/2020] One paper "Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN" is accepted by ACM ICPP 2020!

[02/2020] One paper is accepted by IEEE Symposium on Security and Privacy (S&P) Deep Learning and Security workshop, 2020.