I am a research scientist/director in the Data Analytics and Intelligence Lab (DAIL) at Alibaba Group. Prior to joining Alibaba, I was a researcher in DMX group at Microsoft Research. I completed my Ph.D. in Computer Science at University of Illinois at Urbana-Champaign under the supervision of Prof. Jiawei Han, my M.Phil. at The Chinese University of Hong Kong, advised by Jeffery Xu Yu, and my B.S. at Renmin University of China, advised by Shan Wang and Qing Zhu.
We are hiring research scientists, engineers, and research interns for our lab! Please drop me a line if you are interested.
Research and Projects
My research focuses on data privacy (including definitions, algorithms, and systems), privacy-preserving data management/analytics/learning (e.g., federated learning), and making systems intelligent and efficient with machine learning and optimization techniques.
More recently, I enjoy developing algorithms and building systems in the following areas/projects:
DPaaS (Data Privacy as a Service): Developing a series of privacy-preserving data collection, analysis and learning techniques, for example, multi-dimensional / multi-source data sharing and OLAP under local differential privacy and MPC, and federated learning with formal privacy guarantees for vertically collaborative learning and device-server collaborative learning. Building a system (e.g., this one) where these techniques can be easily deployed and extended for different scenarios, with high usability for developers and end users, and high flexibilty for different data pipelines.
[News] We have open-sourced an easy-to-use federated learning package, FederatedScope, which provides comprehensive functionalities including privacy protection, personalization, auto-tuning of federated machine learning models, as well as a programming framework with which one can conveniently develop and deploy her/his own federated models in various settings (e.g., vertically, horizontally, and device-server collaborative learning).
Sys4AI: In order to enable developers and data scientists with limited machine learning expertise and resources to train high-quality models for their specific business needs, we develop a series of automated machine learning techniques (AutoML) which automate the tuning of hyperparameters, feature selection, and network structure in machine learning models. Some of these techniques have been deployed into Alibaba’s cloud AutoML products.
AI4Sys: Is it possible for models to learn to be a statistician, a database administrator, an index, a query processor, or an optimizer? We develop a series of "learnig-to-be" techniques for different database components, as well as a system framework to deploy these learned components into real databases.