Networked and Distributed Computing Systems Laboratory (NDSL) is working on a number of research projects that improve the practice of design and implementation of networked computing systems.
The importance of a technological breakthrough grows with the recent advent of low-latency services or high-computing applications such as AR/VR and distributed deep learning. We deal with various problems that occur when the applications operate in data centers, cloud environments or mobile networks, and we propose new systems that are unconventional, while taking advantage of new hardware such as GPUs or SmartNICs.
Accelerating Network Application via SmartNIC
In the face of exploding network traffic, such as in data centers, existing network stacks are already showing their performance limitations. To address this problem, we exploit programmable NICs which has flexible programmability and focus on offloading repetitive operations in network application such as Layer-7 proxy and in disk I/O to a programmable network card (SmartNIC) that can flexibly provide new features.
High-performance GPU-based Systems for Accelerating AI Applications
Deep learning (DL), as the key of modern artificial intelligence applications, requires a high-cost system that can process large amounts of computation in a short time. In our lab, we are conducting various research that efficiently accelerate either DL training or inference tasks by using many GPUs at the same time. For example, we develop a GPU resource management system that efficiently schedules GPU resources in a GPU cluster where multiple training tasks are executed simultaneously. We also develop technologies that accelerate communication between GPUs by allowing them to handle communication events autonomously instead of relying on CPU.