Networked and Distributed Computing Systems Laboratory (NDSL) is working on a number of research projects that improve the practice of design and implementation of networked computing systems.
The importance of a technological breakthrough for such systems grows with the recent advent of low-latency services (e.g., AR/VR) or high-computing applications (e.g., distributed deep learning). We deal with various problems that occur when the applications operate in data centers, cloud environments or mobile networks, and aim to propose novel systems that are unconventional, while taking advantage of new hardware devices.
AccelNIC: Accelerating Network Application via SmartNIC
In the face of exploding network traffic, such as in data centers, existing network stacks are already showing their performance limitations. To address this problem, this research project focuses on offloading repetitive operations in the transport layer (TCP/IP) and in data encryption (SSL) to a programmable network card that can flexibly provide new features.
CoDDL: GPU Cluster Management System for Distributed DL
Distributed deep learning (DL), which utilizes multiple GPUs, is widely used to reduce learning time, whereas a single GPU can require several months. Thus, efficient GPU resource allocation and management for multiple tenants is a critical issue in deep learning clusters with many GPUs. We are working on an efficient resource allocator and a scheduler system with consideration of deep learning model characteristics.
Knowledge Caching: Low Latency Deep Learning Services with Deep Model Cache
Deep learning applications offer services that are closely related to our lives in the form of smart speakers. These applications are typically deployed in the cloud servers, but this approach has potential problems such as server computation limits and user information privacy. We propose and implement a system with a new approach, called a deep model cache, which handles user requests that often involve repetitive or sensitive personal information in the local environment.