KAIST EE Computing Lunch

Computing Lunch Schedule (2019 Spring)

Meeting time: 12:00-12:50 pm

Subscription to computing-lunch mailing list

List of semesters

2021 Spring | 2020 | 2019 | 2018 Fall | 2018 Spring

04/26(Fri)

AccelTCP: Accelerating Network Applications with Stateful TCP Offloading

YoungGyoun Moon

N1-112

Abstract

The performance of modern key-value servers or layer-7 load balancers often heavily depends on the efficiency of the underlying TCP stack. Despite numerous optimizations such as kernel-bypassing and zero-copying, performance improvement with a TCP stack is fundamentally limited due to the protocol conformance overhead for compatible TCP operations. Unfortunately, the protocol conformance overhead amounts to as large as 60% of the entire CPU cycles for short-lived connections or degrades the performance of L7 proxying by 3.2x to 6.3x. This work presents AccelTCP, a hardware-assisted TCP stack architecture that harnesses programmable network interface cards (NICs) as a TCP protocol accelerator. AccelTCP can offload complex TCP operations such as connection setup and teardown completely to NIC, which simplifies the host stack operations and frees a significant amount of CPU cycles for application processing. In addition, it supports running connection splicing on NIC so that the NIC relays all packets of the spliced connections with zero DMA overhead. Our evaluation shows that AccelTCP enables short-lived connections to perform comparably to persistent connections. It also improves the performance of Redis, a popular in-memory key-value store, and HAProxy, a widely-used layer-7 load balancer, by 2.3x and 11.9x, respectively.

05/03(Fri)

Learning to Schedule Communication in Multi-agent Reinforcement Learning

Deawoo Kim

N1-112

Abstract

Many real-world reinforcement learning tasks require multiple agents to make sequential decisions under the agents' interaction, where well-coordinated actions among the agents are crucial to achieve the target goal better at these tasks. One way to accelerate the coordination effect is to enable multiple agents to communicate with each other in a distributed manner and behave as a group. In this research, we study a practical scenario when (i) the communication bandwidth is limited and (ii) the agents share the communication medium so that only a restricted number of agents are able to simultaneously use the medium, as in the state-of-the-art wireless networking standards. This calls for a certain form of communication scheduling. In that regard, we propose a multi-agent deep reinforcement learning framework, called SchedNet, in which agents learn how to schedule themselves, how to encode the messages, and how to select actions based on received messages. SchedNet is capable of deciding which agents should be entitled to broadcasting their (encoded) messages, by learning the importance of each agent's partially observed information.

05/24(Fri)

Data Synthesis for Privacy and Augmentation with Generative Adversarial Networks

Noseong Park

N1-112

Abstract

Data synthesis can be used for various purposes. In this talk, I will present data synthesis techniques, in particular generative adversarial networks (GANs), and their applications. First, I will talk about generative adversarial networks with multiple generators to synthesize multi-modal data, which had been published in IJCAI2018. Second, data synthesis can be used to relieve privacy concerns while releasing and sharing data. I will talk about table synthesis published in VLDB2018. Third, in cybersecurity the lack of training samples quite frequently happens. I will introduce a text generative model for oversampling phishing URL samples, published in IEEE BigData2018. Fourth, no one discuss about merging database and machine learning theory for tabular data augmentation. I will present my recent work to synthesize fake tables considering functional dependencies. This work was recently accepted in IJCAI2019.

05/31(Fri)

LpGL: Low-power Graphics Library for Mobile AR Headsets

Jaewon Choi

N1-112

Abstract

We present LpGL, an OpenGL API compatible Low-power Graphics Library for energy efficient AR headset applications. We first characterize the power consumption patterns for a state of the art AR headset, Magic Leap One, and empirically show that its internal GPU is the most impactful and controllable energy consumer. Based on the preliminary studies, we design LpGL so that it uses the device's gaze/head orientation information and geometry data to infer user perception information, intercepts application-level graphics API calls, and employs frame rate control, mesh simplification, and culling techniques to enhance energy efficiency of AR headsets without detriment of user experience. Results from a comprehensive set of controlled in-lab experiments and an IRB-approved user study with 25 participants show that LpGL reduces up to ~22% of total energy usage while adding only 46 usec of latency per object with close to no loss in subjective user experience.

09/27 (Fri)

Neural Network Acceleration on Mobile Devices

Dongyoung Kim

N1-113

Abstract

Deep neural networks (DNNs) are ubiquitous in various applications such as computer vision, natural language processing, and speech recognition. Especially, DNN shows reliable result on real-time mobile applications. However, it is challenging to implement neural networks that meet the real-time constraint on mobile devices with restricted hardware resources. To tackle this problem, software/hardware co-optimization for neural networks has been widely studied recently. This seminar introduces researches for accelerating neural network on mobile devices. In the first half, we will introduce Neural Processing Unit (NPU) which accelerates neural network utilizing sparsity and/or reduced precision. Then, in the next half, we will introduce on-device machine learning technologies such as accelerating neural network on mobile CPU and designing optimized neural network for mobile devices.

Material: pdf

Bio

Dongyoung Kim received the PhD degree in the Dept. of computer science and engineering from Seoul National University in 2019. He has been performed researches for developing neural network optimization algorithms and implementing them by designing a novel hardware architecture, so called Neural Processing Unit (NPU). He is currently a researcher at Hyperconnect and working on research about software/hardware co-optimization for neural networks.

10/04 (Fri)

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

Youngeun Kwon

N1-113

Abstract

Recent studies from several hyperscalars pinpoint to embedding layers as the most memory-intensive deep learning (DL) algorithm being deployed in today's datacenters. This paper addresses the memory capacity and bandwidth challenges of embedding layers and the associated tensor operations. We present our vertically integrated hardware/software co-design, which includes a custom DIMM module enhanced with near-memory processing cores tailored for DL tensor operations. These custom DIMMs are populated inside a GPU-centric system interconnect as a remote memory pool, allowing GPUs to utilize for scalable memory bandwidth and capacity expansion. A prototype implementation of our proposal on real DL systems shows an average 6.2-17.6X performance improvement on state-of-the-art DNN-based recommender systems.

Material: pdf

Bio

Yongeun Kwon is pursuing the Ph.D. degree in electrical engineering from the Kore Advanced Insitute of Science and Technology (KAIST), Daejeon, South Korea. He is a member of VIA research group at KAIST where the primary investigator is Minsoo Rhu.

10/11 (Fri)

Fire in your hands: Understanding Thermal behavior of smartphones

Sooyoung Park

N1-113

Abstract

Overheating smartphones could hamper user experiences. While there have been numerous reports on smartphone overheating, a systematic measurement and user experience study on the thermal aspect of smartphones is missing. Using thermal imaging cameras, we measure and analyze the temperatures of various smartphones running diverse application workloads such as voice calling, video recording, video chatting, and 3D online gaming. Our experiments show that running popular applications such as video chat, could raise the smartphone's surface temperature to over 50 °C in only 10 minutes, which could easily cause thermal pain to users. Recent ubiquitous scenarios such as augmented reality and mobile deep learning also have considerable thermal issues. We then perform a user study to examine when the users perceive heat discomfort from the smartphones and how they react to overheating. Most of our user study participants reported considerable thermal discomfort while playing a mobile game, and that overheating disrupted interaction flows. With this in mind, we devise a smartphone surface temperature prediction model, by using only system statistics and internal sensor values. Our evaluation showed high prediction accuracy with root-mean-square errors of less than 2 °C. We discuss several insights from our findings and recommendations for user experience, OS design, and developer support for better user-thermal interactions.

Material: pdf

Bio

Sooyoung Park is a Ph.D. student in the Department of Computer Science at KAIST. His research interests include novel applications using mobile wireless sensor networks and optimization through experimental analysis of mobile systems. He is currently working on research developing a novel mobile applications using audio signal processing incorporated with optimized machine learning algorithms.

10/25 (Fri)

Toward Scaling Hardware Security Module for Emerging Cloud Services

Juhyeng Han

N1-113

Abstract

The hardware security module (HSM) has been used as a root of trust for various key management services. At the same time, rapid innovation in emerging industries, such as container-based microservices, accelerates demands for scaling security services. However, current on-premises HSMs have limitations to afford such demands due to the restricted scalability and high price of deployment. This paper presents ScaleTrust, a framework for scaling security services by utilizing HSMs with SGX-based key management service (KMS) in a collaborative, yet secure manner. Based on a hierarchical model, we design a cryptographic workload distribution between HSMs and KMS enclaves to achieve both the elasticity of cloud software and the hardware-based security of HSM appliances. We demonstrate practical implications of ScaleTrust using two case studies that require secure cryptographic operations with low latency and high scalability.

Material: pdf

Bio

Juhyeng Han is a Ph.D. candidate in the School of Electrical Engineering at KAIST. He received his B.S. in the School of Computing from KAIST in 2016, and M.S. in the School of Electrical Engineering from KAIST in 2018. His research interests are network systems and network security.

11/08 (Fri)

MetaSense: Few-Shot Adaptation to Untrained Conditions in Deep Mobile Sensing

Taesik Gong

N1-113

Abstract

Recent improvements in deep learning and hardware support offer a new breakthrough in mobile sensing; we could enjoy context-aware services and mobile healthcare on a mobile device powered by artificial intelligence. However, most related studies perform well only with a certain level of similarity between trained and target data distribution, while in practice, a specific user's behaviors and device make sensor inputs different. Consequently, the performance of such applications might suffer in diverse user and device conditions as training deep models in such countless conditions is infeasible. To mitigate the issue, we propose MetaSense, an adaptive deep mobile sensing system utilizing only a few (e.g., one or two) data instances from the target user. MetaSense employs meta learning that learns how to adapt to the target user's condition, by rehearsing multiple similar tasks generated from our unique task generation strategies in offline training. The trained model has the ability to rapidly adapt to the target user's condition when a few data are available. Our evaluation with real-world traces of motion and audio sensors shows that MetaSense not only outperforms the state-of-the-art transfer learning by 18% and meta learning based approaches by 15% in terms of accuracy, but also requires significantly less adaptation time for the target user.

Bio

Taesik Gong is a Ph.D. candidate in NMSL, School Computing, KAIST. His research interests are generally in mobile sensing powered by machine learning. He is currently working on the adaptation of deep models to personal conditions in mobile sensing.

Material: pdf

12/13 (Fri)

Economical Analysis of Blockchain Technologies

Yujin Kwon

N1-113

Abstract

Recently, many researchers and developers have been interested in blockchain technologies, and quite a few blockchain applications such as cryptocurrencies including Bitcoin and Ethereum have been launched. While there exist various studies on the analysis of blockchain technologies, several open problems still exist, which are needed to determine if blockchain technologies can replace the current technologies in the future. In this talk, we focus on economic aspects of blockchain. First, we analyze the security of a PoW-based cryptocurrency and relationship among multiple PoW-based cryptocurrencies economically. Currently, most of blockchain systems adopt a PoW mechanism where each node earns a reward proportional to its computing power. 1) We propose a new attack overcoming shortcomings of the existing attacks against PoW systems, and find out that the attack would make the blockchain system severely centralized. 2) Moreover, we extend an analysis to a situation when multiple coins compete. We show that rational nodes, which pursue a higher profit by dynamically changing one coin to another, can lead to a downfall of minor coins. 3) Next, we also try to answer a generic open question, which can determine the future of blockchain and its application. An important question is whether a high level of decentralization is reachable in a blockchain system. In fact, while blockchain systems are designed for a "good" decentralized system, we often observe poor decentralization in deployed blockchain systems (e.g., Bitcoin and Ethereum etc). This brings up the problem of how we can achieve good decentralization. We prove that this is an impossible goal to achieve. (These works were published by ACM CCS 2017, IEEE S&P 2019, ACM AFT 2019, respectively.) 4) Lastly, another open question in stablecoin th at we are solving is introduced. A stablecoin has a mechanism, which makes its coin price stable, unlike traditional cryptocurrencies. Even though many developers have launched various stablecoins, no global stable currency exists yet. In addition, we don't know yet which type of stablecoins can be global. In this talk, I'll introduce this briefly.

Bio

Yujin is currently a Ph.D student in the Department of Electrical Engineering at KAIST. She received her BS in Electrical Engineering KAIST in February 2016. She is interested in blockchain systems, game theory, and economics. She also visited UC.Berkeley as a summer intern in 2018 (advised by Dawn Song).