2. mOS Socket Abstraction

The most important abstraction in mOS is the socket interface. We reuse the mTCP socket (socket with BSD-like semantics) for this purpose. Although functionally similar, the internal implementation of an mTCP socket is still markedly differently from BSD’s. Due to this reason, we pre-append all socket function names with mtcp_ keyword (e.g. mtcp_socket()). Our mOS socket library provides all important routines that you would typically need when dealing with BSD sockets (e.g. mtcp_getsockopt() and mtcp_setsockopt()).

The socket descriptor space is local to each mOS thread where each registered mTCP socket is associated with a thread context. This allows parallel socket creation from multiple threads by removing lock contention on the socket descriptor space. We also relax the semantics of socket() such that it returns any available socket descriptor instead of the minimum available fd. This reduces the overhead of finding the minimum available fd descriptor.

mOS provides two types of sockets: (i) mTCP sockets (for endpoints), and (ii) monitoring sockets (for middleboxes). We briefly explain both types below.

2.1. mTCP sockets

mTCP sockets (socket type: MOS_SOCK_STREAM) in mOS provide reliable end-to-end communication between the client and the server nodes. Users can build mOS server applications by calling sequence of BSD-like socket functions (mtcp_socket(), mtcp_bind(), mtcp_listen(), mtcp_accept()). We provide similar set of functions for building client applications (mtcp_socket(), mtcp_connect()) as well.

Like Berkeley sockets, the connections built on mOS sockets also have internal flow control and congestion control implementations.

2.2. Monitoring sockets

For monitoring, we extend our networking API to introduce a new socket type called MOS_SOCK_MONITOR_STREAM. We term MOS_SOCK_MONITOR_STREAM sockets as simply stream monitoring sockets. Conceptually, a stream monitoring socket abstracts a middlebox’s tap-point on a passing flow or packets. A monitoring socket is similar to a Berkeley socket, but it differs in its operating semantics. First, a stream monitoring socket represents a non-terminating midpoint of an on-going TCP connection. With a stream monitoring socket, one can closely follow the TCP state change of both client and server without terminating a TCP connection explicitly. Second, a monitoring socket can monitor fine-grained TCP-layer operations while a stream Berkeley socket carries out coarse-grained, application-layer operations. For example, a monitoring socket can detect TCP or packet-level events such as abnormal packet retransmission, packet arrival order, abrupt connection termination, employment of weird TCP/IP options, etc., while it simultaneously supports reading flow-reassembled data from server or client.

Using the monitoring socket and its API functions, one can write a complex monitoring middlebox in a modular manner. First, a developer creates a ‘passive’ monitoring socket (similar to a listening socket) and binds it to a traffic scope, specified in a Berkeley packet filter (BPF). Only those flows/packets that fall into the scope are monitored.

Note that there is no notion of “accepting” a connection since a middlebox does not engage in a connection as an explicit endpoint. Instead, one can specify when custom operation should be triggered by registering for flow events as described in section mOS Event System. All one needs is to provide the event handlers that perform a custom middlebox logic, since the underlying mOS networking stack (or mOS stack) automatically detects and raises the events by managing the flow contexts. An event handler is passed along an ‘active’ monitoring socket that represents the flow triggering the event. Through the socket, one can probe further on the flow state or retrieve and even modify the last packet that raised the event.

A part of this text first appeared in a technical report.