Web Client Application (epwget)

Introduction

The epwget program is a sample event-driven HTTP web client which sends HTTP requests and receives the web pages through HTTP response. epwget uses epoll (event poll) interface to detect whether the mTCP socket is ready for read and write operations.

Code Walkthrough

The following sections provide an explanation of the main components of the epwget code. All mOS library functions used in the sample code are prefixed with mtcp_ and are explained in detail in the Programmer’s Guide - mOS Programming API. Note that we omit the error handling logic from the example code snippets for brevity.

(1) The main() Function

The main() function performs the initialization and calls the execution threads for each CPU core.

The first task is to initialize mOS thread based on the mOS configuration file. fname holds the path to the mos.conf file which will be passed to mtcp_init() function. We can use mtcp_getconf() function to retrieve current configuration settings from the mOS core.

/* parse mos configuration file */
ret = mtcp_init(fname);

mtcp_getconf(&g_mcfg);
core_limit = g_mcfg.num_cores;

The next step is global parameter initialization using the GlbInitWget() function. We will describe the details of this function in the next section.

The last step is to create and run per-core mTCP threads. For each CPU core, it creates a new mTCP thread which gets spawned from a function named RunMTCP().

for (i = 0; i < core_limit; i++)
        pthread_create(&mtcp_thread[i], NULL, RunMTCP, (void *)&cores[i]));

(2) The Global Parameter Initialization Function

The GlbInitWget() function loads the epwget application-specific configuration from epwget.conf file. The following code block shows the example configuration for epwget.conf. url parameter is used to set the URL of the file to be downloaded. dest_port specifies the port number of the web server to connect. total_flows indicates the total number of flows (in other words, the total number of downloads), and total_concurrency is the number of concurrent flows allowed to run at the same time. By setting core_limit parameter, the application can override the number of CPU cores to be used.

url = 10.0.0.3/64K
dest_port = 80
total_flows = 100000
total_concurrency = 4000
core_limit = 8

GlbInitWget() function reads the configuration file, and saves the parameters in global variables. We note that our epwget implementation assumes that the maximum number of file descriptors that mTCP thread can create is three times larger than the user-defined number of concurrent flows.

epwget overrides the max_concurrency and max_num_buffers parameters of mOS configuration using mtcp_getconf() and mtcp_setconf() functions:

/* set the max number of fds 3x larger than concurrency */
max_fds = concurrency * 3;

mtcp_getconf(&mcfg);
mcfg.max_concurrency = max_fds;
mcfg.max_num_buffers = max_fds;
mtcp_setconf(&mcfg);

(3) The RunMTCP() Function

The RunMTCP() function is executed in a per-thread manner. First, RunMTCP() function affinitizes a CPU core to each thread and creates a mtcp context. Next, it calls the RunApplication() function, which uses sockets to create connections, send HTTP requests, and receive HTTP responses.

/* affinitize the mTCP thread to a core */
mtcp_core_affinitize(core);

/* mTCP initialization */
mctx = mtcp_create_context(core);

RunApplication(mctx);

RunApplication() function consists of InitWget() function and RunWget() function. InitWget() creates a thread context which holds thread-specific metadata including epoll-related variables and statistics of the flows related to their status (e.g., started, pending, done, errors, and incompletes).

One of the important roles of InitWget() function is to initialize the RSS (receive-side scaling) setup which involves deriving the source port number from the remaining three parameters of 4-tuple (source network address, destination network address, and destination port number) TCP connection information.

mtcp_init_rss(mctx, saddr, IP_RANGE, daddr, dport);

Afterwards, epwget creates the epoll loop to receive the read and write availability events as follows (note that we have simplified the code for better readability):

ep = mtcp_epoll_create(mctx, ctx->maxevents);

RunWget() is the core of this program. In this function, using the epoll event API, it creates new connections, and sends or receives data.

while (!done) {

        /* until it meets the maximum number of concurrent connections, */
        while (mtcp_get_connection_cnt(ctx->mctx) < concurrency) {
                /* create a new connection */
                CreateConnection(ctx);
        }

        /* wait inside the epoll_wait call until there's any event */
        nevents = mtcp_epoll_wait(mctx, ctx->ep, ctx->events, ,,,);

        for (i = 0; i < nevents; i++) {
                if (ctx->events[i].events & MOS_EPOLLERR) {
                        /* print an error message and close the connection*/
                        ...
                } else if (ctx->events[i].events & MOS_EPOLLIN) {
                        /* read the data arrived at the socket buffer */
                        HandleReadEvent(ctx, ctx->events[i].data.sock, ...);
                } else if (ctx->events[i].events == MOS_EPOLLOUT) {
                        /* write HTTP request to the socket send buffer */
                        SendHTTPRequest(ctx, ctx->events[i].data.sock, wv);
                }
        }
        ...
}

Here are some detailed explanations for each sub-function in the code above:

  • CreateConnection() function creates a new mtcp socket, sets the socket as non-blocking, connects to the target web server, and adds the socket to the epoll event queue.

    sockid = mtcp_socket(mctx, AF_INET, SOCK_STREAM, 0);
    ...
    mtcp_setsock_nonblock(mctx, sockid);
    ...
    mtcp_connect(mctx, sockid, &addr, sizeof(struct sockaddr_in));
    ...
    mtcp_epoll_ctl(mctx, ctx->ep, MOS_EPOLL_CTL_ADD, sockid, &ev);
    
  • SendHTTPRequest() function creates an outgoing HTTP request header, and opens a file to store the response data.

    snprintf(request, HTTP_HEADER_LEN, "GET %s HTTP/1.0\r\n", ...);
    len = strlen(request);
    wr = mtcp_write(ctx->mctx, sockid, request, len);
    ...
    wv->fd = open(fname, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    
  • HandleReadEvent() function consists of reading the payload from the socket, and storing the data to the file.

    rd = mtcp_read(mctx, sockid, buf, BUF_SIZE);
    /* parse the http header */
    ...
    if (writable) {
           /* store the data to the file */
           write(wv->fd, pbuf + wr, rd - wr);
    }
    

(4) Multi-process Version (DPDK-only)

You can also run epwget in multi-process (single-threaded) mode. This mode will only work with Intel DPDK driver. You can find epwget-mp placed in the same directory where epwget lies. The overall design of epwget-mp is similar to epwget (only pthreads are absent). One can run epwget-mp on a 4-core machine using the following script:

#!/bin/bash
./epwget-mp -f config/mos-master.conf -c 0 &
sleep 5
for i in {1..3}
do
./epwget-mp -f config/mos-slave.conf -c $i &
done

The -c switch is used to bind the process to a specific CPU core. Under DPDK settings, the master process (core 0 in the example above) is responsible for initializing the underlying DPDK-specific NIC resources one time. The slave processes (cores 1-3) share those initialized resources with the master process. The master process relies on the mos-master.conf file for configuration. It has only 1 new keyword: multiprocess = 0 master; where 0 stands for the CPU core id. The mos-slave.conf configuration file has an additional line: multiprocess = slave; which (as the line suggests) sets the process as a DPDK secondary (slave) instance. We employ a mandatory wait between the execution of the master and the slave processes. This is needed to avoid potential race conditions between the shared resources that are updated between them.