PowerTCP: Pushing the Performance Limits of Datacenter Networks
Increasingly stringent throughput and latency requirements in datacenter networks demand fast and accurate congestion control. We observe that the reaction time and accuracy of existing datacenter congestion control schemes are inherently limited. They either rely only on explicit feedback about the network state (e.g., queue lengths in DCTCP) or only on variations of state (e.g., RTT gradient in TIMELY). To overcome these limitations, we propose a novel congestion control algorithm, PowerTCP, which achieves much more fine-grained congestion control by adapting to the bandwidth-window product (henceforth called power). PowerTCP leverages in-band network telemetry to react to changes in the network instantaneously without loss of throughput and while keeping queues short. Due to its fast reaction time, our algorithm is particularly well-suited for dynamic network environments and bursty traffic patterns. We show analytically and empirically that PowerTCP can significantly outperform the state-of-the-art in both traditional datacenter topologies and emerging reconfigurable datacenters where frequent bandwidth changes make congestion control challenging. In traditional datacenter networks, PowerTCP reduces tail flow completion times of short flows by 80% compared to DCQCN and TIMELY, and by 33% compared to HPCC even at 60% network load. In reconfigurable datacenters, PowerTCP achieves 85% circuit utilization without incurring additional latency and cuts tail latency by at least 2x compared to existing approaches.
Optimal Weighted Load Balancing in TCAMs
Traffic splitting is a required functionality in networks, for example for load balancing over multiple paths or among different servers. The capacities of the servers determine the partition by which traffic should be split. A recent approach implements traffic splitting within the ternary content addressable memory (TCAM), which is often available in switches. It is important to reduce the amount of memory allocated for this task since TCAMs are power consuming and are often also required for other tasks such as classification and routing. Previous work showed how to compute the smallest prefix-matching TCAM necessary to implement a given partition exactly. In this paper we solve the more practical case, where at most $n$ prefix-matching TCAM rules are available, restricting the ability to implement exactly the desired partition. We give simple and efficient algorithms to find $n$ rules that generate a partition closest in $L_\infty$ to the desired one. We do the same for a one-sided version of $L_\infty$ which equals to the maximum overload on a server and for a relative version of it. We use our algorithms to evaluate how the expected error changes as a function of the number of rules, the number of servers, and the width of the TCAM.