There are many different projects for controlling different aspects of the network in a datacenter. Many of them seem to be more oriented to the hypervisor side of things: how to get containers to talk to each other across a network. Those projects seem to make the assumption that a 'core network' already exists. Which implies that a network is partitioned into management by (at least) two groups of people?
What sort of solutions exist for network management end to end -- ie, being able to provision container networking, distributed networking within the hypervisor, routing/switching in the core, connectivity to the edge/exterior world? This would also include the ability of distributed security. Rather than central firewalls - security is provided at each entry point: each container/guest interface along with some distributed service function chaining to provide additional layers of security on particular flows.
To understand the challenges more intimately, I have been experimenting with these concepts. For instance, openvswitch is a popular platform for switching in hypervisor environments. Under the hood, it makes use of OpenFlow tables to control packet flows. It has a rich set of matches and actions.
The same development group have recently released OVN (open virtual networks), which provides a distributed abstraction for routing/switching and container provisioning. They have a ways to go before routing becomes a solved problem for resiliency. To me, OVN only solves part of the problem of networking throughout a datacenter infrastructure -- mostly having to do with the hypervisor side of things.
At another style of abstraction, Ryu appears to be a popular controller for OpenFlow style networking, and has an active user base. Not only does it know OpenFlow, it has been extended to handle openvswitch extensions, and even 'knows' how to integrate with the openvswitch ovsdb. Some users have integrated it with NetworkX to handle path calculations through a network.
I was initially tempted to run a single instance of Ryu to control various proof of concept routing/switching/hypervisor instances. The single instance of Ryu would use some graph theory to figure out which network element required which subset of functionality. That doesn't necessarily scale well, nor is it very resilient.
Taking a page from the OVN playbook where OVN has a central controller with agents on the hypervisors, something similar could be performed with Ryu. Rather than a central RYU controller, there would be a Ryu controller on each network element. Their effective SouthBound interface would be a distributed key/value store such as what could be provided by Consul. Each Ryu would calculate it's own participation in the network graph represented in the key/value store adjust local OpenFlow rules to suit. This provides distributed control resiliently with a distributed northbound interface for intent based network provisioning.
As part of it's feature set, Consul can also perform health monitoring and some statistics gathering. These statistics and state management can be used with the local path calculations to perform load-balancing and multi-path planning. Theoretically speaking.
I am wondering if this concept can handle distributed MLAG/LACP and VRRP style resiliency functionality. The distributed key/value store is the mechanism for signalling. Would this work?
Who is working on similar functionality?