From the cisco-nsp mailing list.
troubleshooting routing issues on paths external to our network that lead to blackholing of specific 5-tuple combinations here, very likely due to ECMP/Bundling issues (we are link is up/up and used for load-balancing, but cannot actually transmit or receive traffic, therefor dropping those packets on the floor).
Now, this happened a few times over the years, and I am wondering if you guys have any suggestions or tools that you use in those cases, other than tcpdump'ing at both ends, generating thousands of 5-tuple combinations and then analyzing them in wireshark.
Many people on this list are probably doing this already but with hardware devices like Ixia testers. You can use the control software for them to generate different flows in a pragmatic fashion. We haven’t the budget for them I’m afraid but I think we can test all we want with these pieces of software below.
You can check out Cisco’s TRex (https://trex-tgn.cisco.com/) or MoonGen (https://trex-tgn.cisco.com/). Both are built on DPDK so you need to install that as a prerequisite. These will let you generate large numbers of flows in a scriptable fashion and record the results.
I haven’t had time (story of my life!) but ideally I want to set up a new test server in our lab an get either/both of these installed so that each time we test a new device we can generate a range of traffic/flows to test the device forwards as desired, to test load balancing and hashing, testing ACLs, QoS etc.
At the minute we have a couple of low end devices and make do with the following open source tools to generate single packets or single flows for just basic speed testing or testing that traffic drops into a specific queue, matches an ACL etc:
- Generating single customer packets: http://ostinato.org/ and http://packeth.sourceforge.net/packeth/Home.html
- Layer 2 Ethernet/MPLS: https://github.com/jwbensley/Etherate
- Layer3/4 IP/TCP/UDP: https://github.com/esnet/iperf Layer 2/3/4: http://pktgen.readthedocs.io/en/latest/
- Specifically for testing ECMP have a look at this (I haven’t had a chance to play with it personally yet): https://github.com/facebook/UdpPinger
Also, after obtaining a list of affected and unaffected 5-tuples, any particular easy way to find out how this is getting hashed, so that we could find the likely number of bundle members (this could be very useful multiple interconnection and parties are involved).
You are probably going to need to dig into vendor specifics; what vendors are in play and the configs deployed (what load balancing / hashing options/knobs have been configured), then look at the hardware documentation with regards to what is supported by the hardware, does that match what is configured? If you test it empirically does it add up? The vendor documentation should say how the load-balancing is done.
You can roughly work out the hashing mechanism by say sending a fake flow from 10.0.0.1 to 10.0.0.2, proto TCP, src port 1, dst port 1. Then just increment one field by one, dst port == 2, then dst == 3 etc. Look as the traffic moves between links. If you keep incrementing you can brute force you way through an eventually you might see the same pattern of hashing results emerge.
Also some boxes have a command to test the hashing, example from a Cisco 4500X:
#show platform software etherchannel port-channel 1 map l4-port 1.1.1.1 100 2.2.2.2 200 | i is Te Map port for l4-port 1.1.1.1:100, 2.2.2.2:200 is Te1/1/16(Po1)
Another entry from the mailing list:
I'm looking at the "paris traceroute" toolsset right now, that looks like it could be the right tool for the job: lib paris trace route: aware of the multiple paths and can report on any single one of them accurately, as well as on all of them.