Performances

ART is a free multibit trie based routing table. To keep it simple, it can be seen as using more memory for fewer CPU cycles. In other words, we get a faster lookup by wasting memory. The original paper presents some performance comparisons between two ART configurations and the BSD Radix. But how does this apply to OpenBSD?

I asked Hrvoje Popovski to run his packet forwarding test on his Xeon box (E5-2620 v2 @ 2.10GHz, 2400.34 MHz) with ix(4) (82599) interfaces. The test setup consist of three machines with the OpenBSD box in the middle:

       Setup

The machine is configured with:

        kern.pool_debug=0
        kern.maxclusters=32768
        net.inet.ip.forwarding=1
        net.inet.ip.ifq.maxlen=8192

The simulations have been performed with an OpenBSD -current from June 9th. The machine is configured with pf(4) disabled in order to force a single route lookup for every IPv4 packet. Based on the result of the lookup the kernel decide if it should forward, deliver or drop the packet:

      Route Lookup

In order to measure the impact of a different routing table configurations we need to send enough packets to make sure the machine is CPU bound but not live locked. In this case it happens when sending 800 Kpps. During a test the sender generates packets for all addresses of a given subnet to invalidate any possible cache. Plus to see the difference of various subnets, the test has been reproduced on a /24, a /20 and a /16:

       Forwarding performances

ART 8/4/4/4/4/4/4 is the configuration currently enabled in -current. It has been choosen because it has a memory footprint similar to the BSD Radix for a fullfeed BGP server. The other configurations waste more memory by using bigger stride lengths but, as shown on the graphic, perform faster.

Memory consumption

Sadly this test is not enough to see how much memory can be wasted by different ART configurations because it does not create fragmentation:

       Memory consumption when forwarding to 11/16

So I asked benno@ and phessler@ if I could measure once again the memory consumption of a BGP fullfeed on their test setup:

       Memory consumption of a BGP server

We can see that using bigger stride lengths result in more memory being wasted.

Conclusion

As expected ART performs faster than the BSD Radix on OpenBSD because it uses more memory. That's hopefully not new to me, so what did I learn during these experiments?

First of all I validated the choice I made two years ago to integrate an implementation supporting variable stride lengths. It is nice to be able to choose at compile time, depending on your usage, how much memory you want to dedicate to improve your route lookup performances. This is important for experimenting but it also matches the fact OpenBSD is a generic purpose operating system and its users have different requirements.

Secondly I've seen that the 8/8/8/8 configuration doesn't offer any advantage compared to the 16/8/8 one. The only problem is that it doesn't seem to be currently possible to allocate the 16-heap in a MP-safe way on OpenBSD, something to look at in the future?

Finally the in-kernel memory consumed by the routing table of a BGP server is around 130M in -current and would grow to 180M with a 16/8/8 configuration. Even if route lookup performances are not critical for such setups, I'd argue that consuming 180M is already acceptable today, in case we decide to crank the defaults in the upcoming years.