Version 3.2.3 - Performance Improvements

25 September 2019

We proud to announce the immediate availability of Gravwell version 3.2.3. This release is all about performance and bug fixes, but we did manage to slip in a new Kafka ingester.

The full changelog for version 3.2.3 is available on our docs page. The new Kafka ingester is available in the Debian repository and our downloads page. Docs on installing and configuring the new Kafka ingester are also ready. Now lets take a quick look at those fulltext indexer performance improvements. This blog post is a little deeper on the technical side and dives into some internals in the Go language and it's tools. If you don't care, skip to the results section to see what we wrang out.

Profiling, Performance, and Fulltext Indexing

Every once in a while we get at Gravwell will pick a component and spend a bit of time trying to improve performance. It usually ends up becoming a game where developers go back and forth wringing out additional performance as we go. This week was all about the Fulltext indexing system.

Fulltext indexing involves a few stages. First we crack an entry into components that we wish to index, then we remove duplicates and clean the components, then we hand the components over to the indexing engine. We decided the cracking component was ripe for improvement and set out to improve its performance. The first step was to perform some profiling and build out an enhanced test suite so that we had a consistent set of benchmarks to compare against. Gravwell is a Golang shop and the Go tool chain has some excellent tools for building out benchmarks and profiling code.

Once we had the benchmarks built we set about to get a nice baseline with the current code and compare it to what we typically see in production. The results were consistent, but I have to admit we were not expecting what the benchmarks and profiling results were telling us. The fulltext cracker was the dominating component. My assumption was that the fulltext cracker would be the least expensive component in the chain, but our testing was showing that it was the most expensive. Even worse, it was allocating, a lot...

You Know What They Say About Assumptions

The fulltext cracker should not be allocating at all, it's just finding bounds in an existing slice of data. However, our profiling showed that the cracker was copying components, which means we were absolutely hammering the memory subsystem. Those allocations not only slowed down the fulltext cracker, they also make the garbage collector work a lot harder. The first task was to identify which specific piece of code was performing the allocations. The Go pprof and test systems make that pretty easy, just slap on a -memprofile flag and run our benchmarks.

The cracker was indeed at fault, but the allocations were not coming from our code -- they were coming from the bufio package in the Go standard library. Specifically in the Scan function. If you are a Go developer, check the source of that function; you will see exactly where we screwed up. We built the fulltext cracking engine using the bufio.Scanner code as a loose wrapper, unfortunately that code has a nasty little copy() in it. The core of the problem is that the bufio.Scanner object holds an io.Reader as a data source because it might be reading from a file, network socket, etc. We made the assumption that passing it a byte buffer would mean there would be no need for any data copying, that was a bad assumption.

So we set out to build a new fulltext processing framework that didn't use the Go standard library and didn't allocate. Long story short, we rewrote it from the ground up using some static allocations and spiffy processing to reduce loops. Allocations were reduced dramatically and we still retained the ability to process UTF-8 text, because you never know when your fulltext processing engine is going to need to deal with a poop emoji (💩).

The Results

If you refer back to our benchmark blog post we were getting about 95KE/s or 19MB/s on 10 million Apache access logs when using the index engine. The new system improves that to about 146KE/s or 30MB/s. The bloom engine makes the effects of the new fulltext cracker even more dramatic, going from about 164KE/s to over 330KE/s. We more than doubled ingest performance using the fulltext accelerator. Improving the fulltext cracker is important for ease of use, when you don't know exactly what you are getting (which is what we are here for) fulltext indexing is the logical choice.

The increased ingest rate is great, but the REAL kicker is the dramatically reduced pressure on the memory subsystem. Version 3.2.3 sees a pretty handy reduction in resident memory when performing high volume fulltext ingest. Less memory means less impact on the system and better performance at search time. Faster ingest means that Gravwell indexers can handle and ride out massive deluges of data. Long story short, faster is better and we just got faster.

Conslusion

Gravwell version 3.2.3 got a lot better on a few fronts: we fixed some bugs, sped up the fulltext cracker, released a new ingester, and improved our handling of geo maps. If you are ready to see how a high speed unstructured data analytics platform can reduce your costs and allow security and devops teams to dig deep into your data, hit us up.