Gravwell Blog

Fighting Unpredictable Analytics Costs With All-You-Can-Ingest Pricing

Oct 10, 2018 4:07:31 PM / by Corey Thuen

One of the biggest complaints that’s heard across the industry is that of cost. “Too expensive” or “untenable pricing scale” are things we have been hearing from colleagues at conferences and on forums for years. Years! Yet we’re still stuck with this extremely frustrating pricing model that disincentivizes people from using the very tool they purchased. What do I mean? Let’s dive in.

 

Cost Analysis Scenario:Threat hunting within Bro

To give some practicality to the discussion, let’s frame the context around threat hunting within Bro data, but the data source could be anything. We’ll use a fictitious company, ACME Pharm, to avoid naming actual entities. The technical aspects to how threat hunting is performed in Bro is outside the scope of this piece (but likely to be covered elsewhere on our blog, stay tuned!). What we do care about, however, are the different types of data generated by Bro and why we would want to have each type.


Let’s say ACME Pharm currently has an analytics tool that costs $600 per GB of data per day per year, or $50/GB/Month if they pay for the whole year at once. They’ve got a budget for a threat hunting analytics platform of $120k per year (10k per month for easy math).


Bro dumps data out in different log files based on the module that generated the data. For threat hunting of DNS, the DNS logs are an obvious must have. Let’s say that in our enterprise these logs are about 1MB/s or 86 gigs per day. ACME Pharm also wants to collect and analyze network flows, file transfer activity, Windows authentications, and maybe some other ancillary logs. Exactly what’s in each of these logs isn’t important for this discussion. Their total data generation might look like:


Source

Data Rate

dns.log

1MB/sec

conn.log

2MB/sec

ntlm.log

0.3MB/sec

files.log

0.4MB/sec

smb_files.log

0.8MB/sec

Total

4.5MB/s (~389GB/day)


ACME Pharm has got far too much data for what they can afford! Doing some fast math we see their costs are around $19.5k/mo in license fees. BUT WAIT WE’RE NOT DONE IT GETS WORSE! That’s the size of the raw output from Bro but the license fees are based on indexed data (i.e. size on disk). So we need to take this bad boy and multiply it by 1.2 to account for a 20% blowup for indexes.


Source

Base Data per Day

Billed Data per Day

dns.log

86.5

104

conn.log

173

208

ntlm.log

26

31

files.log

34.5

41

smb_files.log

69

83

Total

389 GB / day

467 GB / day


The slightly more accurate measure for cost per month is $23350. Over twice what they’ve budgeted. Cue the sad trombone.


Item

ACME Pharm Cost

Data analytics with crappy pricing model at 467 GB per day

$23,350 per month

Cost of not having the data necessary for proper response

-$0 unless there’s an incident but that won’t happen to us…

Total

$23.3k



Begun, the Data Priority Wars have

ACME Pharm, like almost any organization out there currently using a vendor stuck on this garbage pricing model, is having to fight internally about which data they get to ingest for analysis and which they have to drop on the floor.


They make some hard decisions and opt to collect all of DNS...and that’s about it. They’d like to include more but they know that they’ll need some room in case they have an incident or investigation that warrants turning on any or all of the additional data sources for a period of time. Bumping against the license cap can vary in pain level depending on the vendor, but it ain’t pretty for any of them. Especially if it’s a frequent occurrence.

 

An incident occurs

Eventually the unfortunate inevitability happens, ACME Pharm has got an incident on their hands. A tip from the DNS feed has led their hunt team to discover a compromised host exfiltrating data via DNS. It’s time for a complete investigation.


The hunt team turns on the other data sources and hopes they can quickly gather all actionable intelligence they might need before hitting the license cap. They flip on the connection log info in order to track which other systems within their network are communicating with the compromised system in order to get some idea of lateral movement -- all while crossing their fingers hoping the movement mechanism isn’t past occurrence.


The investigation just starts to hit a nice stride when they unexpectedly smack into the ceiling on the data cap because everyone forgot about the beginning of the month when those shiny little IoT devices that marketing was testing doubled the DNS rate for 4 days. Predicting data rates isn’t science or an art, it’s voodoo. The team lead practices her counting to ten breathing exercises but she skips half of them because she. can’t. even.


Unpredictable and runaway costs are serious consequences to a pricing model that isn’t built for the people doing the job. The old model is from the days before data generation exploded and it’s underpinned by legacy technology that can’t scale to make proper use of resources. There’s got to be a better way.


How Gravwell is changing the game

We’ve done a lot of talking on this blog about the awesome technology behind Gravwell and the kinds of use cases that it enables. What that technology stack also enables, though, is scalability that’s sorely needed in this space.


We wrote Gravwell from scratch using Go because we knew that “going wide” was paramount. We needed to give our customers the freedom to make their own data decisions and actually utilize the hardware or cloud resources that they already purchased. We needed a pricing model that didn’t turn the screw when customers needed us the most.


Revisiting the Scenario: This time with Gravwell

Let’s come back to the scenario and walk through it and observe ACME Pharm replacing their legacy solution with Gravwell. Recall that ACME Pharm is doing about 389 GB per day of raw Bro logs with a budget of $10k per month.


Regarding indexer bloat, Gravwell indexes add about 5% overhead as a base. For customers using Accelerators that number can add another 2-4%. So on the pessimistic scenario we’re talking about a final Gravwell indexer ingest of about 420 GB per day getting put in Gravwell shards on disk. But, do you know what? It doesn’t matter!


Gravwell does not price based on data rates. Every Gravwell deployment includes unlimited data. It’s your storage -- use it. In case you haven’t picked up on it by now we’re pretty passionate about this topic (though of course you have as we already established how intelligent you are for reading this). At this point you probably have some questions.


What does that mean for ACME Pharm? What’s the price comparison? Can they just buy one Gravwell node no matter how much data they have?


Ok, I need to level with you here. There’s a downside to this in that it’s tough to do an apples-to-apples price comparison because the models are different but check this out.

 

Gravwell Pricing enables predictability

Let’s say that ACME Pharm is running this on prem with some relatively modest hardware. They have 3 systems in a rack with some nice storage tiers (SSD and magnetic spinners) that can handle a given amount of data per day and still be responsive when it comes to searching. With Gravwell, they can put 3 TB per day into a single node if the storage will take it, but that doesn’t mean it’s a good idea. They should match appropriate compute power for their needs (a topic for another post). Let’s say their nodes are capable of handling 200 GB per day each while maintaining satisfactory search performance. Based purely on this quick-and-dirty data rate estimation, they could probably get by with 2 nodes but let’s see what 3 nodes looks like when it comes to cost:


Item

ACME Pharm Cost

Base Gravwell node

$2k per node per month ($6k)

Premium Enterprise features (ACME Pharm wants some bells and whistles like high availability and single sign-on)

$1.5k per node per month ($4.5k)

Annual payment discount

-10%

Total

$9.45k


We’re already coming in under budget and at under half of what the competitor is charging. ACME Pharm gains predictability in cost and deploys Gravwell in a forward-thinking manner that allows for expansion as data rates naturally increase. If a data spike happens, it’s no longer a huge pain issue. Teams don’t have to fight it out to see which data gets to be valuable on a given day. Instead, they can all live in data nirvana.

 

Wrapping it up

Hopefully this post has provided some insights into the kinds of cost predictability that can be had when an analytics solution prices like it’s 2018.


I also hope I’ve riled you up a bit to fight the injustice of the current abusive pricing system. If this resonates with you at all, we would love 5 minutes to hear tales from the trenches about your cost woes.

Let's talk about predictable analytics costs

Topics: Case study, Gravwell Story, Analytics Economics

Corey Thuen

Written by Corey Thuen

Co-Founder of Gravwell