Years ago, if you wanted to launch an Internet company or Web application, you had to own necessary hardware. Now, the economics have changed drastically with the ease of Cloud computing. It’s still a new industry that people are trying to figure out, especially when it comes to cost and optimization.
Today, we’re talking to Dann Berg, a Cloud ops analyst at Datadog. He helps others understand and lower the cost of Cloud operations. Dann is a detective who is dedicated to figuring out why a company’s Cloud bill is so high.
Some of the highlights of the show include:
Companies struggle with field of Cloud economics; can be overwhelming because there’s so much to learn about products and implementation
Companies use the Cloud to grow quickly, which makes their Cloud costs grow quickly and more than expected
Only access to full list of every resource being used is the Cloud bill; there’s no comprehensive inventory service available
Companies need to offer visibility to Cloud bill; not everyone has access to understand how their actions impact the bill
Cost of Cloud bill is dependant on different factors, including new features, new users, and cost of goods sold (COGS)
Scale and manage bill by using a platform app or hiring a consultant/team
Understand pricing of AWS and learn best practices for cost controls early on
Don’t leave money on the table by focusing on engineering time - not best use of resources; focus on the smallest things that have the biggest impact
Cost is important, but don’t slow down those developing in the Cloud; open lines of communication to create culture to understand cost, value what’s measured
Full Episode Transcript:
Corey: This episode of Screaming In The Cloud has been sponsored by CHAOSSEARCH. CHAOSSEARCH is a cloud-native SaaS offering that extends the power of Elasticsearch’s API on top of your data that already lives in Amazon’s S3. CHAOSSEARCH essentially turns your data in S3 into a Warm Elasticsearch Cluster, which finally gives you the ability to search, query, and visualize months’ or years’ worth of log and event data without the onerous cost of running a Hot Elk Cluster for legacy data retention. Don’t move your data out of S3. Just connect the CHAOSSEARCH platform to your S3 Buckets and in minutes the data is indexed into a highly compressed data format and written back into your S3 Buckets, so it keeps the data under your control. You can then use tools like Kibana on top of that to search and visualize your data on S3, querying across terabytes of data within seconds. Reduce the size of your Hot Elk Clusters and waterfall your data to CHAOSSEARCH to get access to an unlimited amount of log and event data. Access more data, run fewer servers, spend less money. CHAOSSEARCH. To learn more, visit chaossearch.io and sign-up for a trial. Thanks to CHAOSSEARCH for their support of this episode.
Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined today by Dann Berg, who's a CloudOps Analyst at Datadog. His job is generally to liaise between engineering and finance to help understand and lower cloud operations costs, which is a subject that's, of course, near and dear to my heart. More importantly than that, he has one of the most impressive mustaches I've ever seen. Even now recording this a week after re:Invent, I'm still in awe over it. Dann, welcome to the show.
Dann: Thank you so much. I really think it's the mustache that helps me save costs. It's done great things for my career.
Corey: I'm sure it has to because, even if you're telling me something that's objectively wrong, no one is going to argue with that mustache.
Dann: Mustaches are coming back. I shaved it last year for November and then, after November was over, I went back to my usual beard and everybody missed it. My wife missed it. I kind of missed it, and it's been here since then, so a little over a year now.
Corey: I wish I could grow one, but that's a whole separate argument for another time and a sad over-drinks-at-a-bar conversation. Let's talk a little bit about cloud economics, starting with, "What is it?"
Dann: Fantastic. It's interesting because previously when somebody would want to launch an internet company, or a web application, or something, you'd have to actually own the physical hardware, which is all stuff that your audience knows, but the economics have just changed so drastically with the ease of cloud computing, and AWS, and everybody else. Since it is still such a new industry, we're still trying to figure things out in terms of cost, and optimization, and whether something is more cost-effective to have it on-prem or have it in the cloud. Really, being able to run all those numbers, being able to work with finance to understand the bill, first of all, be able to do cost projections and really understand your application, both how it works technically and how adding a certain number of users will impact your bill, I would put all of that into cloud economics.
Corey: It winds up being a radical shift in how companies do these things. You used to have to do capital expense planning that would span years. It was hard to accidentally order $6 million worth of hardware and not get fired for that or be accused of embezzlement, but it's easy to wind up causing tremendous waste when everything is on-demand and you're five organizational layers of separation between the person who can cause changes to the bill and the person who gets the bill. I wound up calling myself a cloud economist because those are two words that almost no one understands so no one was going to argue with me about it, but it turns out this is a field a lot of companies are continuing to struggle with.
Dann: Yeah, it's interesting because I came from a background of the actual physical hardware and dealing with capex, projections and things like that. At my previous company, I kind of started getting more and more into the cloud, which is how I eventually ended up where I am. It was just interesting because it presented an entirely new set of challenges not just from the money perspective. I think anybody approaching it from the finance side of things–there's just so much to learn that it can be so overwhelming because it's not just the sheer number of products that Amazon has; it's the millions of different ways that they can bill each individual project or product. Wrapping your head around it and really understanding it is a continuing journey. I don't know if it'll ever end.
Corey: Just before we dive into this any further, I want to give a quick conflict-of-interest statement here. I am not partnered with any vendor in this space, Datadog is not sponsoring this episode of the podcast and, as of the time of this recording, Datadog is not one of my customers because the hard sell is next week. With that in mind, I also want to call out as well that you are speaking in general terms and not specific aspects of the bills that you see in any particular company, as am I. Is that a fair way of calling this out?
Dann: Yeah, that’s exactly true. I think that the things that I do, we could talk about in terms that are actually going to be useful to anybody listening, and I don't think that the actual numbers are relevant to that because I think it's really about methods, and I think it's the ways of thinking about things that are really important here. Yeah, that is correct.
Corey: Perfect. For those who only know me as the stupider voice on this podcast, what I do day to day is I go into companies that have large infrastructures and I help optimize the AWS bill, which is why I care so much about having this conversation today. Let's start at the very beginning. Dann, in your experience, what do companies care about with respect to cloud bills? Because everyone starts with the conversation of, "Oh, the number's too high. We need to make it lower." How do you see that manifesting?
Dann: That's exactly how it starts, and I view my job a lot as detective work, especially starting in a new company or, for you, I'm sure, starting at a company that you don't necessarily know well. The way that you get to know a company or what's going on is starting with the bill and working your way backwards. Hopefully, you're working with a company that has a good tagging strategy in place. Hopefully, you have access to the people that can provide you answers.
It starts with companies that are going to grow fast. To grow fast, they just throw something up in the cloud and then the company does well and they're just growing bigger, and bigger, and bigger. Pretty soon, their cloud costs are going faster than expected, and somebody at the company, whether they have a finance department or somebody just looking at the bills, takes a moment to be like, "Wait, what are we doing?" That's really where somebody who's dedicated to working on the bill starts, and that's really where the detective work starts from my experience.
Corey: Organizational politics are always fascinating. Now, you and I are also both biased for companies that are "born in the cloud" or have a primary presence here, and even for startups. There's a whole other world out there of companies that actually have a business model and a history that isn't measured with a stopwatch so much as it is calendars. Their approach to this is often very different. They'll wind up wanting enterprise agreements in place with a cloud vendor before they ever put anything in place.
That's a bit of a different origin story, but I still find that, in the work that I've done, it arrives at the same place. Regardless of how people get to the cloud, eventually, all problems start pending to normalize around certain particular things. In my experience when I have conversations with clients and prospects, very often, the person who's noticing the problem and is brought into solve the problem and generally reaches out to me, sometimes they don't even have access to the entire bill for their entire company.
They're only limited to a particular division or they're getting an extract through some other tool. Shining the light on what actually is happening in a cloud environment is often sort of Order #1. It's somewhat embarrassing that, in 2018, the only way to get a full list of every resource you're using in AWS across regions and throughout your account is the bill. There's no inventory service that is comprehensive today.
Dann: Yeah, and it's interesting to me, too, when you have developers that don't have access to the bill just because having that view of your underlying costs and if you make a change, like you switch from I3XL to I32XL, how that impacts the bill even though you're just doing three-quarters of the original number of nodes. Really, being able to see those changes as they're happening, I think, is super important for companies to give visibility to people, whether that's providing access to Cost Explorer, which can be kind of daunting because there's no way to just grant access to the UI of Cost Explorer without basically revealing or giving access to your entire billing center, or finding some external tools and then granting access that way. Visibility, as you said, is the very first super important key.
Corey: One thing I find as I talk to customers as well is, despite their initial approach of, "The bill's too high. Make it lower," there's also an unspoken desire of–people want to be able to accurately predict it. If you're spending $5 million last month and, this month, you spent $7 million, the CFO is going to have words and then those words are interpreted as, "You're spending too much money." If you'd instead spent $3 million this month when $5 million had been predicted, you'd be having many of those same words just because these things start to matter as far as trend line goes. What does this mean for cost of good soul? What does this mean for a unit economic model? How do we wind up predicting accurately 18 to 36 months out when we can't even predict 1 month ahead what the bill is likely to look like? That's often misunderstood as, "You're spending too much money."
Dann: Yeah, that's exactly what I've seen, too. Depending on the size of the organization, the cost of your bill can be dependent on so many different factors because, one, it's obviously whatever it takes to run your application for paying customers. You have the costs and then there's also people that are working on new features, some people that are testing out new features, maybe you have trial customers or you're doing all of these different things that can have an impact on the bill that will pop up.
Really, being able to understand what's going on, being able to attribute that back to actual usage and cost data, being able to communicate that to finance, to understand, "Okay, I'm aware of this initiative. Here is what it looks like on the bill," and really having all of those pieces of the puzzle, so not only understanding how adding users to your platform impacts bills but all the different initiatives that might be going on inside a medium to large-sized company that will impact the bill that you might not generally think of immediately.
Corey: I have a client who have, tongue in cheek, once pointed it out that the size of the bill is less a function of how many customers you have and more a function of how many engineers you hired. You're right. When you talk to companies and when I give talks to companies at the enterprise scale, I'll often ask the room, "Okay, raise your hand if you can spin up resources in AWS." Most people raise their hands. Cool. "Keep them up if you're not allowed to see the bill," and a surprising number of hands remain in the air. That is default behavior, but it's broken in some ways. It's hard to hold people to account for the resources they're spinning up if they don't know what makes sense.
Dann: As an example, just go open up one of Amazon's free-tier accounts yourself, just personally, and start playing around with things. You forget to turn something off and, suddenly, you have a bill that's hundreds of dollars. Imagine that at a company with tons of engineers who all have the ability to spin up servers. Nobody is coming directly to them. If there's a spike in the bill, especially for medium or large companies, the chances that myself, or a financer, or somebody is going to come specifically to that single engineer and say, "Hey, we noticed you did this and it impacted our costs in this huge way," it's practically nothing. There needs to be visibility so that people have access to the bills and regular cadence so they can see how their actions are impacting the bill or some other way to really just have that awareness and that understanding.
Corey: The challenge that you'll also see is–you're right. There is no way of directly attributing things back without some rudimentary tooling. Things like cost allocation tags are great, but they're not retroactive so you have to start approaching this after a few "whoopsie" mistakes. My bill is nothing to speak of. I just got November's bill at the time of this recording and it was $16 in change. Last month, I was doing some work and accidentally left a few VPCN points running and, surprise, it was a bit over $50.
That is an over 2X surprise that I got on my bill, and the dollar figure doesn't matter. Add a few zeros to the end of this and you start to see how things start to be very confusing. The fact that it was bounded just to me means that it was pretty easy to figure out what had happened, but even in a 20-person development team, that becomes a big question mark. If you have 2000 engineers and not a lot of instrumentation or visibility into it, it becomes almost impossible. That more or less becomes the cost of doing business.
To that end, let's talk a little bit about scale. I have my own opinions on this but I'm curious at what point you wind up seeing in a company that it's time to start using a platform app to tell you where your bill's going, hire a consultant–hello–or hire a dedicated team of people like you to wind up managing this for a company.
Dann: That's such an interesting question. I think that, in terms of saving money in the cloud, a lot of people–you just mentioned three options, and I think those options themselves might be something that people don't quite realize exist because when you're starting on Amazon and you're starting on there, you get your bill and you're like, "Shoot, I need to figure out how to lower this." You're going there and you're trying to learn best practices, you're trying to understand what this API call in Cost Explorer actually means and attribute it to something specific, and then you grow to a certain scale and there are all these services.
Now, I was going to bring this up, too. It's crazy to me how you need external services like CloudHealth or CloudCheckr in order to really understand your bill and see things like, "Oops, you left something on," and it's not built into something like Cost Explorer. Cost Explorer is getting better, but Amazon is really lacking there. The options you have are figure out the money stuff yourself, which is a whole job in and of itself, use one of these third-party tools that ingest your bills, provide security recommendations and let you parse your usage a lot better than Cost Explorer and in more detail, and then you have the option of hiring a consultant such as yourself when you're of a certain scale who can just come in and fix things, hopefully.
Then, when you have a certain scale, you might want to just have a dedicated person depending on how fast you're moving, what sort of pieces or conversations you're having with Amazon on an ongoing basis to really manage and do that. It's hard to give exact numbers for when that is the case because it's definitely a company-by-company basis, but when it comes down it as somebody who operates in the cloud, those are your options. If you're not doing one of those, you're going to have an outrageous bill at some point very soon.
Corey: Just for those who may not spend their lives diving into the intricacies of AWS billing, Cost Explorer is a native tool that is either free or costs one penny per API call depending on how you're interacting with it that gives you a decent degree of this ability into your bill. There are companies such as CloudHealth, CloudCheckr, Cloudability, Cloud Bandsaw which I just made up, and a bunch of other companies that have similar-sounding names to the tune of a dozen of them now where they all wind up doing this as a service. They're model traditionally falls into the, "Pay a percentage of your bill for those platform offerings."
At the risk of alienating people who work for those companies, the honest assessment I can say from what I've seen after a few years of doing nothing but this, there is no single-platform tool for sale out there that is so far better than the other folks in the space that if you're on one, I would suggest moving to another. They're all very decent at solving this problem. They take different approaches and come at it from different angles, but they're all more or less equivalent.
Now, I'm sure my email is going to blow up with angry notes, but that's my position at this time. If you believe that your product has a key differentiator, please let me know. I am thrilled to modify that statement in the future episode if you can convince me of it. Again, I partner with no one in the space.
Dann: I haven't had extensive experience in any of those tools. There's a little bit, but I would have to say that I agree with that for the most part. The crazy thing for me is, as of right now, all of these tools that I've seen really operate a pricing model on percentage of your bill. If your bill is still at a manageable level, using a tool like that that's a percentage of your bill compared to the cost savings that you're getting is definitely reasonable.
As you're growing, like if you're going out of the startup category and into the small business category, depending on your usage, your cloud bill might get to the point where it just doesn't make sense. That's really the point where you start exploring other options, whether that's a consultant or that is getting a dedicated person on board.
Corey: Absolutely. I've never yet spoken to a customer who heard any form of pricing model that involved a percentage of their bill or percentage of their savings and was happy to hear it. It seems that it works mathematically but there's something broken psychologically about charging percentages. I understand why people do it. To that end, I just didn't run around the entire thing. My pricing model is I charge a fixed fee. If I don't find at least 10 times that fixed fee in first-year savings, I give people their money back, which I've never had to do because, surprise, I know what I'm doing.
That said, there is a floor below which I can't do a whole lot for a company. In most cases–and there are rare exceptions to this–it starts at about a million dollars a year of bills spent. If you're spending 40 grand a month or so, I'm thrilled to have a conversation, but there really isn't likely to be an engagement that makes sense from a pure cost reduction story. That's unfortunate because, frankly, you shouldn't need to be wasting X dollars before bringing someone in to help you that makes physical sense, but that's the world we live in.
Dann: Yeah, exactly. If your bill is 40K a year, you have a great opportunity to really, one, use these tools to charge a percentage of your bill because it's probably worth it in that particular instance and, two, if you're listening to this podcast, you're already thinking about this stuff, really start diving into the pricing of AWS, learn the best practices when it comes to cost controls and get those things in place early while you're still at that scale because, as you grow and you get to that $1 million-year level, you're going to be glad that you have some of that experience under your belt. You're going to be in a much better situation than 90% of companies. Maybe that's exaggerating but, yeah, the cost stuff is something that everybody can need dedicated attention aid to.
Corey: What you're describing is sort of the edge case of where I can add value, historically. You're small now. You know you're growing. What do you need to instrument today so that a year from now, you have data that's actionable, that points to business metrics that makes sense? I guess the third stage of this beyond consultants and beyond using platform as a service offerings, when is it time to hire a you or a team of people like you to build out a cloud-costing organization?
Dann: That totally depends, one on what your relationship is like with different cloud providers, whether you're working with them on a regular basis on different parts of your bill, whether it makes sense to have somebody on board in a full-time position to be able to work with your different engineers to try to identify big cost spikes and get them down, whether you need a dedicated person to be able to jump into those finance meetings to help people understand the bill to take control of that.
In terms of my full-time job, there's a lot of different people and teams that I interact with that takes a full-time job for sure. Right now at Datadog, it is just me working on this but that might not always be the case. When you have a consultant such as yourself, sometimes they have other clients, there might be a limited period of time where they're focusing on your company and maybe have a retainer. There's a bunch of different deals you can have, and I'm sure you can speak to this a little bit better than myself.
When you get to the scale where you need to be having these regular meetings, you need somebody that intimately understands your application, and how it runs, and how it interacts with different cloud providers, then it might be time to start considering bringing on somebody full-time.
Corey: What's fascinating to me is that I did this internally at companies in years past, and that gave rise to my current consultancy. I was convinced when I started my company over two years ago that I was pretty up-to-speed on everything that I needed to know for this, and what I've learned is that I left so much money on the table back then just because it was never the only thing that I got to focus on. That's something that caused a bit of a revelation and awakening for me. There's always another level.
I see that, to some extent, with customers I've had in the past. A majority of customers implement some or most of what I recommend, all of which in my first pass is lower, no-engineering effort. A couple of them implement everything and then go significantly beyond what I've identified to the tune of re-architecting applications, the tune of devoting a team of engineers for six months to build things.
I'll talk to them and they'll be incredibly excited about that when I do my follow-ups. "Great. Okay, you saved $200,000 in your annual bill. How'd you do it?" "Oh, we just had our team of six engineers working on this for the last six months." Unless there's a growth story or something else tied to that, you spent more in engineering time and lost the focus than you're ever going to recoup in the near future. At some point cost no longer becomes the driving concern. In other words, you're never going to optimize your way to your next business milestone. "Well, we were about to go out of business but then we cut our cloud bill and then we raised a Series C," is usually not a story that you hear in the real world.
Dann: Yeah. It's interesting because, as you said, the first thing that you present is kind of this low-hanging fruit. If you're a large organization and you haven't spent a lot of focus on this, there are quite a few that I'm sure that I can identify, I'm sure you can identify, then it's like, "Okay, well, do this, this and this, and we'll be at a much better place." Having those engineers dedicating that time and full-salaried engineers working on saving the 200K a year isn't really the best use of resources.
Corey: Right, with the caveat that there are, of course, exceptions, strategic objectives and constraints that I'm not necessarily privy to. This is a third party speaking in the general case perspective. This is not, "Oh, if you're doing this, you're easily doing things wrong." Context matters, and it's never immediately clear from the outside what that necessarily looks like internally to a company.
Dann: Yeah, and I don't want to discourage people from re-architecting their app in order to run better and more efficiently because, obviously, that's important, but I think the most important thing is to focus on the smallest things that have the biggest impact, like the 70-30 Rule or whatever it is where the 30% will give 70% of the savings or whatever, and really being able to identify those things by looking at your bill and seeing where your biggest opportunities are and really nailing those is going to give you much, much higher returns on your bill.
Corey: Absolutely. A common story I'll hear is when I'm presenting my findings where an engineer will chime in and say, "Hey, you didn't mention those unattached elastic IPs," at which point I can often hand them a quarter and say, "Here you go, you've now turned a profit on this hour of meeting. Now, the next bullet point says $800,000 a month. Let's go back to that." It's almost an urge to go alphabetically rather than starting with the big numbers and working your way down.
Globally, we've seen–and there have been reports published on this by vendors. ECII, for example, is ballpark of 60% of global AWS spent. You add in S3, RDS, data transfer, elastic block store, you're up to 85% and then there's a very long tail. No one has ever hired me to optimize their Amazon Chime bill. That doesn't tend to happen. Something else that I think people are still surprised by is I've never seen a significant Lambda bill.
Anytime a company is spending thousands on Lambda, they're spending hundreds of thousands or millions on ECII. Sure, you wind up with some spent in other places, but focusing on things that are easy, things that make more sense in the short term and getting the quick wins in before focusing on the bigger stuff is something that people tend to gloss over. They think everything has to be a hard engineering problem and it's really not.
Turn off stuff you're not using. Delete data you don't need anymore. Make sure that your applications are built in such a way that they're not speaking through a managed NAT gateway all the time. They're basic block-and-tackle stuff that you can look into before you wind up going down the road of building custom bots to spin up and down your developer environments and people leave the office because you've hooked them into a geo-tracking system.
Dann: One of the things that I've noticed from my experience that often surprises people who work in the cloud is data transfer. That's one that might not be the biggest opportunity for savings but it is often the biggest surprise. Just because you think of data transfer in terms of, "It's free coming in; you pay going out," there are so many different ways that they get you with data transfer, whether it's across AZs, across regions, doing different load balancers or different everything. They're all different pricing models, and I found almost always a surprise.
Corey: Absolutely. Moving one gigabyte of data in AWS from one place to another is anywhere from free to 0.24¢ per gigabyte or more depending upon, exactly, what you're doing. There really isn't a rhyme or reason to a lot of this, but understanding exactly how your application is built or things like that is important. One thing I'm careful to do is to highlight this, but telling people, "Okay, time to stop and redo your entire software architecture so you can save data transfer money," is usually not realistic either.
There's things to consider, the next generation of your app, but virtually no one rebuilds their application from the ground up solely to save money. There are cost optimizations to consider when you're doing that, but it's never going to be the re-factor decision point.
Dann: Hopefully, you're not in the place where that is your best option because, then, you might not have built your application for the cloud, which is possible if you're a large organization that comes from a background of not being cloud native, but it's rare, I would say.
Corey: My rule of thumb to answer our earlier question, then–and your numbers may vary, but use one of the applications when you blow a million bucks in annual spend when you're between $1 million to, let's say, $30 million a year. It's, "Hi, let's talk," and, too far beyond that, you start to rapidly hit a point where your needs become specialized enough that having at least one or two people internally either on a fractional or full-time basis focusing on cost optimization is valuable. Those are my tiers. Anything horribly objectionable about what I just said to you?
Dann: No, and I think that's about right. The only thing that I would say is it's hard to add exact numbers to that. The range is good, but it really has to do with your business' needs, and how you're operating in the cloud, and what your relationship with those cloud providers is.
Corey: Put a big "-ish" on that. Yeah, I think that probably makes everyone more comfortable.
Dann: Exactly. Whether you're using Amazon exclusively or whether you're also using JCP, Azure, the other providers, there's so many different factors to take into account with whether you need somebody full-time.
Corey: The question then becomes–and I want to make sure that we address it without assuming aspects of it as well to beg the question–how do you convince developers that costs are important without slowing them down but to get back there first, is that something that makes sense to do for everyone?
Dann: Yeah. In this role, it's really tricky because the last thing you want is to slow down people who are developing in the cloud. Definitely, I see my role specifically not as a gatekeeper, and that's always how I communicate it. I'm like, "You do not need to come to me for permission to do anything. Operate as usual. Grow as the business needs. Dictate." Really, my goal is to open up those lines of communication where, if there's temporary capacity coming up versus just general auto-scaling growth or whether you're going to be adding a certain amount of capacity that, "Oh, we should actually buy reserve instances for this," being a part of that conversation and adding that additional voice to it is really valuable.
I hope that, when working with developers, my goal is to be able to express the value of just having additional eyes on what's going on and being able to share what you're working on with additional people to just build the amount of knowledge that's going towards a particular project or happening.
Corey: The role can either become that of a gatekeeper where you keep people from provisioning things until certain criteria are met or it can be a trailing function that cleans things up. I find the latter tends to be the much better approach. Back in datacenter days, we dealt with six-week provisioning cycles if we're fast to get a new set of servers spun up. If you cut that in half to a three-week provisioning process to get ECII resources spun up, you know what most line managers have, is a company credit card.
Suddenly, you have shadow IT on the rise. The advent of cloud was in no small part due to the fact that, sure, it was more expensive, there were security and compliance concerns, data residency issues, but, on balance, you didn't have to deal with those smug jerks in central IT. Now, if you wind up recreating those patterns, you'll see the same type of thing start to emerge again. It's working collaboratively people and not yelling at them when they get it wrong.
If you have a developer who spins up a testing environment that happens to comprise a $20,000-a-week cluster, great, it's time for a conversation, but that conversation doesn't need to start with screaming as you crash through the door to that building that they're in. It tends to wind up being something that has much more nuance to it. Sometimes, it's intentional. Sometimes, it's people don't know, and let's not kid ourselves. This stuff is not simple.
Looking at the names of various instance sizes, there's nothing intuitive about the fact that, between a T3 and a P3, one of them will cost you half a cent an hour. The other one will cost you, in some cases, upwards of $40 an hour. There's no way to tell that by glancing at it. Building controls, building things that report on strange usage patterns, that makes sense. Yelling at humans is usually the most counter-productive thing you can do in this space.
Dann: You mentioned the two approaches. One is becoming a gatekeeper and two is kind of trailing afterwards. I had mentioned that a lot of my work is detective work, so looking at the bill, trying to attribute the cost increases or anything else to different initiatives, and I think that's really where you can add the most value when you're trying to cut costs. Opening up those lines of communications so that you're minimizing your detective work so that, as people are doing new things, the company–or when I said "company", like finance people that are not actually engineering that–are aware of that and understand how that's going to affect cost. Being able to model that if possible is really where the value comes in.
Corey: Absolutely, and it all comes down, in my experience, anyway to getting the people in finance and the people in engineering sitting down and talking to one another. That's something that doesn't necessarily come naturally to many of these groups because, historically, they don't talk. They don't need to talk. The world of cloud is changing that, and being able to tie engineering more closely into business decision-making and strategic prioritization is important.
I still talk to clients, and things I discover are fascinating where I come back with an assessment and I point out that the company does not value cost-cutting measures. Invariably, I'm told that I'm wrong and that's not true at all. Then, I point out that an engineer did a project that saved $8 million a year. When they brought that up during their performance reviews, "Well, that wasn't tied to any of our KPIs so we're going to ding you for not getting another feature shipped during that time period instead." It comes down to a story of valuing what you measure.
I think that there tends to be a sometimes fundamental misunderstanding. When you start seeing engineers get bonuses for finding creative ways to save money, bounded, of course, then you start to drive a culture of cost optimization. I'm not saying every company should or even most companies should, but if you are passionate as a company about saving money, you need to incentivize the behaviors that you want to see.
Dann: Yeah, I think it's interesting that you said it's a culture of cost optimization because that kind of stuff is totally a culture thing. That's really what makes it so tricky. When you have a lot of these larger, older companies that have a culture that might not fully understand and appreciate cloud usage at a higher executive-type level, it can be really tricky just because the understanding isn't there. There also has to be the ability to want to learn from those higher-up levels in order to appreciate it, and a lot of these changes are difficult. A lot of them–yeah, there's no easy answer.
Corey: Absolutely. I know how to fix the billing issues. I don't know how to make people care about it. I've spoken to companies spending nine figures a year, and they're just fine with that because it isn't in the strategic roadmap to worry about optimizing those things. Good for them. I'm not saying they're wrong. I am saying that, when I see something like that and no one is empowered to care or do anything about the bill, from a pure business perspective, I have no market opportunity in those environments, and that's fine.
I'm not going to be able to compel people to re-prioritize things and, in many cases, I strongly suspect they're right. There's an upside potential to a lot of these businesses that goes far beyond what you can do by optimizing costs. I can save you a theoretical maximum of 100% on your cloud bill, but you can triple that by launching the right feature to the right market at the right time. I can't tell you that it's time to optimize your bill; that has to come from you.
Dann: Yeah, and that's so interesting, too, because it's all business decisions and it's so much larger than just people focusing just on AWS and cloud spending because you launch a new feature, you gain X number of new clients. I think a lot of engineers might not necessarily appreciate sales as much as–this is an overgeneralization, but sales and engineering are often two very separate companies. A lot of times, the culture of a company doesn't necessarily mix them.
If you have a salesperson that brings in some large company that's bringing in millions of dollars a year, let's say, then saving the 200K on X thing doesn't really compare. If you're just in the engineering world and not paying attention to any of the sales side of things, you might say, "Okay, well, this optimization is super top priority." You definitely need, one, people being able to view the business from the high level and make those calls and, two, just the culture of communication to be able to share the business as a whole with everybody on the team so that everybody is facing the same direction.
Corey: Absolutely. Most of my clients are not engineering side; they're finance side. Talking to engineers about this when I occasionally get outreach from someone–the good citizen effort, as I tend to think about it–I will often find myself in conversations with engineers where they're incensed because their monthly bill is $80,000 and they think it should be $40,000. They may be right, but I start asking questions.
"Okay, what does your boss say?" "Oh, I can't get her to pay attention to me." "Okay, how many engineers are working on this?" "50." "Okay, what's the purpose of your group?" "Oh, we're chasing a market opportunity that might be worth $4 billion a year and, in six months, we'll know or not." At that point, the answer largely becomes a, "I've got news for you. Your team is embezzling more in office supplies than you're wasting in cloud costs, and it's time-bounded and there's a bigger picture here. I appreciate what you're saying, but that's not valuable to the company strategically at this time, and that's why your boss doesn't care. That's why she's focusing on other things. I applaud the good citizen effort to save money, but that doesn't add value to what your company is working on right now."
Understanding that distinction is, in many ways, part of the educational process I wind up having to put some of my clients through, and it's fun. I enjoy having these conversations. I enjoy seeing how different organizations view the world. For better or worse, you'd think that working on cloud costing would be an incredibly boring job, but I'm learning about this stuff constantly. Every day, I see something that I didn't know existed. It's really a privileged position.
Dann: Yeah. If you look to anybody who actually focuses on costs and think that they know everything, that, I think, is a misstatement just because there are so many new things that come about every day and so many little intricacies. Especially in your role, working at different companies and getting all those different experiences, you can see how different people approach it, see one company might be using one service that another company isn't, and really get that wider perspective. There's just so much there, so much there.
Corey: There really is. If people want to talk to you more, or see what you have to say, or simply marvel at your majestic mustache, where can they find you?
Dann: You can find me on Twitter. It is @dannberg.
Corey: Perfect. I'll throw a link to that in the show notes. Dann, thank you so much for taking the time to speak with me today.
Dann: Yeah, thanks so much for having me. It's a pleasure.
Corey: Dann Berg, cloud ops analyst at Datadog. I'm Corey Quinn, and this is Screaming in the Cloud.