Did you know that you can now run Lambda functions for 15 minutes, instead of dealing with 5-minute timeouts? Although customers will probably never need that much time, it helps dispel the belief that serverless isn’t useful for some use cases because of such short time limits.
Today, we’re talking to Adam Johnson, co-founder and CEO of IOpipe. He understands that some people may misuse the increased timeframe to implement things terribly. But he believes the responsibility of a framework, platform, or technology should not be to hinder certain use cases to make sure developers are working within narrow constraints. Substantial guardrails can make developers shy away. With Lambda, they can do what they want, which is good and bad.
Some of the highlights of the show include:
Companies are using serverless as a foundation and for critical functions
Serverless can be painful in some areas, but gaps are going away
Investing in the Future: Companies doing lift-and-shift to AWS are looking at technology they should choose today that’s going to be prominent in 3 years
Serverless empowers new billing models and traces the flow of capital; companies can choose to make pricing more complicated or simplified
What value are you providing? Serverless can offer flexible pricing foundation
When something breaks, you need to be made aware of such problems; Amazon bill doesn’t change based on what IOpipe does, which is not true with others
Developers are the ones woken up and on call, so IOpipe focuses on providing them value and help; they are not left alone to figure out and fix problems
Serverless and event-driven applications offer a new type of instrumentation and observability to collect telemetry on every event
For serverless to go mainstream, AWS needs to up its observability level to gather data to answer questions
AWS, in the serverless space, needs to make significant progress on cold starts in other languages, and offer more visibility and easier deployment out of the box
Full Episode Transcript:
Corey: This week’s episode of Screaming In The Cloud is generously sponsored by DigitalOcean. I’m going to argue that every cloud platform out there biases for different things. Some bias for having every feature you could possibly want offered as an added service at varying degrees of maturity. Others bias for, “Hey, we heard there’s some money to be made in the cloud space. Can you give us some of it?”
DigitalOcean biases for neither. To me, they optimize for simplicity. I polled some friends of mine who are avid DigitalOcean supporters about why they’re using it for various things, and they all said more or less the same thing. Other offerings have a bunch of shenanigans, root access, and IP addresses. DigitalOcean makes it all simple, “In 60 seconds, you have root access to a Linux box with an IP,” that’s a direct quote albeit with profanity about other providers taken out.
DigitalOcean also offers fixed-price offerings. You always know what you’re going to wind up paying this month, so you don’t wind up having a minor heart issue when the bill comes in. Their services are also understandable, without spending three months going to cloud school. You don’t have to worry about going very deep to understand what you’re doing. Its click a button or making API call, and you receive a cloud resource. They also include very understandable monitoring and alerting.
Lastly, they’re not exactly what I would call small-time. Over 150,000 businesses are using them today. Go ahead and give them a try. Visit do.co/screaming and they’ll give you a free $100 credit to try that. That’s do.co/screaming. Thanks again to DigitalOcean for their support to Screaming In The Cloud.
Corey: Hello and welcome to Screaming In The Cloud. I'm Corey Quinn. I'm joined this week by Adam Johnson who's the cofounder and CEO of IOpipe. Welcome to the show Adam.
Adam: Hey, thanks and good morning.
Corey: It is morning. One thing I want to start with here is a disclaimer that I am an IOpipe customer. You're not paying me to say that, you're not sponsoring this episode. You are effectively here because you're doing interesting things in the world of Serverless observability or observerlesss as I insist on calling it. This is not a paid placement, I just love the service and getting you folks involved in what I'm doing is always something I try to do. I think you're the second person I had with Erica being the first from IOpipe.
Adam: Yeah, definitely. Thanks for the for the kind words for sure. I'm happy to talk today about some of the things we've been seeing.
Corey: Perfect. Before we dive too far into customer stories and specific things that you’ve seen. I want to start with a question just for my own curiosity. About a week or two ago at the time of this recording, they sort of had a stealth announcement that wasn't usually publicized but okay, you can now wind of burning Lambda functions for 15 minutes instead of the 5-minute Lambda functions timeouts that we saw before. Where do you land on that?
Adam: That's pretty interesting. I think that feature and previous features that were rolled out quietly were probably done for really specific customers of theirs that had very specific use cases. I generally am happy with them increasing that time. I don't think most people are going to ever use it but it's nice to do it just so that people aren't saying that Serverless is not useful because it has this five-minute limit, 15 minutes is a very long time and you can do a lot of stuff with that and I think it does open up some new use cases, especially in like machine learning, doing a lot of distributed stuff.
There was recently a paper that came out where they were essentially doing a lot of by machine learning type of stuff in a very distributed way. I forgot who rolled that out but it was an interesting use case and those definitely weren't as possible with the five-minute limitation. While it may not be common, I think it's nice to open it up to more use cases.
Corey: I've heard whispers that I was never able to substantiate that they were doing things like this on a case-by-case basis for very specific customers of extending the Lambda runtime. I can't obviously confirm that but that's something that I wound up hearing about. On one hand, it's neat that they're making this available to everyone now. My concern is that this feels like it's the sort of thing that's going to empower three or four really helpful use cases and several tens of thousands of absolutely terrible architectures where, "Yay, we're one step closer to shoving our entire monolith into a Lambda function." Now if only they give us more disk, more ram, more connectivity, etcetera. I got to say, I'm a bit of a skeptic on this.
I come from a world where I went through the whole process of naivety as a developer where I would build an awesome system that I was sure would fix all the problems of the systems that came beforehand and there was no way that people would misuse this and then I saw what customers did once they got it into their hands and that scared me. I don't know, there are some things that I will never be able to heal based upon people implementing things terribly. This is a finely crafted torque wrench, we're going to use it as a hammer.
Adam: Yeah it's true but I think at the same time if you look at the history of Serverless, I would consider a thing like platform as a service in early iteration of Serverless and things like Google app engine which is a great platform. It didn't really take off the way that they had anticipated. I think like one of the main reasons for that is because it was very opinionated and had too many guard rails for developers and with guard rails that substantial, developers will all kind of shy away from using that framework or technology or platform. I think that that's why Lambda has been very popular is because it hasn't really been as prescriptive as earlier incantations.
It's much more welcomed by developers because they can do what they want in general and that's good and bad for sure. People are always going to write terrible things that shouldn't exist but I think it should be up to them. It should be up to education, it shouldn't be the platform's responsibility to hinder certain use cases just to make sure that the developers are working within the narrow constraints that the provider decides for them.
Corey: Very diplomatically put. I'm certainly not going to argue at that point with you. One thing I'm curious about since you're in a better position to see what the industry is doing the Serverless than I am, how are you seeing people use this? I keep viewing the idea of Lambda, Serverless, all of this as something that in its current state is something of a toy. You replace chrome jobs with it, you can wind up implementing trivial things, but it's not the sort of thing that you would build an entire business application or SaaS platform on top of and yes, there are notable exceptions there. to be clear and I don't believe that that's going to be the case forever, I think we're probably about 18 months away from seeing some transformative shifts in that space. But I'm curious as to what you're seeing today. I can make naive assumptions all morning long. I'd rather see what you're seeing here in the real world.
Adam: I hear that all the time. I talk to VP of engineering and things like this and it's a common comment that I hear about Serverless is that Lambda is mainly just for like chrome jobs or like toy applications. They can't really fathom that it's used for anything critical. But we do see a wide variety of stuff with our users. We see companies that are startups who are kind of born Serverless, who are building their startup with Serverless as the foundation which gives them some advantages over their incumbents.
If you build something from the ground up with Serverless in mind, that suddenly opens up a lot of different opportunities that your competitors don't have or your competitors may have very limited ways in which they can charge their users based on how they consume computing resources that the startup may not. I think it's opening up those use cases. We typically see those startups are doing the most interesting things primarily because they're starting with all green field, they are trying to go all in on Serverless.
It may be somewhat painful in some areas but I think as you said, over the course of 18 months, many of those gaps are going away just as we're seeing that duration has increased, the cold starts have been less and less of an issue over time. I think it's still an issue for some languages but for languages like node and python, it's very minimal impacts these days.
This is something that you don't find AWS talking about but they are like quietly improving these things to the point where it's extremely minimal. I think on the other side, non-startups, we do see larger companies like large enterprises who are traditionally laggards starting to embrace a realist before we see the traditional early adopters that we would consider an early adopter. I think that's super interesting to me and it was very unexpected to see. I think what's going on is the early adopter has jumped on the Kubernetes bandwagon very early on and they're kind of deep down that path and they don't want to make a change right now because they've already invested so much into that direction.
Meanwhile, there's all these laggards who are just now going to public cloud. Still a majority of the market is not in the public cloud so there's a lot of change to be happening in the future but those companies who are deciding to do the lift and shift to AWS or other public clouds are looking at the technologies that they should choose today that's going to be prominent in three years. A lot of them are looking at this and they're making a decision like, "Should I invest in containers or should I invest in Serverless?" I think most of them will end up doing a mix of both.
But I think they have to place their bets where they think things are going to head in the future and a lot of them are seeing Serverless as an interesting way for them to leapfrog the early adopter competitors in their space instead of their developers worrying about setting up clusters and coding infrastructure, they can then just spend their time building and shipping business logic. If they are doing that, they certainly in my mind are going to have an advantage over their competitors who were the early adopter in the coming years.
Corey: There's a lot that you just said that we can unpack but one thing I want to focus on is the idea that this empowers new billing models. I don't mean for you to throw anyone under the bus in particular but, the idea of being able to trace the flow of capital through your organization as Simon widely says is something that's compelling. As this accounts for more and more of what workloads a company runs, it enables you to do that but it also sort of unlocks a Rube Goldberg pricing chart that is going to scare the crap out of an awful lot of people.
"Well, every time you wind up listing the users, we're going to charge you a quarter of a penny every time you query that, we're going to charge you a tenth of a penny." and it turns very quickly into this thing where the pricing model does not make sense to a human being. Are you seeing startups going in that particular direction or are you seeing it in a more how do I put this, human sense?
Adam: Definitely the latter. I mean I think it's possible to do that but I think it's like you said, it's pretty obvious that if you have such a complicated pricing structure, it's going to be very hard to convince people to buy into that. but I think what we do see is somewhere in the middle where they have a lot more flexibility on their margins to either lower their price in general with simple pricing structures or change it to a different model that's quite flexible but not as complicated.
For example, if you're using the service, you're consuming compute, so you should pay for it at that point but if you're not using the service, you may not have to pay for it. I think I haven't seen that prominently happened yet but I think it's possible and I'm kind of interested in seeing what comes out of that. I don't know what the winning pricing models are going to look like but it is opening up the use case to them and I suspect that some startups are going to realize this and start taking advantage of that to differentiate.
Corey: It makes sense that you wind up having a pricing model that’s at least loosely coupled to what it actually costs you and I think that being able to get that level of granularity into what it costs to provide a service internally is incredibly valuable just for a business metrics perspective. With that said, on the other side of the coin, I've always been a big believer in charging based upon value as opposed to charging based upon cost.
It feels like the former winds up in a sort of an escalating chain the longer you do something and the latter tends to generally lead you into a race to the bottom. I'm worried that there are going to be some stories around Lambda that end that way.
Adam: Yeah, I agree with that as well. I'm in the camp of charging based on value as well. Even for our service, I've had folks at AWS that want us to do more metered pricing based on very specific things around Lambda.
Corey: Everyone hates that.
Adam: Exactly. It's missing the point of what value are you actually providing on two teams and I think that should be really where the focus is. But I am in favor of having a foundation, if Serverless opened up a foundation where that gives you more flexibility in the choices that you're making on your pricing and opens up higher margins, that's always a good thing because I do think that in the startup world, startups are pricing their products too low.
Everybody starts by pricing them too low, they're not really understanding the value that they're providing to their end users in the early days and that value increases over time so I think it's just a trend that happens. But yeah, I'm with you in being a little bit scared by that trend continuing and going down to zero because that's just not helping the end users in the long term. They may be saving money in the short term but if that startup it or company, in general, is not getting the margins, they can't reinvest that in creating new technologies and new innovations. So to me it's like a world where customers just jump from vendor to vendor looking for the next cool thing. That's a lot of time spent in switching as well.
Corey: Exactly. Again as one of your customers, there's a keen appreciation I developed for the way that you wind up pricing things. I think every month I've had you folks in place, you have cost significantly more than the Lambda functions you monitor because my Lambda to bill hovers somewhere around 60 cents and that's fine because the value of understanding what that application does is worth far more to me than 60 cents a month.
I care about understanding and seeing what happens. To be clear, I have several different applications running Lambda, including the entire production pipeline for my ridiculous newsletter which is last week in aws.com for those who aren't familiar with it. The sign up link for that as well leverages to Lambda functions in an API gateway. That's the thing that I've set IOpipe to wake me up in the night if it winds up breaking.
If someone can't subscribe to the newsletter, that's a problem that I need to be aware of that. I think I've seen all of three times that that has gone off in the past four or five months and then every time, it was someone doing something bizarre and not formatting an input correctly, not trying to operate in good faith, there was an attrition test that I wound up commissioning that wound up triggering some of it and that's exactly what I want. It's not excessively noisy, it's not something that I want to roll into a larger platform that winds up managing 15 different things that are vaguely correlated. It does one thing and it does it extremely well and I can integrate it to the rest of my view of my business and that's something that I find incredibly valuable.
Adam: Yeah, absolutely.
Corey: The idea of trying to tie this into something that varies is nuts. More to the point, my Amazon bill doesn't change based upon what you folks do. And with a lot of monitoring platforms, that's not true. I've done trials where the monitoring system cost me nothing but it doubled my cloud watch bill just out of the blue. That tends to be an intensely frustrating conversation.
Adam: Yeah, I agree. Especially with the trend of using more and more third party services which I'm a fan of. It is complicated but I think in general, it helps anybody build things that just weren't possible before but it does add that complexity of like when you make a change to this dependency, how it's going to affect the pricing of everything else. That's super complicated, I don't think there's been a great solution around that yet for sure.
I do agree like the value that the companies provide really should be where things are priced and for us, it's about providing more confidence to developers who ship their code. They don't want to get woken up in the middle of the night and you want to make sure that what you're shipping is working and that when there is a problem, you want to quickly know if is it my problem or is it one of those third party services that I'm using.
Is it a database I'm relying on or some authentication service or what have you, you want to get to those answers as quickly as possible and one of the trends that we're seeing in Serverless is that developers are almost always the ones who are woken up an on call for the functions that they ship.
Even if there is a dedicated DevOps team, that is how it works. we found that to be the case pretty early on when building IOpipe so we've been focusing on really providing value to the developers themselves to provide them with some service that access another extension of their team that has their back that helps it so that they don't have to dig through mountains of logs to figure out, "Was this my code acting up, some code path that I just didn't expect to happen or is it just because there was a network blip between the Lambda container and Dynamo."
That actually happened to us just the other day. We got an alert with one of our data pipelines. We basically have an alert set up for when the number of indications drops below a threshold. It's like reading off of Conesus so it's pretty flat. It may go up but it doesn't go below a certain threshold. So we got an alert saying, "Hey, this dropped below the threshold." we immediately started going in and digging into IOpipe for example and looking at what's going on. There were unexplained things going on, on the Lambda side that pointed us very quickly to Lambda possibly having like a networking issue on the container itself.
It ended up fixing itself fortunately, that's kind of the nice part about Serverless, when things do happen like that, they are typically very quickly resolved but it is important to have the tools and visibility so that you can understand, was that their problem or was it my problem? Is there something I can do to avoid that happening in the future. A lot of times in my past, like I've seen ops teams who just don't have an explanation. The thing fixed itself and they're like, well I don't know what caused it but it fixed itself so hopefully, it doesn't happen again.
That’s not great so it's really important to have like that level of visibility to understand, "Yes, I can see these exact events that came through and I can go back and use it like an audit trail to understand how many of these requests were slow into which service." I think that starts answering the questions of pointing the fingers at the right provider.
Corey: I will absolutely say that there's an incredible level of frustration with the way that the native tools are positioned around visibility into a Lambda function. "Oh, I just set up this complicated thing and tied 15 things together and look in three different places to make sure that the ridiculous log message is esoteric and arcane has the data you need." they haven't been able to look in one place rather than chasing down this giant laundry list of items was incredibly helpful when I was doing early debugging.
Once the application came up and stable and "Done." yes I know that what things are never done, don't email me. You wind up in a scenario where at that point you just want to see anything that happens that's out of the norm. for me, that's either inputs that are valid email addresses, that's my third party API acting up that I'm bouncing off of and I'm still annoyed by that and I'm in the process of replacing the component in question that does that. It winds up getting to a point where I don't hear from the monitoring system.
I don't think I can point at any other application I've ever worked on and say, "Yeah, it was quiet. Except when something was broken." dialing in was something that was always a work in progress and never done. I don't know if that's something that I can thank you folks for, if that's an artifact of the entire Serverless model or I just write such perfect code that unlike all of the idiots I used to work with previously, I know what's up. Yeah I had to teach myself python for this project. I promise, it is not that one.
Adam: Right. I think Serverless itself and in general like event-driven applications is opening up a new type of instrumentation and observability where you can actually and you may be forced to collect telemetry on every event going through the system. If you look at like the previous incantations of monitoring and observability, it's really around aggregations. So if you look at some of the very popular tools that are out there, if you look at the resolution they provide, it's like they give you one second resolution.
In one second at a very high volume service, you may have hundreds of thousands of events or more flowing through Lambda function and if you only have six metrics that tell you what happened during that one second, you have no idea what really happened. You may know that something was slow for five minutes but you don't know who is affected or if you're processing email sign ups or orders. You don't know which users were affected by that in general.
So I think that by the way that event-driven systems and especially Lambda and function as a service operates, tools like IOpipe and others I think are starting to collect more and more of that data, more of that telemetry and you can go back and use it almost as an audit trail to go back and see which emails were skipped during this outage that happened, which orders failed to execute due to a network blip in the container.
These are just things that weren't possible before and I think there's many reasons for this. I think in general, the advances in technology and the reduce cost of storage over time has allowed us to start capturing all of that data. That to me is like the foundation for the next generation of observability tools. I think all of the existing tools should just show that aggregate data are insufficient in this world.
Once you started using a tool that provides you with that level of resolution where you can see every single event and the telemetry around it, especially at the business logic level, you can't go back once you've seen that.
Corey: No, and I don't think there's ever going to be a putting that genie back in the bottle. I think of that ship has entirely sailed. I think that there's no real path forward for going back to the opaque things that we used to accept as normal once you become accustomed to what this unlocks and empowers. I suspect it's going to be a bit of a long road to get this into the mainstream, but we're seeing it around the periphery an awful lot. Are you seeing something different in the sense where this may come sooner than people expect?
Adam: I think it will take time. I think in general there are only a couple of us startups who are providing this right now. I think for Serverless to get mainstream, I think the cloud providers need to provide this themselves out of the box. The level of tooling that AWS provides around Lambda is not even close to sufficient right now and I think that's a big hurdle. This, of course, is not really helpful to my start IOpipe but I think for the sake of Serverless in general, AWS needs to really up there their level when it comes to observability.
They need to start collecting all of these things and they need to make it so you're not having to jump through all of these hoops to answer questions and they have to give you the appropriate telemetry to actually answer the important questions which they're not doing right now and that's going to be a big blocker for Serverless until the service providers can step up their game.
Corey: Let's move on to the dangerous portion of this episode specifically you and I have sort of rough ideas from various directions of what might be coming down the pipe and future releases for re:Invent at the end of November. Without violating trust confidences etcetera, what are we hoping to see what comes out of AWS in the Serverless space? Are there things you're excited or hopes to see other things that annoy you that it isn't doing yet or is this such a landmine that we shouldn't even mention that there's a conference coming up next month?
Adam: I think in general, there are certain things I can't say but I can tell you what I hope to exist and I have no idea of some of these will be announced or not. This is just straight gaps that I think are in the current ecosystem. I think one of those is I'm seeing them make significant progress on cold starts in other languages. They've already done it for some of the languages but if you're still using things like Java which a lot of people are doing, the cold start situation in that world is very painful.
Corey: And a sarcastic answer of, "Oh, just don't use those languages." is sort of a language bigotry that I think serves no useful purpose anymore. It's all fun, we all have our favorite teams we like to bet on but let's not urge people away from their platform of choice just to prop up something else. That tends to be a terrible model so I'm with you on that.
Adam: They probably won't announce anything because they generally don't talk about cold starts or even improvements that they make with cold starts. We noticed the cold start impact getting lower and lower and they just don't talk about it. It's something they quietly fix which is fine. I think in general, at every re:Invent, they've added more languages to support Lambda. So I could expect more languages that people want. Some people want PHP or Perl or whatever. Let them use those languages I think that's an interesting one that that may come out but we'll see if we can take bets on which languages are supported. Like the last one, I would never have won that bet, the PowerShell, definitely not on my radar, but that's kind of interesting.
Corey: If you told me there's languages that would be purported in Lambda and you asked me to build a list in order of likelihood, I'm not sure I would have thought to include PowerShell at all. I mean that's one of those things that winds up just completely out of the blue. They did it vaguely quietly too which makes that even more interesting to me.
Adam: Yeah, definitely another one of those that was probably like done for a specific customer too is my guess. There are other really popular languages out there that I think a lot of people want to see on Lambda. So hopefully they're making some progress there. They're already way ahead of everyone else there. I know the other cloud providers are pretty far behind in offering lots of language choices.
That's one area, I think the other area that I'd be interested in is, I would love to see more visibility out of the box into what's going on. I think that needs a lot more effort and I'm hoping that they'll have something to offer there in terms of like the bugging tools. I think that’s still kind of a weak story. I also think that the deployment side of things is still quite weak.
One of the biggest complaints that we run into in talking the users is that just deploying is still a pain and I think they have a lot of the pieces in their arsenal to put something interesting together. So hopefully that's something that they're working on as well.
Corey: I will give Amazon credit. They don't tend to sit and watch customers suffer. They seem to at times from a public space but internally, I've never yet had a conversation with an Amazon employee. They were made aware of a customer issue and didn't care about it. Very often, I find that when I come to them with an engineering problem that annoys the heck out of me, they won't respond with, "Wow, no one's ever said that before." What they'll say very honestly is, "Yeah, we know and because of X, Y and Z, we're not able to do anything about that right now. We're working toward it but it's more complicated than it looks from the outside." and I do believe them. There are no simple problems when you're dealing at their scale and when you're dealing at this level of complexity.
Adam: Right, yeah I totally agree with that. I think the other big component not like future releases, I think that there needs to be a lot more education happening from them to get more adoption. I know they're doing a lot, but I think that they need to spend more time at the various levels of words to help those companies make the decision if they're release is right or not for them. I think that that involves from the developer level all the way to the top of the organization on down.
Corey: I would agree and I think that's probably a decent place to wind up calling it an episode. Will you be at re:Invent on a booth? Will you be wandering around sadly looking for scraps of food? Where do people catch up with you?
Adam: Yeah, so we actually are going to have a booth for the first time. It's going to be in the expo hall, we're actually going to be in the Aria. I think it's near the registration, there's going to be a little startup area and we're going to have a little tiny booth there.
Corey: To be clear, this is something that’s sanctioned by Amazon. This is not effectively you deciding that, "Yeah, they won't give us a booth so we're going to make our own." and just provide a table or something.
Adam: We tried that in the past, it didn't work out well. But this time it's official.
Corey: Yeah, the security is on point for this.
Adam: It is, they're really good about it. But yeah, we're going to be in the Aria which I believe is the hotel where all of the containers and Serverless talks are going to be. If you're there and you're going to those talks and you're in the Aria, stop by, see our booth. I'm going to have some interesting giveaways and we have a really interesting demo we're putting together with Deeplens and Lambda as well to show some kind of interesting things you can do with observability with video.
Corey: Perfect. I look forward to venturing out of the Venetian maybe and catching up with you over in the Aria.
Adam: Sounds good.
Corey: Thank you so much for your time today. I'm Corey Quinn this is Adam Johnson of IOpipe and this is Screaming In The Cloud.
This has been this week's episode of Screaming In The Cloud. You can also find more of Corey at screaminginthecloud.com.