Red Team Summit CFP now open! Register Here
Shopping Cart

No products in the cart.

Cloud Security Adoption Curves

This Anti-Cast was originally aired on January 22, 2024.

In this video, Andrew Krug discusses the latest advancements in cloud security, focusing on new features introduced at AWS re:Invent and how they can help organizations enhance their security posture. Throughout the session, Andrew highlights the challenges and fears associated with adopting new security controls, offering insights and strategies to overcome these barriers. The video provides a comprehensive overview of key AWS security updates, including Assume Root, Resource Control Policies, and Security Lake, and emphasizes the importance of intent-based security in today’s cloud environments.

  • The webinar addresses the challenges and barriers to adopting new cloud security controls, highlighting the fear and difficulty in implementing these changes.
  • AWS’s evolution over the years is discussed, emphasizing the importance of understanding the security implications of their services and the potential for misconfigurations.
  • Intent-based security is introduced as a concept to enforce security measures by defining what actions should never occur, aiming for stronger security practices.

Highlights

Full Video

Transcript

Andrew Krug

I’m actually really excited to, to give this talk today because this is like kind of a preview of some of the updates that are coming to securing the cloud foundations and I hope it’s going to be an opportunity for everybody in the chats and I’ll, I’ll try to keep an eye on all of them, to ask questions about what came out at Amazon Reinvent, because there was a pile of very quiet, what we call like pre invent announcements.

Like maybe they didn’t end up in a keynote or something on stage, but they were really major changes that are, they’re really going to make an impact if people actually adopt them.

So that’s, that’s what we’re going to talk about is like why are people not adopting these things? What are the barriers to adopting new security controls? And like what, what myths can we kind of dispel?

For some of the most impactful of these new things, I’m going to make a prediction.

Daniel Lowrie

They don’t they don’t do these things because it’s scary and hard to do at first because it’s all brand new and I don’t, and I don’t want to break anything.

Right. Would you think that that’s probably the number one reason why people don’t implement new security controls or just new features even?

Andrew Krug

Yeah, there’s a little bit of that like testability is, is like a big one. but like also just firmly understanding what the impact of a new feature is and how to roll that back if things go wrong.

is something that we talk about a lot in, in the class but like let’s, let’s get right into it here. So if you, if you’re not familiar with me, this is me and where to find me.

I respond to all my emails and LinkedIn DMS actually on a, almost on a daily basis. So if you ever thought like you can’t reach an instructor from Antisyphon or you can’t like talk to us, you can talk to me and I will respond to you.

So like always feel free to ask questions on Discord or via email or whatever. my day job is a datadog. I lead the security advocacy and research team there, which gives me insight into a ton of what’s going on in some of the world’s largest cloud accounts.

Today we’re going to use a tool called Slido, in the, in the presentation today. So if you want to join the Slido, here you can ask questions Via that.

And then we’re going to also have some like, interactive polling during the, during the session. So I’ll just leave this up for like a hot second. Have you ever used Slido, Daniel?

Daniel Lowrie

I can’t say that I have. Explain this Slido magic–

Andrew Krug

It’s a pretty cool tool. And because we’re live on a bunch of different platforms right now, sometimes we use the Zoom Q and A, sometimes we use Discord. This kind of like just bridges all those gaps.

So, hoping that it works, for this, this is my first time trying it on a live stream. You can of course ask questions and things the normal way.

Daniel Lowrie

And yeah, I’ll, I’ll totally be watching Discord and Zoom for the chats just to make sure that we hopefully don’t miss any of your wonderful, and useful questions for this, for this talk.

So if you do have questions, you just want to do it the old fashioned way. But Slido, it’s trying to bring bridging gaps, it’s bringing people together.

Andrew Krug

Yeah. And I, I would love as many questions as the audience wants to ask because this is not like a demo heavy or content heavy. It’s more conceptual, and information kind of session, which is not what we normally do, but some of these things don’t have good hands on demo.

It’s more about talking about the reason why these new security features exist and then what the impact is of implementing them.

Daniel Lowrie

Well, where do we begin?

Andrew Krug

So where we begin is a story about the AWS cloud, which is that, this is the year that AWS cloud is almost old enough to drink. so I never thought I would say that, that AWS as a business is almost 21 years old.

that’s nuts to think about.

Daniel Lowrie

How long have you been working in aws? Is that your bread and butter is aws?

Andrew Krug

Yeah. The AWS cloud I decided to specialize in, in 2012. and so it was pretty early. It wasn’t the beginning of AWS or anything, but it was pretty early days for the AWS cloud.

And I was working in operations and really I was interested in it because it allowed me to scale out, at the time during like one of the most stressful, like sales cycles of the year for the business I was in.

And so that was what drew me, the lure that brought me to the AWS cloud in the first place.

Daniel Lowrie

That, was a good one. It’s so funny to think of how it started and like you said, that’s been 21 years and in that really short span of time they are the powerhouse.

They run so many things that we just take for granted every day. I use it all the time. I like AWS. I know a lot of people use like DigitalOcean or GCP or even Azure for a lot of stuff, but for just run-of-the-mill every day, things I need in a cloud-based environment, I typically run to aws.

Andrew Krug

Yeah, I mean it’s still my cloud of preference. Like I do some work in gcp, quite a bit. I dabble in Azure. But it’s still like this is the cloud that is the fastest and most familiar for me.

and I think in terms of security, innovation, it’s definitely in the best position in the market. just in terms of the talent and responsiveness to customer requests. But that’s my opinion, not based in fact.

So I’m wondering from the audience, hopefully folks were willing to join the slido. if you’re not, that’s okay. You can just like put your answers in the Zoom chat. what was the first service that AWS launched?

Ever get some, some votes in here?

Daniel Lowrie

It feels like S3 might be it. Well, hey, we got. Just about everybody is already okay now we’re starting to see some movement here.

Andrew Krug

EC2, I got 24, 26 responses. People are going hard on S3.

Daniel Lowrie

maybe that’s because it’s the most well known of all the AWS services. Right. I, I feel like it is anyway, if I mentioned AWS to somebody and they’re like, I’m like S3 buckets.

They go, oh yeah, I’ve heard of that. So I feel like it gets. And maybe that’s for. It’s more notorious than it is.

Andrew Krug

Yeah. Keep in mind like back at this time in the history of Amazon, Amazon was an online bookstore, as well. So like that drove a lot of the way, like the technology that they were building internally.

And then they started to say, wow, we have all this extra capacity all the time, like what do we do with it? And they were trying to figure out how to monetize that as a business. So what, go ahead and stop the poll here. But a lot of people answered S3, 52%, 27% said EC2.

And like we got a little bit of SQS and networking. But the answer is actually SQS was the very first service that Amazon took to market, trying to just monetize their, their message queue service because it was very highly durable, very, very inexpensive.

SQS remains unchanged since its launch in 2004.

Daniel Lowrie

Today that, that is amazing that when you say unchanged, define unchanged for us.

Andrew Krug

so the customer experience of interacting with this basic queuing service is, I’m sure that behind the scenes the way that they scale it and ensure that it’s highly durable and multi region like et cetera has changed, but the customer experience has not changed, in the entire time like SQS has been a thing.

Daniel Lowrie

I’m going to give Amazon big kudos to that because there is nothing like, most of us have to deal with Microsoft in some way, shape or form and they love, love, love, love to go, hey, that thing you’re so used to doing and it’s right there, it’s all muscle memory.

Yeah, we changed all that. That’s gone. We like to put it over here now underneath the dropdown, which is a completely unintuitive dropdown by the way, because that’s fun. And it’s like Microsoft has got the middle finger up to us in a lot of ways when it comes to why and where they move things.

I feel like they’re just pulling stuff out of a hat. But the fact that Amazon has a service that for the end user has not deviated, it is a warm sweater that they put on in the night and feel good about their lives.

That is a special thing.

Andrew Krug

Yep. So like if we look at the the timeline here for like how the cloud evolved, really we had that first service that came out in 2004 which was great.

it democratized message queuing which was kind of a hard thing at the time. And then 2006 is when S3 came out and then they, they started to have the AWS brand.

So S3 was designed to just provide really simple file access and it also allowed you to charge people back for files that were accessed, which was kind of a game changer at the time because you could build a business on top of this so brand, brand new.

And then in summer of 2006, shortly after the launch of S3, they started offering Compute and that COMPUTE was offered, in more or less like what was a flat network.

The Network was called EC2 Classic at the time. so if you were to like TCP dump for example, on one of the network interfaces on one of those systems, you’d actually see other customers traffic if it was unencrypted.

In fact, EC2 classic, that style of networking was retired I believe two years ago finally. But shortly Thereafter that they came out with what they call VPCs or virtual private Cloud, which was the software defined network.

and then this was like the first example of something that they wrote. They realized there was a security issue, they rolled out a product almost immediately and then it took 15 years for them to deprecate the thing that came out with EC2.

Daniel Lowrie

Yeah, that seemed like a pretty good step in the right direction. You being able to just dump all the data that’s just coming across like it’s one big mirrored port going.

Hey, what’s this? Oh, that looks like a, Yeah. Correct me if I’m wrong. Is that API key? I’m gonna grab that just for looks. I’m not gonna do anything with it. I’m sure that never happened.

would you say that it was probably things like that, that like it always, it always kind of baffles me that things to that extensive nature of, I mean you’re putting everything in clear.

It’s one big flat network. Anybody that’s on that network can see everybody else’s traffic out of the gate. That seems like a problem like that should not have made it past the diagram board on what we should be doing with this.

So I’m glad that they saw that that was an issue and quickly corrected. But how the heck did, how was that a thing to begin with?

Andrew Krug

I think that there’s just a fundamental understanding, like there was a fundamental assumption that folks understood more about how the underlying technology worked more than it did. Which continued to be a theme that was carried forward as services evolved and defaults evolved.

And we don’t just see this in AWS cloud, we see this on the Microsoft side as well in rolling out things that are easy to use, easy to onboard to, and the pattern that is easy to onboard to is not an inherently secure pattern.

Daniel Lowrie

Yeah, I’ve always heard done is better than perfect. Just, just get it to work and ship it and we’ll fix it later. And that does seem to be the the standard operating procedure for most vendors out there.

So not, I’m not surprised that they did it. I’m just surprised that it is a thing because, and I guess I get it at the end of the day, we probably would still be waiting for many a product we take for granted had we waited for them to get it right the first time.

So it’s a, it’s a double edged sword.

Andrew Krug

Yep. So I run into people all the time at like conferences and I try to convince them to take my Class, and they say, like, I don’t have anything in the AWS cloud.

Which I think actually for the majority of folks is almost like never true. Because, like, if you don’t know, like if you don’t think that you have a service that runs in the AWS cloud, you’re probably wrong.

because even if you don’t directly run a workload in the cloud, you’re probably here in this like, middle ground, which is that you have some stuff, you just don’t have visibility into it or know that it’s running in AWS cloud.

And so there’s all these different reasons why that might be the case for you. Like, shadow it. Like people with a credit card can just get an account in AWS and then you have to go back through your bank statements and figure out that you have this thing that performs a very critical function for your business.

third party platforms are often built on the AWS cloud. So if you use a SaaS provider, chances are there’s some point of presence in aws.

Managed services, are another way. Like if you use an integrated services vendor, they’re probably running workloads on AWS in some capacity or you have hybrid cloud and you also just don’t know that because it’s in some part of your business that’s just like buried.

And all these cases are opportunities for that implementation to have really sharp corners that aren’t easily visible to you.

These are the things that keep me up at night more so than the environments that we know about, that we audit, that we observe. It’s all these hidden clouds.

Daniel Lowrie

Andrew, is there a way that’s easier for us? Because I’m thinking of traditional asset management where we do audits and we try to find all these things and then go, oh, wow, look at all this stuff. We didn’t really know that we had.

Is there a way to kind of, for lack of a better term, pregame, so that we don’t get caught with, with a bunch of, oh, I didn’t even know we had all this.

How, how do we keep that from being something in the future where we have a bunch of shadow it that’s happening, or we’ve got a bunch of third parties that we didn’t realize we were using, or managed services?

How do we keep an eye on that?

Andrew Krug

It’s, it’s harder, for some of these than others. Like, the easiest one is the shadow it case because like, obviously if you’re large enough that you have centralized purchasing follow, the money is like Those charges stand out on a credit card statement.

And actually some of the, some of the larger payment processing vendors for corp stuff like like Brex is one team, pay is another. You can actually disallow charges to AWS on those platforms so that somebody can’t take one of your corp cards and just create an account outside of like review of the team that manages that third party risk.

Daniel Lowrie

Somebody said, just lock down the firewall so you can’t access AWS and you’ll find out real quick what ain’t working.

Andrew Krug

Yeah, third, third party risk. like obviously you should be asking your SaaS providers, what they’re built on, and understanding that as your part of your third party risk process.

And then managed services, same thing, hybrid cloud, gets really interesting because like oftentimes one team will break out a section of infrastructure and they’ll do like a peered or something with on prem or they just have like one.

Let’s say they’re using S3 buckets as a backup or AWS backup. And that’s not highly visible. those are things that you can only figure out, through all the, the traditional means, like having visibility into what your teams are building.

Daniel Lowrie

Makes sense. Sounds like what your answer is, is you got to do some work. You gotta actually, you gotta keep an.

Andrew Krug

Eye on, you gotta do some some sleuthing.

Daniel Lowrie

well, and that’s how things slip through the cracks, right? Is because you could easily see these things if you were looking at it. But because we’re probably not, that stuff slips through the cracks, right?

Andrew Krug

Yeah. So to kind of set the stage here as part of the securing the cloud foundations on the first day of the class, we always talk about famous data breaches and like what some of the common patterns are in these data breaches.

but I haven’t updated that in a while. so aside from what we do in the course, I wanted to look at what were the coolest, most interesting breaches of last year. because there are some new themes here.

A lot of it’s still the same as it has been, but this one was kind of fun. I don’t know if you’re familiar with the Shiny Hunters, data breach.

Daniel Lowrie

Was this the one where they got into all the cell phone providers and were stealing, was that. That one’s.

Andrew Krug

So Shiny Hunters was just a hacker group that was profiling, going after businesses effectively looting and storing the data and then combing through the Data in an automated fashion and using that to pivot even further.

they were doing it using aws. so like that’s kind of the coolest part of this story is that.

Daniel Lowrie

Was their infrastructure was all built in AWS.

Andrew Krug

Yeah, they had been around since 2020. they had a theft of over 200 plus million records across 13 or more companies. According to the bug crow article reference there.

they were really good at doing what they, they did. in the sense that anytime they found a credential of any kind, like an OAUTH token, they would have automations that would try to figure out was that a GitHub, OAuth token, can I use that then to pivot into somebody’s AWS account.

They had this very sophisticated tool chain that was really helping them compromise more cloud environments. And this is a theme I think that we’re going to see because five years ago hackers were not that good at leveraging all of these credentials.

They were fine at like, oh yeah, we got an access key, we’re going to spin up some GPUs and do some bitcoin mining. But the, the level of understanding and sophistication of these very complex authentication chains is increasing.

And what’s really cool about this one is that these guys were actually the victim of their own misconfiguration, which is that they misconfigured an S3 bucket, which is how they were discovered.

Daniel Lowrie

Oh, it’s funny how you become like the same things that work that you use as an attacker against the people you are attacking. I’ve always heard that hackers, like threat actors have the worst opsec.

They, they just, they’re just spinning stuff up and throwing it against the wall. If it works, cool, we made some money. But they’re probably not implementing the things that they should be doing as well to keep themselves safe as much as they should.

Andrew Krug

Yep. So this is great. The bad guys get it wrong sometimes too.

Daniel Lowrie

That’s right.

Andrew Krug

There was also this sisense breach which actually CISA issued an advisory for. This followed a fairly common pattern in that, it was a hard coded credential in a private repository.

the data exfil was several terabytes of data. Unfortunately, like due to the nature of this business, it was relatively sensitive data. And so sisense then had to, sisense and CISA had to do an advisory, to tell people to rotate all these credentials that they were not prepared to rotate through.

the entire like session tool chain, Azure Sessions and things were in that list. So that was an interesting one. not super notable though.

It follows a pretty common pattern that we’ve seen for like the last, 10 years of people just putting credentials in places they don’t belong, like private git repositories.

Daniel Lowrie

We just do that for a minute. That’s it’s just for testing. We’re going to take it out. It’s not a big deal. I don’t see the problem.

Andrew Krug

Yep. Or they think a credential that can only do something as innocuous as read data isn’t sensitive enough to encrypt. which is a pretty common trap that folks fall into.

the other thing that we’re seeing a lot of this year is we’re seeing targeted supply chain malware. And this is something new. actually I gave a talk on this with Zach Allen at last year’s or at this year’s 2024 rein, on how we discovered all this different supply chain malware and profiled this North Korean hacker group that was really trying to get malware into the software supply chain for PyPi and npm.

The, most common one that we see is called Beavertail. and Beavertail malware is just like a command and control, like back channel. But the threat actors that are getting beavertail onto these systems really know how to target cloud systems.

And so what we see when this lands on a system in AWS is, via the bash history and things, is that the pivoting and understanding of how to do recon in AWS is really leveled up.

Daniel Lowrie

And why would you say, you say that’s coming from a realization that a lot of things are basically exposed in AWS in some way, shape or form. And if you understand that ecosystem a whole lot better, then you can build malware and tool chains and things that allow you to easily exploit that.

Andrew Krug

Yep. and we’re seeing as like, as well. Just in general, there’s an assumption that if one of these things detonates that it’s on a cloud system so it’s relatively easy to fingerprint and identify.

in some cases when it’s detonated on like a developer laptop, not only will it look for like cloud provider access keys, but, but it will also like go after things that it can assume about a software engineer.

Like it was getting cloud provider keys first and like Solana Wallet second in one case, because software engineers are interested in crypto. So we can assume, that maybe there’s a crypto wallet present.

And so like, they’re, they’re still making the malware very small, very light, but the things that they’re going after are highly targeted.

Daniel Lowrie

Makes sense, right? Go after the good stuff. Nothing like getting, some, some keys to their kingdom. And then, hey, if you’ve got any crypto lying around, I’ll take that too.

Andrew Krug

Yep. And then we’re also seeing all the same stuff that we’ve been seeing for the last 10 years. So there’s, there’s still like general hygiene problems with long live credentials. Multi, factor authentication is not strongly enforced, or consistently.

we’re still seeing a lot of credential leakage via things like ssrf, even though we have mitigations for, those things today. And then we also see, the Shiny Hunters kind of use case where people are still accidentally making things public.

and Chris Faris has a really good post about this on his blog, where he goes on this rant about how there’s like a thousand ways to make a File public in S3 and there’s like one way to make it secure, like, not public.

Daniel Lowrie

That, that is an interesting thing. I love that you put, quotes around accidentally public because they’re like, it’s not working. I can’t access this. to me, this is always how misconfiguration occurs is you build the thing and you go, hey, users, I built the thing, you can use it.

And they go, I can’t access it. You go, oh, that’d be really hard to work out exactly the perfect credentials. I’ll just open it up to the public. There you go. Can you access it now? Great, it’s working.

Oh, crisis averted. Yeah.

Andrew Krug

And we’re kind of living in the age right now where this is becoming more important than ever, just due to the fact that AI is made or broken on the data set that we’re fine tuning and training models on.

So the rise of the importance of understanding when not to mix public content and private content or creating security boundaries around data sets is so important.

So that’s what we’re seeing. Some of it’s not surprising and some of it is new. So that’s what’s going on in 2025. So, this is like my list of the things I would like you to please, please, please do.

if you are in the AWS cloud today, that’s my Sabrina Carpenter.

Daniel Lowrie

This is the next one for today about an espresso?

Andrew Krug

No, but I could have put a theme through this.

Daniel Lowrie

Want your entire slide deck just Sabrina Carpenter references I and memes. That’ll be fun.

Andrew Krug

I thought about it, but unfortunately, like a lot of the pictures of Sabrina Carpenter on Google images are not appropriate for a broad, safe, for work presentation.

Daniel Lowrie

I mean, we’ve got people that can put, the right things in the right places to make sure that that got to reach out can help. Anyway, moving on to back to cloud stuff.

Andrew Krug

So, the answer here is that we really need to increase the rate that we’re adopting the new security controls because the problem isn’t that the mitigations don’t exist to create these security boundaries, create the guardrails, the architecture patterns, they’re there.

it’s just really how quickly folks can adopt those. And some of it, I think is perception and some of it is what now I’m calling cloud security Tech debt is like, maybe you moved into the AWS cloud 10 years ago and the person that moved you in there doesn’t even work there work for you anymore.

That workload that’s just been like kicking around that you lifted and shifted doesn’t have, you don’t even know how to recreate it if you accidentally deleted it kind of a thing.

So, there’s different levels of comfort that different folks have with adopting these things that we’re going to talk about.

Daniel Lowrie

Yeah, we, we were kind of talking about that earlier. Right. It’s like, it can be scary. It can be like, well, everything’s working and I don’t want to throw a rock in the pond and cause a bunch of ripples. And this can be really difficult.

Even if there is a tool, it might be difficult to implement and understand and get it just right. And then you could fall into the false insecure. There’s a lot of crazy, weird reasons why we don’t implement some sort of security functionality that is new.

Andrew Krug

Yep. And, the best example I can highlight, which I highlight in the class and people talk about all the time, is the Capital One data breach. Really famous data breach breach that impacted a ton of folks.

it was a misconfigurated, misconfigured WAF that allowed an SSRF to grab a credential. The credential was overly permissive, et cetera, et cetera. well, at the time in 20 or 2019, when CAP Capital One was breached, there was no real mitigation for SSRFs when it came to the metadata service.

However, the first blog post about the danger of SSRFs, impacting the metadata service was three years prior.

so this is kind of interesting because problem had been identified, folks were talking about it, raising awareness for it, and for three years nobody really did anything until one very large customer was heavily impacted.

And then AWS rolled out the second version of the metadata service in 2019. So right after the Capital One data breach, the response was amazing.

The feature rolled out. There was an immediate mitigation for, SSRFS being able to grab these credentials regardless of how permissive they were. But the adoption curve on this was awful.

and we’ll look at how awful it was over time. and it wasn’t until last year, 2024, that they actually rolled out an API call that allowed you to, that new instances being launched were forced to use the new version of the metadata service.

Daniel Lowrie

So before, before this, it was luck of the drawer. You had to know to implement the.

Andrew Krug

New metadata service, you had to know to implement the control. So between 2019 and 2024, you could have adopted this control at any time.

IMDS v1, for some, CSPMS was considered a misconfiguration, but we didn’t get enforcement until this year.

Daniel Lowrie

And I’m assuming they, they kept the old version around for backwards compatibility of things that were built on that and they didn’t want to, just pull the carpet out from underneath people. But I mean, we’ll talk, we’ll talk.

Andrew Krug

About that, we’ll get there.

Daniel Lowrie

Okay. Okay.

Andrew Krug

But if, if you’re wondering like, how, how long it took this adoption curve, like what this adoption curve looks like, it didn’t like, really start to get better until they added that control to enforce it.

So this is a graph from the datadog state of cloud security report that I have the privilege of being an author on every year. and we look at datadog customers which are probably heavily skewed towards the positive in terms of their overall security maturity and devopsiness.

But you can still see as of September 2024. And, we’ve been measuring this now since 2022. we’re not even at half.

Daniel Lowrie

Well, we’re getting there. What’s your hurry? I don’t understand. Where’s the fire, Andrew? It’s, So what, what do you think that is? Do you think that’s me?

Yeah, I’m gonna let you answer. I’m not, I’m not going to throw anything out there. Give us, give us your estimation of why we’re only at half when it comes to this.

Andrew Krug

So we, we analyze this a little bit. and, and what we wanted to really answer that question. Like, was there anything that, that couldn’t be if you moved it to IMDs?

what percentage could just go over there without any functional impact? and we did this by analyzing Cloudtrail data. and what we found is that over 60% of instances, could, could move without any impact.

So the reasons why people don’t move to this, is, I think it’s, it’s part an awareness problem, and part just like a lack of understanding how their code base actually interacts with the credential provider tool chain.

And so this is a case where attackers understand how this stuff works sometimes better than the folks that are operating it. And that is kind of unfortunate.

Daniel Lowrie

Well, it is, like you say, it’s an unfortunate thing. It’s just how it goes, I guess, from time to time. And there’s nothing really do it other than to try to help promote and push the knowledge that you can do this.

And here’s how.

Andrew Krug

The folks that would have issues adopting a control like this would have been people that went way out of their way to write custom integrations with the credentials, service, the metadata service on the box.

But if they were deploying, let’s say, using the SDKs that AWS provides for every language under the sun, all they would have had to do is just move to the next dot release of the SDK and everything would have been fine.

companies that had, an easy time moving to this also use infrastructure as code and then they were able to measure the adoption through observability. So that’s one of the things like during adopting, a control like this is you want to know when you turn it on, is it actually breaking the environment as quickly as possible?

And without an observability tool that’s doing basic things like ingesting logs and distilling metrics from those like denials, what, what, the, the specific version number is in the API call itself you.

You simply can’t know.

Daniel Lowrie

It’s all abstracted away and hidden.

Andrew Krug

Yeah. And so, given that, given that explanation, I am curious from the audience, the folks in Slido, how confident are you that you could implement IMDS v2 without breaking your production environment?

Daniel Lowrie

I don’t know why, but my mind was. It would be a more fun question if it said how confident are you that you can implement V2 and it will break your production environment. It’s just 100% positive.

Yes, I will break something. And that that is what it is. I’m myself included. You, you start to look at new. Okay, I need to make this work.

I want to do it right the first time. And I know that even though it doesn’t matter how much research I do and blog posts I read and documentation I pour through, I’m gonna get something wrong because I’m gonna misinterpret how they mean X, Y or Z and I’m gonna click that wrong switch and then it is coming down like a house of cards on me.

I’m gonna get in trouble, I’m gonna lose my job, or I’m gonna get sanctioned some way, shape or form because I did the wrong thing. It’s just that that nugget of fear is always in the back of your head that that will happen.

And man, I there. I don’t think there’s any good way around that. I, wish it was. A little documentation is probably the best. Like, good documentation’s worth its weight in gold and that can keep you out of a lot of the weeds.

So I would just like to see, many more organizations focus on making phenomenal documentation.

Andrew Krug

Yep. good documentation and sharing some of the learnings. like the, the larger companies that have done this. like we just published a blog on Datadog Security Labs about the move to IMDS v2 and what that was like for us because, we have very large environments.

and obviously like, reliability is incredibly important to us. so like all the ways that we did that, like gradually over time and then finally, enforce this very strong control.

Daniel Lowrie

Yeah, yeah, getting, getting. I always really like when people tell kind of the story of what they did. Not necessarily all just have it be a technical document of what they did.

A. good blog post that has a good narrative behind it. Hey, today we decided that we were going to do this and here’s how it went, and you just kind of walk us through. Well, it was like any other day. And having someone that’s a really good storyteller write those blog and work with someone who knows all the technical ins and outs.

Put those two people together, write that as a resource, and that becomes just super invaluable to the community because it’s easy to read, it’s easy to digest, easy to assimilate the knowledge, and then you can kind of.

It’s not just a brute fact telling you, you click this and this happens. Well, how about that? I click that and it doesn’t happen, or that box is grayed out for me. Why is that?

there’s always these little weird edge things that happen and it can be very frustrating storytelling around it just makes it a little more understandable of what’s going on.

Andrew Krug

Yep, yep. So, I think most of the folks in the poll answered that they were not confident that if they turned on IMDS v2 enforcement that it wouldn’t break their production.

And that’s kind of the crux of the, the problem is that they don’t know because they’re not able to observe that thing or they don’t understand what they need to observe in order to just like meter what the potential impact of one of these changes is going to make.

And we’re making them globally. like you can make that, that change at the instance level or the auto scaler level, or you can turn it off for an entire account or like all the accounts that are part of your billing organization.

And depending on what level you decide to enforce that at, it could be very impactful. Chances are it’s probably not going to be. But largely, it’s not awareness that’s preventing us from adopting these things because we know that, we know the controls exist.

It is, it’s, in some cases it could be agility, like folks don’t have the right infrastructure as code or they can’t make a one line change and update all their stuff.

But like, in most cases it’s actually fear, that prevents them from doing it. It’s not fear that’s substantiated with data. And so if we don’t have data, we can’t reason about impact.

Daniel Lowrie

Well, that’s, that sounds like something we’ve talked about. You’re right. It’s that darn, there it is again coming back. Everybody gets scared and it’s completely understandable. Like if you’re sitting there going, I don’t want to tip my hand that I’m afraid to do this because I’m afraid something will go wrong.

Listen, that is every single one of us. There’s always some modicum of, even if I’m confident in the technology, I know exactly what I’m doing. I could easily think or, become overconfident.

Right. And that overconfidence leads me to make a stupid mistake. I’ve done it, I’ve been there, I’ve told the story here where I was in charge of doing updates, and I accidentally stuck production servers in the test environment, and they updated in the middle of the day and rebooted.

That did not thrill many people. So I get it. I get why it’s fear.

Andrew Krug

Yep. So I, I want to introduce this concept now, which is like a maturity index. And, and we’ve talked in the past about DevSecOps, maturity models, and like, maturity models and scorecards, and I think that’s kind of boring as an analogy.

And this is a big ideas webinar, so I don’t know. Are, Are you familiar with the Kardashev scale, Daniel?

Daniel Lowrie

This is a new one to me. Kardashev. I like, I like the word. It makes. Makes me have fun saying things I’ve never heard before. It sounds made up, but technically, I guess all words are made up.

Andrew Krug

But like, Kardashev was, was a mathematician slash astronomer. And also like Carl Sagan did some, some work on this. And this was basically, as a civilization, how good at your.

At extracting energy from the things around you. So like a type 1 civilization, Earth, all of humanity, by the way, is not a type one civilization yet. We’re a long ways off.

would just be like a planet. You can get all the energy out of your resources on the planet. Like type 2 is you can build like, a Dyson sphere around a star, and you can harness the power of a star.

And type three is you. You can extract energy from, like, long energy, and you have like, this massive ability to, like, do, mastery of time, space, et cetera.

Daniel Lowrie

You are a, God at this point.

Andrew Krug

Yeah, yeah. So, like, if we think about this, if we apply this to cloud, these things are all a massive distance from one another. They talk about these, type one civilization goes to a type two in like, hundreds of thousands of years, and type two goes to type three in, like, potentially millions of years, which is a blink of an eye for a civilization.

But it’s, it’s a long time from, from our perspective. And so, yeah, these are great, distances in between these phases of maturity.

The difference between, like, the Kardashev scale and cloud stuff is Kardashev assumed that the longer a civilization exists, the better it gets at existing, which is actually not true at all.

For resources that we form in our cloud environments, the longer that they live, it’s actually quite the opposite. They get worse at existing and we get worse at maintaining them.

Daniel Lowrie

It’s the inverse Kardashev scale.

Andrew Krug

Kind of interesting like the, the further in this you get away from the big bang, the m. More assumptions that we can make that maybe there’s a civilization out there that is that advanced and could be observed.

Daniel Lowrie

So I’ve seen pictures of them. They’re pretty cool people.

Andrew Krug

In the case of cloud stuff, as well, longevity does not necessarily equal maturity. So if you’ve been operating for 10.

Daniel Lowrie

Years, like a big complex system has a lot of complexity, but maturity is a different idea. Right. It’s maturity is the fact that we understand this, we’re implementing it at a higher level.

So back to your cartagev scale. Right. Like, that’s really what we’re looking for, is maturity out of what we’re building and not necessarily just large complex system.

Andrew Krug

Yep. And I think we struggle today to build things simply. I think building simply is kind of a lost art. And simple systems are inherently securable. And because you have this smorgasbord of options like 300ish services in AWS, I think today, you’re tempted to use as many of them as you can.

Daniel Lowrie

So, well, it’s right there. You just have to go into the AWS little square thing and click and there’s all these lovely services. And it all integrates so nicely. Like, oh, yeah, I spun up the CC2 instance and I’ve got a web server going now.

Well, I’ll just go to Route 53 and buy a domain and then I’ll do some DNS there. Heck, I’m already here. Might as well make that happen. Let me get into the firewall real quick. Let me start sending up some light sale stuff.

Oh, what’s this? Like, there’s so many options for you to do stuff and it’s fairly easy, like you said, out of the gate. it’s not too difficult to implement many of these types of technologies within aws.

And anyway, m. I’m not as familiar with the others, but I can see why it could grow pretty quickly and pretty.

Andrew Krug

Yeah, so you want to manage that growth. And so I kind of map some of the biggest behaviors that prevent people from extracting the maximum value or maximum energy from the cloud provider that they’re on.

so like type one would be like, we got some workloads to the AWS and we don’t know how to necessarily operate them. Like, maybe the person that did it doesn’t even work for you anymore.

but it works, it works for you. You’re able to use that thing and extract value with basic care and feeding. type Two is you’ve leveraged infrastructure as code for most things and you have some observability in your environment.

Multiple accounts or multiple clouds are child’s play, some security controls are in play. And then type three would be like, you’re literally living in a reality where everything is automated. Infrastructure is code, there is zero click ops.

You can leave a cloud provider at will because you’ve abstracted everything. Like if the price of your cloud provider doubles, you can just nope out and go to another cloud. You do things like chaos experiments and you observe 100% of your environments and you practice this concept that I’m going to introduce next, which is called intent based security.

Have you ever heard of intent based security before?

Daniel Lowrie

I can’t not dot that terminology. No, I mean it seems like it’s kind of self, explanatory. Whereas you have intent based where I’m intentionally doing security in some way, shape or form.

So I’m building things with security as the focus, not necessarily the thing is the focus or if I am, I know that I have to do it securely, but no, not as a proper philosophy.

Andrew Krug

So intent based security is kind of getting kicked around. Like there was a talk at Re Invent by Chris Faris which is very similar to this concept which he called security invariance. Which I think is just a fancy way of saying intent based security.

Sometimes people call it guardrails, but I don’t think that that really captures it. But really it is, what it is is it’s the ability to declare the things that you never ever want to have happen and create statements that always hold true for your business and applications.

And this is a really great way to get people to think beyond the technology that they’re using and just like say, well this is what we want. In an ideal world, if you don’t want anybody to ever be able to make a public S3 bucket, that’s like a really simple example.

No one should be ever be able to make anything public because my company doesn’t have any public data that I like that.

Daniel Lowrie

And so this is just basically taking some of the security ideas that we’ve all kind of heard before and go and like go, we’re going to make this kind of our, our again.

I, I want to use the word philosophy of how we do business. So we’re, we’re trying to change the, the philosophical view of the culture behind your organization on how you implement new and existing technologies is to be always intently security focused.

I want to do the things that are secure and we’re going to maybe even make a policy that says you will do it that way.

Andrew Krug

Yeah. The difference now with and what’s game changing intent based security is actually the ability to ask an AI assistant like what is the shortest distance between my intent and enforcing this control?

Daniel Lowrie

Mhm.

Andrew Krug

So that’s pretty cool.

Daniel Lowrie

Real quick. Yeah.

Andrew Krug

So if you want to check out Chris’s talk on this, which is an hour long as well, in addition to this, I’ve linked it here in the slide deck as well. it was one of my favorite sessions from this year’s reinvent when I went back through the sessions.

But I just want to call out like a big prerequisite to this is that you are already using AWS organizations, which is the way that you get multiple accounts linked to the same bill.

and with that you get the ability to enforce a whole bunch of different security controls. We talk about this in the Securing the Cloud Foundations class and we discuss at length what it is to architect one of these things.

But the biggest thing I want to highlight here and a little preview of the course is that this is something that most people think that they need to do when they want to move out of a single AWS account into a multi account organization.

Which is they take the crusty account that they’ve had for a decade, they promote it to an organization master and then they start a five year long migration to move resources out of it.

Instead of doing that, why not just create a brand new account and then adopt your crusty old account into that and then you don’t have to do a five year migration and then all of a sudden you have instant observability of your crusty old account.

Instant ability to enforce these intents at the organization level for all your existing workloads and monitor the impact of that.

Daniel Lowrie

Andrew, what, what is it then? that is kind of driving people or organizations to take the old and busted approach instead of this new hotness of just create a new thing and then migrate everything in and make it work that way.

Why are they doing it the hard way?

Andrew Krug

I think they just don’t think about it where they see the button in the console that says like enable AWS organizations. And by the way, once you do that to an account you can never undo it. you can’t like demote an organization.

So like at that point you’re forced to do a migration. But basically like the, the anti pattern here is running resources in your organization master account because this is where you’re enforcing security controls.

And because the AWS doesn’t want you to lock yourself out, you can’t enforce security controls on the organization’s root. You can enforce, it on subordinates, just like Active directory, like Group policy, Inheritance.

Daniel Lowrie

Right.

Andrew Krug

you can’t enforce anything at the domain level Still, I believe you have to enforce it at an OU level very broadly. Correct me if I’m wrong, somebody that has worked with AD in the last 10 years, but used to not be able to do it at the domain level.

Daniel Lowrie

So this is, you’re just talking. It’s the path of least resistance and it’s right there. It seems intuitive. Like that’s what you would do.

Andrew Krug

Yep.

Daniel Lowrie

sounds like we’re going right back to that whole. If a good piece of documentation or wording is very important in how random, people might understand what it is they think they should be doing.

Andrew Krug

Yep.

Daniel Lowrie

Are there, and are there any kind of, for lack of a better term, guardrails or safeguards or prompts or anything that’s telling you along the way, like, are you sure this is what you want to do?

Andrew Krug

I think that there might be one box that says this cannot be undone.

Daniel Lowrie

Gotcha. So if you feel like that’s what you have to do, though, like, you’re like, okay, it’s fine.

Andrew Krug

Most companies also already have these in place because the DevOps team had to create them to do like cloud cost management or like fintech, financial ops, type stuff.

And so like, a lot of times security folks don’t even know that they have organization masters or they don’t have access to them because they’re so locked down. So a great way to get into these in your company is to ask the DevOps people if you can just have a security audit role to explore and see what features are enabled, how it’s laid out, etc.

So we only have 10 minutes left. So I kind of got a motor through the announcements here. But. But, I’ll just tell you what, the four most impactful announcements are from Re Invent last year.

And three of the four of these you have to be in an organization’s environment to use. the first one is assume root, which is just a fancy way of saying now you can finally get rid of the root users from your AWS accounts if that’s what you want.

The first most all powerful user, they have turned that into something that you can just disable and then you can create a policy that allows certain People to assume those.

We’ll talk about that. Resource, control policies, which is a brand new system that allows you to do intent based security for public access.

Declarative organization policy, which is also an intent based security system at the organization level. And then security lake, and OCSF support, which if you haven’t heard about ocsf, we’re going to talk about how cool and impactful that project is for the community.

So going right along here, assume root is obviously a big one. we spent a lot of time in the foundations class talking about what you can and can’t do with the root user account, what you need it for, how to put it in a safe, et cetera.

I’m really excited that I get to just say once upon a time we used to do this and now you can do this instead. So this is great. It’s a one button click and you can actually delegate the access.

This is a really good use case for what’s called time based access. So like obviously nobody really needs root access a bunch of the time and probably if you have a credential in a safe and it’s really inconvenient, almost nobody ever takes it out.

but if you don’t have, if you just like grant your entire team the ability all of a sudden to assume root now and become the most privileged user in the account, we actually have a different which is that on the, the good side of this feature we’re eliminating one more long lived credential and on the downside of this feature we’re creating a privilege escalation path potentially from a regular user account to the most powerful user inside of the account.

So.

Daniel Lowrie

Gotta love those double edged swords, don’t you?

Andrew Krug

Yeah, it’s a, it’s a sword that cuts both ways. the, the next one I just want to mention here is resource control policies which for the first time in eight years AWS changed the way that they evaluate access decisions and they added this RCP filter to action.

So we used to say that there was no way for you to actually declare in policy that a file or a bucket or an SQS queue or an SNS topic could never be public.

But now because they added this additional layer of policies right at the top of the decision making stack, now we can actually declare at the account level, at the OU level, that we can’t, we actually can’t have public resources regardless of what a user does downstream of that, no matter how powerful that user is.

Daniel Lowrie

So it’s, it’s kind of like a, a deny statement saying if I stick something in here, it doesn’t matter how crazy you get with your permissions and your provision and whatever you do, this will always make sure and ensure that that policy is adhered to.

Andrew Krug

Yep. And this, this in concert with security boundaries, strong security boundaries. like account level separation of public and private data is going to be the prevention for so many breaches.

But again you have to have that ability to say at the account level, I never want any resource in this account to be public. We have an island over there that we put the public stuff on and that’s fine, but not ever in here.

Daniel Lowrie

I like it.

Andrew Krug

So RCP is a very cool long awaited feature. It took, I’m sure it took moving the earth, in terms of what AWS had to do for architecture to put that decision in flow.

Because if you think about the IAM API, it makes billions of decisions per second and they have a sub millisecond SLA on the amount of time it can take to make that decision.

So some serious engineering there. So this is an example of what an RCP looks like. It looks like the tried and true Identity and Access Management Policy system, but they added a lot of really cool condition keys.

I’ll just call out one here that we talk about in the course which is string not equal if exist AWS source. Org id and this is just a fancy way of saying this data can get shared, but it can only get shared with accounts that I pay the bill for.

Like only inside my company. Which is a good. That’s such a great guardrail to just put in place if you don’t know. Yeah.

Daniel Lowrie

Because there’s a lot of ways. And Or there at least there used to be. It’s been a while since I’ve done some AWS security stuff, but there’s some things that you can call as long as you’re an AWS subscriber, or whatever and you have an API key that can connect with.

With AW cli. You can access all sorts of stuff just by the very fact that you have a login and even if it’s not yours. And it’s. It’s crazy how these things are able to be done.

I mean feels like it’s vestiges of that old flat network you were talking about where everybody’s had access to everything. And is that where that came from or is it just.

Andrew Krug

It really came from the fact that public access in S3 was a feature like it was part of the Core set of features at launch. It was designed to be, for web hosting, exchanging files between companies.

I don’t know that anybody that was an original architect on S3 said to themselves, what? This is going to be used to store some of the world’s most sensitive data sets.

Daniel Lowrie

Yeah, makes sense. Somebody’s asking, how will, existing resources be evaluated against this policy? Or will they?

Andrew Krug

So if you put an RCP in place that says nothing can be public, and you have like a bucket policy or an ACL that’s marked a file as public, all of a sudden those access decisions are going to be denied.

Daniel Lowrie

And then the phone will ring, and you will.

Andrew Krug

Which is. It’s different than the old control because we’ve had a control for a few years where you were able to say at the account level, don’t allow anybody to add a policy to a bucket that granted public access.

So if you had bucket A that already was public in bucket, new bucket B, new bucket B couldn’t be public ever until you relax that control and bucket A would still be public because all it was doing is denying the API call to set the policy.

Daniel Lowrie

Gotcha.

Andrew Krug

It wasn’t in the decision flow for file retrieval. And now this is actually an identity decision that’s being made at read time.

Daniel Lowrie

Like you said, the engineering that went behind this had to be. I bet they spent a few, A few pizza and wings nights at the keyboard.

Andrew Krug

Yep. so, in order here, the next, is declarative organization policy. This is very cool. This follows the security and variance pattern. It is the ability for, for a bunch of services, snapshots, AMIs, VPCs and IMDs v2 for you to make human statements like human.

I only want, instances to be able to launch in US West 2 that are T2 mediums and they have to have IMDSV2 and then have that enforced at the organization level and also report on it.

Daniel Lowrie

That’s impressive. That’s going to be a game changer right there. Because then all of a sudden, I, guess it’s going to. Time will tell whether or not it’s a game changer. It seems like it will be.

Depends on how well that functionality works and how well it interprets what you mean by what you say. LLMs obviously have proven that that can be done to a certain extent with, much greater accuracy than we have in the past.

But, barring the fact that it doesn’t go weird on you, that should be very useful.

Andrew Krug

Yep. And this, is a complementary feature to another feature that we call service control policies, which have been the way that historically you’ve done this. The downside is because service control policies sit at the front of that access decision chain, you can’t have a ton of them.

So there was a major service limit on the maximum size that those could ever be. And so this is shifting a lot of that burden out to something that’s very simple that doesn’t affect, the maximum SCP that you can place after you also use this feature.

Some of the most common use cases I guess have been moved out to this. So the last thing, here on the list that I want to talk about is, not really an AWS feature so much.

It is like an industry movement right now, which is that all the major SIM providers, all the major cloud providers are moving their log taxonomies to a format called ocsf, the Open Cybersecurity Schema Framework.

And we as security people have tried to do this a bunch of times. I don’t know if you remember like Chef or any of those other standards that everybody was championing that never really got a adoption because they weren’t comprehensive enough.

Everybody kind of had their own flavor. Those were failures, for reasons. This is a very comprehensive framework that all the cloud providers are doing a great job both contributing to and also generating logs in, and that’s a game changer on the detection and response side because we can land this OCSF format schema, in a data lake.

And that is one of the coolest things that didn’t launch at re invent this year. It actually launched at reinforce, two years ago. And it’s the ability to take all of these different data sources that AWS knows how to generate, put them in a very cheap data store and then be able to search them very rapidly using a SQL query interface.

And so what we can start to do is not just detection and response using CloudTrail data, S3 data network log data. we can use the same tools and tactics that we use for detection engineering in SOCs to measure behaviors, detect misconfigurations and really see what’s going on across the entire environment with a single query.

Daniel Lowrie

That’s pretty impressive.

Andrew Krug

So on the right here I just put a snippet that shows the CloudTrail event, for IMDS v2 adoption. So there’s like a line in there, EC2 role delivery that says the version that’s pinned to that instance.

So this is a great Way, for example to measure like how many V1 launches versus V2 launches. But you can really do a lot with this. And the rise of detection as code is like another thing I think that we’re going to see a lot of this year.

in fact, last summer, one of the folks that works in my team released a tool, open source totally free tool called Grimoire, that’s designed to help people write good detection as code or better understand cloudtrail events for specific use cases.

And the way that this works is that you actually just run a command that detonates an attack where it runs like a script against aws, that is the behavior that you want to detect and then it goes and waits for all the log events and then it extracts only those log events and gives them to you.

So then you fully understand what the actual JSON payloads look like. That model, that behavior.

Daniel Lowrie

That’s going to be super cool to be able to do.

Andrew Krug

Yep.

Daniel Lowrie

So you can just verify whether or not your detections are working without actually having to attack the thing.

Andrew Krug

Yep. So this is a complimentary tool to another attack simulation tool called Stratus Red Team. But like really you could use any combination of things. but this idea of detection as code, is very, very cool.

And I think we’re just going to see more and more of it over the next year or so. And AI agents helping people generate those as well. so final question to the audience before we wrap up today and I’ll answer any questions.

which one of the controls that I, that I showed you on screen would do you think you need to adopt right away or like what was the coolest for you?

Looks like people are giving some love to Security Lake.

Daniel Lowrie

Yeah, it shifted around pretty quickly there. I was, it was kind of impressive and now, oh, there it goes again.

Andrew Krug

I have a whole webinar out there on the Internet somewhere on Security Lake. But like one of the coolest things about it is that it’s designed for compliance use cases 7, year data retention, so you can retain tons of logs very inexpensively that you would otherwise have to spend a ton of money on to put in a hot index.

It also allows you to hot load out to third party tools with hot indexes so you can use it as your durable storage and then just pay the expensive bits when you want to pay the expensive bits.

Daniel Lowrie

The, the convenience of the cloud. Right. Is that you only pay for what you use.

Andrew Krug

Yeah, I mean people, I go back and forth on this because if you use security like you’re paying for like effectively double ingest. but the, the longevity of the data and the ability to replay it into a, threat detection platform is it outweighs the double ingest cost for me.

Looks like folks like security like the most and RCPs the least, which is so interesting because I actually would rank RCPs the most impactful given that most data breaches are S3 data breaches.

There’s a, there’s a ton of questions here. it looks like we got some upvotes. We don’t have time for a ton. oh, somebody put in the slides, which is great.

How do I keep up with all the new services? I just do, is the answer. I attend as many conferences as I can, like aws, Reinforce and Reinvent. I, spend a lot of time watching all the talks from Reinvent after, Reinvent is over.

these other two I will answer in the chat. And with that, again, this is me. This is how to get ahold of me. If you want to get ahold of me. If you want to ask questions about the class or cloud security or career stuff, would, love to hear from you.

I always do a survey at the end of these as well. So if you can fill out this survey form, it also has an opportunity to express interest in having a career coaching coffee session with me.

They’re just 15 minutes. I do these with people on a rolling basis. I only have so many slots per month. But, if you get on the list I will get to you eventually. And I love these with people because I get to hear what changes they’re trying to make in their career, what brought them to cloudy stuff.

And I restructure some of my content as a result.

Daniel Lowrie

Very cool. Andrew, thank you so much for joining us today and presenting this. All this, really amazing information about cloud and some of the things that are coming up in security that’s going to be wrapped around it, if not already, but will be if, if we’re not using it yet, we probably will be very, very soon.