This webcast was originally published on October 30, 2024.
In this video, Andrew Krug discusses the intricacies of blob stores, with a focus on Amazon’s S3 service. He provides an insightful overview suitable for beginners, explaining the default configurations that facilitate data sharing across boundaries, and highlighting the common issues and misconfigurations that can arise. Through the session, Andrew offers practical guidance on securing S3 buckets, including encryption practices and lifecycle policies, while encouraging interaction and addressing audience questions.
- S3 buckets are often misconfigured due to a lack of understanding about their global namespace and public accessibility features.
- Effective security for S3 involves implementing guardrails at multiple levels, including logging, account-level controls, and encryption of data at rest and in transit.
- Lifecycle policies for S3 data are important for both compliance and operational resilience, ensuring data is archived or deleted as necessary to reduce liability and storage costs.
Highlights
Full Video
Transcript
Andrew Krug
So today we’re going to talk about blob stores, particularly with regard to S3 and if you’ve never, if you haven’t used these services in the past, like, this is going to be a good overview of just like how people use them.
So don’t be concerned if you’re coming from zero knowledge, would like to keep this one relatively interactive because I think this is a, this is a service that people traditionally have a lot of questions about operating but they’re not necessarily always in a position where they can ask somebody who’s operated a bunch of S3 buckets at scale.
So, if you have any questions, I’d encourage you to drop those in that Discord chat and I’ll try to either answer them now or at the end. The other notable here is that I did already put in the scroll back all the links for today.
So there, there’s a GitHub gist in there with all the links that you’ll see in this presentation, a link to the slides as well as a link to a survey on the content that we’ll cover at the end of the presentation.
In case we haven’t met, I’m Andrew Krug. I lead the security advocacy team at Datadog. I’m also an instructor for Antisyphon training. I do a bunch of stuff besides that which we talked plenty about in the pre-show banter.
I’ve been kind of kicking around the cloud security space for roughly a decade now. In fact, if you’re going to be at Amazon Reinvent, I’m also going to be at Amazon Reinvent. Love to connect, grab a beverage or something, catch up on site.
Today, I’ll be your chief blob officer and the title of this webcast is in fact Securing the Blobs, for which I actually hate the acronym for this. When I looked up the origin of blob, because I didn’t know it before, I actually started putting the webcast together.
It’s crazy how we use these terms sometimes. So, it stands for binary large object and the, the last B is actually taken from the word object. So, it’s not a cool recursive acronym or anything.
But, but that’s what BLOB stands for. It’s just a representation of a bunch of bytes that are arbitrarily stored. It could be in a database of some kind.
It could be in something like Apache, Presto. An in-memory data store, whatever. Every cloud provider has a Blob store of some kind.
So Google has cloud storage, Amazon has the S3 storage service, and Azure also has Blob storage. And these all operate in really, really similar ways.
And we’re going to kind of talk about some of the rules for how blob stores have to work in order to do what they do. One of the things that I think is really confusing about these services is that they are designed by default to operate in a way that lends itself really well to sharing data across organizational boundaries, across account boundaries, and also to the public.
In fact, if you want to host a reliable website in 2024, with 99.99997% uptime, Amazon S3 is probably the cheapest way to go about that.
So public access is kind of part of the package that you’re paying for. S3 is notably one of the oldest services, in the AWS cloud.
So look at, this is actually a copy of the press release from 2006 that lives in an AR. so that’s, that’s quite an old service. So when Amazon was just, just barely getting started, they, they launched this as one of the very first services that they were selling when they were scaling out the extra compute that they had from what was then a bookstore.
Fun, fact, if you control F in this press release for how they’re selling the value of the service, the word security does not appear even one time. Neither does access management.
so that just like really wasn’t the focus in the design for a lot of stuff back in the day. So like I mentioned, cross account and multi account is in fact a feature of the service, according to the Wikipedia page, because AWS doesn’t really publish numbers on this.
S3 stores something like 280 trillion objects today. So, it is the right tool for the job if you need to store something, very inexpensively.
if you want massive volumes of highly available storage, it’s not the wrong choice. You just have to kind of understand how it deploys by default, what some of the guardrails are that we can put in place and then how to detect when weird stuff happens.
And that’s exactly what we’re going to kind of COVID inside, of the context of our conversation today. Yeah, lots of, lots of chatter in the discord about the lack of security mentioned in the press release.
backups are also something that we’re not going to cover in the webcast today because, that’s kind of like an operational resilience thing. But like, notably, I think there’s also a like a, an incorrect perception that because you put something in S3 it’s just like inherently backed up without having to copy that to another bucket or something.
And we’ve seen that happen lots of times. Wow, I triggered the trigger, the balloons there. So let’s talk about the common issues here in the service and kind of break those down in terms of where does this go wrong for people in terms of their understanding of how the storage works and what some of the rules are for engagement with that service.
And we talk about this in securing the cloud in detail. but this is going to kind of be the very quick version of S3 issues.
It’s like most, most often we see issues with S3 and you’ve seen plenty of headlines where somebody has misconfigured what is called a resource policy for a bucket.
And we’ll talk a little bit about how resource policies are different here in a bit. But what you’re looking at on screen is probably the most common misconfiguration which is that folks just grant access to the principal star, which in the case of AWS means anonymous.
they allow S3 get object on any object in the bucket and then somebody discovers that bucket and then they manage to pull all the data. Or worse than this would be as if they granted all actions to the anonymous principle so that somebody could know the names of all the objects and then fetch all the buckets.
which really, the fact that that’s not present in this policy just slows somebody down. It doesn’t really prevent them from getting data in the bucket. So we see this relatively frequently.
This still comes up in real world audits that I do today. And we’ll talk about the right way to do this in just a bit. The second one is the now deprecated access control list system.
So this actually launched with the S3 storage service. And before resource policies existed this was the way that you granted access to single fil or folders, inside of the S3 object system, you could grant object to the owner of the bucket.
So anybody inside of the account boundary, all users, which was public access authenticated users, which folks didn’t really understand at the time. So this actually meant anyone with an AWS account, period, not anyone inside of your account.
And That’s a pretty common mistake that people make. You could also grant access to a log delivery group principle. But even though this ACL system was very, very simple, it was still a source of, a common source of misconfiguration.
Because I think as a user inside of aws, you believe that if you say everyone that’s often scoped to the boundary of your account, and we’ll talk about why that’s not true for S3 and a couple other services, but I don’t think in particular in the console AWS did a great job of like telling its user community hey, this is something that is scoped in a very different way than some of the other resources.
And then one of the new and spicy attacks that we’re starting to see in S3 is actually the abuse of what we call like predictable names. so there’s a great Aqua Security article that I linked here, in the, in the deck and the dock, that shows how attackers are actually now we have all these attackers that really understand how to use the AWS cloud.
They are predicting the names of buckets inside of specific folks accounts. And through misconfigurations they’re actually able sometimes to put infrastructure as code inside of those buckets and then trick a service like Cloudformation or the CDK into deploying that.
And that is a pretty massive problem. So we’re going to talk about a mitigation for that as well. But this article is a really great read. You’re definitely going to see more research like this pop up in the cloud security community over time because there’s all sorts of implicit trust that’s placed on service principles inside of the AWS cloud where you have like a service that’s very broadly trusted potentially from an account that’s outside of your organization that can pivot in and do things like drop a file, in a bucket, so long as you understand the name.
So this is an example of an infrastructure as code tool that’s called the cdk, which we do use in the class. And it was generating predictable bucket names.
So it’s like cdk, region account number, something like that. It’s like very easy to guess if you have a little bit of information about the account that you want to take over and then you can kind of plan or see if there’s a way for you to actually get code into that bucket that will then be executed by the Cloudformation service which operates with a lot of privilege inside of the account.
So this, this has all been reported, it’s all been mitigated for this specific example. But this is a really good example I think of how attackers are getting really clever, in how they’re exploiting misconfigs, inside of S3.
Thanks, for linking that in the chat. again this is just the same diagram a, little bit bigger without the link.
if you want to check out some of the flow through that where you have a user that creates the bucket in the victim account and then what the attacker was doing is duplicating a bucket that had been created.
So you had a, in the blue account, what would happen is that the tool would create the bucket. Maybe that bucket would get deleted at some point. And because the names of those buckets are globally unique across the entire AWS cloud.
AWS cloud, not just your account, somebody would go and create that same bucket name in account B and open it up so that any account could read the objects and then the cloudformation service would pull data from that bucket that lived in an untrusted account.
So it’s just another form of something like a subdomain takeover. If you’re familiar with subdomain takeovers, somebody in the chat points out sad that people use cloudformation or not Terraform could be the same type of attack for Terraform or Terraform State, depending on how you’re, how you’re storing the state or if you’re using S3 bucket to store IAC stuff.
So like really this, this isn’t this isn’t a unique scenario for just cloudformation. Like cloudformation in this case is the target service. But lots of services in AWS work this way.
And in fact there’s a increasing body of research that shows us even behind the scenes for certain types of operations. AWS itself will create these predictably named buckets with specific permissions and if you can like stomp those by pre creating them inside of a malicious account, there’s a potential pivot there.
Or you could get a service to pull a file from an untrusted location. So obviously if this happens, it’s really, really bad. like I said, most, most of this has been mitigated.
But this is really just the beginning of this kind of research into to how AWS services trust each other implicitly. behind the scenes in some cases.
So you might be asking yourself at this point in the presentation, like, why the heck is this stuff so hard? And like, why do people get it wrong so much of the time? in this case and for many of the cases, the data leaks as well.
That’s because there’s a fundamental lack of understanding that S3 is what we call a global namespace. That means that when you create an S3 bucket, if you create an S3 bucket, that’s like bob.example.com that bucket name is globally unique inside of the entire AWS cloud, and you home that to a region, but nobody else can go and create a bucket with the same name, ever until you delete that bucket and release the name.
So, the same concept applies to the way that the permissions work because again, cross account and multi account access is a feature.
So, for the, for the shadow bucket example that we talked about, the answer to this, or at least the prescriptive guidance from AWS has been to generate random names for all your buckets.
And in fact, in cloudformation, when you just declare, hey, make me an S3 bucket, it will generate a bucket name that is effectively only a guid.
And that’s all well and good, except the downside of that is that now the bucket name for that thing is in fact a good. And because we actually don’t want attackers to know the names of these buckets, now all of a sudden we have to treat them as secrets.
which is kind of unfortunate because that means, we don’t want to put them in, in public source code, we don’t want to put them in documentation. So, like, where do we end up putting them?
The answer is that we’re starting to see patterns where people are moving these, relatively seemingly benign parameters now out to things like, parameter Store.
parameter store is just another service in the AWS cloud. You could store secret parameters, you can store not secret parameters in there, but, putting the bucket names into parameter store and then having the application retrieve those for what the bucket is at the time that it needs to use that bucket is now like kind of being guided as the best practice.
And that does create a lot more work for developers. it does make it harder to understand as a human being going and kind of looking at your bucket inventory, what the heck different buckets are for.
And we’ll get to a little bit of guidance on how to, how to structure that in a bit. but this is the best practice. It Certainly doesn’t make life any easier. But it’s what we’re doing.
So kind of beyond data breaches, we also have compliance and operational concerns with regard to S3 buckets. we have to have certain compliance controls obviously to comply with minimum standards or legislation in your region.
So some examples of that would be encryption at rest, encryption in transit and file level encryption, depending. We won’t get to file level inside of this hour because we simply don’t have enough time to break down the crypto SDK, but could do that in a follow up webcast.
If anybody’s really interested in how you use AWS KMS to encrypt stuff. the lifecycle of data is really really important and we’ll talk about a couple of ways to manage that at the end of the session.
And then we’ll also talk about data residents, a little bit as well. So one of the unique things about S3 is that it does allow you to home data in a specific region in the AWS glob.
All of what’s called the primary regions In AWS have S3 storage. So you can of course decide that your bucket is going to live in a specific region which is relevant for things like gdpr, where data can’t cross certain borders.
And you need to be able to actually attest that on the operational resilience side, this is not necessarily a security thing but I do like to always talk about it because resilience is increasingly hand in hand with security in terms of the way that we do incident response.
So we consistently talk about, is it, is the secure system that you’re designing performant and is it also resilient and that that comes into the S3 conversation as well when we talk about multi region data replication or potentially what happens when you have like an issue with an availability zone inside of AWS.
I think the S3 service today, if memory serves, the last time I looked at the SLA’s prom, promises seven nines, of uptime inside of a single region.
So it is very very highly resilient. and now they have different storage tiers so you can put data in different classes of iops. So like really, really speedy storage.
Normally speedy storage. And then when we get to the lifecycle part of the conversation we’ll talk about the glacier service, which is very, very slow storage for archival purposes.
So you may be kind of wondering here, given, given that the service is relatively basic. Like why, why is it so confusing? what, what can we do about it?
So the first I want to address the problem of why, why it is so confusing. so I think that the reason why this service in particular is misconfigured more frequently than most other services is because it doesn’t follow the regular pattern for access.
And this is like the 101 level slide of the Identity and Access Management lecture that I have in the course, which is that in normal access it requires two things.
You have a user principle. That principle has a credential. That credential has been granted some access, either explicitly or in the role that it can assume.
You use that credential in a call to call an API. That API first authorizes the request in the access decision facility and then it carries out that request to do things.
Right. That is not the way it works for the S3 service. So like, normally, this also by the way, happens inside of a single account boundary. The S3 access requirement, because this is a service that is inherently public in many of the use cases, is really only that you have curl, available.
So it has anonymous as an option and attackers use this and they exploit it all the time to hunt down S3 buckets. So if you, if you’ve heard about things like bucket enumeration attacks, what what people will do is they’ll scrape like the HTML or the JavaScript off of a website, they’ll match that on a regex for what looks to be an S3 URL and then they’ll just attempt to snag things from like a word list out of that bucket.
And based on the responses that we get back from the API, we can actually know is that really one, a URL that points at S3, even if it’s not an S3Amazon aws.com is that bucket public?
Is there content in the bucket that’s public? And so we can actually trick the AWS API into leaking some information about the company, about the bucket, about the account, just by kind of going through this relatively benign looking scraping exercise.
There are also other ways that S3 buckets leak information. And this is like a really clever one I actually found at the URL that I linked, in the slide deck, which is where an attacker is actually sliding through a bunch of emails that they had from like a password list, like a username dump or something.
And they’re actually trying to apply an ACL to a file that they know happens to be in the bucket. Well, if you do this, for, for a word list, you can actually get S3 to disclose back to you, what the root user’s email is, because the API will give you a different response for a valid email address versus an invalid email address.
So this is just another way that S3 potentially increases, the attack surface for a specific account.
So, one of the folks in the chat asks, if a bucket is not public, is it still possible to scrape for URLs or buckets? And the answer is yes.
At a minimum, what you can do is you can get the API to disclose that there is a bucket there. You won’t actually get any information about the underlying bucket, but in the headers that are returned, even if it’s a denial, it will return an Amazon request header that leaks that information that there is a bucket with that name, inside of the account.
So, good question. there’s also some ways, depending on the configuration, to trick the Bucket service into leaking the account id. we’re not going to dive deep on that because this is a defensive presentation, not a red teaming presentation.
But, there are plenty of articles out there to talk about that. So, like the TLDR here is that for years people have been scrubbing their account IDs out of screenshots or scrubbing their bucket IDs out of things.
And there’s kind of been two camps. One is that the less information that you can share with attackers, the more secure you are. And there’s been one camp that was kind of saying, what’s the big deal?
I don’t know why you care about sharing account IDs or I don’t know why you care about sharing bucket names. And the answer is really here that knowing more information does in fact make an attacker’s job easier.
So the more you can kind of safeguard this information, you can make it much more difficult to guess. you’re really just going to slow people down. But it is a good first line of defense here.
So the guidance is now quickly trending towards make the names of these things a lot less predictable and then really, really safeguard the account IDs for your S3 buckets.
So another reason why I think this is hard is, we talked about that classical IAM policy model where you have a credential. the credential is granted something, it’s authorized against the API, and then you carry out the actions.
S3 and a number of other services which are actually listed on the slides, they Use a system that’s called resource policies. And resource policies are a way that you dictate who can access that resource and then somebody in another account can grant themselves access to call the API, but then that’s validated against the policy on the bucket, the queue, the VPC endpoint, whatever, in order to decide if they’re authorized.
So it’s kind of a two part decision. And because these resource policies are inherently designed to give very, very broad access, this is where people really mess up.
And we’ll talk about kind of the public versus not public guardrails, here in just a minute. a good question here in the chat. Is there any point which invalid requests trigger an abuse lockout or is there no rate limiting?
So that’s, that’s a two parter question. So there is no point in which an invalid request would trigger an abuse lockout. there is however an exponential back off rate limit for API calls in aws.
And I think it, depending on the account, I think it’s, it’s like a thousand per second or something. don’t, don’t quote me on that. But there is a, there’s a rate limit on the S3 endpoint itself.
it’s not low so it’s really just slowing you down. And if how to handle rate limiting of any kind, you can find examples out there of exponential back off loops and things for enumeration.
And there’s really no downside to doing it. So people just be doing it.
I’m going to save this question about denial of wallet type stuff until the end. so arcus honor, bring this question back about money, stuff at the end of the presentation because that’s a big topic that I didn’t have time to write in here.
So now we’re going to get, we’re about halfway through, right, so we’re halfway through our time together this morning. let’s talk about the guardrails that we can put in play because I really do want to arm you with some advice on how you can handle these different levels of controls for securing S3.
So like building bottom up, like obviously like logging is a big part of any defense in depth strategy. S3 is often overlooked from a logging perspective and I’ll talk about why here in just a bit.
There are organization, level guardrails you can put in place. So if you are the owner of like the billing account and you have multiple AWS accounts. There’s things that you do at a very high level, there’s things that you can do at account levels, there’s things that you can do at the bucket level to prevent data from becoming public if you don’t intend it to.
Then there’s network level. So things like TLS transport, for data required by default from the bucket. And then there’s also the file level, that we don’t have time necessarily to get to today, which would be actually encrypting the data that you store inside of S3.
So the easiest one to do is if you are a company that does not have a use case, to use S3 for public data, there is now this thing that’s called a public access control block, which is on by default in all AWS accounts that are created as of about five years ago or so.
The problem is, is that most of us have had our AWS accounts longer. So we actually have to go and enable this. And it’s kind of a two parter. If you’re in a billing organization, what you do is you go and enable this public access block for the accounts that have no public data whatsoever.
And then you actually apply what’s called a service control policy so that no one can ever relax that setting without contacting the billing owner. So this is a really nice separation of concerns in how you, you have to be really, really intentional and go through a lot of different steps to make an S3 bucket public.
So this, this setting, once it’s applied, says for an entire account or an entire organization, you can just never have public data regardless of what somebody puts in a policy.
And that’s really, really safe. Of course that doesn’t usually work for everyone because most people have at least one use case where they want to make data, public from S3 data store.
The next one is at the account level. So there is this account level setting, block public access for this account. So this is just one step down. If you’re in a single account reality where you’re not in a billing organization, you can turn this on.
But of course anyone who is an administrator can go in and they can relax that setting in order to make data public.
So we have to address this use case sometimes because obviously like it’s great to host data in S3. In fact, you could argue that you actually decrease the attack surface for your company by hosting data in S3 because all of a sudden you don’t have to maintain nginx or Apache servers or IA servers to host simple static website content.
So what did we learn though? in the anticast like before this or in the pre show banter before this, we’re talking about how we don’t seem to learn from the last 10 years of security.
We learned this a long time ago that mixing content of different types inside of the same boundary in any system is not a great idea. And I put up a good old fashioned Internet Explorer mixed content war.
But really the piece of advice that I’m giving you is that if you have to have public data, separate that at an account level and then have buckets that only host public data.
Don’t mix and match private and public data inside of the same blob store. So if you’re a visual learner, account A has public stuff and account B has not public stuff.
And then you can enforce that guardrail at the appropriate level and then you can home all your public stuff in one location. Really, really makes sense. Especially if this is an intentional design.
The problem is if you’re already in an AWS account, it’s a little bit more difficult to like back yourself into it. So somebody asked in the Discord chat, we’re now considering accounts to be a boundary instead of a resource.
And the answer is yes. the guidance from AWS for some time ever, since organizations really came out is that more accounts are inherently better.
In fact they’re the strongest security boundary that you can really take. And it’s free to create more accounts. Of course, if you have platforms or systems, that have a financial impact as you scale a number of accounts, you have to think about that.
But really it’s quite easy to create new accounts. So like the next thing is if you have to host public data, be really, really intentional with public grants. And you’ll hear me talk about this in the which is that if you have public data, one of the most powerful things that you can do in the way that you are storing that data is actually to tag it with a tag that just says hey, it was intentional that this piece of data was, it’s, it’s meant to be public.
And so what we can actually do is we can actually create a strong contract on the policy side. And this is an example of that policy using what’s called a condition key. And that condition key just says if I’m going to serve data to an anonymous principal, it has to have the tag public.
And that’s a really nice separation of concerns. That just requires somebody to place a tag otherwise this policy will not be in effect and that prevents the accidental scenario like hey, I was copying this entire directory of stuff and when I did I accidentally copied up a salary spreadsheet or something.
True story, happens all the time. People use the aws s3 sync command and they have like a temp file or something that didn’t belong. So if obviously if that tag wasn’t present that data wouldn’t be served to an anonymous principle.
next in line is we want to ensure data is encrypted at rest and in transit. So S3 does have one click transparent encryption by default that really just checks the box.
From a compliance perspective we actually should be going one level beyond that. So server side encryption is very nearly free or zero cost.
But using a customer managed key is better. So AWS has an HSM kind of a situation where you can create a little bit of key material and then that key becomes an additional way to grant access to data that’s stored inside of S3.
So in this scenario if something was encrypted with SSE KMS and somebody gained access to the bucket, they couldn’t actually gain access to the sensitive data because they don’t have a grant to decrypt the data using that piece of key material.
So belt and suspenders approach for sensitive data stored. You’ll see this in like HIPAA applications, in particular like fintech, space is kind of a default pattern.
So the next thing in line here is that really we want to disable all the old systems that are no longer serving us. and we talked about that access control list system that allows authenticated users for example.
So AWS has really recognized this is a problem. and in fact now ACLs are disabled for all new buckets that are created. But there is a way for you to back yourself into disabling support through those old ACLs that get people in trouble all the time.
here’s an example of a policy that pins a specific TLS version. So if you’re in a regulated space where you have to ensure that data is only transited over specific major versions of TLS or greater, you can also do that to ensure the integrity of data from S3 and transit.
from a data integrity perspective, bucket versioning is a easy checkbox to turn on. Which just means anytime that a file is written or overwritten, it creates a copy on write snapshot.
This is incredibly useful for Dr. And IR because you can tell when data has been tampered with and by whom.
now we’re going to talk a little bit about logging, which is where I really want to focus a little bit of, time in the conversation because I think that this is the, this is like one of the biggest misses when it comes to storing data in S3 is that folks assume that they’re never going to have a problem or that the relative cost of logging is so high that it’s just not worth doing.
lots of good questions coming in the chat here. are there good tools for migrating large numbers of accounts into a master organization? And the answer is yes. if you want to move from many single accounts that you’re paying the bill for to a single consolidated billing organization, it’s like a one click invite.
So if you have a way that you can assume an admin role in that you’re like one API call away from being able to group those things in. And until you do things like enable service control policies, there’s no downside to consolidating them under one master, organization.
That’s like a brand new shiny account that then you can move into. Somebody, asked, how are attackers able to determine when a bucket was deleted?
there’s no actual way that they can. They, they just check over and over to see if that bucket name has been freed until it’s freed and then they stomp it. So this would require a little bit of reconnaissance on the part of the attacker in the shadow bucket scenario to know what tooling somebody’s using or, or understand their, their process and practice to be able to predict those bucket names.
Unfortunately, like human, humans are really predictable. Corey Ham has a good quote where he says, if you can think of a password as a human being, it’s been part of a data breach.
and that’s probably true for predictable bucket names in S3. All right, so option number one for logging events, for S3 buckets is actually just to use CloudTrail.
And we talk about CloudTrail all the time as something that you want to have on by default. If you’re not familiar with the CloudTrail service, all it does is log every single API call that anyone makes in AWS except when it comes to storage.
So there was a really intentional choice in CloudTrail that if you wanted data events that you had to opt into those. and so, so you can certainly do that. And then you get every single object call for Anonymous principles and non anonymous principles and denies and allows and all that good stuff.
The problem with this is that it’s really, really expensive. So logging every single, object, get object, put access denied for S3 can like really add up in terms of your Cloudtrail volume.
And because Cloudtrail charges an additional fee per gigabyte versus just regular log storage, that that cost can actually rack up super, super fast.
So this is usually not my recommendation if you have no other option. Generally what I would recommend is that you only take your sensitive buckets that contain sensitive data and opt them into this.
But again that requires you to understand using tagging, labeling or account boundaries where is the sensitive data. And then if that data is very, very frequently accessed, it’s still going to be quite expensive.
So option two here is that we can use S3 to secure S3 from a logs perspective. So we can use a feature, called server logs, in S3 and this is what I see the majority of people who are running storage accounts at scale actually doing is there, they’re creating what’s called the S3 server logs, which are different than access logs by the way.
And we’ll talk about how they’re different and how they’re the same here in just a second. these are just Apache logs because like behind the scenes, whatever the S3 services today, it probably started as like an Apache proxy or something and they just like carried the format forward.
so this is what S3 storage logs look like. It’s just a checkbox to enable these and you can sync them to one location inside of aws. So you, you tell it, send the logs from this bucket to a dedicated logging bucket.
It can be just one single bucket for every S3 bucket inside of your entire account in a region. And then you can aggregate them, however you aggregate logs.
The fun stuff here that you get is you get the bucket owner, in the form of a uuid, you get the name of the bucket, timestamp information, the remote IP of the requester, the user principal, that was the caller.
So in the case that this is a privileged operation, you’ll actually get the arn or the identity of the user that was requesting that item, the operation itself. so in this case it was a put call for a single object and then you get the key.
So that’s really a lot of information that you could use for detection and response. You could use this to establish patterns of normalcy for ML models inside of whatever Your sim is.
So if you’re not collecting these logs today, this is a very low cost efficient way to gather logs. The downside of this style of logging is that there isn’t really a delivery SLA or it’s, it’s quite long if memory serves on when the events actually come in.
So if you’re looking at very, very real timey detection for access of files, best to go back to either event triggers or that cloudtrail, method for incredibly sensitive data.
Missed a slide here. So the next piece of advice here, which is one of the better pieces of advice I can give is that, if you want to create lots of S3 buckets or even if you’re just creating a few, like beyond two, you should probably have a pattern that you use for creating buckets.
And I don’t care if this is like a list of ADA AWS commands that you just have somebody paste into a terminal or whether it’s cloudformation or terraform, but make sure it’s the same every time. So at least if you’re making a mistake, you’re making the same mistake over and over again.
But if you have desired best practices, this is the best way to make sure that it’s enforced on a consistent basis. Think of it as a way to create buckets as a service and if I were you, I would probably have two versions of this.
One that is for public buckets and one that is for not public buckets. I find terraform modules are really great for this and I linked a resource here.
the Cloud Posse has a pretty good terraform module for this where they just allow you to true false on different functions and features, and then have it go ahead and create the bucket.
so it makes it really easy to organize. It makes the functionality that you’re trying to achieve in terms of creating resources easily encapsulated. It allows you to reuse a lot of code over and over over again and that ultimately ensures that your security standard is really, really consistent across your entire environment.
In fact you can even do something like place a tag on all the buckets that were created for IAC with iac, so that you can go back and clean up the stuff that was created with human hands if you’re migrating to something like this.
So the last piece of advice here before we go to open Q and A is also to really pay attention to those lifecycle policies. It was kind of minor point that I made in terms of the compliance aspect, but S3 has this feature that’s called a lifecycle policy.
And what you can do is you can say that when data is written it only has a certain shelf life inside of S3 before it’s either deleted or sent to a long lived archive. And you can determine the amount of time on this.
So some data, obviously you’re not going to want a life cycle because it’s very long lived data. It’s user data that you’re storing for somebody. But if it’s log data or access information, something like that, you, you are sometimes legally required to have a life cycle on that policy.
And if it is your own log data and you aren’t legally required to lifecycle it, I usually advise people to do it anyway because that data that you’re storing does become a liability to the organization in some respects.
Like a great example would be, is if you’re, if you’re in a lawsuit or something, and you receive a spoliation letter for log data and you have no policy for when you retire that log data you are required to retain it indefinitely until the end of that lawsuit, which in some cases can be many, many years.
and that can become very very expensive to retain over time. So making use of these lifecycle policies for the transition of data to slower archives and then ultimately deleted is a great best that protects you and it also makes sure you don’t have data kicking around that you could leak later on.
So with that we’re at 45 minutes after the hour. I’ll go ahead and open it up to open Q and A. I do have a couple of links in here, that I would ask that You do.
There’s a survey in here which is the same survey I give at the end of every session. the contact information is of course optional but if you can let me know like if you learned something, did you find the content useful?
It’s also got sign up in there for virtual coffee sessions that I’m rolling out right now. So if you’re in the queue for that you will get links to a calendly.
I just can only accept so many per week. And then in terms of all the resources that you saw, including the slide deck, there’s a bitly link there that has links to a GitHub gist with all that information and with that I will take a few questions here.
let’s see, Somebody asked could CloudTrail be set to only record denials.
the answer is no. So, like, cloudtrail will record allowed and denied by default. The only limits that you have inside of default CloudTrail is to restrict it either to read or write operations.
I know somebody did ask about the, denial of wallet attack problem, which I do want to address, because I also didn’t have time to build that into the slide deck. But there are cases, where, somebody, somebody had a public S3 bucket and somebody like, literally just hammered a public resource so many times they racked up a massive bill for the account.
And AWS has begun, to put mitigations in place for that now, as a result of some really good research that was published on the topic. If, there’s another specific question about denial of Wallet, I’d be happy to answer, that as well.
Or really anything with regard to S3 cost.
Zach Hill
Hey, Andrew, great job, sir, as always.
Andrew Krug
Thanks so much. how would you advise to start playing with this ourselves if we don’t have access to this AWS free tier and terraform? So they.
Yeah, this comes up a lot. if you want to be a cloud practitioner, I would advise that you have a personal AWS account that you’re playing around with.
A lot of this does fall into free tier, usage. In fact, S3, if you don’t store anything inside of the bucket, actually costs nothing. the only real piece of advice that I would say, like don’t skip if you’re going to set up a home lab AWS environment, is don’t skip setting up a billing alarm for yourself that’ll send you like a little note when it’s going to exceed whatever your lower bound is.
Like I tell people, like $5 or $10 a month if you can afford that is a, super reasonable lower threshold. how do you address public buckets?
That should be private and they’re not a client. So I, think that question is probably, like if you find a public bucket that should be private in somebody else’s account, where do you report it?
So if you, if you don’t know how to get hold of the customer, you can actually report those to aws. If you think it’s like a legit data leak, you can report that through the Vulnerability Reporting Program.
they have a vehicle for that. Sometimes it does take them a little while to get back to the individual, but I assure you that they do, follow up on reports like that.
Zach Hill
Thanks.
Andrew Krug
Do you find aws’s well, architected labs cover this well. Or what resources would you suggest in addition to the content for self learning? I do find that the AWS well architected security pillars cover this pretty well.
Like the problem with those is that they are in fact pillars and they’re not actually tactical. So. Well architected is going to cover some of this stuff. It’s not going to get you down to the prescriptive guidance like strong opinion, type stuff.
When I actually built this presentation, I just went in the cloud security forum Slack and I asked a ton of people hey, would you what, what are the best S3 resources you’ve seen in the last like couple of years?
and compiled those. So take advantage of the community too. Like a chat like this, cloud security forum Slack is a good one.
Zach Hill
How often do you reach out to the community in instances like that?
Andrew Krug
all the time. because like I, I don’t get to work with every part of AWS in my job. It’s, it’s pretty massive and so I, I love to lean on other people. I find in cloud security in particular, people are just generally really friendly and they love to share resources.
So never hesitate to ask.
Zach Hill
For sure. I love to hear that man. did you see static tiers question there?
Andrew Krug
is that the well architected question? Yeah, yeah, yeah, I covered the well architected question.
Zach Hill
And is your class going to be a good, good learning for that?
Andrew Krug
Yeah. We have an entire module in the class on S3 as well as CSPM type stuff for detecting S3 misconfigs. there’s one more here on encryption, going back to server side which is SSE S3 encryption.
I researched this briefly but is the decryption of that data tied to read access on the resource? The answer is yes. in the, in the case of transparent encryption, behind the scenes, it’s, it’s being decrypted on the fly as it’s being served.
Zach Hill
I was trying to get a link to your class to put in chat, but HK beat me. Thank you. HK Yep. Now we have a couple questions in, in Zoom as well. I can grab a couple of those.
Andrew Krug
Yeah. Mhm.
Zach Hill
Matt asks what questions should they ask their vendor who owns and maintains their S3 bucket to make sure they are doing like all the things that you’re doing doing.
Andrew Krug
I mean in terms of third party risk. I always like to ask people what their logging strategy is for this and like detection and response, depending on the use case for the bucket.
Great. to ask about, labeling isolation inside of the tenant because anytime you have a vendor, that’s maintaining data for you.
Depending on the sensitivity of that data, you want to make sure if it is inside of a single account boundary. How are they ensuring that if somebody pops box A that they don’t have access to every single customer’s data?
Potential good answers are encryption, like different encryption keys per customer, which you generally won’t find because they’re expensive. But you get the idea.
Zach Hill
How do attackers identify open and misconfigured S3 buckets across the Internet? Internet S3 bucket URLs are so lengthy.
Andrew Krug
Yeah, we kind of went through this a little bit, about a third of the way into the presentation. really one of the easiest ways is to use word list to try to just guess, and then pull predictable names or URLs.
And then if you get a, like certain headers back, that there’s a bucket there and people are doing stuff like that all the time. In fact, there are lists of known S3 buckets I’m sure that you can go out and get from, threat actor communities.
You can also scrape them from places like GitHub and web pages.
Zach Hill
Are buckets vulnerable to ransomware attacks?
Andrew Krug
it depends on your definition of ransomware attacks. So a lot of the ransomware attacks that we’re seeing today are actually not even really technical. So early days of ransomware, you’d have like a crypto locker that would come in and it would actually like use an encryption key and phone home and mess up a bunch of files.
Today what we see more commonly in cloud is actually full blown account takeovers. And then they just call you or mail you a letter and say, give me some money or I won’t give you back access to your account.
so yeah, in that sense everything is vulnerable. And yes, you can definitely traditional ransomware data inside of S3 or you can use cloud to ransomware cloud, which is also a disturbing pattern that we’re seeing where somebody controls an HSM in another account and then they use that to encrypt data in a victim account.
Zach Hill
Jellybeanster asks, how do you mitigate denial of wallet attacks?
Andrew Krug
So there’s not a great way in native S3 to mitigate denial of wallet because that’s like really on the provider. though that said, if you are, if that is really a concern, it’s Great to have something like a CDN in front of publicly accessible data and then you can use WAF rules to control any rate limiting or blocking patterns on that content.
Zach Hill
Are There any common PowerShell Python scripts available to scan and find? buckets that don’t meet best practice.
Andrew Krug
There are so many different tools out there. now for, for CSPM style scanning, like point and shoot tools in a lot of the audits I’ve worked on as a individual consultant, I use Prowler, for that kind of stuff.
Access Analyzer now, which is the free tool that AWS makes catches dangerous access conditions as well like any csp, you’re going to see, detect this kind of stuff.
Zach Hill
Awesome. Just a friendly reminder for y’all today we will be doing a breakout session after this. So if you have the Zoom, application installed on your device, at the bottom of that of your screen there you should see breakout rooms.
We’ll be doing an AMA session in there. So any, everybody is welcome to join us there and you guys can ask questions about anything cyber security related, secure, certifications or your journey.
it’s very open format. Everyone is welcome. I hope to see you there. but if you have any more questions for Andrew, be sure to throw them in Discord or throw them in the Zoom Q and A.
We’ll try to pick those up as they come in. You have a couple more minutes for a question to take it. Right Andrew?
Andrew Krug
Yeah, yeah, definitely.
Zach Hill
And if you want to come see Andrew in person, he’ll be at Wild West Hacking Fest in Denver with us this year here. He will be teaching there with us as well. Here’s a link for that.
Andrew Krug
Yeah, Wild West Hack and Fest Denver is the next opportunity. And after that, the course will be at Kernel Con.
Zach Hill
Colonel Con. I can’t wait for Colonel Con. That was a fun conference.
Andrew Krug
Yeah, Colonel Con is a great conference.
Zach Hill
I wasn’t expecting it to be just because it’s in Nebraska. You’re like what their cybersecurity in Nebraska. They have other things besides like corn and steak.
But the cybersecurity, cyber security. I can’t even say it now. Cyber security community in Nebraska is actually pretty large and they are all really amazing and friendly people.
It was so cool. I really plan to be there again.
Andrew Krug
Really really fun Con.
Zach Hill
Really.
Andrew Krug
great little downtown area that the conference is in. I had a great time in Omaha, but I also love both steaks and corn.
Zach Hill
Yeah, me too. It’s Hard to beat a good steak, so. But everybody, they’re so friendly too. And it was great, great community.
Andrew Krug
A.
Zach Hill
Lot of thank yous for your presentation today and of course, we always appreciate you coming on.
Andrew Krug
Yeah, well, always great to be here with, with so many live viewers.
Zach Hill
It’s so fun to do things live. I love being able to interact with everybody.
Andrew Krug
Yeah. Don’t, don’t forget to fill out the surveys because, like, the surveys are really. And I do read 100% of the surveys that come in. I will throw that link to my survey one more time in, the chat before I take off.
So just one more quick plug. Please fill out the survey. Even if you don’t want to be contacted, just give feedback, on the material.
Zach Hill
Feedback is so incredibly helpful. How, how do you deal with, like, negative feedback that you get? Like what? And like, how do you take it? First of all, when you see, when you get it.
Andrew Krug
I, I will say, like, most of the time I consider it a failure if I get no constructive feedback. and sadly, I don’t get a lot of constructive feedback on content.
So I’d love to see, anything that, that folks would like to see different because I, that does represent a set of the audience that I’m not reaching, in a presentation like this.
So it’s, it’s just an opportunity to make, make the content better for sure.
Zach Hill
I always ask people, like, when I’m asking for feedback, I’m, I’m always asking, like, give me your negative feedback first, please. Like, just, I, I need to know. And like, sometimes it sucks, right? Like, sometimes it could be really painful to hear you’re like, oh, wow, that was.
That, that hurt. But it’s also like a great opportunity to reflect and see where you can improve and how you make those improvements. And that’s what’s, that’s what’s critical to me and that really, really appreciate negative feedback.
Andrew Krug
Yep. As an instructor, it’s so important too. Especially like, when you’re just getting started teaching, you have this like, imposter syndrome where you don’t know if you’re actually reaching people.
And at least if somebody’s giving me negative feedback, I know that they’re listening.
Zach Hill
Yeah. Oh, Luke wants to know if you have a beginner’s course.
Andrew Krug
so securing the cloud foundations is targeted at beginners. it is a little bit of a steeper on ramp. So the recommendation is like, have some, prior experience, like on the command line.
Prior experience, like using a Text editor. but that is really like the get started and then tour all the different jobs in Cloud Sec.
looks like somebody wanted some more Microsoft content. I hear you. one of these days I’ll make some Azure content. But it’s just like it’s not my, my area.
I’ve never been a practitioner in Azure Cloud, so we’d all be learning together.
Zach Hill
It’s probably a pain like enough just keeping up with AWS and all of the different changes that they make constantly. So couldn’t imagine trying to like, maintain AWS and Azure and all of those changes that they make.
Andrew Krug
Ooh, that’s a good question. Will there be a securing the Cloud Advanced Edition? and the answer is probably. so the current class that’s out there there is about three years old.
though it does get content refreshes so it stays up to date with changes. I’ve had a roadmap item to make a volume two of that. I probably wouldn’t call it the advanced class, but it would be like volume two.
So some of the stuff that we covered the first time a little bit more difficult, mature fashion and then some new content that we didn’t cover in the first class.
Zach Hill
Oh, look at there’s a constructive feedback for the session. It seemed rushed. This could easily have been stretched out another half hour and go a bit more in depth, especially the answers for sure.
Andrew Krug
Unfortunately we only have an hour for the anti cast, so it’s always tough to figure out what to smash in here.
Zach Hill
Sure. if anybody has any other questions for you, what’s the best way to get a hold of you, sir?
Andrew Krug
The, best way is to email me, or you can hit me on discord, in any of these chats that I’m in. But like really just plain old email works really well. That’s why I put my email on the second slide.
That is my personal email. I’ve never had a problem, dealing with volume, so feel free to follow up any way you like.
Zach Hill
Awesome. I appreciate you being here. I think everybody has appreciated you being here as well. A lot of great, great feedback today. so can’t wait to see you again next time. Of course. And if you guys want to stay tuned and up to date with what’s coming up next, you can go to poweredbybhas.com and I’ll show you all of the different webinars and things that we’re doing.
So go check that out. we’ll be back here again next week, same time, same place. If you want to join our breakout room, we’re going to be getting that started here in just a few seconds.
but yeah, until next week. Thank you all. Thank you so much, Andrew, for being here. Any final words for anybody?
Andrew Krug
Somebody just commented that survey does not have a free form feedback section. I will just add one right now.
Zach Hill
Look at that. Updating on the fly. I love it there.
Andrew Krug
It now has the survey now has a free form feedback section. Perfect.
Zach Hill
I love it, dude. Thank you. And thank you everybody else. I’ll see you in the breakout room or I’ll see you next week. Take care.
Andrew Krug
Thanks so much, everybody.