Keep your buckets tidy

There’s nothing stopping you from using bucket lifecycle rules, especially those related to expiring and deleting objects. While there are cases where it’s crucial to store objects for a longer period, this fully depends on the type of data you’re handling. If you’re not legally obligated to retain data for a specific period to meet compliance requirements, you should definitely explore the optimization options that AWS provides.

Note: The pricing mentioned throughout this post is based on the US East (N. Virginia) region.

Expiration lifecycle rules are free

Lifecycle rules are free to set up, and you will only be charged for transition lifecycle rules. For example, this applies when an object is moved from the Standard bucket class to the Infrequent Access (IA) class. However, expiration rules are completely free. You just need to set up the rule, and AWS handles the rest — rules run once per day at midnight UTC.

Potential issues with expiration rules

These are actions you can setup for a rule in AWS console:

1. Might now work as expected when bucket versioning is enabled

When versioning is enabled, it’s important to be aware of delete markers and how they work.

When you expire the current version of an object, S3 does not immediately delete it. Instead, it adds a delete marker to the object. This means that the object itself is not physically removed, but the delete marker becomes the current version of the object. Previous versions are still retained.

A delete marker itself does not contain any data, other than the bytes for the key and ID referencing the original object. However, it still incurs minimal storage charges based on the S3 Standard storage class.

Therefore, you may want to set up multiple lifecycle policies to fully remove outdated versions, especially if you need different time periods or filtering criteria. One policy can handle expiring the current versions of objects, while additional policies can manage deleting noncurrent versions and removing delete markers.

2. Not setting up the filtering

When setting up a lifecycle rule, you can filter by tag, object size, and most importantly, by prefix. These options are great for achieving lifecycle granularity.

Important thing to note, when using prefix, trailing slashes matter:

  • documents/sample/ will match objects like: documents/sample/2024-02.pdf and documents/sample/financial.json
  • documents/sample will match documents/sampleandimportant/2024-02.pdf and documents/sample.txt

Think twice about Intelligent-Tiering

While this solution is great, there are a few scenarios where you should really reconsider using Inteligent-Tiering.

1. High number of objects

If you store a large number of objects, it might not be worth using Intelligent-Tiering due to monitoring, automation, and transition charges. For monitoring and automation, AWS charges $0.0025 per 1,000 objects per month for all objects larger than 128 KB. Additionally, transition requests are charged at $0.01 per 1,000 requests. Please visit the AWS documentation for the most up-to-date pricing.

Example scenario

Let’s say we need to store 10 million objects (200 KB each) which equals 200GB of total storage. Then, in the Inteligent-Tiering, let’s assume 80% of objects will transition (8 million objects), while the remaining 20% (2 million objects) will stay in the original class.

Option 1: S3 Inteligent-Tiering

Monitoring & automation:
    $0.0025 per 1,000 objects per month
    10 million objects → $25/month
Transition charges:
    $0.01 per 1,000 requests
    8 million objects → $80/month
Storage cost:
    $0.023 per GB/month
    200 GB → $4.6/month

Total monthly cost for Intelligent-Tiering:
$25 (monitoring) + $80 (transition) + $4.6 (storage) = $109.6/month

Option 2: S3 Standard class

Storage costs:
    $0.023 per GB/month
    200 GB → $4.6/month

Total monthly cost for Standard Access:
$4.6 (storage)

With Intelligent-Tiering, the number of objects matters more than the total size. So, if you’re dealing with a large quantity of objects, the costs for monitoring, automation, and transitions can quickly add up, even if the total size isn’t that large.

For more detailed calculations, check out official AWS pricing calculator.

2. Small object sizes

If you mostly store objects smaller than 128 KB, they are not eligible for auto-tiering, and you will be charged at the Frequent Access class rates. The good news is that Frequent Access class rates in Intelligent-Tiering cost the same as the Standard class which is $0.023 per GB/month. Additionally, no automation or monitoring charges apply.

However, in this case, using Intelligent-Tiering doesn’t add much benefit and would be redundant, as you’re already paying the same rate as the Standard tier. It might be more efficient to go with regular lifecycle rules, which would let you move such objects directly to a cheaper class.

3. Known data lifecycle

Intelligent-Tiering excels when don’t know how frequently the objects will be accessed or when the data lifecycle is unpredictable. However, if you have a clear understanding of your data’s lifecycle and how it will flow through your system, it’s more efficient to use lifecycle rules to optimize and manage your objects according to your specific needs and requirements.


Minimum Storage Duration

All storage classes, except for Standard, Intelligent-Tiering, and One Zone-IA, charge based on a minimum storage duration. This means that objects which are deleted, transitioned, overwritten, or expired before reaching the minimum duration are still charged for the full minimum storage duration.

Example: If you store objects in the Standard class, you pay for the exact time the object is stored, whether it’s 2 days, 10 days, or 5 hours. However, when storing objects in Standard-IA (Infrequent Access), you are always charged for the minimum storage duration, even if the object is removed after just 5 minutes.

Therefore, hypothetically, setting up a lifecycle rule to remove data from the S3 Glacier Deep Archive class after 60 days doesn’t make sense when its minimum storage duration is 180 days.


Monitoring and alarms

It’s generally a good practice to maintain at least basic monitoring of your resources, especially when automated services can easily loop on errors, potentially resulting in millions of undetected objects in S3 (trust me, it can happen to anyone).

However, if continuous monitoring isn’t an option, S3 Storage Lens can serve as a useful alternative, allowing you to manually check your buckets and review their status. Alternatively, you can quickly set up a CloudWatch alarm based on storage size - it will take almost no time to set up and provide at least a bare minimum level of monitoring.


Summary

S3 is a fundamental and straightforward service in AWS, but it can quickly become complex, especially in distributed systems. Understanding some of its nuances will provide significant long-term benefits.