How do I configure Object Lifecycle Policies with Wasabi?
    • 19 Dec 2023
    • 5 Minutes to read
    • PDF

    How do I configure Object Lifecycle Policies with Wasabi?

    • PDF

    Article summary

    Lifecycle Policies are used on Wasabi to manage an object's retention period in your bucket. Currently, Wasabi supports the ability to expire objects in newer buckets via the Lifecycle Tab in your Bucket Settings page, or via the API. For more information on how to configure a Lifecycle Policy, please see our documentation available here.

    If your bucket does not support Lifecycle Policies, we can utilize a scripted approach. This article demonstrates how one can set this up via script(s) to achieve this goal. When an object reaches the end of its lifetime based on its retention period, the script will remove those expired objects.

    NOTE: Data uploaded using S3 backup applications that use their own proprietary algorithms to compress, de-duplicate, or create blocks in its non-native format before uploading them to S3 Storage must NOT be deleted manually or via any external (outside of the application) Lifecycle policy. Those S3 backup applications should trigger any kind of deletion to maintain the indexes and proper chain within the backup application databases.

    There are two scripts to achieve two different use cases:

    1. Delete ONLY the non-current (old versioned) objects via Lifecycle

    • Take 'X' days as user input to decide retention time

    • Check Last modified (all non-current objects) > X days

    • Delete those expired objects

    • Use cron job to execute the script automatically once per day every day

    2. Delete BOTH the current & non-current (old versioned) objects via Lifecycle

    • Take 'X' days as user input to decide retention time

    • Check Last modified (all objects) > X days

    • Delete those expired objects

    • Use cron job to execute the script automatically once per day every day

    Before running the script:

    • Make sure you have installed AWS SDK boto3 for python on your CLI before running the script

    • Install Python 3+ version to run this script

    Executions and Details of the Script (output screenshot attached):

    For Demo purpose, we are showing one of the use cases mentioned above "Delete ONLY the non-current (old versioned) objects via Lifecycle". Both use cases scripts as well as setting up cron/schedular instructions are attached to this KB document below

    1. Enter the 'aws_access_key_id' and 'aws_secret_key_id' for your account in the script

    2. Enter the value of 'delete_after_retention_days' according to your requirements

    3. Enter your bucket name in the 'bucket' variable and the corresponding endpoint for that bucket in 'endpoint'.


      Note that this example discusses the use of Wasabi's us-east-1 storage region. To use other Wasabi storage regions, please use the appropriate Wasabi service URL as described in this article.

    4. Enter the prefix based on your requirements in the 'prefix' variable and the script will only delete the expired object(s) based on the specified prefix.

    NOTEIf you are specifying a prefix, please be sure to enter FULL PREFIX PATH (bucket name NOT included)

    # Copyright (c) 2022. This script is available as fair use for users. This script can be used freely with Wasabi
    # Technologies, LLC. Distributed by the support team at Wasabi Technologies, LLC.
    
    from boto3 import client, Session
    from botocore.exceptions import ClientError
    from datetime import datetime, timezone
    
    if __name__ == '__main__':
        # Editable variables
        # __
        aws_access_key_id = ""
        aws_secret_access_key = ""
        delete_after_retention_days = 1  # number of days
        bucket = ""
        prefix = ""
        endpoint = ""  # Endpoint of bucket
        # __
    
        # get current date
        today = datetime.now(timezone.utc)
    
        try:
            # create a connection to Wasabi
            s3_client = client(
                's3',
                endpoint_url=endpoint,
                aws_access_key_id=aws_access_key_id,
                aws_secret_access_key=aws_secret_access_key)
        except Exception as e:
            raise e
    
        try:
            # list all the buckets under the account
            list_buckets = s3_client.list_buckets()
        except ClientError:
            # invalid access keys
            raise Exception("Invalid Access or Secret key")
    
        # create a paginator for all objects.
        object_response_paginator = s3_client.get_paginator('list_object_versions')
        if len(prefix) > 0:
            operation_parameters = {'Bucket': bucket,
                                    'Prefix': prefix}
        else:
            operation_parameters = {'Bucket': bucket}
    
        # instantiate temp variables.
        delete_list = []
        count_current = 0
        count_non_current = 0
    
        print("$ Paginating bucket " + bucket)
        for object_response_itr in object_response_paginator.paginate(**operation_parameters):
            if 'DeleteMarkers' in object_response_itr:
                for delete_marker in object_response_itr['DeleteMarkers']:
                    if (today - delete_marker['LastModified']).days > delete_after_retention_days:
                        delete_list.append({'Key': delete_marker['Key'], 'VersionId': delete_marker['VersionId']})
            if 'Versions' in object_response_itr:
                for version in object_response_itr['Versions']:
                    if version["IsLatest"] is True:
                        count_current += 1
                    elif version["IsLatest"] is False:
                        count_non_current += 1
                    if version["IsLatest"] is False and (
                            today - version['LastModified']).days > delete_after_retention_days:
                        delete_list.append({'Key': version['Key'], 'VersionId': version['VersionId']})
    
        # print objects count
        print("-" * 20)
        print("$ Before deleting objects")
        print("$ current objects: " + str(count_current))
        print("$ non-current objects: " + str(count_non_current))
        print("-" * 20)
    
        # delete objects 1000 at a time
        print("$ Deleting objects from bucket " + bucket)
        for i in range(0, len(delete_list), 1000):
            response = s3_client.delete_objects(
                Bucket=bucket,
                Delete={
                    'Objects': delete_list[i:i + 1000],
                    'Quiet': True
                }
            )
            print(response)
    
        # reset counts
        count_current = 0
        count_non_current = 0
    
        # paginate and recount
        print("$ Paginating bucket " + bucket)
        for object_response_itr in object_response_paginator.paginate(Bucket=bucket):
            if 'Versions' in object_response_itr:
                for version in object_response_itr['Versions']:
                    if version["IsLatest"] is True:
                        count_current += 1
                    elif version["IsLatest"] is False:
                        count_non_current += 1
    
        # print objects count
        print("-" * 20)
        print("$ After deleting objects")
        print("$ current objects: " + str(count_current))
        print("$ non-current objects: " + str(count_non_current))
        print("-" * 20)
        print("$ task complete")

    OUTPUT:

    Now, you may set this script to run as a cron/scheduler and automatically run the script same time every day say at 9 AM to clean up expired objects for your Wasabi Bucket(s).

    The above script as well as setting up cron/schedular instructions are attached to this KB document.


    ESC

    Eddy AI, facilitating knowledge discovery through conversational intelligence