How to get total size of current & non-current objects inside a bucket
    • 09 Sep 2024
    • 8 Minutes to read
    • PDF

    How to get total size of current & non-current objects inside a bucket

    • PDF

    Article summary

    There are some s3 applications/tools which recognize versioning feature and allow you to check the total size of current and non-current objects.

    For the Windows system, you may use the S3 Browser application as shown below

    In the screenshot, the first red box shows "current objects" total size and the second red box shows "current + non-current objects" total size combined.

    Screen_Shot_2021-03-22_at_8.16.54_AM.png

    We are currently not aware of any S3 application for macOS and Linux system that has a native feature to show this Information and hence we can use a scripted method to get the total size of current & non-current versioned objects using the following python code

    • Make sure you have installed  AWS SDK boto3 for python on your CLI and turned off the versioning feature on your bucket before running the script

    • Install  Python 3+ version to run this script

    Executions and Details of the Script (output & screenshot attached):

    1. When you execute the script, it will prompt you to select the profile or enter the API keys of the admin who is executing this script

      1. If you already have a profile configured on your CLI, you may Press 1


        You may configure the  AWS CLI profile for the Wasabi account using the Wasabi keys ahead of time


        NOTE that it is optional for you to use credential files to run your code but it is always a best practice to use such implementation wherein your credential keys are in a file stored on your local machine rather than being part of your actual code or entering Keys at runtime prompt


        In this example, we have set the profile name as " profilename" in the "~/.aws/credentials" file.

      2. If you do not wish to use the existing profile, you may press 2 and enter your API Keys.

    2. Enter your own Bucket Name and the region it belongs to


      Note that this example discusses the use of Wasabi's us-east-1 storage region. To use other Wasabi storage regions, please use the appropriate Wasabi service URL as described in this article

    #  Copyright (c) 2021. This script is available as fair use for users. This script can be used freely with Wasabi
    #  Technologies LLC. Distributed by the support team at Wasabi Technologies.
    
    """
    Overview
    This Script will take the following inputs:
     1. profile name / Access key and Secret Key
     2. Bucket name
     3. region
     4. Prefix
    
     Calculate the size and count of the total number of delete markers, current and non current objects.
    """
    
    import sys
    from boto3 import client, Session
    from botocore.exceptions import ProfileNotFound, ClientError
    
    
    def calculate_size(size, _size_table):
        """
        This function dynamically calculates the right base unit symbol for size of the object.
        :param size: integer to be dynamically calculated.
        :param _size_table: dictionary of size in Bytes. Created in wasabi-automation.
        :return: string of converted size.
        """
        count = 0
        while size // 1024 > 0:
            size = size / 1024
            count += 1
        return str(round(size, 2)) + ' ' + _size_table[count]
    
    
    def get_credentials():
        """
        This function gets the access key and secret key by 2 methods.
        1. Select profile from aws credentials file.
           Make sure that you have run AWS config and set up your keys in the ~/.aws/credentials file.
        2. Insert the keys directly as a string.
        :return: access key and secret key
        """
        credentials_verified = False
        aws_access_key_id = None
        aws_secret_access_key = None
        while not credentials_verified:
            choice = input("$ Press 1 and enter to select existing profile\n"
                           "$ Press 2 and enter to enter Access Key and Secret Key\n"
                           "$ Press 3 to exit: ")
            if choice.strip() == "1":
                aws_access_key_id, aws_secret_access_key = select_profile()
                if aws_access_key_id is not None and aws_secret_access_key is not None:
                    credentials_verified = True
            elif choice.strip() == "2":
                aws_access_key_id = input("$ AWS access key").strip()
                aws_secret_access_key = input("$ AWS secret access key").strip()
                credentials_verified = True
            elif choice.strip() == "3":
                sys.exit(0)
            else:
                print("Invalid choice please try again")
        return aws_access_key_id, aws_secret_access_key
    
    
    def select_profile():
        """
        sub-function under get credentials that selects the profile form ~/.aws/credentials file.
        :return: access key and secret key
        """
        profile_selected = False
        while not profile_selected:
            try:
                profiles = Session().available_profiles
                if len(profiles) == 0:
                    return None, None
                print("$ Available Profiles: ", profiles)
            except Exception as e:
                print(e)
                return None, None
            profile_name = input("$ Profile name: ").strip().lower()
            try:
                session = Session(profile_name=profile_name)
                credentials = session.get_credentials()
                aws_access_key_id = credentials.access_key
                aws_secret_access_key = credentials.secret_key
                profile_selected = True
                return aws_access_key_id, aws_secret_access_key
            except ProfileNotFound:
                print("$ Invalid profile. Please Try again.")
            except Exception as e:
                raise e
    
    
    def region_selection():
        """
        This function presents a simple region selection input. Pressing 1-5 selects the corresponding region.
        :return: region
        """
        region_selected = False
        _region = ""
        while not region_selected:
            _choice = input("$ Please enter the endpoint for the bucket: ").strip().lower()
            if len(_choice) > 0:
                _region = _choice
                region_selected = True
        return _region
    
    
    def create_connection_and_test(aws_access_key_id: str, aws_secret_access_key: str, _region, _bucket):
        """
        Creates a connection to wasabi endpoint based on selected region and checks if the access keys are valid.
        NOTE: creating the connection is not enough to test. We need to make a method call to check for its working status.
        :param aws_access_key_id: access key string
        :param aws_secret_access_key: secret key string
        :param _region: region string
        :param _bucket: bucket name string
        :return: reference to the connection client
        """
        try:
            _s3_client = client('s3',
                                endpoint_url=_region,
                                aws_access_key_id=aws_access_key_id,
                                aws_secret_access_key=aws_secret_access_key)
    
            # Test credentials are working
            _s3_client.list_buckets()
    
            try:
                _s3_client.head_bucket(Bucket=bucket)
            except ClientError:
                # The bucket does not exist or you have no access.
                raise Exception("$ bucket does not exist in the account please re-check the name and try again: ")
    
            return _s3_client
    
        except ClientError:
            print("Invalid Access and Secret keys")
        except Exception as e:
            raise e
        # cannot reach here
        return None
    
    
    if __name__ == '__main__':
        # Generate a table for SI units symbol table.
        size_table = {0: 'Bs', 1: 'KBs', 2: 'MBs', 3: 'GBs', 4: 'TBs', 5: 'PBs', 6: 'EBs'}
    
        print("\n")
        print("\n")
        print("$ starting script...")
    
        # generate access keys
        access_key_id, secret_access_key = get_credentials()
    
        # get bucket name
        bucket = input("$ Please enter the name of the bucket: ").strip()
    
        # prefix
        prefix = input("$ Please enter a prefix (leave blank if you don't need one): ").strip()
    
        # get region
        region = region_selection()
    
        # test the connection and access keys. Also checks if the bucket is valid.
        s3_client = create_connection_and_test(access_key_id, secret_access_key, region, bucket)
    
        # create a paginator with default settings.
        object_response_paginator = s3_client.get_paginator('list_object_versions')
        if len(prefix) > 0:
            operation_parameters = {'Bucket': bucket,
                                    'Prefix': prefix}
        else:
            operation_parameters = {'Bucket': bucket}
    
        # initialize basic variables for in memory storage.
        delete_marker_count = 0
        delete_marker_size = 0
        versioned_object_count = 0
        versioned_object_size = 0
        current_object_count = 0
        current_object_size = 0
    
        print("$ Calculating, please wait... this may take a while")
        for object_response_itr in object_response_paginator.paginate(**operation_parameters):
            if 'DeleteMarkers' in object_response_itr:
                for delete_marker in object_response_itr['DeleteMarkers']:
                    delete_marker_count += 1
    
            if 'Versions' in object_response_itr:
                for version in object_response_itr['Versions']:
                    if version['IsLatest'] is False:
                        versioned_object_count += 1
                        versioned_object_size += version['Size']
                    elif version['IsLatest'] is True:
                        current_object_count += 1
                        current_object_size += version['Size']
    
        print("\n")
        print("-" * 10)
        print("$ Total Delete markers: " + str(delete_marker_count))
        print("$ Number of Current objects: " + str(current_object_count))
        print("$ Current Objects size: ", calculate_size(current_object_size, size_table))
        print("$ Number of Non-current objects: " + str(versioned_object_count))
        print("$ Non-current Objects size: ", calculate_size(versioned_object_size, size_table))
        print("$ Total size Current + Non-current: ",
              calculate_size(versioned_object_size + current_object_size, size_table))
        print("-" * 10)
        print("\n")
    
        print("$ process completed successfully")
        print("\n")
        print("\n")

    OUTPUT Screenshot:

    mceclip1.png

    The script is also attached to this KB document.