How to download ALL versions of all files within a Bucket
    • 28 Aug 2024
    • 3 Minutes to read
    • PDF

    How to download ALL versions of all files within a Bucket

    • PDF

    Article summary

    How to download ALL versions of all files within a Bucket?

    Some S3 applications and tools recognize the versioning feature and either allow you to download one versioned object at a time or download all current revisions of all objects inside your bucket.

    Using AWS CLI to Download Versioned Objects

    1. When you use AWS CLI, you can list all objects along with its version ID using CLI command. For example:

      aws s3api list-object-versions --bucket  --endpoint-url https://s3.us-east-1.wasabisys.com

      Example output:

      aws s3api list-object-versions --bucket download-versions-bucket --endpoint-url https://s3.us-east-1.wasabisys.com
      {
       "VersionId": "001595000443450881443-klYCL1RDCV",
       "IsLatest": true,
       "ETag": "\"4bd22a5ff53e7cd0160e3f51917b9393\"",
       "LastModified": "2020-07-17T15:40:43.000Z",
       "Owner": {
       "DisplayName": "sahani.p",
       "ID": "EE5775E47FC856DD908EFBCE0E69EBD91CA400F4A86B6445F2F74A7A389C7840"
       },
       "StorageClass": "STANDARD",
       "Size": 3855,
       "Key": "Wasabi-read-write-1MB.xml"
       },
      
      ...
    2. Use the get-object command while specifying the version ID to download that particular object:

      aws s3api get-object --bucket  --key  --version-id   --endpoint-url=https://s3.us-east-1.wasabisys.com

      Example output:

      aws s3api get-object --bucket download-versions-bucket --key Wasabi-read-write-1MB.xml --version-id 001595000443450881443-klYCL1RDCV new-name.xml --endpoint-url=https://s3.us-east-1.wasabisys.com
      {
       "Metadata": {},
       "VersionId": "001595000443450881443-klYCL1RDCV",
       "ETag": "\"4bd22a5ff53e7cd0160e3f51917b9393\"",
       "ContentType": "text/xml",
       "ContentLength": 3855,
       "AcceptRanges": "bytes",
       "LastModified": "Fri, 17 Jul 2020 15:40:43 GMT"
      }

    Using S3 Browser to Download Versioned Objects

    When you use applications like S3 Browser, you can download all current versions of all objects, as shown below. For more information, review How do I use S3 Browser with Wasabi?

    1. Select bucket

    2. Right click on Versions menu

    3. Download

    If your requirement is to Download ALL versions (current + old versions) of ALL objects inside your bucket, you may use the scripted approach.

    The following script is tested with Wasabi to achieve this use case:

    Make sure you have installed AWS SDK boto3 and Click package for python on your CLI before running the script. For more information, review Welcome to Click.

    This code example discusses the use of Wasabi's us-east-1 storage region. To use other Wasabi storage regions, use the appropriate Wasabi service URL as described in Service URLs for Wasabi's Storage Regions.

    #!/usr/bin/env python3
    
    import boto3
    import click
    import re
    import shutil
    import sys
    
    @click.command(help = 'List S3 versions, optionally download all versions as well')
    @click.option('--bucket', required = True, help = 'The s3 bucket to scan')
    @click.option('--prefix', default = '', help = 'Prefix of files to scan')
    @click.option('--download', default = False, is_flag = True, help = 'Download all versions (prefix filenames with ISO datetime of edit and version), paths not preserved')
    def s3versions(bucket, prefix, download):
     '''List all versions of files in s3.'''
    
    s3 = boto3.resource(
    's3',
    endpoint_url = 'https://s3.us-east-1.wasabisys.com',
    aws_access_key_id='Wasabi-Access-Key',
    aws_secret_access_key='Wasabi-Secret-Access-Key')
     bucket = s3.Bucket(bucket)
     versions = bucket.object_versions.filter(Prefix = prefix)
    
    for version in versions:
     object = version.get()
    
    path = version.object_key
     last_modified = object.get('LastModified')
     version_id = object.get('VersionId')
     print(path, last_modified, version_id, sep = '\t')
    
    if download:
     object = version.get()
     filename = path.rsplit('/')[-1]
     with open('{last_modified}-{version_id}-{filename}'.format(last_modified = last_modified, version_id = version_id, filename = filename), 'wb') as fout:
     shutil.copyfileobj(object.get('Body'), fout)
    
    if __name__ == '__main__':
     s3versions()

     Execution syntax for the above program:

    python s3versions.py --bucket  --prefix  --download 

    Here are the outputs:

    1. The bucket has multiple versions of different files inside a “ download-versions-bucket" bucket, the below command is listing all of those along with its Version ID.

    syntax:

    python s3versions.py --bucket 

    Example output:

    $ python s3versions.py --bucket download-versions-bucket
    
    Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
    Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
    Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
    Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
    Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
    Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
    Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
    Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null
    bucket-utilization-for-invoice.csv 2020-07-17 19:01:44+00:00 001595012504138434824-ieVYu1_rZj
    bucket-utilization-for-invoice.csv 2020-07-17 15:39:58+00:00 null

    2.  Listing based on prefixes:

    From the entire file, you can choose to list files based on prefix matching. In this example, we are only listing all objects that have Wasabi as a prefix.

    syntax:

    python s3versions.py --bucket  --prefix 

    Example output:

    $ python s3versions.py --bucket download-versions-bucket --prefix Wasabi
    
    Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
    Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
    Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
    Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
    Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
    Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
    Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
    Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null

    3. Downloading All objects (Current + Old versions):

    syntax:

    python s3versions.py --bucket  --download

    Example output:

    $ python s3versions.py --bucket download-versions-bucket --download
    
    Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
    Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
    Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
    Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
    Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
    Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
    Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
    Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null
    bucket-utilization-for-invoice.csv 2020-07-17 19:01:44+00:00 001595012504138434824-ieVYu1_rZj
    bucket-utilization-for-invoice.csv 2020-07-17 15:39:58+00:00 null

    4. Downloading ALL objects (Current + Old versions) based on prefixes:

    syntax:

    python s3versions.py --bucket  --prefix  --download

    Example output:

    $ python s3versions.py --bucket download-versions-bucket --prefix Wasabi --download
    
    Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
    Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
    Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
    Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
    Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
    Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
    Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
    Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null