How to download ALL versions of all files within a Bucket

Prev Next

How to download ALL versions of all files within a Bucket?

Some S3 applications and tools recognize the versioning feature and either allow you to download one versioned object at a time or download all current revisions of all objects inside your bucket.

Using AWS CLI to Download Versioned Objects

  1. When you use AWS CLI, you can list all objects along with its version ID using CLI command. For example:

    aws s3api list-object-versions --bucket  --endpoint-url https://s3.us-east-1.wasabisys.com

    Example output:

    aws s3api list-object-versions --bucket download-versions-bucket --endpoint-url https://s3.us-east-1.wasabisys.com
    {
     "VersionId": "001595000443450881443-klYCL1RDCV",
     "IsLatest": true,
     "ETag": "\"4bd22a5ff53e7cd0160e3f51917b9393\"",
     "LastModified": "2020-07-17T15:40:43.000Z",
     "Owner": {
     "DisplayName": "sahani.p",
     "ID": "EE5775E47FC856DD908EFBCE0E69EBD91CA400F4A86B6445F2F74A7A389C7840"
     },
     "StorageClass": "STANDARD",
     "Size": 3855,
     "Key": "Wasabi-read-write-1MB.xml"
     },
    
    ...
  2. Use the get-object command while specifying the version ID to download that particular object:

    aws s3api get-object --bucket  --key  --version-id   --endpoint-url=https://s3.us-east-1.wasabisys.com

    Example output:

    aws s3api get-object --bucket download-versions-bucket --key Wasabi-read-write-1MB.xml --version-id 001595000443450881443-klYCL1RDCV new-name.xml --endpoint-url=https://s3.us-east-1.wasabisys.com
    {
     "Metadata": {},
     "VersionId": "001595000443450881443-klYCL1RDCV",
     "ETag": "\"4bd22a5ff53e7cd0160e3f51917b9393\"",
     "ContentType": "text/xml",
     "ContentLength": 3855,
     "AcceptRanges": "bytes",
     "LastModified": "Fri, 17 Jul 2020 15:40:43 GMT"
    }

Using S3 Browser to Download Versioned Objects

When you use applications like S3 Browser, you can download all current versions of all objects, as shown below. For more information, review How do I use S3 Browser with Wasabi?

  1. Select bucket

  2. Right click on Versions menu

  3. Download

If your requirement is to Download ALL versions (current + old versions) of ALL objects inside your bucket, you may use the scripted approach.

The following script is tested with Wasabi to achieve this use case:

Make sure you have installed AWS SDK boto3 and Click package for python on your CLI before running the script. For more information, review Welcome to Click.

This code example discusses the use of Wasabi's us-east-1 storage region. To use other Wasabi storage regions, use the appropriate Wasabi service URL as described in Service URLs for Wasabi's Storage Regions.

#!/usr/bin/env python3

import boto3
import click
import re
import shutil
import sys

@click.command(help = 'List S3 versions, optionally download all versions as well')
@click.option('--bucket', required = True, help = 'The s3 bucket to scan')
@click.option('--prefix', default = '', help = 'Prefix of files to scan')
@click.option('--download', default = False, is_flag = True, help = 'Download all versions (prefix filenames with ISO datetime of edit and version), paths not preserved')
def s3versions(bucket, prefix, download):
 '''List all versions of files in s3.'''

s3 = boto3.resource(
's3',
endpoint_url = 'https://s3.us-east-1.wasabisys.com',
aws_access_key_id='Wasabi-Access-Key',
aws_secret_access_key='Wasabi-Secret-Access-Key')
 bucket = s3.Bucket(bucket)
 versions = bucket.object_versions.filter(Prefix = prefix)

for version in versions:
 object = version.get()

path = version.object_key
 last_modified = object.get('LastModified')
 version_id = object.get('VersionId')
 print(path, last_modified, version_id, sep = '\t')

if download:
 object = version.get()
 filename = path.rsplit('/')[-1]
 with open('{last_modified}-{version_id}-{filename}'.format(last_modified = last_modified, version_id = version_id, filename = filename), 'wb') as fout:
 shutil.copyfileobj(object.get('Body'), fout)

if __name__ == '__main__':
 s3versions()

 Execution syntax for the above program:

python s3versions.py --bucket  --prefix  --download 

Here are the outputs:

1. The bucket has multiple versions of different files inside a “ download-versions-bucket" bucket, the below command is listing all of those along with its Version ID.

syntax:

python s3versions.py --bucket 

Example output:

$ python s3versions.py --bucket download-versions-bucket

Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null
bucket-utilization-for-invoice.csv 2020-07-17 19:01:44+00:00 001595012504138434824-ieVYu1_rZj
bucket-utilization-for-invoice.csv 2020-07-17 15:39:58+00:00 null

2.  Listing based on prefixes:

From the entire file, you can choose to list files based on prefix matching. In this example, we are only listing all objects that have Wasabi as a prefix.

syntax:

python s3versions.py --bucket  --prefix 

Example output:

$ python s3versions.py --bucket download-versions-bucket --prefix Wasabi

Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null

3. Downloading All objects (Current + Old versions):

syntax:

python s3versions.py --bucket  --download

Example output:

$ python s3versions.py --bucket download-versions-bucket --download

Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null
bucket-utilization-for-invoice.csv 2020-07-17 19:01:44+00:00 001595012504138434824-ieVYu1_rZj
bucket-utilization-for-invoice.csv 2020-07-17 15:39:58+00:00 null

4. Downloading ALL objects (Current + Old versions) based on prefixes:

syntax:

python s3versions.py --bucket  --prefix  --download

Example output:

$ python s3versions.py --bucket download-versions-bucket --prefix Wasabi --download

Wasabi Pings.rtf 2020-07-17 19:01:44+00:00 001595012503828293579-Iey7J7ecdX
Wasabi Pings.rtf 2020-07-17 15:40:43+00:00 001595000443240234132-qUmZ0PZcBs
Wasabi Pings.rtf 2020-07-17 15:39:58+00:00 null
Wasabi-read-write-1MB.xml 2020-07-17 15:40:43+00:00 001595000443450881443-klYCL1RDCV
Wasabi-read-write-1MB.xml 2020-07-17 15:39:59+00:00 null
Wasabi_Invoice_373103.pdf 2020-07-17 19:01:45+00:00 001595012504561887231-tN92rg7sq4
Wasabi_Invoice_373103.pdf 2020-07-17 15:40:44+00:00 001595000443683495545-WeujjiU8LK
Wasabi_Invoice_373103.pdf 2020-07-17 15:39:59+00:00 null