Active Read with MultiPartUpload
    • 18 Oct 2024
    • 1 Minute to read
    • PDF

    Active Read with MultiPartUpload

    • PDF

    Article summary

    To perform concurrent GET (download) operations while uploading files to Wasabi S3 using multipart uploads, you can follow these steps. The examples use Python with Boto3, as it's a common choice for interacting with S3-compatible services.

    Step 1: Set Up Your Environment

    Install Boto3:

    pip install boto3
    

    Configure Your Credentials: Ensure you have your Wasabi access key and secret key ready.

    Step 2: Initialize Your Multipart Upload

    Start the Multipart Upload: You need to initiate the multipart upload and get an upload ID.

    import boto3
    
    # Initialize S3 client
    s3 = boto3.client('s3', 
                      endpoint_url='https://s3.wasabisys.com', 
                      aws_access_key_id='YOUR_ACCESS_KEY', 
                      aws_secret_access_key='YOUR_SECRET_KEY')
    
    bucket_name = 'your-bucket-name'
    file_name = 'large_file.zip'
    multipart_upload = s3.create_multipart_upload(Bucket=bucket_name, Key=file_name)
    upload_id = multipart_upload['UploadId']
    


    Step 3: Upload Parts

    Upload the File in Parts: You can read the file in chunks and upload each part.

    import os
    
    part_size = 5 * 1024 * 1024  # 5 MB
    parts = []
    file_path = '/path/to/your/large_file.zip'
    
    with open(file_path, 'rb') as f:
        part_number = 1
        while True:
            data = f.read(part_size)
            if not data:
                break
            part = s3.upload_part(
                Bucket=bucket_name,
                Key=file_name,
                PartNumber=part_number,
                UploadId=upload_id,
                Body=data
            )
            parts.append({'ETag': part['ETag'], 'PartNumber': part_number})
            part_number += 1
    
    # Complete the multipart upload
    s3.complete_multipart_upload(Bucket=bucket_name, Key=file_name, UploadId=upload_id, MultipartUpload={'Parts': parts})
    


    Step 4: Start Downloading Files Concurrently

    While the above upload is in progress, you can start downloading files. Use threading or asynchronous programming to allow concurrent operations.

    Example Using Threading

    Download a File in a Separate Thread:

    import threading
    
    def download_file():
        s3.download_file(bucket_name, file_name, '/path/to/downloaded_file.zip')
    
    # Start the download in a separate thread
    download_thread = threading.Thread(target=download_file)
    download_thread.start()
    


    Step 5: Monitor Progress

    Monitor Upload Progress: You can print progress messages or use a progress library to give feedback on the upload.
    Join the Thread: Ensure that the main thread waits for the download thread to finish if needed.

    download_thread.join()
    print("Download completed!")
    


    Additional Considerations

    • Error Handling: Implement error handling for both upload and download processes to retry on failures.
    • Adjust Part Size: The part size can be adjusted based on your network conditions and file size, but it must be between 5 MB and 5 GB for each part.
    • Cleanup: Ensure you clean up resources and handle any necessary checks for successful uploads and downloads.

    Conclusion

    With this setup, you can efficiently upload large files to Wasabi S3 using multipart uploads while simultaneously downloading other files. This method ensures that you make the most of your bandwidth and resources.