Contact Us

Uploading Large Files Upto 5TB to Amazon S3 using Boto3 in Python


awspartnerbadge

Amazon Simple Storage Service (S3) is a widely-used cloud storage service that allows users to store and retrieve any amount of data at any time. Uploading large files, especially those approaching the terabyte scale, can be challenging. Boto3, the AWS SDK for Python, provides a powerful and flexible way to interact with S3, including handling large file uploads through its multipart upload feature.


Prerequisites:

Before we begin, make sure you have the following:

Why Multipart?


Benefits

S3 Multipart Upload is beneficial for handling large files efficiently. Here are key reasons to use it:

  1. Efficiency for Large Files: Splits large files into smaller parts for better handling.
  2. Resilience to Failures: Reduces the risk of failure by allowing resumption from the point of interruption.
  3. Parallel Uploads: Speeds up uploads by enabling parallel uploading of file parts.
  4. Optimal for Unstable Connections: Minimizes the impact of network failures by retrying only the failed parts.
  5. Support for Transfer Acceleration: Compatible with S3 Transfer Acceleration for faster uploads.
  6. SDK Support: AWS SDKs offer built-in support, simplifying implementation.
  7. Concurrency Control: Allows control over the number of parallel uploads.


Writing the Python Script


Let's create a Python script that utilizes Boto3 to upload a large file to S3 in a multipart fashion.

import boto3 from boto3.s3.transfer import TransferConfig # Set your AWS credentials and region aws_access_key_id = 'YOUR_ACCESS_KEY_ID' aws_secret_access_key = 'YOUR_SECRET_ACCESS_KEY' region_name = 'YOUR_REGION' # Set your S3 bucket and object key bucket_name = 'YOUR_BUCKET_NAME' object_key = 'your-prefix/your-large-file.tar.gz' # Specify the local file to upload local_file_path = 'path/to/your-large-file.tar.gz' # Set the desired part size and number of threads part_size_mb = 50 # You can adjust this based on your requirements num_threads = 10 # Create an S3 client s3 = boto3.client('s3', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, region_name=region_name) # Create a TransferConfig object transfer_config = TransferConfig(multipart_threshold=part_size_mb * 1024 * 1024, max_concurrency=num_threads) # Create an S3 transfer manager transfer_manager = boto3.s3.transfer.TransferManager(s3, config=transfer_config) try: # Upload the file using multipart upload upload = transfer_manager.upload(local_file_path, bucket_name, object_key) # Wait for the upload to complete upload.wait() print(f"File uploaded successfully to {bucket_name}/{object_key}") except Exception as e: print(f"Error uploading file: {e}") finally: # Clean up resources transfer_manager.shutdown()

Understanding the Script


Let's break down the key components of the script:

Running the Script

To run the script:

  1. Save the script to a file (e.g. upload_to_s3.py).
  2. Open a terminal and navigate to the script's directory.
  3. Run the script using the command python upload_to_s3.py.

Ensure that the AWS credentials have the necessary permissions to perform S3 uploads.

Conclusion

Uploading large files to Amazon S3 using Boto3 in Python becomes a manageable task with the multipart upload feature. By breaking down the file into smaller parts and uploading them concurrently, you can efficiently transfer large datasets to S3. Adjusting parameters such as part size and concurrency allows you to optimize the upload process based on your specific requirements. Incorporating this approach into your workflow facilitates the seamless transfer of large files to the cloud, unlocking the full potential of Amazon S3 for scalable and reliable storage.

Continue Reading

Top Cloud Services providers with CloudPlexo's Innovative Solutions

Understanding the Difference Between AWS SNS and SQS

Uploading and Downloading Files to/from Amazon S3 using Boto3