Friday, September 21, 2012

Working with s3curl and Amazon S3

Overview

The goal of this post is to show how to use S3curl, a tool for handcrafting HTTP requests to Amazon Simple Storage Service (Amazon S3) on a Linux/Unix instance. Some definitions:
  • Amazon S3 - a fully redundant data storage infrastructure for storing and retrieving any amount of data, at any time, from anywhere.
  • s3curl - a wrapper around cURL based on Perl, that will will calculate the authentication parameters for a request to Amazon S3 - often the most difficult part of working with Amazon S3 using direct HTTP requests.
  • cURL - a command line tool for transferring data with URL syntax.


Set up an Amazon Linux AMI to run s3curl

This is an optional step. If you can run s3curl in your existing environment, great. If you can't, then you can spin up a Linux instance in just a few minutes and use s3curl on the instance. Setting up the Linux instance falls within the free tier pricing. Make sure you make the appropriate choices for free-tier. Also, don't forget to terminate your instance when you are done.

EC2 Linux Instance Details
EC2 Linux Instance Details


1. Set up EC2 Linux instance e.g. Amazon Linux AMI 2012.03. We won't go over the details, but make sure you
  • Have a key pair selected that you have saved the private part of the pair locally so you can Secure Shell (SSH) in to the instance.
  • Have a security group defined and use it when creating the instance. Make sure that port 22 inbound to the instance is open.
  • All of these details are more are covered in the Launch an Instance topic in the Amazon EC2 Getting Started Guide.
2. Connect to the instance.
  • For example, from Windows you can use PuTTY, and SSH client. All information about connecting to the instance is in Connect to Your Linux/UNIX Instance in the Amazon EC2 Getting Started Guide.
3. Copy over s3curl.pl after getting it from the download site.
  • You can just put it in the /home/ec2-user directory which is where by default you log in.
  • If you are using Windows, you can use a tool like WinSCP to transfer the files to the instance.
4. Log on to EC2 instance. You should be in the ec2-user directory.

5. Try "perl --help" and "curl --help" to make sure you have these working.

PuTTY Connection Window (left), EC2 Instance Login Screen (right)
PuTTY Connection WindowEC2 Instance Login Screen

Set up s3curl

1. Download and copy the s3curl.pl to a directory you will work in, if you haven't done so already.

2. Create a file .s3curl as instructed in the README file.
%awsSecretAccessKeys = (
    # personal account
    personal => {
              id => 'YOURACCESSID',
              key => 'YOURSECRETKEYID',
                     },
  );
3. Change the permissions on the .s3curl file so that you (the owner) can read and write the file. Use the command "chmod 600 on .s3curl".

4. Run one of the commands below. The commands all have "perl " in front of them and they assume that the bucket is called travelmarxbuck.


s3curl Examples

Some notes about using the examples:
  • The Amazon S3 API is here: http://docs.amazonwebservices.com/AmazonS3/latest/API/Welcome.html
  • In some cases below, a StringToSign and CanonicalResource are shown. This is useful to know if you are planning on signing your own requests. If not, disregard.
  • The examples are shown with the travelmarxbuck bucket in the US Standard region. Therefore, we don't have to explicitly include region in the URI. However, it is easy to modify the URI as appropriate. See the endpoints in the this Region and Endpoint table.
  • The examples here show a bucket travelmarxbucket in the US Standard region. The bucket naming rules state that we can use virtual host-style and path-style request interchangeably for a bucket in the US Standard region as long as we use all lowercase characters in the bucket name. If the bucket name is TravelmarxBucket (mixed-case), we could only use the path-style request because the mixed-cased name is not DNS compliant. Furthermore, if travelmarxbucket is created in a region other than the US Standard region (e.g. us-west-2) then if you tried to use https://s3.amazonaws.com/travelmarxbucket you would get a 301 Moved Permanently. You need to add the region and use this https://s3-us-west-2.amazonaws.com/travelmarxbucket. Or, in this case (lowercase bucket name) you could specify https://travelmarxbucket.s3.amazonaws.com/ and that would work too.
  • There are two variations of specifying the Amazon S3 resource: virtual host-style (https://travelmarxbuck.s3.amazonaws.com) and path-style (https://s3.amazonaws.com/travelmarxbuck) request.
  • Use “perl” before all the commands below.
  • All the commands below are on one line even though them may look otherwise below.
  • The concept of a "folder" doesn't really exist in S3 but it useful for thinking about how objects in your bucket are organized. (The S3 console uses the folder concept.) Each folder maps to one part of the full path to the object, for example, "https://s3.amazonaws.com/travelmarxbuck/folder1/folder2/object".


GET object

Retrieve an object with a path-style request and stream the output.

s3curl.pl --id personal --debug -- https://s3.amazonaws.com/travelmarxbuck/temp.txt -v

Retrieve an object with a path-style request and save the object to a local file.

s3curl.pl --id personal --debug -- https://s3.amazonaws.com/travelmarxbuck/temp.txt -vv -o retrievedFile.txt

Retrieve an object in a folder with a path-style request and stream the output.

s3curl.pl --id personal --debug -- https://s3.amazonaws.com/travelmarxbuck/subfolder/temp.txt –v

Retrieve an object  in a folder with a virtual-style request.

s3curl.pl --id personal --debug -- https://travelmarxbuck.s3.amazonaws.com/subfolder/temp.txt -vv
StringToSign='GET\n\n\nFri, 24 Aug 2012 19:20:07 +0000\n/travelmarxbuck/subfolder/temp.txt'
CanonicalResource= "/travelmarxbuck/subfolder/temp.txt"


DELETE object

Delete an object with a path-style request.

s3curl.pl --id personal --debug --delete -- https://s3.amazonaws.com/travelmarxbuck/temp.txt -v

Delete an object in a folder with a path-style request.

s3curl.pl --id personal --debug -- https://s3.amazonaws.com/travelmarxbuck/subfolder/temp.txt -v

GET bucket

List items in a bucket with a path-style request.

s3curl.pl --id personal --debug -- https://s3.amazonaws.com/travelmarxbuck -vv
StringToSign='GET\n\n\nFri, 24 Aug 2012 18:39:31 +0000\n/travelmarxbuck'
CanonicalResource= "/travelmarxbuck"

List items in a bucket with a filter (prefix) and a path-style request.

s3curl.pl --id personal --debug -- https://s3.amazonaws.com/travelmarxbuck?prefix=b -vv
StringToSign='GET\n\n\nFri, 24 Aug 2012 19:07:20 +0000\n/travelmarxbuck/'
CanonicalResource= "/travelmarxbuck"


List items in a bucket with a filter (prefix) and a virtual host-style request.

s3curl.pl --id personal --debug -- https://travelmarxbuck.s3.amazonaws.com?prefix=b –vv


GET object lifecycle

s3curl.pl --id personal --debug -- https://travelmarxbuck.s3.amazonaws.com/subfolder/temp.txt?lifecycle –vv

GET service

List buckets.

s3curl.pl --id personal --debug -- https://s3.amazonaws.com -vv

StringToSign='GET\n\n\nFri, 24 Aug 2012 18:35:08 +0000\n/'
CanonicalResourceString = "/"
HEAD object

HEAD request using a virtual-host style request.

s3curl.pl --id personal --debug --head -- https://travelmarxbuck.s3.amazonaws.com/temp.txt –vv

HEAD request using a path-style request.

s3curl.pl --id personal --debug --head -- https://s3.amazonaws.com/travelmarxbuck/temp.txt -vv

PUT object

Put an item in the bucket, at the top level.

s3curl.pl --id personal --debug --put file.txt -- https://travelmarxbuck.s3.amazonaws.com/temp2.txt –vv

Put an item in the bucket, in a "folder".

s3curl.pl --id personal --debug --put file.txt -- https://travelmarxbuck.s3.amazonaws.com/subfolder/temp2.txt -vv

DELETE object

Delete an item in the bucket, at the top level.

s3curl.pl --id personal --debug --del -- https://travelmarxbuck.s3.amazonaws.com/temp2.txt –vv

Delete an item in the bucket, in a folder.

s3curl.pl --id personal --debug --del -- https://travelmarxbuck.s3.amazonaws.com/subfolder/temp2.txt -vv

PUT bucket policy

s3curl.pl --id personal --debug --put bucketpolicy.txt -- https://travelmarxbuck.s3.amazonaws.com/?policy-vv

Where the bucketpolicy.txt file might be something like this:

{
  "Version": "2008-10-17",
  "Id": "MyPolicy123",
  "Statement": [
{
"Sid": "MyStmt123",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::111122223333:root"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::travelmarxbuck/*"
}
]
}
 
Change 111122223333 a fake account ID to something real.


References
http://www.cantoni.org/2012/01/10/curl-cheat-sheet
http://www.eucalyptus.com/eucalyptus-cloud/tools/s3curl
http://aws.amazon.com/code/128
http://aws.amazon.com/free/
http://curl.haxx.se/
http://en.wikipedia.org/wiki/Secure_Shell

No comments:

Post a Comment

All comments go through a moderation process. Even though it may not look like the comment was accepted, it probably was. Check back in a day if you asked a question. Thanks!