Friday, January 14, 2011

UNIX command to analyze Amazon S3 logs

I wanted to monitor the download activity on a particular file that I made publicly available on my Amazon S3 account.  Here's how I do it with from a bash command-line:
~/dev/s3-curl/s3curl.pl --id= -- https://s3.amazonaws.com/ 2> /dev/null | xpath -e '//Contents/Key/text()' 2> /dev/null | grep '^logs/' | xargs -i ~/dev/s3-curl/s3curl.pl --id= -- https://s3.amazonaws.com//{} 2> /dev/null | grep 'GET\.OBJECT.*'
where and are variables representing the name of your S3 bucket and the filename you'd like to monitor and is the name of the profile you configured in your ~/.s3curl file.

The above command assumes these S3 Logging options for the bucket:
  • Enabled is checked
  • Target Bucket is the same as the bucket containing the file being monitored
  • Target Prefix is "logs/"
In a nutshell, this works by first downloading the file listing of the bucket, then extracting the log file names, then downloading each log file in turn and finally grep'ing them for the download activity (GET.OBJECT).

You'll need to have these command-line tools available:

No comments: