Categories
Programming

Cleaning up failed multi-part uploads on Amazon S3

This problem hit me, too. I backup stuff with s3cmd every night. I kick off a sync at 2AM and kill the script at 6AM. For a while, I had one large file that could not finish. I built up a lot of orphaned file fragments. My monthly S3 bill kept growing and Amazon reported more and more storage being used, but if I used s3cmd to add up all the files, it was way below. It took awhile to sort it out because Amazon’s tools a somewhat primitive–and I think they discouraged third parties from developing tools for analyzing usage by changing formats frequently.

Anyway, now I’m cleaning out incomplete multi-part uploads every night using the technique described below.

Cleaning up failed multi-part uploads on Amazon S3

I’ve recently been using sc3cmd to back up a lot of data to Amazon S3. Version 1.1.0 (currently in beta) supports multi-part uploads. It has borked a few times half way through large uploads, without properly aborting the operation server-side. This meant that the parts uploaded so far were not removed from the server, and that’s bad because Amazon charges for this storage.

s3cmd doesn’t currently have any way to list or abort interrupted multi-part uploads, which meant I had to figure out some other way to do it. It turned out to be quite simple using Python and the boto library: