s3nukem – Delete large Amazon S3 buckets

s3nukem is a slightly improved version of s3nuke, a Ruby script by Steve Eley that relatively quickly deletes an Amazon Web Services (AWS) Simple Storage Service (S3) bucket with many objects (millions) by using multiple threads to retrieve and delete the individual objects.

Improvements include:

  • The key retrieval thread will pause when the queue contains 1000 * thread_count items. The original script’s queue would grow unabated, eating up memory unnecessarily.
  • All output is automatically flushed, which ensures you can keep an eye on progress in real-time.
  • Added the number of seconds elapsed since the start of the script to the output so you can calculate the rate at which items are being deleted.

Background

I, like many others, have needed to delete an S3 bucket with many items, but, as you may know, you first have to delete all the objects in the bucket — not a quick task if the bucket has hundreds of thousands or millions of objects.

The bucket I needed to delete had 99 million objects. Attempts to delete the bucket through S3fox and even through the AWS Management Console would fail!

s3cmd, which deletes with a single thread, was deleting objects at a rate of about 1,800/minute (2.5 million / day). At that rate, the deletion would have taken about 40 days.

s3nuke/s3nukem, which I ran with the default 10 threads, deleted objects at a rate of about 9,000/minute (13 million / day), reducing the job to about 7-1/2 days.

Since my deletion was a bit larger than Steve’s (his was about 260,000), I had to make a couple improvements to s3nuke (listed above) so that it wouldn’t slow down and crash and so that I could keep an eye on its progress. You can find my fork at http://github.com/lathanh/s3nukem

Quick download: http://github.com/lathanh/s3nukem/raw/master/s3nukem

Posted in Web Technology | Leave a comment

WREST (Website REST)

WREST n.

  1. A RESTful API service that is made available to its own website. The distinguishing behavior from a regular RESTful API is that calls coming from the client are identified the same way as other calls made by the client’s browser (viz., the client’s cookie(s)) rather than by API keys and secrets/signatures.

I’m currently working on a service that has both a website (usable by the general public), and a RESTful API (currently used by our iPhone app, and later usable by partners). A Flash component of the website also uses the RESTful API when it needs needs data from the server. And while partners will need to obtain an API key, get user approval to make calls on their behalf, and sign calls, it would not be appropriate to expect the same of the Flash component.

So, I made some of the RESTful API calls available in a way such that the client can be identified by cookies instead of an API authorization token.

This results in the service having four classes of HTTP calls: Continue reading

Posted in Web Technology, portmanteau | Leave a comment

haircro

hair + Velcro

haircro n.

  1. When hair is cut very short (using, say, #2 on clippers), the hair holds hoodie hoods on like it’s Velcro — but it’s hair!
Posted in portmanteau | Leave a comment

Two Subnetworks on One LAN, and Linux arp_filter

It’s a rare situation in a small networking environment that having two subnetworks on one broadcast domain can be an issue. I would normally avoid such a scenario (and it’s usually easy to do so) but I recently got AT&T’s U-verse, and the do-it-all device that it requires (a 2Wire 3300HGV-B “residential gateway”) has forced me to put both my private (NAT’d) subnetwork on the same broadcast domain as my public (DMZ’d) subnetwork. While undesirable, this isn’t usually a problem, except that my dual-homed Linux box had trouble behaving with the 2Wire gateway.

The Two-Interface Linux Box

Two subnets on One Broadcast Domain

As you can see in the diagram above, there’s a Linux box on this network that has two network interfaces, Continue reading

Posted in Networking | 1 Comment

Beware of DVI-I Cables (they are Not Compatible with DVI-D Devices)

I recently moved my flat-panel displays further away from my computer, but the DVI cables they came with weren’t long enough to connect the displays to the computer in their new location. So, I ordered some longer DVI cables from Newegg.com.

Turns out that both of my displays’ ports are DVI-D and that the cables were DVI-I.  Well, DVI-I cables have four extra pins that carry analog signal in case you want to use it to hook up an analog display with an analog video adapter. And you can’t stick a 29-pin DVI-I cable into a 25-hole DVI-D port (but you can put a 25-pin DVI-D cable into a 29-pin DVI-I port)! Continue reading

Posted in Personal Technology | Leave a comment

AT&T U-Verse — A Network Geek’s Perspective

I just got AT&T U-verse, which delivers Internet, TV (IPTV), and phone service (VOIP) to the home; all this over one pair of copper from the VRAD. My upgrades to the service include HD TV, DVR, and a static IP block for my servers. The device that they provided me with, a 2Wire 3300HGV-B (“Residential Gateway”) is responsible for a lot:

  • TV: Broadcasting TV using IPTV set-top boxes
  • Phone: Providing VOIP phone service to the plain-old-telephones in the home
  • Routing public (static) IP traffic to the public IP machines
  • NATing Internet for the “private” computers

Telecom Closet with AT&T U-Verse Continue reading

Posted in Networking | Leave a comment

Open Personal Portable Lifetime Store (Perpolis) v0.1.0

Perpolis gives you ownership of and puts you in control of your own data; whether it’s your blog posts, status updates, pictures, bookmarks, address book, calendar, or to-do lists.

  • Personal: You own your data; all of it
  • Portable: You can take your data with you (whether from one service to another, or physically on your own data drive)
  • Lifetime: Use your data store to store everything that’s yours (address book, bookmarks, blog posts, pictures, etc.), forever
  • Hostable/Accessible: You could keep all your data on your own data drive, but for easiest access, let a service host it for you
  • Cachable / Syncable (Versioned): You should be able to use your data even when not connected with your primary data store, including on your phone, laptop, or through the web browser of a public computer
  • Integrated: Your data are designed to be used, shared, and integrated across services, giving you the flexibility to mix and match by whom or where your data are stored.
  • Discoverable: If you want it to be, your data should be easy to find by other services or even other people so that they can interact with it.
  • Safe: It’s easy to backup. Because it’s Portable, it’s easy to backup in a way that can be restored apart from the current host. By hosting your store with one service and backing up somewhere else (to your own portable drive, for example), you’ll never lose your data
  • Secure: You define who gets to access what data, whether it be another service or another person. Data on your devices can be encrypted, of course, so if they’re lost or stolen, the data can’t be accessed.

Continue reading

Posted in Personal Technology | Leave a comment