More email fun

I love parsing public data.  I blogged about it here  about 4 years ago (wow, how time flies)

Now, there is a new set of email data from Supreme Court Justice nominee Elena Kagan which the Sunlight Foundation folks put into a nice gmail interface here:

Unfortunately, the dump from the archives looks to be in PDF format.  I’m hoping there is a way to get the plain text dump of these emails.  I’ve contacted the Sunlight guys and hope to get a chance to run some parsing algorithms shortly ;)

Update: Tom Lee and Jake Brewer quickly responded and shared their methodology with me (thanks guys!)…I’m downloading now and will be parsing shortly ;)

Last update:  After getting everything converted over to text, I ran a series of checks for different things like checking/saving accounts, ssn, credit card, pr0n, etc.  The only hits were a password to a non-existent site and some pr0n hits in the received box.  All in all, very tame stuff.


Comments are closed.