Friday, January 22, 2016

Digging through Ashley Madison

After the release of the Ashley Madison dumps many of us downloaded them and searched through them. It wasn't very long there after that several web apps came up and allowed you to do it yourself, but most of those apps censor the entirety of the data for legal reasons. If you wanted to go through the dumps I can show you how in a few steps:

Get the torrent it is available on the pirate bay.

Once you have the link you will want to put the sql into a mysql database, you must have mysql or mariadb installed. To do so is a very simple step it will be something like:

If you don't know SQL it's probably a good idea to learn you some. I think most people actually go with mariadb these days.

# mysql -p am < *.dump

YMMV

After that you can fire up sql:

mysql

At the sql prompt you will then have to select the db:

use am;

Then you will have to construct a search, for example:

select * from am_am_member where city="Nashville" and gender="2" LIMIT 4;

This will select 4 males in Nashville, the gender for a woman, is "1".

It does appear that across all tables id, refers to the same people. So id 102 is id 102 throughout.


With the csv's one of the things I wanted to do is get unique email addresses, that represented people who were tech savvy enough to own their own domains so I wanted to filter out anything like gmail/yahoo/aol etc, but I left in comcast and clearwire just because they might be using their real names as their email addresses which struck me as interesting.


To do that you just navigate to the csv directory and then:

awk -F, '{print $21}' *.csv | grep -vi -P "(gmail.com|hotmail.com|aol.com|ymail.com|yahoo.com|live.com|\"US\"|\"P\"|\"TW\"|\"\"|\n)" | grep -i -P "(com|org|net|ninja|gov|edu)"

As you can see the 21st field is the email field in the Credit Card transaction csvs. I don't know why the country codes are getting in there I suspect my awk delimiter isn't quit right but I obtained a dirth of information with that. Approximately 134,949 unique email addresses, I almost reported that above a million because of all the recurring ones.

It might be interesting and certainly wouldn't require a lot of scripting to just sort by total amount spent on the site.


I certainly wouldn't count this guy out:

531 "ALN3269@AOL.COM", "79.00"


531 individual charges of 79 dollars.

That's $41,949. That's a lot of money to cheat on your spouse. Seems unnecessary. Remember my list doesn't include gmail accounts so this guy probably actually scores a bit farther down on the list.



I'm just doing some uniq -c sorting, so it's possible that email entity spent more money in other amounts, but like I said, I wouldn't count them out as a contender.

No comments: