10 Things I Learned from the AOL Search Data

by Pete Abilla on August 8, 2006

Enjoy this article? Please SUBSCRIBE to receive all the FREE updates!

Most of you know already the HUGE mess-up that AOL committed by releasing MASSIVE amounts of private search data. Arrington wrote about it and today, AOL took it down after realizing what it had done, then today AOL apologized on the New York Times. As a quick summary, AOL released 20 million web queries from 650,000 AOL customers. While AOL took down the site, the data is freely available from several sites: A Mirror, A web interface to the data, and another one. Playing with the data is a researcher’s dream, but there is some very disturbing queries, for sure. Here are ten things I’ve learned from the AOL Search Data:

  1. “Suicide” was queried by 179 users, 542 different times, and led to 257 different sites.
  2. User 9486162 queried “asphixiation by gas ovens” at least 5 times; several minutes later, user 9486162 queried “monster.com”, “lendingtree.com”, and (his/her) last query in the data set was “hold ‘em poker school” logged at 2006-05-14 09:35:43. One could argue that there is a relationship here: unemployment leads to financial problems, which is made worse by a gambling problem — and the cumulative effect of all of that leads one to want to commit suicide.
  3. “viagra” was queried by 409 users, 1201 different times, and led to 779 different sites.
  4. User 3152318 is interested in learning “how to order viagra online”; this query is followed by a query about the “indian clip art” and “new york bartending school”. Does this mean user 3152318 is interested in taking viagra while viewing indian clip art in a new york bartending school?
  5. “Klu Klux Klan” was queried by 19 users, 71 different times, and led to 35 different sites.
  6. User 4359145 queried “Klu Klux Klan”, followed by 26 queries of “combat knives” followed by some other very disturbing queries.
  7. “Bill Gates” was queried by 105 users, 242 different times, and led to 135 different sites.
  8. User 21319088 queried “Bill Gates” at 2006-05-04 20:28:18, then queried “play chess free online” at 2006-05-04 22:44:19, then queried “whitney houston” at 2006-05-04 19:59:09. User 21319088 is apparently interested in playing chess with Bill Gates while listening to Whitney Houston.
  9. 108 users queried “David Letterman” 189 different times that led to 244 sites.
  10. User 6296532 queried “David Letterman”, “Sexy Feet”, “Sean Connery”, “Adam Sandler”, “Robert Danerio”, and “Rapsong Whisperer” — all within 1 minute.

This was a huge privacy blunder by AOL. This mistake damages their brand even more and it will be very difficult to recover from that. Since the data is public, research on the data raises many social and legal questions such as (a) whether law enforcement ought to do something with the queries involving murder and other terrible things (b) or does the public have a social responsibility to seek out the persons who enter queries on “suicide” in order to get them some help or (c) whether law enforcement ought to apprehend and find the persons who query items involving crimes against children.

These are tough questions that we now have to face and answer; it was just a matter of time that we would have to deal with these pressing questions, but AOL expedited things quite a bit. This is a lot to think about and the fallout from AOL’s blunder could last for a long time.

Enjoy this article? Please SUBSCRIBE to receive all the FREE updates!

{ 5 comments… read them below or add one }

Xensen August 8, 2006 at 10:37 pm

Here are a few preliminary things I learned from the data:

http://stutteringhand.blogspot.com/2006/08/romance-hogs-and-aol-user-66.html

Mark August 9, 2006 at 11:15 pm

I thinlk the conclusions you make are ridiculous and the searches you chose to profile in your post mean nothing. If I search for a post office and then for a Shotgun does that mean I am about to go nuts and kill people in a post office?

No. Please research more if you want to post on this topic. There are search trends much more obvious and disturbing than the phony conclusions you listed.

Mark from Dumb Little Man

psabilla August 10, 2006 at 4:27 am

@Mark,

Thanks for reading shmula.

To your point about my conclusions — I actually didn’t make any in the post. Notice that I used words such as “does that mean…”, “One could argue…”, “Apparently user xxxx is interested…”. These phrases are weak inductive conjectures, not conclusions. I don’t have enough data to conclude or judge these people’s intentions, nor is that my style anyway. Those conjectures were my attempts at some entertainment and a little humor.

Perhaps they weren’t funny — I concede that; but conclusions, they certainly were not.

Again, thanks for reading shmula.

Pete Abilla

ty August 12, 2006 at 9:29 am

A site where you can search this data is here:

http://www.datablunder.com/logitems/query/

Andy August 18, 2006 at 9:39 am

dontdelete.com is another good site for browsing and trying to backfill characters on people’s search behaviour

3673629: confused teen. starts doing his homework, gets distracted.
5623435: this one is like a novel
1432749: world’s greatest redneck
4030260: hilarious. easily distracted stoner failing to do his homework
8408184: planet random

Leave a Comment

{ 2 trackbacks }