Enjoy this article? Please SUBSCRIBE to receive all the FREE updates!
I’ve wanted to find a way that organizes and renders in a meaningful way what people talk about. Blogs, for example, are a fascinating word-of-mouth tool, but such unstructured data isn’t very easy to work with. So, I began creating something that will help me better understand what blogs talk about.
The phrase “Web 2.0″ is a nebulous term, that not even O’reilly could define well. Yet, it’s now in our vocabulary: it’s difficult to define, yet we use the term and somehow or rather, people know what each other is talking about. If Ludwig Wittgenstein were alive today, he would have a hayday with this: rather than “things of which we do not speak”, the phrase “web 2.0″ becomes “the thing of which we speak much of yet do not know.”
Anyhow, I was curious to see how often the term “web 2.0″ was used in blogs. My first test case is the main cheerleader of the web 2.0 generation: Techcrunch.
Here’s what I wrote in Ruby:
+++++
# @author: Pete Abilla
# @date: 27, September, 2006
# @function: crawls hard-coded feed url and
# computes basic linguistic statistics on given word
# in posts.
require ‘rubygems’
require ‘feed_tools’
feed = FeedTools::Feed.open(‘http://www.techcrunch.com/feed’)
keyword = ‘web2.0′
total_occurances = 0
total_posts = 0
keyword_pattern = Regexp.new(keyword, Regexp::IGNORECASE)
feed.entries.each do |entry|
puts “Entry: #{entry.title}”
matches = entry.content.scan(keyword_pattern)
if matches !=nil
puts “Occurances of ‘#{keyword}’: #{matches.size}”
total_occurances += matches.size
total_posts += 1 if matches.size>0
end
end
puts “Total number of posts in feed: #{feed.entries.size}”
puts “Total occurances of ‘#{keyword}’: #{total_occurances}”
puts “Total number of posts in which ‘#{keyword}’ appeared: #{total_posts}”
puts “Percentage of posts with the phrase ‘#{keyword}’ in it: #{total_posts}/#{feed.entries.size}”
+++++
Now, here are the results:

So, we see based on only the last 15 posts that my code could grab, Techcrunch used the phrase “web 2.0″ in 10 posts, bringing his hype percentage to 66%.
Eventually, I want to grab feeds daily and store them in my MySql Db, then run queries off of that. That would give me a larger set to play with.
Still, Arrington, out of 15 posts, 10 of them have “Web 2.0″ in it. To be sure, that’s a lot of hype.
Enjoy this article? Please SUBSCRIBE to receive all the FREE updates!
![]() | ![]() | ![]() | ![]() | ![]() |








{ 1 trackback }