Hyping Web 2.0: Techcrunch on Repetition

I’ve wanted to find a way that organizes and renders in a meaningful way what people talk about. Blogs, for example, are a fascinating word-of-mouth tool, but such unstructured data isn’t very easy to work with. So, I began creating something that will help me better understand what blogs talk about.

The phrase “Web 2.0″ is a nebulous term, that not even O’reilly could define well. Yet, it’s now in our vocabulary: it’s difficult to define, yet we use the term and somehow or rather, people know what each other is talking about. If Ludwig Wittgenstein were alive today, he would have a hayday with this: rather than “things of which we do not speak”, the phrase “web 2.0″ becomes “the thing of which we speak much of yet do not know.”

Anyhow, I was curious to see how often the term “web 2.0″ was used in blogs. My first test case is the main cheerleader of the web 2.0 generation: Techcrunch.

Here’s what I wrote in Ruby:

+++++

# @author: Pete Abilla
# @date: 27, September, 2006
# @function: crawls hard-coded feed url and
# computes basic linguistic statistics on given word
# in posts.

require ‘rubygems’
require ‘feed_tools’
feed = FeedTools::Feed.open(’http://www.techcrunch.com/feed’)
keyword = ‘web2.0′
total_occurances = 0
total_posts = 0

keyword_pattern = Regexp.new(keyword, Regexp::IGNORECASE)

feed.entries.each do |entry|
puts “Entry: #{entry.title}”
matches = entry.content.scan(keyword_pattern)
if matches !=nil
puts “Occurances of ‘#{keyword}’: #{matches.size}”
total_occurances += matches.size
total_posts += 1 if matches.size>0
end
end

puts “Total number of posts in feed: #{feed.entries.size}”
puts “Total occurances of ‘#{keyword}’: #{total_occurances}”
puts “Total number of posts in which ‘#{keyword}’ appeared: #{total_posts}”
puts “Percentage of posts with the phrase ‘#{keyword}’ in it: #{total_posts}/#{feed.entries.size}”

+++++

Now, here are the results:

shmula.com, ruby code

So, we see based on only the last 15 posts that my code could grab, Techcrunch used the phrase “web 2.0″ in 10 posts, bringing his hype percentage to 66%.

Eventually, I want to grab feeds daily and store them in my MySql Db, then run queries off of that. That would give me a larger set to play with.

Still, Arrington, out of 15 posts, 10 of them have “Web 2.0″ in it. To be sure, that’s a lot of hype.



No Related Posts.

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Comments

[...] Hyping Web 2.0: Techcrunc… [...]

Leave a comment

(required)

(required)