Thursday, December 22, 2005

Wikipedia Watch

There be net.kook-ery here, but this point is valid:

While Wikipedia itself does not run ads, they are the most-scraped site on the web. Scrapers need content — any content will do — in order to carry ads from Google and other advertisers. This entire effect is turning Wikipedia into a generator of spam


Jerry Holkins, the writer over a Penny Arcade, put forth last week the most eloquent and erudite delineation of the inherent flaws of Wikepedia that I've ever seen. My opinion mirrors his, and he says it better than I ever could --

"As an encyclopedia, Wikipedia has some issues. As a model of how and where distributed intellect fails, it's almost shockingly comprehensive.

"Reponses to criticism of Wikipedia go something like this: the first is usually a paean to that pure democracy which is the project's noble fundament. If I don't like it, why don't I go edit it myself? To which I reply: because I don't have time to babysit the Internet. Hardly anyone does. If they do, it isn't exactly a compliment.

"Any persistent idiot can obliterate your contributions. The fact of the matter is that all sources of information are not of equal value, and I don't know how or when it became impolitic to suggest it. In opposition to the spirit of Wikipedia, I believe there is such a thing as expertise.

"The second response is: the collaborative nature of the apparatus means that the right data tends to emerge, ultimately, even if there is turmoil temporarily as dichotomous viewpoints violently intersect. To which I reply: that does not inspire confidence. In fact, it makes the whole effort even more ridiculous. What you've proposed is a kind of quantum encyclopedia, where genuine data both exists and doesn't exist depending on the precise moment I rely upon your discordant fucking mob for my information."
I'm afraid I'm going to have to argue with Penny Arcade's critique of Wikipedia.

First, and foremost:

Wikipedia *works*. If you doubt this, check out the Google Zeitgeist for 2005 -- Wikipedia is the #3 most clicked site from Google searches (!). Maybe you believe that really, secretly, a bunch of people are getting duped, but I think there's something else going on.

Any source of knowledge has to be evaluated probabilistically.

What's the probability that this is true?
What's the 'standard deviation' of the quality of the source?
What's the probability that your source will be of quality Y about topic T?

I think what's really going on is that some people are freaked out that the 'standard deviation' of Wikipedia is pretty broad. You can find articles that are wildly inaccurate -- but you're also going to find a lot of articles which are shockingly detailed on some subject areas [particularly technology].

Another take on this is found on Chris Anderson's blog, who talks about the difference between emergent, probabilistic knowledge (Google, Wikipedia, Flickr, etc.:

Really, believing in the power of experts is not at all contradictory in believing in the power of Wikipedia. Lots of questions remain to be answered about the long term state of Wikipedia, but for right now, it is *amazingly* effective at what it does.

For a fun comparison, consider Wikipedia five years into existence vs. Britannica five years into its. I happen to have a reprint of First Edition Britannica, and here's a gem from it.

Woman: The feminine form of the animal Man.

That's it, one sentence.

I think 'First Edition Wikipedia' holds up pretty well compared to that. ;)