Go
Decent Blogging using Foswiki Read on
07 Nov 2005 - 22:25 in , tagged , , , , by Michael Daum
EACL 2006 is hosting a workshop on wikis and blogs in Trento, Italy.
Among other things Wikis and blogs and other dynamic text sources is focused on the type of text produced in these kind of online media. But is text produced so differently from text found in your personal diary or online news ticker? Yes, somehow but that should not be surprising and linguists already suffer from bad text. Just recall the never ending rants on the penn treebank, a collection of Wall Street Journal articles. This is really bad bad text from a linguists perspective with lots of dirt in the data. Note, however, that the WSJ is not some blog. Over a decade or so parsers for English have been evaluated against that beast of data corpus, mostly stochastic methods. Rule-based systems as ours have a hard time to cover all these irregularities. Irregular speech and language is quite normal though: speech recognition in noisy environments but also ungrammatical language, child language. Every aspect of language can deviate from your grammar knowledge for one or the other reason. So systems must cope with so called every day data the same way they do with regular input as good as possible in a robust way. That's why I like our stuff so much: it does not implement robustness on top but it is robust out of the box because parsing is reduced to a constraint optimization problem (read the parers for more info). So far on the new challenges.

Again, wikis are only perceived by its most prominent installation, Wikipedia: "In contrast to blogs, wikis have high ambitions as regards factual correctness, persistence, editorial quality, and trustworthiness." (from the call for participation).


Leave a Reply

You may have to login or register to comment if you haven't already.
r8 - 05 Oct 2007 - 20:54:38 - Main.MichaelDaum
Copyright © 1999-2010 Michael Daum Consulting. All rights reserved. Impressum.