Wednesday, January 11, 2006

Languages.

Converting the linguistic corpus in electronic format is not easy task, nor an minor one.
In thousands years, human beings have generated an incredible variety of information, represented in an incredible variety of fashions.
Computers offer an unprecedented opportunity to unify such representation and provide instruments for a better understanding of the thought and life of people who preceded us and confronted the difficult challenges that we all face.
So in tracing a path through such an attempt I would start from the association for literary and linguistic computing (ALLC), founded in 1973 with the purpose of supporting the application of computing in the study of language and literature. Such studies are important because of their theoretical nature. They research the form in which information is coded and the problematics related to the development of algorithms needed to process it. The reasons to process such information are various: from simple study to profit.
A reasonable and interchangeable representation is advantageous for everybody. XML should be the main linguistic structure (well I have difficulty in calling it a language) used for such codification, the reason being its incredible flexibility and consequent openness to the future.
Then, still looking at such site, it's impossible not to mention Fr. Roberto Busa SJ (God bless him) [Now, wikipedia is intrinsically unreliable, however, it seems working for data, where everything requiring a deeper level becomes quickly controversial or does not offer enough interest to the wide audience to be written there] and the Corpus Thomisticum (So Saint Thomas becomes unavoidable...)
Now, from that, I would move to the Association for computational Linguistics which is even more theoretical in the fact that its studies are not restricted to coded texts but embrace the very nature of language representation.
After that I would have a look at the Association for computers and humanities which looks like a group dedicated to divulge knowledge, more than developing it.
My current interests make me point out the lexicographic links page at the University of Cambridge.
Somebody could be interested in the project page list of the Text Encoding Initiative, and maybe in the Cursus project