Monday, May 05, 2008

The Catalan language in the World

03.05.2008 - 21:04

We use statistical techniques to identify blog language. That means that our algorithm decides what language a blog is in by looking at the text content, and not at any language attributes in the markup. Weblogs with fewer than 500 bytes of text content are not included in this list.

How reliable is our algorithm? We're currently doing statistical sampling to find out, and will post results in this table when they are ready.

Type Count

English 1958443
Catalan 123320
French 83950
Spanish 80509
Portuguese 71561
German 35870
Italian 26659
Chinese-big5 25123
Farsi 19730
Chinese-gb2312 19324
Japanese 18576
Dutch 13133
Danish 9870
Indonesian 8831
Malay 6658
Japanese-euc_jp 5413
Swedish 5267
Czech 5089
Icelandic 3776
Tagalog 3608
Finnish 3326
Turkish 2817
Esperanto 2803
Slovak-ascii 2592




Welcome to , the Internet service in Catalan; a free service, independent of any business, entity or association.
The objective of is to provide Catalan-speaking communities with a point of reference on the Internet which can link us together through our language --a data base, a Catalan area-- in short, a focal point in the Net for Catalan in the world. We want to communicate with all the regions in which Catalan is still a living language, and include all Catalan communities throughout the world.
At present this service has no commercial or profit motives. Its only objective is the Catalan language itself, a language minimized even in the lands where it belongs.
Our website, unique in the Net, is destined to communicate and reflect the opinions and feelings of Catalans, a people with their own language and age-old culture, yet without a formal State and currently divided by political and historical circumstances into several separate communities with little relationship between them.
In the upper left-hand box of the heading on this page, you can add a new URL. only admits webs written in Catalan.
In the first row you will find information about our language. From left to right: different aspects of Catalan, the current situation of the language in its various linguistic zones, new developments in the Internet and graphic browsing, with comments.
In the next two files are network resources in our language, which can be searched using a branching thematic tree. To search for information which is not in Catalan, use other search engines. Some of these have, like us, a Spanish version, but all of them, in contrast to , accept pages in Spanish. Please do not hesitate to contact us should you have any doubts, comments or problems.
If you would like to see the latest news in our language, try our service in Catalan: you will see how easy it is for Spanish speakers to understand our language. If a particular section does not interest you, click the blue bar and it will take you straight to the following section. Enjoy yourself! Thank you for visiting.

1 comment:

Dídac López said...

Mmm, it's hardly to belief. Perhaps some biases in language recognition are introducted in the semantic techniques used by NTLE. Moreover, their data have been circulating in Catalan blogosphere for the last 2 or 3 years.

Blog Archive