Language names and ISO 639

DiskuteraTranslating LibraryThing? (General Talk)

Bara medlemmar i LibraryThing kan skriva.

Language names and ISO 639

Denna diskussion är för närvarande "vilande"—det sista inlägget är mer än 90 dagar gammalt. Du kan återstarta det genom att svara på inlägget.

1boekerij
Redigerat: okt 16, 2006, 6:44 pm

It might be a good idea the have the (at present English language) language names followed by their ISO 639 code. This will provide for an easier way to figure out exactly what language is meant.

Then again, though users might prefer to have all language names in their proper language of choice when looking at e.g. the fun statistics pages--and I think the latter is suitable indeed--it might be preferable to have the "pick your language of choice" providing a list of languages showing the language names in that languages themselves. For easy reference, the ISO 639 code might be added to the latter, too.

Thus, one might have e.g. :

Afrikaans (af)
Bokmål (nb)
Dansk (da)
Deutsch (de)
English (en)
Español (es)
français (fr)
Gaeilge (ga)
lingua Latina (la)
Nederlands (nl)
Nynorsk (nn)

and so on.

One might even prefer to have ISO 639 in front:

(af) Afrikaans
(nb) Bokmål
(da) Dansk
(de) Deutsch
(en) English
(es) Español
(fr) français
(ga) Gaeilge
(la) lingua Latina
(nl) Nederlands
(nn) Nynorsk

and so on.

2timspalding
okt 16, 2006, 10:55 am

I'd hazard a guess that far more people on LT are familiar with the MARC standard--which is longer and therefore (all things being equal) less likely to cause confusion--than ISO 639.

3boekerij
okt 16, 2006, 8:28 pm

>2 timspalding:

I'd hazard a guess that far more people on LT are familiar with the MARC standard--which is longer and therefore (all things being equal) less likely to cause confusion--than ISO 639.

Though I wouldn't dare to doubt the former--for at present LT is (mainly) English language speakers--that might change when LT rolls out its different language versions. The "MARC standard" is mainly English language oriented, i.e. if one knows the English language name of a language, one might get a clue when seeing the ISO 639-2B code--or "MARC standard" as you name it. However, one who doesn't know the English language name of the language, is rather puzzled. There is no clue, or rather, the clue one might think he is having, probably is plain wrong.

Thus, I think a quite important question in this matter is : does LT want to keep all of his "English only", or does it rather prefer and go for multilinguality?

You might have a look at my proposal at http://nl.librarything.com/talktopic.php?topic=2509#25881, too.

The ISO 639-1 language code "nl" for Dutch might look rather strange, unless of course, unless you know--as most people speaking it probably do--that the name of this language in that language itself, u.e. in Dutch, is Nederlands (nl). In the same way, it might look rather puzzling the ISO 639-1 language code for Irish is "ga". Obviously, there isn't the slightest correspondence--in English, that is, for of course, the Irish language name of Irish language being Gaeilge (ga), having "ga" for Gaeilge popps up as, well, not that strange at all, if not obvious indeed.

Though American trained professional librarians might feel quite at ease with ISO 639-2B ("MARC standard"), I think most non-English language non-professional librarians might feel rather puzzled when staring at the "MARC standard" (ISO 639-2B).

Whom does LT want to serve?

4MMcM
Redigerat: okt 17, 2006, 6:48 am

As you presumably know, Dutch is one of the 22 codes that are different between ISO 639-2/B (Bibliographic = MARC : dut) and ISO 639-2/T (Terminology = extension of ISO 639-1 : nld from nl). Unfortunately, it's many important languages, including French and German and Chinese too.

Strictly speaking, MARC, ANSI Z39.52, and ISO 639 are all separate standards. But they are converging. I believe MARC is the oldest; it really doesn't deserve inverted commas.

It's a bit hard for me to believe that the user community is selected by URL conventions.

But more constructively, can't LT just register them all? I assume the DNS sends everyone to the same server farm, so it's just a matter of supporting alternates in the lookup when you decode the URL into the translation set.

Base the language codes on the 3-letter ISO 639. Where /B and /T differ, use both. They never conflict. Qualify dialects with a region code as necessary, as mentioned in one of the other topics on this. Add select country codes as synonyms. Register in those country domains if warranted. Avoid the 2-letter ISO 639 codes because they conflict with TLDs. Flemish speakers in Belgium can go to nld; but nl is there too for the Dutch. Quebecois can go to fra.

nl.librarything.com = nld.librarything.com = dut.librarything.com = www.librarything.nl
br.librarything.com = por-br.librarything.com = www.librarything.com.br
pt.librarything.com = por.librarything.com = por-pt.librarything.com = www.librarything.pt
uk.librarything.com = eng-uk.librarything.com = www.librarything.co.uk

5timspalding
okt 17, 2006, 2:49 pm

Short answer: Technically yes. But NO NO NO when it comes to search engines. You want as little confusion as possible when it comes to them. In theory a permanent redirect is enough, but theory and SEO don't see eye to eye immediately, or ever. For what it's worth, I'm worried about what Google's going to do with LT's foreign-language sites as it is.

6timspalding
okt 17, 2006, 2:53 pm

Incidentally, I don't think "nl" or "ga" are very odd for English-language speakers. "The Netherlands" is more commonly used in English than "Holland; "Irish" is not a language term familiar to most Americans, who call it "Gaelic." Indeed, outside of Ireland--for example, in the Gaelic-speaking community of Portland, ME--it's called that.

7MMcM
okt 17, 2006, 5:09 pm

>5 timspalding:

Oh, of course. We know that's one of your areas of expertise.

Would it be better if LT only let the search engines index the main www site? Do you imagine varying content? That site could then give you a language selection based on a cookie. The others would override, including a new English site URL for people with a non-English cookie and explicit navigation through language list menus.

The preview would be funny because it would show English surrounding the searched terms, but you'd click through okay. (Well, I guess you could give the indexer a site where all the translation terms were blank so that it only indexed books and talk. And I guess you could handle referral from a language-specific Google specially in the absence of a cookie, too.)

The problem is that search for a word that occurs in two places in the real content will soon show you dozens of "similar pages" under librarything, being all language variants of the two. Is that it?

Legitimate SEO (as opposed to secrets offered by spam vendors) is about giving out stuff that will be indexed well without seeming to their algorithms to be tuning your results specially for the indexer, because that gets you banned altogether. Right?

8LA2 Första inlägget
mar 8, 2007, 1:14 pm

Currently, the Swedish version of LibraryThing is on se.librarything.com. While SE is the country code for Sweden, it is not the language code for the Swedish language (SV). As a matter of fact, SE is the language code for the Northern Sami language, which is also spoken in Sweden. The Swedish Wikipedia is sv.wikipedia.org and the Northern Sami Wikipedia is se.wikipedia.org.

9xtien
mar 15, 2007, 4:05 pm

I agree with Tim that we should have as few url's as possible, because of the searche engines. Having multiple url's point to the same content makes search engines believe you're advertising something and thus they make it harder to find your pages. Can't we have just one url (www.librarything.com) and link to pages in other languages from there?

10kantelier
mar 17, 2007, 9:25 am

I'm staying away from the URL/Marc/ISO discussion but want to express 2 observations:

In the head section of at least this page I'm missing something like


I don't exactly know the consequences of the former observation, but I do know when I google for something that happens to be in my catalog, I get it in all kind of unfamiliar languages but not in English or Dutch.

11bvs
dec 31, 2007, 3:16 pm

One suggestion is to use the same prefix as Wikipedia where applicable. *.wikipedia has many more articles in non english languages, it is used quite heavily by many many people (who won't know ISO 639), and wikipedians have thought about these issues for much longer. As LT becomes more popular, the % of people that know MARC will also go down.

I am not a fan of redirects either (except perhaps in a transition phase).

12boekerij
jan 5, 2008, 1:44 am

> 11

But, AFAIK, Wikipedia--as most other reknown multilingual webstites--is using ISO 639-1 language codes (Alpha-2), except for languages where the latter is not available, using ISO 639-2/T ~= ISO 639-3 language codes (both Alpha-3) instead, as a fallback. Therefore, though most users probably don't even know (neither care, for that matter), nor have ever heard of the name ISO 639, still, users of multilingual websites most probably will be quite common with ISO 639-1 (and, failing the latter one, ISO 639-2/T) already indeed.

Many examples of and pointers at different multilingual websites that are providing a "pick your language of choice" tool have been given before. Take e.g. :
- Recommend Site Improvements : Less Javascript links? (Oct 14, 2006) (and reactions) (cf. the pointer in this topic's Message 3 (above), too)
- Recommend Site Improvements : Scan ISBNs by Bar Code (Oct 23, 2006), containing several functioning examples and pointers. Mind the tooltips with the example originating from Firefox Central | Mozilla Europe, too. They are providing the language name in full, as it is written in that very language indeed--i.e. : as an endonym.
- several others

The question has been bumped up several times--at no avail, yet.

Worse: LT's present "pick your language" tool is defunct in two ways :

1.) (technical)

1.a.) other page elements might hide part of the list, thus making several if not most language choices unreacheable;

1.b.) the languages list might be longer than page canvas height, thus making the latter part of it hidden and unreacheable.

Examples :

(supposed you haven't signed in into that specific language edition; otherwise, you won't get any different language choice tool at all)
- Tim's profile page in Arabic (ar)
- Tim's profile page in Greek (el)
- Tim's profile page in Russian (ru)
- Tim's profile page in Chinese (zh)
- etc.

Exercise :
Using the examples as provided above, try and pick at a glance your language of preference. Good luck!

2.) (usability)

Though at many places where language names translation might provide advanced usability, LT doesn't translate them, yet, providing English language only language names, whatever language the user might have picked, the Pick your language language list is providing the language names in the present language--i.e. : as exonyms instead of endonyms. The latter are far more userfriendly, this the more if one wants to serve an international, multilingual audience.

Examples :
(supposed you haven't signed in into that specific language edition; otherwise, you won't get any different language choice tool at all)
- This page in Arabic
- This page in Greek (el)
- This page in Russian (ru)
- This page in Chinese (zh)
- etc.

Exercise :
Using the examples as provided above, try and pick at a glance your language of preference. Good luck!

If you are lucky, you might find the "LibraryThing.com" link at page bottom, or when using www.LT.com, the "LibraryThing.fr/de/nl/it/es/dk" (*) links. Even if you can image/find out that the latter are links at different language editions of LT, when you language of preference is different, you are in bad luck. LT in your language of preference might be available, you are at your own in trying to find out how to reach it.

(*) The latter are not language codes, but rather country codes (ISO 3166-1 alpha-2). (**)

(**) There is a problem indeed that LT is using country codes (ISO 3166-1 alpha-2) as a stand-in for language codes, thus getting the visitor/user puzzled and giving rise to confusion indeed.

Country or language ?

As is mentioned i.a. by user LA2 in Message 8 (above), intermixing language codes and country codes; worse yet : using the latter as a stand-in for the former, is rather confusing at least.

As a matter of fact, many, if not most, countries in the world are multilingual--either officially, either unofficial, i.e. : in fact.

Example: Dutch language

More than a year ago, LT added country flags with language pages. I thought this was a bad idea. They would be withdrawn soon, Tim said. Worse, with e.g. the Dutch language page, LT provided two and but two country flags, i.e. the one of Belgium (BE/BEL) and the one of The Netherlands (NL/NLD). The latter was and is quite odd, for Dutch (nl) has got official status in five countries, i.e. : in Aruba (AW/ABW), Belgium (BE/BEL), the Netherlands (NL/NLD), the Netherlands Antilles (AN/ANT) and Suriname (SN/SUR), as well as in some supranational organisations as e.g. in the European Union (EU/EUR) and in the Union of South American Nations (ZASG or Unasur or Unasul). Of course, the language (i.e. : Dutch) is a native language in many more countries, as e.g. in France (FR/FRA), Germany (DE/DEU), South Africa (ZA/ZAF), Namibia (NA/NAM) and Indonesia (ID/IDN), and an immigrant language in (i.a.) Australia (AU/AUS), Brazil (BR/BRA), Canada (CA/CAN), New Zealand (NZ/NZL) and the United States of America (US/USA). I failed to understand why any country flags at all would be used to indicate a language, and even if one 'd think the latter would do any good, I was and am puzzled the more when only two of them were added with Dutch.

Example : Belgium

When LT thinks that I am accessing it from Belgium, it delivers a top page yellow box containing a picture of a (deformed) belgian flag and the text: "LibraryThing in your language? You might be interested in www.librarything.fr or www.librarything.nl."

Insult ?

Frankly speaking, the latter is quite odd, too, for Belgium has got more than two official languages indeed. Though Belgium is divided in different linguistic areas--the latter being either officially monolingual, either officially bilingual--, it has got three (official) language "Communities" as it has got three official languages, notably : Dutch (nl), French (fr) and German (de). Why is the latter left out ? Apart from the question whether it is a good idea to try and detect from which country a visitor is accessing LT and offer some "suitable" language choice, leaving out one of the even official languages of that country most probably is a very bad idea, for it might be insulting indeed.

Conclusions :

1. Which language code ?

If LT is targeting at an American style trained English language professional librarians audience only, it might be a good idea to use and stick with ISO 639-2/B or MARC (family) language codes (ANSI Z39.53). In all other cases, using ISO 639-1 (and if the latter is not available, ISO 639-2/T (~= ISO/DIS 639-3)--i.e. : as per RFC 3066/3066BIS, too) might be and most probably is far better an idea indeed.

REMARK : No codes mixing, please
Better (and please) do NOT mix language codes and country codes for this is very confusing (cf. LA2's Message 8 (above), too.)
FYI : Code families
Language codes are ISO 639 family, while geographic codes are ISO 3166 family, out of which countries and dependent territories codes are ISO 3166-1 family.
FYI : Use of case
For quick and easy recognition, language codes (ISO 639 family) are represented most often and by preference in lowercase, while country codes (ISO 3166-1 family) are represented most often and by preference in uppercase.

2. Language names and/or codes

2.1. Endonyms

When space is ready and available, most userfriendly is using the endonyms of the language names--i.e. : the language name in full, in the way it is written in that language itself.

Remark : in some exceptional cases, after a visitor/member has got the opportunity to pick his language of preference, as e.g. at a member's statistics page, exonyms--i.e. : the name of a language in a different language--might be preferred.

2.2. Language codes : target your audience

Cf. Conclusion 1 : MARC (family) language codes (ANSI Z39.53) (American national standard targeting at and used by (American style) trained English language professional librarians. AFAIK, the latter is a rather specific and what's more: a rather narrow audience.

OTOH, the grand public, even in the US--IIRC, e.g. member bvs (Msg. 11) is living in SF Bay Area, CA--is (far) more familiar with ISO 639-1 and ISO 639-2/T language codes, even if they haven't got the slightest idea that those language codes are in fact ISO 639-1 and ISO 639-2/T (fallback)--or RFC 3066/3066BIS, if you 'd prefer the latter indication. At the internet, too, the latter language codes are quite omnipresent. Imagine--or guess?--which language codes will be easier and more familiar and better known when dealing with an international, multilingual general audience of which most are not (American style) trained English language professional librarians.

3. Tooltips

For advanced usability, when using a language code (any language code), add the endonym of that language as a tooltip (and vice versa !).

4. Language name and language code

An even better solution is to add the language code behind or in front of the language name (the latter of course being an endonym).

Example : IATE - IATE : Search by Query.

Mark that with the example, the interface language choice (right hand side page top) mentions the present language (endonym), followed by its ISO 639-1 language code (between brackets).

self-explanatory : fits every language

One of the major advantages of using this type of drop-down list box (and its content) is that is self-explanatory. One doesn't need any "pick your language" hint--whatever the language one would choose to put that message in a way everyone can understand it. At present, the latter is a major problem at LT's, too. This solution would provide a usability boost in re offering interface language choice.

Mark, too, that with the same example, the source language pick is given as an ISO 639-1 language code, followed by the language name in full, the latter as an exonym indeed. Hey, it is in the interface language one has picked before--and one can change at a glance by using the upper right hand side corner self-explanatory interface language choice drop-down list box.

Final

@ bvs : though you mightn't know, apparently, you did and do know (part of) ISO 639, it turns out.

This might do--for now.

Happy New Year to all of you.

Sincerely,
boekerij.

13Anneli
jan 5, 2008, 3:18 am

>12 boekerij:

I'm not sure if I understood everything that boekerij wrote, but I agree that the language selection should be more user friendly. I like the upper right hand corner drop down menu in boekerij's example site IATE - IATE : Search by Query.

14bvs
jan 5, 2008, 4:24 am

Boekerij,
You explain it better in the first para in msg 12. I had exactly the same thing in mind. My point was only that this is not a new problem requiring a new solution and LT should simply do what wikipedia does as far as multi-language support is concerned. The fact that it is ISO 639 is something only pedants (like me) & developers care, not the majority of users.

15timspalding
Redigerat: jan 9, 2008, 2:11 am

I've said it before re this sort of thing. I won't read messages that long. I don't read love letters that long.

And we've been around on these issues before, you've left in a huff after making life hell for everyone else on the Dutch groups—including me for having to referee and endless series of arguments prompted by your incessant translation-reversals—and committing numerous TOS violations. You're not getting your way; you're not even being read. Honestly, what motivates you?

I glanced up and saw anti-MARC griping. There are good, shorter arguments against it but, honestly, it's never going to move us. I know the MARC standard is imperfect—imperfect at its time and very much a product of it. (Today's standards are also a product of their time, of course and are equally guilty of chopping up a beast with few non-arbitrary joints.) But we're not going to change to some international standard and break the connection with library data. Any move from the MARC standard to another is going to be lossy and that's not something we're willing to lose. When you get down to it, the standards you care about are the hobby-horse of a small number of people.

Some stats that do matter:

*There are more libraries in the United States than there are McDonalds, all of them running off the MARC standard.
*There is no sizable body of bibliographic data with languages in any other standard.
*There are at least three orders of magnitude more MARC records in use than Wikipedia entries.
*There are more members of Librarians who LibraryThing than any other group on LT.
*And if evil motives are suspected, we also make a substantial portion of our money selling to libraries. We'd like to talk their language.

As regards LibraryThing's insult to the less than 1% of Belgians who speak Belgium's third official language, German, natively, I can only say two things:

1. Far more Belgian residents speak Turkish natively than German.
2. The German-speakers of Belgium who are offended that we suggest Dutch and French are invited to leave. You may imagine me hurling some nasty anti-Belgian-German slur, that they all drive badly or have bad hair, or whatever.

PS: By the way, just because the MARC standard is US-centric is irrelevant to the fact that MARC is a world standard. It's not that it's winning over local standards or something. It's absolutely and completely dominant, and nobody gives it a second thought. Fighting against it is like arguing that decimal numbers discriminate against people who think they're tree sloths. There are various plans to go beyond MARC. I support them. But LibraryThing is about using data that exists, not planning how to catalog things ten years from now.

16GerardM Första inlägget
jan 31, 2008, 10:50 am

Hoi,
Given the number of languages supported by the MARC standard, essentially the languages supported in the ISO-639-2, the standard is woefully inadequate. There are so many languages accepted in the ISO-639-3 that are just missing, it is painful.
Thanks,
GerardM