Template talk:Lang/Archive 6

Page contents not supported in other languages.
Source: Wikipedia, the free encyclopedia.

Parameter to selectively disable auto-italics in the Lang-xx templates

We need to be able to selectively disable (e.g. with |italic=no) the auto-italicization of non-English content in the {{lang-xx}} templates that auto-italicize ({{lang-es}}, etc.), so that the style is not applied to proper names (e.g. placenames, titles of songs, etc.).

For example, the present code of {{lang-es}} is:

{{Language with name|es|Spanish|''{{{1}}}''|links={{{links|{{{link|yes}}}}}}|lit={{{lit|}}}}}

It hard-coding the italics.

The brute-force way around this is to go template-by-template and do something like:

{{Language with name|es|Spanish|{{#if:{{{italic|}}}|{{{1}}}|''{{{1}}}''}}|links={{{links|{{{link|yes}}}}}}|lit={{{lit|}}}}}

A more elegant solution is to:

  1. Put this test into {{Language with name}}, to do italics automatically by default, but exclude it when |italic=no (or |italic=0, etc., etc.) if passed into it.
  2. Change all the {{lang-es}} type templates that should auto-italicize by default, to do:
    {{Language with name|es|Spanish|{{{1}}}|italic={{{italic|}}}|links={{{links|{{{link|yes}}}}}}|lit={{{lit|}}}}}
    (and whatever other parameters they need, case by case)
  3. Change all the {{lang-ru}} type templates (the non-Latin-script ones) that should not italicize, to do:
    {{Language with name|ru|Russian|{{{1}}}|italic=no|links={{{links|{{{link|yes}}}}}}|lit={{{lit|}}}}}
    (and whatever other parameters they need, case by case)

 — SMcCandlish ¢ >ʌⱷ҅ʌ<  07:09, 30 October 2017 (UTC)

I was hoping you could just put italics around the template when you use it in an article, but that doesn't work:
Spanish: Di me con quien andas....
Spanish: Don Quixote
It looks like a systematic solution within {{Language with name}} is necessary. – Jonesey95 (talk) 13:43, 30 October 2017 (UTC)
Yeah, the presence of the language name necessitates a template-internal fix. There is a grotesque hack one can do in situ, but we should not have to do this, and it's so brittle and ugly that later editors are likely to break or revert it: {{lang-es|<nowiki />''Don Quixote''<nowiki />}} – [Don Quixote] Error: {{Lang-xx}}: text has italic markup (help). An even-worse kluge: {{lang-es|1=<span style="font-style:normal;">Don Quixote</span>}}Spanish: Don Quixote.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  00:39, 31 October 2017 (UTC)
This template's documentation suggests:
{{lang-es|{{noitalic|Don Quixote}}}}
[[Spanish language|Spanish]]: <i lang="es">'"`UNIQ--templatestyles-0000000B-QINU`"'<span class="noitalic">Don Quixote</span></i>
Spanish: Don Quixote
Trappist the monk (talk) 11:30, 31 October 2017 (UTC)

converting to lua

Because it amused me to do it, I have hacked up Module:Lang (I was surprised to see that name still available). Not complete but in this first iteration it appears to correctly render {{lang-??}} for languages supported by MediaWiki (not the whole 900+ languages supported by the {{lang-??}} templates (see Category:ISO 639 name from code templates) so the module will need a table of the language names not supported by MediaWiki. The module supports |italic= and appears to correctly render when that parameter is used. It also appears to handle rtl languages when |rtl= is set. The module doesn't deal well with erroneous input and does not yet support categorization; basic rendering of {{lang-??}} and {{lang}} templates first. In these examples, the live {{lang-??}} template is followed by the module {{#invoke:lang|lang_xx}}:

  • Spanish: Don Quixote{{lang-es}}
    • Spanish: Don Quixote|italic=yes
  • German: Don Quixote{{lang-de}}
    • German: Don Quixote|italic=no
  • Spanish: Don Quixote{{lang-es}}
    • Spanish: Don Quixote|italic=
  • Hebrew: הורביץ, אלוף ("לופי"){{lang-he}}
    • Hebrew: הורביץ, אלוף ("לופי")|italic=no |rtl=yes
    • [[Hebrew language|Hebrew]]: <span lang="he" dir="rtl" style="font-style: normal;">הורביץ, אלוף ("לופי")</span>

Trappist the monk (talk) 14:46, 31 October 2017 (UTC)

Schweet. I'm not sure what the "for languages supported by MediaWiki" means; we'd want it, surely, to try to do the right thing for any arbitrary value given for ?? in {{lang-??}}. We're more apt to need something like {{lang-fy}} or {{lang-hop}} than {{lang-es}} in most contexts (how often do we really need a wikilink explaining what the Spanish language is)? Ideally, {{lang-en-GB}}, etc. would also work after the Lua adaptation, since we have specific articles on various dialects of English. I guess that's a lot of work, but hopefully the {{lang}} code with 900+ of these already worked up can be dumped and munged in a way that makes it easy to adapt to the new Lua code. If there's a convenient way to extrapolate the language code to WP article correspondences in an array that is included that would probably make maintenance and expansion easier.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  16:20, 31 October 2017 (UTC)
for languages supported by MediaWiki refers to the languages supported by the magic word
ISO 639-1
code ar (Arabic) is supported:
{{#language:ar|en}} → Arabic
but ISO 639-2 code ara (also Arabic) is not:
{{#language:ara|en}} → ara
Of those languages that are supported, there are likely to be differences:
in this case 'Western Frisian' agrees with the ISO 639 custodians; see loc 639-1 and 639-2, and sil 639-3
I think that the rule we can apply to 639-2 and -3 language codes is to fall back on 639-1 when there is a 639-1: code araar; fryfy; etc. We can keep a table specifically for fall back codes and another table to hold language names for 639-2 and -3 codes that don't fall back to 639-1 (Hopi, for example)
Trappist the monk (talk) 17:21, 31 October 2017 (UTC)
I haven't been following the discussion, so apologies if this is irrelevant, but there exists Module:Language. – Uanfala 17:48, 31 October 2017 (UTC)
Yep, am aware of that. I haven't given it a close line by line reading but to me it looks to be more tailored to Wiktionary's needs than to Wikipedia's needs. I'm not opposed to merging this with that if it makes sense to do so.
Trappist the monk (talk) 17:59, 31 October 2017 (UTC)
I support the module-ization of this template, especially if it means that categories like Category:Articles containing unknown ISO 639 language template will be easier to deal with. I spent a while creating (hundreds?) of ISO 639 templates and matching categories for obscure languages; the error category should more properly be used to track actual errors. I would be happy to help create a list of language codes and their matching full language names. – Jonesey95 (talk) 20:05, 31 October 2017 (UTC)
If there should be an array matching ISO 639-3 codes to language names, then it should ideally be in sync with Module:Language/data/ISO 639-3 as well as – whenever possible – with the comprehensive series of ISO 639:xxx redirects. — Preceding unsigned comment added by Uanfala (talkcontribs) 20:17, 31 October 2017 (UTC)
Perhaps better for initial experimentation is Module:Language/data/iana_languages which also has 639-1 codes. That file may be dated since a comment at the top of it reads 2014-04-10 and I haven't wrapped my brain around the documentation in Module:Language/name/data.
Trappist the monk (talk) 21:05, 31 October 2017 (UTC)
The documentation for this template seems to suggest that
BCP47 (IETF language tags) should be used when choosing the code for the template. That being the case, Module:Language/name/data would seem to be the best choice ... except that it includes a file called Module:Language/data/wp languages
which has, as its accompanying 'documentation', this: "Wikimedia wikis uses some non-standard codes and a subset of IANA codes, plus composite codes". Why? Why 'spoil' the standard that way?
Trappist the monk (talk) 23:16, 31 October 2017 (UTC)
Erutuon might have an opinion here, as he was the last to work on this module. – Uanfala 23:25, 31 October 2017 (UTC)
And there is more ... There are lang-xx templates that don't use BCP47 codes:
Presumably we can troll through Category:Articles containing unknown ISO 639 language template and find what appear to be legitimate language codes that aren't part of 639-anything and create a table for use by the module.
Trappist the monk (talk) 12:56, 1 November 2017 (UTC)
One answer to my 'why spoil the standard' question might be because the 'official' name associated with code el is 'Modern Greek (1453-)' so we use Module:Language/data/wp languages to overwrite the 'official' name with 'Greek'.
Trappist the monk (talk) 16:56, 1 November 2017 (UTC)
The fallback idea sounds good to me. I have to note that many 639-2 codes do not work, even with the current non-Lua templates (including some of the other Frisian languages/dialects). I think we have a big win if end up with a system in which none of the lang-family templates will redlink (or break entirely) unless a) we have no article or the language/dialect, or b) the code given is simply invalid.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  02:31, 1 November 2017 (UTC)
Module:Language/name/data has flaws. For example, that data would return these language names for these codes:
fy → Frisian
frr → Northern Frisian
frs → Eastern Frisian
fry → West Frisian
stq → Saterfriesisch
So, I've created an override table in Module:Lang/data so that we can override the BDP47 language names if needs be. The initial values assigned produce these results
fyWest Frisian: some text
frrNorth Frisian: some text
frs
East Frisian Low Saxon
: some text
fryWest Frisian: some text
stqSaterland Frisian: some text
Trappist the monk (talk) 15:56, 2 November 2017 (UTC)

I saw that my name was mentioned above. It's a wide-ranging discussion, and I'm not sure exactly what I'm being asked.

But I guess I can explain something about Wiktionary's treatment of languages and scripts, which is very different. Language codes that are allowed in language-tagging and linking templates are listed in language data modules. Each language code corresponds to a single language name that we call a "canonical name". The canonical name appears in level-2 headers in entries. There are two subtypes of languages: what could be called "full" language codes are allowed in regular linking or tagging templates, and etymology languages (codes for subtypes of full languages) are allowed in etymology templates: for instance, grc-att for Attic Greek, a dialect of Ancient Greek (grc). Some of the codes are Wiktionary-specific: for instance, ine-pro for Proto-Indo-European.

We also have a script data module that contains information on scripts, such as Ustring patterns for the Unicode characters included in the script. Each language may have an array of script codes indicating which scripts it is written with, either in real life, in linguistic works, or on Wiktionary (for instance, {"Latn", "Brai", "Shaw", "Dsrt"} for English). This list of scripts is used by findBestScript in wikt:Module:scripts to automatically detect the script of text that is being tagged. Thus, script codes are generally not required in tagging templates.

Script codes are used as class names (for instance, <span class="Latn" lang="en">word</span> for English). Many script codes are from ISO 15924 (for instance, Arab); others were created to allow wikt:MediaWiki:Common.css to select different fonts for a variant of the script, either for their looks or their character set. (The script code fa-Arab has the same character pattern as Arab, but having a distinct script code for Persian allows it to be displayed in Nastaliq-style fonts. We don't use the ISO 15924 code Aran because it does not involve a different character set.)

We don't allow any modifiers to be appended onto language codes: placing ru-petr1708, ru-Cyrl, or en-US into a linking or tagging template results in a module error.

As you can see, Wiktionary is much more restrictive than Wikipedia. Many of the features are probably not applicable, but at least you have an overview. One feature that would be nice is script recognition, at least if Wikipedia starts adding CSS classes for scripts. (Or the module could add the very verbose inline CSS that is currently found in {{Script}} and its subtemplates. But inline CSS is best avoided because, to overrule it, you have to add important! to every rule in your personal stylesheet that contradicts it.) I started Module:Language/scripts and Module:Language/scripts/data based on wikt:Module:scripts and wikt:Module:scripts/data, but didn't go anywhere with it, because it would only be for my own use until Wikipedia has a coordinated approach to script tagging and the associated CSS.

As to Module:Lang, I have no objections to it being merged with Module:Language eventually if possible. It's unfortunate to have two modules that do similar things. I did attempt to make Module:Language generate the content of {{lang}} and considered the idea of doing the same for the lang-xx templates, but I don't have the motivation to sort out the crazy IETF tags (crazy from my perspective because I don't have to deal with them on Wiktionary), non-Wiktionary language codes, language names, colons, italicization, and the lack of any CSS classes for scripts. But if the distinct purposes of generating a Wiktionary-compatible tagging and linking template ({{wikt-lang}}) and a Wikipedia-style one ({{lang}}) can be coordinated, that would be great. — Eru·tuon 07:24, 4 November 2017 (UTC)

Thanks for that; it'll take a bit to digest but my initial reaction is that there is a basic lack of compatibility between Wiktionary and en.wiki in that en.wiki attempts, for the most part, to adhere to IETF/IANA language coding and attempts to minimize custom language coding. I do like the css-classes-for-scripting idea.
I think that you were mentioned here because you were the last editor to touch Module:Language/name/data so I guess that the mentioning editor presumed that by doing so, you had become the expert.
Trappist the monk (talk) 10:09, 4 November 2017 (UTC)
Another feature I forgot to mention is that Wiktionary uses a data module to determine whether a script is RTL. It's probably a bad idea to set text direction for a given language, because languages are written in multiple scripts, and direction is a characteristic of the script, and as script direction can be determined automatically, editors should not have to deal with it at all. (On Wiktionary, this item in the data module is almost never used, because text direction is set for many RTL scripts in wikt:MediaWiki:Common.css with the CSS property direction: rtl;.) I've added script direction data to Module:Language/scripts/data.
Another thing I could mention is that we use language and script objects that have several methods (for basic things like retrieving the code and canonical name, or more complex things like retrieving the scripts used by a language, transliterating, or counting the characters in a string that belong to the script). These methods are shared across all objects of the same type using a metatable. This is convenient, because you can use a single variable for the language or the script and retrieve the code or the name from it when needed, and cleaner, because the code that handles the retrieval of the code and name is removed from the functions that use the code and name. But an object is probably overkill at this point if just the code and name are used. Another possibility would be table containing the code and first name (for instance, { code = "en", name = "English" }). — Eru·tuon 21:20, 4 November 2017 (UTC)

categorization

I've added categorization code to the module. The live {{lang-??}} and {{lang}} templates use {{lang}} to do their categorization. {{lang}} will add Category:Articles containing unknown ISO 639 language template when there isn't a Category:ISO 639 name from code templates template that matches the language code. The module doesn't use these templates so it uses a different category when the code isn't in Module:Language/name/data: Category:Articles containing unknown language template codes – that name could certainly be less wordy and more concise. Suggestions?

The live templates do not categorize pages that are not in article space. For the time being, I have disabled that discrimination in the module for the purposes of debugging so you will see red-linked categories produced by the module at the bottom of this page (all hidden categories if 'Show hidden categories' is checked at Special:Preferences#mw-prefsection-rendering). If {{lang}} and {{lang-??}} templates ever call Module:Lang, namespace discrimination will be reinstated.

The red-linked categories attached to this page are Category:Articles containing Frisian-language text because 'West Frisian' (the current category name) does not match the code/name defined by BCP47+Module:Language/data/wp languages; Category:Articles containing Hopi-language text because there is no the {{ISO 639 name hop}} template and therefore has no matching category. For the Hopi case, the live {{lang-hop}} dumps all Hopi-language instances into Category:Articles containing non-English-language text. I think that philosophy is misguided. I think that red-linked categories are more likely to get 'fixed' than a blue-linked dumping-ground category.

Trappist the monk (talk) 09:44, 2 November 2017 (UTC)

Yeah, I wasn't going to get into those yet. Getting all the ISO stuff to work would be first priority, but it would be nice to support codes introduced by others like Glottolog, at least for languages and dialects with no ISO code.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  17:27, 1 November 2017 (UTC)
I'm pretty sure that {{ISO 639 name hop}} has existed since 2011, but it looks like the non-existence of the category causes the generic categorization. You can see a couple hundred other such templates with gaps at Category:ISO 639 name from code templates without a category. I created a bunch of them, but it gets tedious, especially because three other categories are also requested by the documentation for each ISO 639 name xxx template. A bot might be helpful in creating all of these red-linked categories. – Jonesey95 (talk) 00:32, 2 November 2017 (UTC)
You're right, I've edited my post.
I can now see why this 'simple' task of converting the {{lang}} and {{lang-??}} templates to a module has been started before but never been completed. On the face of it, conversion to a module is simple but then you look under the bonnet ...
Trappist the monk (talk) 09:44, 2 November 2017 (UTC)
Keep going! If anyone can do it, you can. Let us know how we can help. – Jonesey95 (talk) 21:45, 2 November 2017 (UTC)
Category:Articles containing unknown language template codes has become Category:Lang and lang-xx template errors. I have also created Category:Lang and Lang-xx templates using Module:Lang to track those templates that are using the module during the transition period. Once all templates that can be have been changed to use the module, this category can go away.
Trappist the monk (talk) 13:06, 6 November 2017 (UTC)

translation and transliteration

The {{lang-??}} templates have support for translation rendering and some support transliteration rendering. I have attempted to add that support to Module:Lang.

Literal translation
{{lang-de|Im Westen nichts Neues|lit=In the West Nothing New}}
  • German: Im Westen nichts Neues, lit.'In the West Nothing New'
    • [[German language|German]]: <i lang="de">Im Westen nichts Neues</i>, <small>[[Literal translation|lit.]]&thinsp;</small>&#39;In the West Nothing New&#39;
{{#invoke:lang|lang_xx_italic|code=de|text=Im Westen nichts Neues|italic=|translation=In the West Nothing New}}
  • German: Im Westen nichts Neues, lit.'In the West Nothing New'
    • [[German language|German]]: <i lang="de">Im Westen nichts Neues</i>, <small>[[Literal translation|lit.]]&thinsp;</small>&#39;In the West Nothing New&#39;
Literal translation with generic transliteration
{{Lang-el|Θεοτόκος|links=yes|translation=God-bearer|translit=Theotokos}}
  • Greek: Θεοτόκος, romanizedTheotokos, lit.'God-bearer'
    • [[Greek language|Greek]]: <span lang="el">Θεοτόκος</span>, <small>[[Romanization of Greek|romanized]]:&nbsp;</small><span title="Greek-language romanization"><i lang="el-Latn">Theotokos</i></span>, <small>[[Literal translation|lit.]]&thinsp;</small>&#39;God-bearer&#39;
{{#invoke:lang|lang_xx_inherit|code=el|text=Θεοτόκος|italic=no|translation=God-bearer|translit=Theotokos}}
  • Greek: Θεοτόκος, romanizedTheotokos, lit.'God-bearer'
    • [[Greek language|Greek]]: <span lang="el" style="font-style: normal;">Θεοτόκος</span>, <small>[[Romanization of Greek|romanized]]:&nbsp;</small><span title="Greek-language romanization"><i lang="el-Latn">Theotokos</i></span>, <small>[[Literal translation|lit.]]&thinsp;</small>&#39;God-bearer&#39;
Literal translation with ISO 843 transliteration
{{
transl
}}
which does; confused yet?
  • Greek: Θεοτόκος, romanizedTheotókos, lit.'God-bearer'
    • [[Greek language|Greek]]: <span lang="el" style="font-style: normal;">Θεοτόκος</span>, <small>[[Romanization of Greek|romanized]]:&nbsp;</small><span title="ISO 843 Greek (Greek language) transliteration"><i lang="el-Latn">Theotókos</i></span>, <small>[[Literal translation|lit.]]&thinsp;</small>&#39;God-bearer&#39;

Trappist the monk (talk) 14:06, 2 November 2017 (UTC)

Well, you were definitely right about this being more complicated than it seemed! Definitely appreciate the effort you're putting into this. We've needed to Lua-ize this for so long (and I don't have the Lua skillz to do it).  — SMcCandlish ¢ >ʌⱷ҅ʌ<  17:07, 2 November 2017 (UTC)

I got to wondering about the html/css markup around transliteration renderings when it occurred to me that the module doesn't (because {{

transl
}} doesn't) include the lang attribute in the enclosing <span>...</span>:

{{transl|ar|al-Khwarizmi}}al-Khwarizmi
<span title="Arabic-language romanization"><i lang="ar-Latn">al-Khwarizmi</i></span>

For this example, shouldn't the module output something like this:

<span lang="ar-Latn" title="Arabic transliteration" class="Unicode" style="white-space:normal; text-decoration: none">al-Khwarizmi</span>

As I understand it, in css, white-space:normal and text-decoration:none are the defaults. If they are used here then that suggests that the css class="Unicode" class somehow alters those two properties. Where is class="Unicode" defined? Pinging Editors

Dbachmann, the author of {{transl}}, and Ruud Koot, the author of these edits
.

Trappist the monk (talk) 12:53, 14 November 2017 (UTC)

Found it, and it appears to be gone:
So then, does that not mean that the html/css markup around transliteration renderings should be:
<span lang="ar-Latn" title="Arabic transliteration">al-Khwarizmi</span>
Trappist the monk (talk) 13:46, 14 November 2017 (UTC)
Changed. Results can be seen in the transliteration example above.
Trappist the monk (talk) 15:52, 16 November 2017 (UTC)

links=no

If I have a template that renders like this:

{{lang-he/sandbox|פרת|Perat|lit=Euphrates|links=}}Hebrew: פרת, romanizedPerat, lit.'Euphrates'

If I set |links=no, shouldn't that unlink the primary language (Hebrew) and the transliteration and literal translation static texts?

{{lang-he/sandbox|פרת|Perat|lit=Euphrates|links=no}} → Hebrew: פרת, romanized: Perat, lit.'Euphrates'

Trappist the monk (talk) 00:03, 5 November 2017 (UTC)

I would certainly think so. Another issue I was just thinking of again today (and grinding my teeth) is that we need a way to suppress these things entirely e.g. with a |labels=no and |labels=lang; we don't need the language name, the "translit.", or the "lit." labels after the first occurrence in the same block of material, or sometimes we need the language one only, e.g. when comparing cognates. What we're doing now is using the template once, then abandoning it for manual markup with a {{
lang-xx}} and driving readers nuts by repeating the same crap over and over at them as if they have dain bramage. ;-/  — SMcCandlish ¢
 >ʌⱷ҅ʌ<  14:18, 5 November 2017 (UTC)
For the time being, I'm going to limit 'new features' to the |italic= switch and perhaps unlinking the translation and transliteration static text so that I can think about making the templates function correctly given a variety of inputs. That I think is mostly done so I'm about to take the module live on a handful of {{lang-??}} templates to see what happens – to see if anyone outside of this conversation notices. You should probably start a new wish-list topic for the label thing.
Done, below.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  14:28, 6 November 2017 (UTC)
Trappist the monk (talk) 21:04, 5 November 2017 (UTC)

sandbox testing

Category:Lang-x templates lists several templates that have sandboxen. Of those, where the template also has a /testcases page, I have edited the sandbox to use Module:Lang. So far, these:

Template:Lang-ar/testcases
Template:Lang-arc/testcases
Template:Lang-el/testcases
Template:Lang-en/testcases
Template:Lang-es/testcases
Template:Lang-hbs/testcases
Template:Lang-he/testcases

Doing this found a handful of coding errors that have been fixed. The interesting case in these templates is {{lang-hbs}} Serbo-Croatian. This language uses both Latin characters and Cyrillic characters (not at the same time, I think) so the issue of italics arises. Rendering is controllable with |italic=no but it might be better to create another script parameter (|script= is currently used to override |code= when rendering the transliteration tool tip – though I don't know how useful that actually is). In this scheme, if |lang-script= is set to a valid IANA script, then we would write <span lang="hbs-<lang-script>> and if not Latn would override whatever |italic= is to no-italic.

The previous sandbox version of {{lang-hbs}} had some module code that would automatically transliterate the input text to the other script. That apparently didn't ever become live because there are/were problems transliterating Cyrillic to Latin in the presence (or lack – I'm not quite sure) of certain Unicode characters. I don't think that Module:Lang wants to go there.

The other one that I have found, though I've done nothing with it yet, is {{lang-sco}}. That template introduces |l=, an alias of |link=; |i=, to control italic rendering; and |abbr=, to replace the langauge name with an unlinked abbreviation of the name. I am sure that we really don't need |l= because in the text editor l looks too much like 1 and because to someone unfamiliar with the internals of these templates, |l=no is meaningless; this latter reason applies to |i= as well. Is there a standardized list of language abbreviations? If yes, then perhaps we should support |abbr=; if no, then we should not support |abbr=. Without a standard list, editors can (and will) write whatever suits them but what they concoct may not be understandable by readers and other editors.

Trappist the monk (talk) 12:55, 3 November 2017 (UTC)

I suppose one could poke through the hundreds of templates to look for parameters, but another way to do it would be to convert the templates one by one to the new module, and have module code that detects unsupported parameters. Like the proposed |script=, such parameters could be evaluated for their utility and potentially incorporated into the module. Parameters that are determined to be unneeded or non-standard could be removed or converted to standard parameters. – Jonesey95 (talk) 14:53, 3 November 2017 (UTC)
Isn't [poking] through the hundreds of templates to look for parameters more-or-less the same as [converting] the templates one by one because to do the latter you are in effect doing the former? These templates are basically similar enough that we will see the oddball parameters straight away; no need for the module to detect anything. Compare this edit to {{lang-el/sandbox}} as an example or this apparently more complex edit to {{lang/sandbox}}.
Trappist the monk (talk) 15:39, 3 November 2017 (UTC)
Modifying the templates will tell us whether or not the unusual parameters are actually used, not just whether they exist in the template. Unused parameters can be discarded. – Jonesey95 (talk) 20:51, 3 November 2017 (UTC)
Editing {{lang/sandbox}} to use Module:Lang showed how it is necessary for the module to support IETF language tags so I've modified the module accordingly. When processing {{lang}}, because that template receives its language code directly from the template in wikitext, editors will be creative in how they set that parameter. The module now supports the most commonly used (I think) IETF tags:
primary language code-script-region
where
primary language code is the two- or three-character ISO 639 language code lowercase (ll)
script is the four-character IANA script code; title case (Ssss)
region is the two-character IANA region code; uppercase (RR)
in these forms
ll
ll-Ssss
ll-RR
ll-Ssss-RR
The module emits an error message when IETF tags don't match these forms or do look right but have invalid content. These tests should probably be added to the {{lang-??}} so that we can, if appropriate create new templates that might make use of it (perhaps {{lang-hbs-Cyrl}} and {{lang-hbs-Latn}}).
Trappist the monk (talk) 15:55, 3 November 2017 (UTC)
I don't know how the ISO 639 name xx templates fit into all of this, but this list of redirects to Template:ISO 639 name ru might provide some useful examples of scripts that are in use. Some of the redirects appear to be for invalid scripts. – Jonesey95 (talk) 20:51, 3 November 2017 (UTC)
This is why we want to make a module. The article Film speed transcludes {{lang|ru-Cyrl|ГОСТ}} which transcludes {{ISO 639 name ru-Cyrl}} which redirects to {{ISO 639 name ru}} which returns 'Russian' so that the article is properly categorized in Category:Articles containing Russian-language text. With the module, Film speed transcludes {{lang|ru-Cyrl|ГОСТ}} which invokes Module:Lang which renders and categorizes in one go.
I imagine that the others serve similar purposes. {{ISO 639 name RU}} is wrong-case language code; should be ru because RU is the ISO 3166 country code for Russian Federation. {{ISO 639 name ru-Cyril}} is a misspelling of the IANA script code Cyrl. I have no idea where ru-1708 came from. Its only use is in Russian Empire; the redirect {{ISO 639 name ru-1708}} was created at the same minute, both by Editor OwenBlacker who can perhaps explain.
I think that the module handles all of these correctly:
{{lang/sandbox|ru-Cyrl|ГОСТ}} → [ГОСТ] Error: {{Lang}}: script: cyrl not supported for code: ru (help)
{{lang/sandbox|ru-Cyril|ГОСТ}} → [ГОСТ] Error: {{Lang}}: unrecognized variant: cyril (help)
{{lang/sandbox|ru-Latn|GOST}}GOST
{{lang-ru|ГОСТ|translit=GOST|script=Latn}}Russian: ГОСТ, romanizedGOST
{{lang/sandbox|RU|ГОСТ}}ГОСТ
{{lang/sandbox|ru-1708|ГОСТ}} → [ГОСТ] Error: {{Lang}}: unrecognized variant: 1708 (help)
Trappist the monk (talk) 22:45, 3 November 2017 (UTC)
That is an excellent explanation. I look forward to getting rid of the current morass of hundreds of templates, redirects, and other madness. Keep up the good work. – Jonesey95 (talk) 23:01, 3 November 2017 (UTC)
Hey there, saw your {{
regex -(\d{4}|[a-z]{5,8})), merging templates together like this is an awesome project. Anything that makes it easier for editors to add language tags to content gets my support :) — OwenBlacker (talk
) 23:48, 3 November 2017 (UTC)
Are you sure? There does not appear to be a 1708 variant listed. There is this, extracted from the current IANA language-subtag-registry file:
%%
Type: variant
Subtag: petr1708
Description: Petrine orthography
Added: 2010-10-10
Prefix: ru
Comments: Russian orthography from the Petrine orthographic reforms of
  1708 to the 1917 orthographic reform
Same thing? de-1911 and de-1996 yes, but the others that you mentioned, no. The data files that the new Module:Lang depends on aren't necessarily current so at the moment I'm working on code that will extract language, script, and region information from the language-subtag-registry file. Currently there is no 'variant' data file but that could be extracted as well.
Trappist the monk (talk) 00:44, 4 November 2017 (UTC)
I have extended the iana data extraction tool so that it also extracts variant data. The result is Module:Language/data/iana_variants. With that data module, and a bit of new code, Module:lang can support:
{{lang/sandbox|ru|Россійская Имперія}}Россійская Имперія
{{lang/sandbox|ru-Cyrl|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)
{{lang/sandbox|ru-Cyrl-RU|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)
{{lang/sandbox|ru-Cyrl-RU-petr1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)
{{lang/sandbox|ru-petr1708|Россійская Имперія}}Россійская Имперія
but rejects improperly formed tags and emits an error message:
{{lang/sandbox|RU|Россійская Имперія}}Россійская Имперія
{{lang/sandbox|ru-Cyril|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: unrecognized variant: cyril (help)
{{lang/sandbox|ru-Cyrl-ru|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)
{{lang/sandbox|ru-Cyrl-RU-Petr1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)
{{lang/sandbox|ru-1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: unrecognized variant: 1708 (help)
The variant data records in the iana language-subtag-registry file include a Prefix item that specifies the language code used with the variant. For variant petr1708 the Prefix is ru so using that variant with another language code is rejected:
{{lang/sandbox|de-petr1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: unrecognized variant: petr1708 for code: de (help)
These changes also apply to the {{lang-??}} template support in Module:Lang.
Trappist the monk (talk) 20:54, 5 November 2017 (UTC)
BCP47 says that IETF language tags are case insensitive so I have relaxed the checking to allow any mixture of case. The code does, however, prettify its output (not that anyone will see it):
{{lang/sandbox|RU-cYRL-ru-PeTr1708|Россійская Имперія}} → [Россійская Имперія] Error: {{Lang}}: script: cyrl not supported for code: ru (help)
[Россійская Имперія] <span style="color:#d33">Error: {{Lang}}: script: cyrl not supported for code: ru ([[:Category:Lang and lang-xx template errors|help]])</span>
I have also added support for three-digit region codes:
{{lang/sandbox|es-419|Spanish in Latin America and the Caribbean}}Spanish in Latin America and the Caribbean
Trappist the monk (talk) 13:23, 6 November 2017 (UTC)
Fantastic work. Should we also be warning against or disallowing language tags with suppressed script codes, e.g. ru-Cyrl?
Quoth (talk) 11:51, 6 November 2017 (UTC)
I have not thought about that. Can you make a separate wish-list topic to hold this and other idea so that it/they don't get lost?
Trappist the monk (talk) 13:23, 6 November 2017 (UTC)
I set up a section for that, and put both my and Quoth's items in it.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  14:28, 6 November 2017 (UTC)

iana data

Module:Lang uses Module:Language/data/iana languages, Module:Language/data/iana scripts, and Module:Language/data/iana regions which are, I believe, derived from the 2014-04-10 IANA language-subtag-registry file. There is a new version that is current as of 2017-08-15. I believe that we should update our data files to be inline with the current registry file. To that end I have cobbled-up a data extraction tool that creates the tables held in the data files from the IANA source. You can see the result.

Like the current version of the data modules, the data created by the extraction tool does not have codes that are deprecated, codes that have preferred alternatives, nor codes that are marked as private use. I do not believe that there is a need for these particular codes but I could be wrong. I'm going to update the data files. If anyone knows of a reason to include the codes that the tool skips, let us know.

Trappist the monk (talk) 16:16, 4 November 2017 (UTC)

Along these lines I've hacked another data extraction tool that will generate a table for Module:Language/data/ISO 639-3. I have used this tool to update that module and the other tool to update the IANA data modules.
But what about Module:Language/data/wp languages? Anyone know where the data in that module came from? Is there an 'official source'?
Trappist the monk (talk) 20:22, 5 November 2017 (UTC)
problems with the data set

List of native plants of Flora Palaestina (E-O) times out before it can be fully rendered. I guess I'm not all that surprised because the data set (all of those modules mentioned in §iana data) is recompiled every time a {{lang}} or {{lang-??}} template is called (in this case the template is {{rtl-lang}}). The Lua processing time limit is 10 seconds. As an experiment, I forced the module to use only one of the data modules Module:Language/data/iana languages and 'included' it in Module:Lang
with mw.loadData() instead of with require(). The page rendered properly in about 2 seconds. The differences are significant. require() allows the included modules to hold executable code but must be reloaded with every {{#invoke:}} (every 'template' in the wikisource). The modules 'included' with mw.loadData() must not hold executable code but are loaded only once per page.

The obvious solution is to create some sort of static version of the table of tables created by require ('Module:Language/name/data'). These tables don't need to recompiled for every use because they will only change when the standards from which they were created change.

Trappist the monk (talk) 17:54, 17 November 2017 (UTC)

You should be able to do mw.loadData ('Module:Language/name/data'), and the data will not be recompiled each time one of these templates is transcluded. That is the way we load data modules on Wiktionary. — Eru·tuon 20:50, 17 November 2017 (UTC)
That works. Thanks. Failure on my part to grasp this in the documentation: "The value returned from the loaded module must be a table ... [of] booleans, numbers, strings, and other tables" For a long time I somehow misunderstood that (perhaps not necessarily from the documentation; could have been from other reading or conversation) because modules always return tables (even if they are tables of functions – something that is used quite a bit in Module:Citation/CS1. Clearly it means that it doesn't matter how the table is built, just that when the module returns, it can only return a table containing a limited subset of data types.
Trappist the monk (talk) 21:08, 17 November 2017 (UTC)
Exactly. The rationale is that functions can "trap" values from one module invocation that could then be transferred to another, or can otherwise change their behavior each time they are called. (For instance, the iterator function returned by ipairs(array) giving a new index and value from the array each time it's called.) So functions would in many cases make unexpected things happen if they were saved in memory and accessed by multiple invocations. Other types (number, string, boolean, nil) don't behave in this way, so they can safely be saved in a table by mw.loadData, accessed through the metatable of a dummy table, and shared between modules. In any case, you can always try loading a module with mw.loadData, and it'll tell you if you're not allowed to. — Eru·tuon 22:14, 17 November 2017 (UTC)

multiple text scripts in a single template

There are a couple of issues here:

{{lang-abq|Къарча-Черкес автоном область ''Q̇arća-Ćerkes avtonom oblast’''}}

Abaza apparently has both Cyrillic and Latin scripts so the italicized part could be the correct abq-Latn or it could simply be a transliteration of the abq-Cyrl. I don't know how to tell the difference. My gut would say that switching alphabets 'midstream' is inappropriate. The same applies to transliterations; {{{1}}} should not hold text in two alphabets.

Module:Lang detects italic markup in {{{1}}} (also incorrectly finds bold markup – I'll fix that) because the correct way to control italicization of {{{1}}} is with |italic=

All of this suggests that the correct way of writing this would be:

{{lang-abq|Къарча-Черкес автоном область}} {{lang|abq|Q̇arća-Ćerkes avtonom oblast’|italic=yes}}

Trappist the monk (talk) 11:07, 7 November 2017 (UTC)

Trappist the monk, some languages use three scripts (at least) – kk.wp is available in Latin, Cyrillic and Farsi script, for example. It would be convenient if all could be accommodated within a single template, but the sort of workaround you illustrate above could work too. Justlettersandnumbers (talk) 16:47, 7 November 2017 (UTC)

As a solution to this languages-with-multiple-scripts problem, I have renamed the existing {{#invoke:}} parameter |script= to |transl-script= and created a new |script= that applies to the text and to the language code.

In the example above, both alphabets are contained in a single template. That is still wrong and this change does nothing to permit that. But, it does start us on the way to supporting multiple alphabets in a single template as I have suggested at #Wish list for future enhancement

{{#invoke:Lang|lang_xx_inherit|code=abq|text=Къарча-Черкес автоном область|script=Cyrl}}
Abaza: Къарча-Черкес автоном область
[[Abaza language|Abaza]]: <span lang="abq-Cyrl">Къарча-Черкес автоном область</span>
{{#invoke:Lang|lang_xx_inherit|code=abq|text=Q̇arća-Ćerkes avtonom oblast’|script=Latn}}
Abaza: Q̇arća-Ćerkes avtonom oblast’
[[Abaza language|Abaza]]: <i lang="abq-Latn">Q̇arća-Ćerkes avtonom oblast’</i>

Above, because |script=Cyrl, the text is not italicized. When |italic= is not set and |script= is set, the module will apply italic markup only when the specified script is Latn (case ignored). When |italic= is set, it controls:

{{#invoke:Lang|lang_xx_inherit|code=abq|text=Къарча-Черкес автоном область|script=Cyrl|italic=yes}}
Abaza: Къарча-Черкес автоном область
[[Abaza language|Abaza]]: <i lang="abq-Cyrl">Къарча-Черкес автоном область</i>

The module emits an error message if the value assigned to |script= is not recognized:

{{#invoke:Lang|lang_xx_inherit|code=abq|text=Къарча-Черкес автоном область|script=Cyril}}
[Къарча-Черкес автоном область] Error: {{Lang-xx}}: unrecognized script: cyril for code: abq (help)

The module does not now, but will, compare the IETF script subtag provided to {{lang}} or received from a {{lang-??}} to |script=. If they are not the same, the module will emit a mismatch error message.

Another reason to do this? So we don't have to fork a bunch of templates to properly support script subtags. —Trappist the monk (talk) 13:55, 9 November 2017 (UTC)

Revision; |script= is not needed with {{lang}}. Because the template gets the language code directly from {{{1}}}, editors can simply add the appropriate IETF script subtag:
abqabq-Cyrl or abq-Latn
Now emits an error message when the script subtag in |code= does not match the value assigned to |script=:
{{#invoke:Lang|lang_xx_inherit|code=abq-latn|text=Къарча-Черкес автоном область|script=Cyrl}}
[Къарча-Черкес автоном область] Error: {{Lang-xx}}: redundant script tag (help)
This error message should be rare because it should not be necessary to have {{lang-??}} templates that specifically set |code= to a value that includes an IETF script subtag.
I suppose, for completeness, the {{lang-??}} templates should also support |region= and |variant= (also not required in {{lang}}).
Trappist the monk (talk) 14:40, 9 November 2017 (UTC)
I wonder if |transl-script= should be |trans-script= instead, to match the |trans-title= parameter style used in the popular
Citation Style 1 templates. – Jonesey95 (talk
) 15:27, 9 November 2017 (UTC)
Because too close to |transcript=? Because |translit-script= just felt too long? Because {{
transl
}}
is the subsidiary template used by the current {{lang-??}} templates that support transliteration? Of course, none of these are good reasons.
For the most part, there are four different groups, if you will, of parameters in {{lang-??}} templates:
  1. main group has:
    fixed by the {{lang-??}} template – language code; module parameter |code=
    {{{1}}} – text; module parameter |text=
    |script= – language script (only templates rendered by the module); module parameter |script=
  2. transliteration group:
    |translit= or {{{2}}} – transliteration of the text in {{{1}}}; module parameter |translit=
    |script= – not part of {{lang-??}} but introduced in {{Language with name and transliteration}}; module parameter |transl-script=
    |std= – transliteration standard (only templates rendered by the module); module parameter |std=
  3. translation group:
    |lit= or {{{2}}} – literal translation; module parameter |lit=
  4. control group:
    |rtl= – fixed by the template; module parameter |rtl=
    |italic= – italic display of {{{1}}} (only templates rendered by the module); module parameter |italic=
Can't do much about existing template parameters here and now (|lit=? who thought that was a good parameter name?)
Still, your point is taken, I'll change |transl-script= to |translit-script=, |std= to |translit-std=, and the module parameter |lit= to |translation=.
Trappist the monk (talk) 16:12, 9 November 2017 (UTC)
That all looks better to me. If we have both translation and transliteration, we should not have any parameters that are abbreviated "trans" or "transl". That's just begging for confusion. – Jonesey95 (talk) 20:27, 9 November 2017 (UTC)
Would want |lit= to continue working; lots of use that, since it's short and mnemonic for what it outputs.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  17:34, 10 November 2017 (UTC)
The problem with |lit= is that in the mind and in the mouth it too much mimics |translit= whereas |translation= doesn't. A possible, and perhaps better, alias for |lit= instead of |translation= is |literal=. For the time being, |lit= isn't going away. And it you are concerned that typing |literal= or |translation= or even |lit= is too onerous, don't use any of them; positional parameters aren't going away either:
{{lang-he/sandbox|פרת|Perat|Euphrates}}Hebrew: פרת, romanizedPerat, lit.'Euphrates'
Trappist the monk (talk) 20:40, 10 November 2017 (UTC)
Following up on my musing that for completeness, the {{lang-??}} templates should also support |region= and |variant=, implemented:
{{#invoke:Lang|lang_xx_inherit|code=ru|text=какой-то кириллический текст|script=Cyrl|region=ru|variant=luna1918}}
[какой-то кириллический текст] Error: {{Lang-xx}}: script: cyrl not supported for code: ru (help)
[какой-то кириллический текст] <span style="color:#d33">Error: {{Lang-xx}}: script: cyrl not supported for code: ru ([[:Category:Lang and lang-xx template errors|help]])</span>
Trappist the monk (talk) 13:53, 10 November 2017 (UTC)

live testing

I have implemented the module in {{lang-aa}}, {{lang-bn}}, and {{lang-grc}}.

Trappist the monk (talk) 14:42, 6 November 2017 (UTC)

+{{lang-ku}}, {{lang-mix}}, and {{lang-sco}}
Trappist the monk (talk) 13:21, 7 November 2017 (UTC)
+{{lang-aec}}, {{lang-af}}, {{lang-ain}}, {{lang-ain}}, {{lang-akk}}
Trappist the monk (talk) 17:16, 11 November 2017 (UTC)

switching |lang= to the module

I am at the point of switching {{lang}} to use the module. I don't anticipate that this will cause problems. But, with 625,000-ish transclusions, problems may arise. The number is so large because a majority of the {{lang-??}} templates use {{lang}} to create the <span>...</span> around the text. I have disabled the italic checking for {{lang}} because such checking will detect the hardcoded italic markup added by many (most) of the {{lang-??}} templates that have not been converted to the module.

Objections to proceeding?

Trappist the monk (talk) 16:54, 13 November 2017 (UTC)

Sounds good, though it may not be idea for lang-xx to be transcluding lang this way; better that it does this in Lua with a call to the same function, to reduce the transclusion count.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  21:05, 13 November 2017 (UTC)
The module supports both. The old versions of {{lang-??}} transclude {{lang}}. {{lang-??}} templates that use the module don't transclude {{lang}} because the module does it all.
Because the old templates transclude {{lang}}, the module will be doing the {{lang}} work that is now done by the wikitext version of {{lang}} until all of the {{lang-??}} templates are converted to the module.
Trappist the monk (talk) 21:41, 13 November 2017 (UTC)

Switched.

Trappist the monk (talk) 23:23, 18 November 2017 (UTC)

what about lang-?? with this ?

From {{lang-am}}:

[[Help:Multilingual support (Ethiopic)|<sup><span class="t nihongo icon" style="color:#00e;font:bold 80% sans-serif;text-decoration:none;padding:0 .1em;">?</span></sup>]]

which gives us the '?' and a link to Help:Multilingual support (Ethiopic):

{{lang-am|text}}
Amharic
: text

An insource search conducted in the template namespace found:

{{Lang-am}}
{{Lang-ti}}
{{Lang-gez}}

All of these are Ethiopic languages. If this is all that use this markup, then, for standardization, it would seem best to discontinue support.

Trappist the monk (talk) 19:57, 13 November 2017 (UTC)

Not sure I follow.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  21:05, 13 November 2017 (UTC)
What don't you understand?
Trappist the monk (talk) 21:45, 13 November 2017 (UTC)
@Trappist the monk: I work very closely with articles containing Ethiopic script. I agree with discontinuing support. Most modern browsers support rendering Ethiopic script. This is an outdated help page that should be archived. It is no longer necessary. The ? is not needed or helpful any more. —አቤል ዳዊት?(Janweh64) (talk) 08:31, 8 December 2017 (UTC)
In fact, it has become a page for software developers to add promotional spam. —አቤል ዳዊት?(Janweh64) (talk) 08:44, 8 December 2017 (UTC)

recent changes and lang-ar

I am minded to revert back to this version of the module. A problem was introduced with these edits that made the module ignore the |italic=no setting in {{lang-ar}} so that all Arabic script was rendered in italics font when it should not have been.

The purpose of the module edits was to simplify a handful of if statements. Were this code running on a micro-controller, such optimization might be required. It is not so we can afford to spend some processor cycles and use up memory space evaluating if 'yes' == args.italic then. There is the added benefit that editors who come after us can know specifically what it is that is needed at that particular point in the code.

Trappist the monk (talk) 11:16, 18 November 2017 (UTC)

Because we managed to break the module and because there are currently some 41k transclusions of it, I have protected it and created Module:Lang/sandbox.
Trappist the monk (talk) 11:32, 18 November 2017 (UTC)
Additionally, I have started Module:Lang/testcases; results at Module talk:Lang/testcases. The sandbox produces different (correct) results for these tests.
Trappist the monk (talk) 14:38, 18 November 2017 (UTC)

Auto-italicization of Latin scripts

The module currently seems to auto-italicize language tags which include a Latn script code, while the previous template didn't. Because the previous template didn't automatically do it, the correct way to format these words was to italicize them using wiki markup, which means that the module now appears to render them with two sets of encapsulating <i> tags (presumably one from the mark-up and one from the module). This also means the module auto-italicizes Latin scripts some of the time, but not most of the time (such as in the common cases where the Latn script is redundant/should be suppressed, e.g. for fr, es, it). I think this should be reverted to the previous behaviour to both avoid this inconsistency and the duplicate HTML.

If, however, anyone wants to go the opposite direction and make the module output for Latin scripts more consistent by auto-italicizing all Latin scripts, I'd also be fine with the relatively small amount of redundant HTML generated by the current formatting in order to remove the need for doing it manually in the future. That might be doable by checking a language's suppressed script codes for Latn when no script tag has been supplied, and italicizing it if true. – Quoth (talk) 16:12, 19 November 2017 (UTC)

Examples of what you mean are always appropriate. Which template are we talking about? Many of the {{lang-??}} templates unconditionally italicize the text in {{{1}}}.
This is a work in progress. It is not possible (for this human, at least) to, in one go, switch all of the {{lang}} and {{lang-??}} templates to use Module:lang.
Trappist the monk (talk) 18:09, 19 November 2017 (UTC)
Right, sorry: you can find an example on this page under the Chinese Mandarin entry with its pinyin transliteration bàng, which uses cmn-Latn; and I'm only talking about usage of the main {{lang}} template. – Quoth (talk) 21:59, 19 November 2017 (UTC)
I'm having a difficult time understanding what the problem is. If I take a step back and view Open back unrounded vowel with the previous version of the template (the last one before Module:lang was introduced), the bàng text looks the same (to me) as it does when that page is rendered with the module. See for yourself:
  1. this link opens the edit window for the previous version of {{lang}}
    https://en.wikipedia.org/w/index.php?title=Template:Lang&action=edit&oldid=775049579
  2. in the Preview page with this template box put:
    Open back unrounded vowel
  3. click the adjacent Show preview button
That is how it 'used' to look. Compare it against the rendering made by the live template. How are they different? They don't seem different to me.
Trappist the monk (talk) 23:15, 19 November 2017 (UTC)
The look hasn't changed, only the HTML markup and the circumstances around when the text will be auto-italicized by {{lang}}. If you inspect the HTML you should see two sets of surrounding <i> tags instead of one; one set from the wiki markup, which was previously required for formatting, and one from the new lang module output. – Quoth (talk) 21:13, 20 November 2017 (UTC)
I did your experiment. First I viewed Open back unrounded vowel with the template as it was before the switch to the module (old). I right-clicked view source and to see the html the en.wiki serves, copy/pasted the markup for bàng. I repeated the procedure with the current template/module (new). Here are the results:
<span lang="cmn-Latn"><a href="/wiki/Pinyin" title="Pinyin"><i>b<b>à</b>ng</i></a></span> – old
<span lang="cmn-Latn"><a href="/wiki/Pinyin" title="Pinyin"><i>b<b>à</b>ng</i></a></span> – new
These look the same to me. Is it possible that you are looking at a cached version of an older page?
Trappist the monk (talk) 21:58, 20 November 2017 (UTC)
Curious. I've cleared my caches, and purged the page, but on the current version of that article I see this markup:
<span lang="cmn-Latn"><i><a href="/wiki/Pinyin" title="Pinyin"><i>b<b>à</b>ng</i></a></i></span>
I should note that I'm looking at the publicly available page, because I'm unable to use the template edit or preview functionality due to it being protected. – Quoth (talk) 20:00, 21 November 2017 (UTC)
I'm seeing the markup <span lang="cmn-Latn"><i><a href="/wiki/Pinyin" title="Pinyin"><i>b<b>à</b>ng</i></a></i></span> when I preview the relevant section too. There is no caching involved because I previewed the page before looking at the source code. — Eru·tuon 23:15, 21 November 2017 (UTC)

most lang-?? templates switched to the module

I have switched most {{lang-??}} templates to use Module:Lang. Most were relatively trivial to switch, the remaining templates less so. These remain to be switched, redirected, deleted, or not:

  • {{Lang-grc-gre}} – appears to be a sort of catch-all for 'hard to define' Greek text or for Greek text that doesn't have a specific IANA/ISO 639 language code; internally the template uses grc; the template labels this text 'Greek' but the documentation implies that this template is to be used with Ancient Greek text so perhaps the labeling is incorrect; this is another case where private use tags may be useful: grc-x-gre as the catch-all; grc-x-koine for Koine Greek; grc-x-attic for Attic Greek (or the linguist list code grc-att); etc – 1424 transclusions
  • {{Lang-he-n}} – special version of {{lang-he}} to use {{script/Hebrew}} to render Hebrew text with Niqqud diacritical marks; not sure what to with this one – 3521 transclusions
  • {{Lang-ka}} – has support for automatic transliteration when {{{2}}} is set to tr; an insource search finds 83 instances of the template that use this functionality; not sure what to do with this one – 3819 transclusions
  • {{Lang-khb}} – calls {{script|Talu|{{{1}}}}} which calls {{Script/New Tai Lue}} to wrap {{{1}}} in <span>...</span> tags with several fonts – 1 article transclusion
  • {{Lang-ksw}} – calls {{Script/ksw-Mymr}} to wrap {{{1}}} in <span>...</span> tags with several fonts – 31 transclusions
  • {{Lang-ku-Arab}}{{Script/Arabic}} to wrap {{{1}}} in <span>...</span> tags with several fonts – 11 transclusions
  • {{
    Ligurian language (ancient)
    – there is no {{lang-xlg}}); may require article naming of the creation of suitable redirects to make this template work with Module:lang – 26 transclusions
  • {{Lang-mnc}} – has support for two simultaneous transliteration renderings – 47 transclusions
  • {{Lang-mnw}} – calls {{Script/mnw-Mymr}} to wrap {{{1}}} in <span>...</span> tags with several fonts – 50 transclusions
  • {{
    Lang-mol}} – named using retired code mol (see sil.org
    ); internally uses mo which does not exist in ISO 639-1 – 76 transclusions
  • {{
    North Azerbaijani but uses the code for Coatepec Nahuatl
    – no article transclusions; delete?
  • {{Lang-nod}} – calls {{Script/Tai Tham}} to wrap {{{1}}} in <span>...</span> tags with several fonts – 25 transclusions
  • {{Lang-nsd}} – purportedly to be used for Dutch Low Saxon but uses the code for Southern Nisu – 1 article transclusion
  • {{Lang-os}} – has support for IPA rendering plus transliteration none of which is documented and may only be used in a very few articles – 197 transclusions
  • {{Lang-pra}} – IANA/ISO 639 define code pra as 'Prakrit languages', a collective of individual languages; special handling in Module:lang is required for collections – 2 article transclusions
  • {{Lang-roa}} – IANA/ISO 639 define code roa as 'Romance languages', a collective of individual languages; special handling in Module:lang is required for collections – no article transclusions; delete?
  • {{Lang-rus}} – has support for IPA rendering plus transliteration none of which is documented and may only be used in a very few articles – 2073 transclusions
  • {{Lang-sal}} – IANA/ISO 639 define code sal as 'Salishan languages', a collective of individual languages; special handling in Module:lang is required for collections – 1 article transclusion
  • {{lang-sh2}} – has support for automatic transliteration when {{{2}}}, mechanism is different from that used in {{lang-ka}} – 3 article transclusions
  • {{Lang-shn}} – calls {{Script/shn-Mymr}} to wrap {{{1}}} in <span>...</span> tags with several fonts – 20 transclusions
  • {{Lang-sla}} – IANA/ISO 639 define code sla as 'Slavic languages', a collective of individual languages; special handling in Module:lang is required for collections – 4 article transclusions
  • {{Lang-son}} – IANA/ISO 639 define code son as 'Songhai languages', a collective of individual languages; special handling in Module:lang is required for collections – no article transclusions; delete?
  • {{Lang-su-fonts}} – wraps {{{1}}} in a <span>...</span> tag that applies special fonts and sizing; does not provide labeling in the manner of most other {{lang-??}} templates – 39 transclusions
  • {{Lang-tt}} – provides labeling for simultaneous rendering of Cyrillic, Latin, and Arabic scripts; this functionality apparently never documented – 402 transclusions
  • {{Lang-ug}} – provides for simultaneous rendering of multiple transliterations – 235 transclusions
  • {{Lang-vi-hantu}} – calls {{vi-nom}} which calls {{lang}} with text wrapped in <span>...</span> tags with several fonts – 23 transclusions
  • {{Lang-wen}} – IANA/ISO 639 define code son as 'Sorbian languages', a collective of individual languages; special handling in Module:lang is required for collections – 8 article transclusions

Trappist the monk (talk) 14:04, 9 December 2017 (UTC)

As the purpose of the template {{lang-grc-gre}} is to label Classical Attic, Koine, or Byzantine Greek text as "Greek", I'd suggest using grc-x-greek. None of the other special subtags have been abbreviated to three characters, and grc-x-gre is kind of cryptic. — Eru·tuon 04:33, 4 January 2018 (UTC)
For the cases where a label different from the label provided by the {{lang-grc-x-??}} templates is desired, editors can, after the next update to the live module, use |label=Greek. It isn't clear to me how the reader benefits from that kind of obfuscation.
I don't think that we should specifically support a grc-x-greek code where the defined name associated with that code is 'Greek'. The module uses the defined name for the rendered label (the {{lang-??}} templates) and for categorization (both {{lang}} and the {{lang-??}} templates). Were we to create a separate {{lang-grc-x-greek}} template that directly calls the module, we would be lumping all of these various old Greek languages into the same category used for modern Greek (el) because they share the same display name. Using the {{lang-grc-x-??}} with |label=Greek categorizes properly.
Trappist the monk (talk) 11:50, 4 January 2018 (UTC)
completed
  • {{Lang-de-AT}} – this and similar templates will require special handling either in Module:Lang or by rewriting the templates to use the lang() function of the module instead of the lang_xx() function – 7 transclusions
  • {{Lang-de-CH}} – see Lang-de-AT – no article transclusions; delete?
  • {{Lang-en-AU}} – see Lang-de-AT – no article transclusions; delete? (previous TfD)
  • {{Lang-en-CA}} – see Lang-de-AT – no article transclusions; delete? (previous TfD)
  • {{Lang-en-IE}} – see Lang-de-AT – no article transclusions; delete? (previous TfD)
  • {{Lang-en-NZ}} – see Lang-de-AT – no article transclusions; delete? (previous TfD)

Trappist the monk (talk) 17:39, 9 December 2017 (UTC)

Trappist the monk (talk) 18:27, 9 December 2017 (UTC)

  • {{Lang-en-emodeng}} – similar to Lang-de-AT, IETF language tags like this will require special handling bu Module:lang – 4 transclusions in article space (previous TfD)
  • {{
    Lang-pan
    }}
    – redundant ISO 639-3 version of {{Lang-pa}} – 3 article transclusions; delete? redirect to {{Lang-pa}}?
    I think this can be safely redirected. – Uanfala (talk) 14:18, 9 December 2017 (UTC)
    redirected by Editor Jonesey95.
  • {{
    Module:Zh
    handles the complexities and nuances of Chinese text; nothing to do here

Trappist the monk (talk) 19:17, 9 December 2017 (UTC)

  • {{Lang-lez}} – sort of a version of {{Language with name and transliteration}} without the annotation; could be easily converted to use Module:lang – 32 transclusions
  • {{Lang-phn}} – sort of a version of {{Language with name and transliteration}} without the annotation; could be easily converted to use Module:lang – 14 transclusions
  • {{
    Lang-scr
    }}
    – uses deprecated code scr; the correct code is hbs – 2 article transclusions; redirect to {{lang-hbs}}?
    redirected

Trappist the monk (talk) 20:36, 9 December 2017 (UTC)

  • These two templates not redirect; instead, |script= set to the appropriate value; the names 'Serbian Cyrillic' and 'Serbian Latin' not preserved because that usage is inconsistent with other {{lang-??}} templates for languages that use multiple scripts and because it is easy to distinguish one script from the other.
  • {{Lang-xal-RU}} – see Lang-de-AT – 24 transclusions

Trappist the monk (talk) 15:08, 11 December 2017 (UTC)

  • {{Lang-yuf}} – IANA/ISO 639-3 name is 'Havasupai-Walapai-Yavapai'; this template requires the use of a code in {{{1}}} to choose one for the language label and link; 29 transclusions
    Converted to use the module; created three new templates that use private use codes, one each for the three language names:
    {{lang-yuf-x-hav}}
    {{lang-yuf-x-wal}}
    {{lang-yuf-x-yav}}

Trappist the monk (talk) 23:36, 24 December 2017 (UTC)

  • {{Lang-gem}} – probably an improper use of gem defined by sil.org as a collective with the name 'Germanic languages' but used by this template as an individual language named 'Proto German'; we should not be redefining international standards so if there is not international standard code for Proto German, we should not make one up except to perhaps create a private use variant de-x-proto; any private use IETF tags will require special handling by Module:lang or by rewriting the templates to use the lang() function of the module instead of the lang_xx() function – 5 article transclusions
    Created {{lang-gem-x-proto}} private-use code version; {{Lang-gem}} now redirects to {{lang-gem-x-proto}}.

Trappist the monk (talk) 00:53, 28 December 2017 (UTC)

  • {{
    Lang-gkm
    }}
    – template name uses a code that is not a legitimate IANA / ISO 639 code ostensibly to refer to Medieval Greek (internally the template uses grc, Ancient Greek); the correct solution may be to rename the template to use a private use variant: grc-x-medieval – 23 transclusions
    Created {{lang-grc-x-medieval}} private-use code version; {{Lang-gkm}} now redirects to {{lang-grc-x-medieval}}.

Trappist the monk (talk) 18:16, 3 January 2018

  • {{lang-ber}} – this one expects as {{{2}}} an ISO 15924 script identifier – 244 transclusions
    changed to use the module; the single Latn {{{2}}} script use fixed.

Trappist the monk (talk) 19:52, 10 January 2018 (UTC)

These templates have been nominated for deletion:

Trappist the monk (talk) 11:04, 25 December 2017 (UTC)

And relisted. Comments there appreciated.
Trappist the monk (talk) 10:54, 3 January 2018 (UTC)

These survived TfD; no concensus:

These deleted:

ISO 639-3 now has cnr for Montenegrin so there is a new {{lang-cnr}} template that replaces {{lang-mis-Cyrl}} and {{Lang-mis-Latn}}.

Trappist the monk (talk) 17:21, 15 January 2018 (UTC)

promoting ISO 639-2/3 codes to ISO 639-1

According to the ISO 639-2 custodian, "Multiple codes for the same language are to be considered synonyms." This would explain why the IANA data set has both ISO 639-1 and 639-3 language codes but does not have both -1 and -3 codes for the same language. This issue was brought to my attention because code ltz was causing a mis-categorization to Letzeburgesch when it should have been Luxembourgish.

It is common practice to promote three-character language codes to equivalent two-character codes. We should adhere to this practice. To that end I have created a tool that creates a Lua table from the data in the table at the custodian's website. The result is Module:Lang/ISO 639 synonyms. Module:Lang uses that table to promote ISO 639-3 codes to ISO 639-1 codes. When this happens, a maintenance category is added so that the template call can be tweaked. Category:Lang and lang-xx code promoted to ISO 639-1 is currently only implemented for {{lang}} and cannot be turned off with |nocat=. Without any issues or problems, this functionality will be extended to the {{lang-??}} templates and |nocat= control enabled.

Trappist the monk (talk) 17:54, 13 December 2017 (UTC)

So to fix these codes: I look for a three-letter code in a {{lang}} template within the page in question, then I look in Module:Lang/ISO 639 synonyms to see if there is an equivalent two-letter code. Then I change the three-letter code to the two-letter code. Like this? If that is correct, it would help to have an error message of some sort, perhaps shown in preview mode only, to give the editor a hint about how to fix the error(s). – Jonesey95 (talk) 20:03, 13 December 2017 (UTC)
Hadn't got there yet. Because it isn't really broken, I had thought to do something akin to the maintenance messages emitted by Module:Citation/CS1 but first I wanted to see if this stuff worked properly.
Yeah, for {{lang}} that is pretty much the fix. When {{lang-??}} gets categorization functionality, the usual fix will be a fix to the template itself – though it is possible to set |code= in a {{lang-??}} template to override its normal rendering:
{{lang-en|text|code=rus}}
Russian: text
Russian: <span lang="ru">text</span><span class="lang-comment" style="font-style: normal; display: none; color: #33aa33; margin-left: 0.3em;">code: rus promoted to code: ru </span>
(not sure why one would want to do that – perhaps that is something that should be prevented for {{lang-??}})
Trappist the monk (talk) 20:20, 13 December 2017 (UTC)
The best fix for {{lang-???}} templates may be to redirect them to the appropriate {{lang-??}}. I did a lot of that when cleaning up those template calls in the pre-module days. – Jonesey95 (talk) 20:24, 13 December 2017 (UTC)
Concur.
Trappist the monk (talk) 20:26, 13 December 2017 (UTC)
Hidden messaging added. To see the messages, add this to your preferred css:
.lang-comment {display: inline !important;} /* show lang messages */
Trappist the monk (talk) 23:02, 13 December 2017 (UTC)
Categorization limited to article namespace, |nocat= supported.
Trappist the monk (talk) 00:03, 14 December 2017 (UTC)
Curious about the construction of Module:Lang/ISO 639 synonyms. Is there a reason for doing ["eng"] = {"en"} rather than ["eng"] = "en"? The latter uses less memory. — Eru·tuon 21:42, 13 December 2017 (UTC)
Copy/pasta from another of the tools, otherwise no reason.
Trappist the monk (talk) 23:02, 13 December 2017 (UTC)
fixed.
Trappist the monk (talk) 00:03, 14 December 2017 (UTC)
I'm not quite sure I see the benefit of running this task. On occasions, the 3-letter code is more intuitive than the 2-letter one: if anything we should encourage the use of for example ave for Avestan rather than ae. – Uanfala (talk) 13:15, 16 December 2017 (UTC)
First sentence of this topic says why: According to the ISO 639-2 custodian, "Multiple codes for the same language are to be considered synonyms." (which see). Promotion to ISO 639-1 is the generally accepted convention. If you look in the IANA language-subtag-registry file for subtag ave you will not find it; Wikipedia's {{#language:}} magic word does not understand eng but does understand en (the magic word code does not support either of ave or ae – which is why Module:Lang has its own data modules):
{{#language:eng}} → eng
{{#language:en}} → English
By promoting synonymous ISO 639-2/-3 codes to ISO 639-1, Module:Lang aligns with this convention.
With regard to your revert: the {{
lang-pan}} all produce the same html markup and the latter two would produce the same visible display and links ({{lang-pan}} redirects to {{lang-pa}}). For completeness in my accounting here, {{lang-pun
}} is deprecated, uses an invalid language code in its name, has no article transclusions, so should be deleted.
Most important though, is that w3c specifies the use of language codes from the IANA subtag registry so that browsers and other html readers understand what is meant by the value assigned to the lang= attribute. This is a prime argument for Module:Lang to discontinue support of the two linguist list codes it now supports.
Trappist the monk (talk) 14:35, 16 December 2017 (UTC)
So, if I understand correctly, the practical rationale behind the promotion to ISO 639-1 is that these codes are more likely to be understood by browsers? If this is so then it makes sense. But do we really want to have the maintenance burden of having to clean up every time someone uses an ISO 639-3 code instead of the 639-1 one? Won't it be possible for the template to do these conversions internally? – Uanfala (talk) 15:02, 16 December 2017 (UTC)
The module does do the promotion so that it produces correct html markup:
{{lang|pan|ਮਾਝੀ}}
<span title="Punjabi-language text"><span lang="pa">ਮਾਝੀ</span></span><span class="lang-comment" style="font-style: normal; display: none; color: #33aa33; margin-left: 0.3em;">code: pan promoted to code: pa </span>
ਮਾਝੀ
The maintenance message is only visible to those who turn on the display with the css code above. I have an AWB script that will help to clear the hidden maintenance Category:Lang and lang-xx code promoted to ISO 639-1 (you reverted an edit made by that script). Yesterday there were about 550 pages in that category. Most of what remains is there because I didn't let the script make the edit so that I have the opportunity to fix the italic markup that will cause errors when the italic error checking code for {{lang}} gets reenabled.
Trappist the monk (talk) 15:45, 16 December 2017 (UTC)
I might have said this somewhere in one of these threads, but it bears repeating: not all the three-letter codes are a 1:1 correspondence with two-letter ones. I have no issue with synonymous longer ones being made more concise (though yes, the longer ones are often more intuitive) as long as the longer ones aren't rejected as input, and most especially as long as three-letter codes for dialects, historical stages, etc., are never collapsed to the generic language name.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  23:07, 17 December 2017 (UTC)
There is no 1:1 mapping of all three-character codes to two-character codes. There is a 1:1 mapping of all two-character codes (ISO 639-1) to three-character codes (ISO 639-2/3). Three-character codes that have an associated two-character code are omitted from the IANA language-subtag-registry file so browsers and other html readers are not obligated to know about those synonymous three-character codes. We do not reject three-character codes as input but where there is a two-character synonym, we use the synonym.
The relationship between codes and language names is a frustrating one. ISO 639 establishes the base code-to-name mapping. When a code has more than one possible name, ISO 639 lists them in some sort of an order. IANA, sometimes chooses to use a different order for the same code and names. Sometimes the ISO 639/IANA names are not suitable for direct use as a label by Wikipedia:
ang → Old English (ca. 450-1100)
So, we have a table of alternate names; of alternate spellings; of names we choose because of ISO 639/IANA of list order differences; of codes that improperly redefine the standard's definition:
ISO 639/IANA: mla → Malo
but in Module:Language/data/wp_languages
mla → Medieval Latin (there is no ISO 639/IANA code for Medieval Latin)
The provenance for the codes/names listed in that module is wholly unknown so is suspect. Cleaning that up is just one more task to be done.
Trappist the monk (talk) 11:43, 18 December 2017 (UTC)

using private-use tags

I have written elsewhere in these discussions that we should not be making up our own primary language tags; should not be redefining tags that have already been defined by international standards. Instead we should be operating within the permitted uses of the standard.

BCP47 (IETF language tags) provides for private use tags. I have tweaked Module:Lang/sandbox
to accept private use IETF language tags in the form:

ll-x-private

where:

ll is the standard ISO 639-1, -2, -3 language code
x is the BCP47-required singleton that marks the beginning of a private use tag
private is the private use tag; one to eight alphanumeric characters

I have created three of these tags for yuf:

yuf-x-hav
{{lang-yuf/sandbox|sw=ha|Havasuuw}}
Havasupai
: Havasuuw[[Havasupai language|Havasupai]]: <i lang="yuf">Havasuuw</i>
yuf-x-wal
{{lang-yuf/sandbox|sw=hu|Hàkđugwi:v}}
Walapai
: Hàkđugwi:v[[Walapai language|Walapai]]: <i lang="yuf">Hàkđugwi:v</i>
yuf-x-yav
{{lang-yuf/sandbox|sw=ya|Wi:kaʼi:la}}
Yavapai: Wi:kaʼi:la[[Yavapai language|Yavapai]]: <i lang="yuf">Wi:kaʼi:la</i>

I use Walapai instead of Hualapai for standardization and because it matches the existing category. The label will link Walapai to Havasupai–Hualapai language because there is an existing redirect. Categorization isn't quite noodled out yet. Simplest and best, I think, it to create three individual categories for the three languages and make them subcategories of Category:Articles containing Havasupai-Walapai-Yavapai-language text.

This sandbox template needs to be implemented as {{lang-yuf}}, {{lang-yuf-x-hav}}, {{lang-yuf-x-wal}}, {{lang-yuf-x-yav}} to be compliant with the other {{lang-??}} templates.

Trappist the monk (talk) 10:50, 23 December 2017 (UTC)

collective language codes

See this faq @ LOC for collective-language code description.

In general, I think, {{lang}} and {{lang-??}} templates should not use collective-language codes. Such use should be discouraged because these codes don't properly identify the language of the text held by the template:

{{lang|roa|<some text>}}

According to MARC Code List for Languages, code roa includes these languages:

Anglo-Norman (xno)
Cajun French (frc)
Franco-Provençal (frp – Arpitan or Francoprovençal in the current IANA list)
Franco-Venetian (not in IANA list – possibly vec Venetian)
Italian, Old (to 1300) (not in IANA list)
Ladin (lld)
Portuñol (not in IANA list)
Spanish, Old (to 1500) (not in IANA list by that name – possibly osp Old Spanish)

To which of them does the example template refer?

I am not suggesting that such codes should never be used, but they should be used with care.

There are about 110 collective codes listed in the IANA language-subtag-registry file (of which only a handful are in current use at en.wiki) where the language name ends with the word 'languages' (plural). This, according to the LOC faq, is how ISO 639-2 distinguishes individual and macro-language names from collective-language names.

The {{lang}} and the {{lang-??}} templates use language names obtained from the data set for categorization and for language labels. For the occasions when collective-language codes are used, I propose that Module:lang shall:

  1. use the proper collective language name for all {{lang-??}} template labels
    {{lang-roa|<some text>}}Romance languages: some text
  2. standardize category naming for these language codes:
    Category:Articles with text from the Romance languages collective

Trappist the monk (talk) 14:41, 1 January 2018 (UTC)

I have seen instances of these codes used when the derivation of a word is unclear, but where it does appear to be traceable to a root word in a collective set of languages. I agree that there should be a recommendation to use them only in that situation or similar situations. I support the proposal to match the language codes with the "collective" name; if editors want a more specific label, they can use a more specific language code.
All of that said, I expect that this change will have some unexpected side effects, and we should be open to refining it as we go. – Jonesey95 (talk) 15:11, 1 January 2018 (UTC)
I have tweaked the sandbox to use the category naming convention described above. In mainspace, this:
{{lang/sandbox|aav|text}}
renders this:
<i><span lang="aav">text</span></i>[[Category:Articles with text from the Austro-Asiatic languages collective]]
Module:Language/data/wp_languages redefines these collective codes:
bh → 'Bihari' – Bihari languages [category]; only two-character collective code
ber → 'Berber' – Berber languages [category]
cel → 'Proto-Celtic' – Celtic languages [category]; {{lang-cel}} now redirects to {{lang-cel-x-proto}}
gem → 'Proto-Germanic' – Germanic languages [category]; {{lang-gem}} now redirects to {{lang-gem-x-proto}}
myn → 'Mayan' – Mayan languages [category]
nah → 'Nahuatl' – Nahuatl languages [category]
pra → 'Prakrit' – Prakrit languages [category]
roa → 'Jèrriais' – overridden in Module:Lang/data to 'Romance'
sal → 'Salish' – Salishan languages [category]
sla → 'Slavic' – Slavic languages [category]
son → 'Songhay' – Songhai languages [category]
wen → 'Sorbian' – Sorbian languages [category]
Module:Lang/data redefines these collective codes
bat → 'Baltic' – Baltic languages [category]
nrf → 'Norman' [category] – not defined as a collective but has the appearance of a collective – IANA names: Jèrriais, Guernésiais; proper handling of this may require nrf-x-jer and nrf-x-gue private-use codes
roa → 'Romance' – Romance languages [category]; overridden in Module:Lang/data to 'Romance'
sem → 'other Semitic' – Semitic languages [category]
So, with the exception of nrf, all that should be required to implement the collective naming convention is to move the categories associated with these code to the appropriate names and tweak the data set to correctly support them.
When the '<something> languages' name is undesirable in article text, |label= can be used to locally override the template-provided label (category name will remain the same).
Trappist the monk (talk) 13:21, 7 January 2018 (UTC)

latn script inside <poem>...</poem> tags

Because of this conversation, I noticed that {{lang}} was not italicizing Latn-script text inside of <poem>...</poem> tags. All of the text inside the {{lang}} template at Erde, singe §Text under the German current lyrics heading is written using characters belonging to the Unicode Latin character set so should have been rendered in italics.

It turns out that <poem>...</poem> tags insert poem strip markers that look like this:

?'"`UNIQ--++++-67--QINU`"'?

The '?' characters in the strip marker are used here as visual replacements of the invisible delete character (U+007F). I do not fully understand how <poem>...</poem> tag processing works but when it comes time for {{lang}} to do its work, the text has these strip markers and it has the original newline characters (U+000A, LF, '\n').

I have tweaked the sandbox to account for the delete and newline characters:

Erde, singe,
dass es klinge,
laut und stark dein Jubellied!
Himmel alle,
singt zum Schalle
dieses Liedes jauchzend mit!
Singt ein Loblied eurem Meister!
Preist ihn laut, ihr Himmelsgeister!
Was er schuf, was er gebaut,
preis ihn laut!

Trappist the monk (talk) 13:30, 5 January 2018 (UTC)