Wikipedia talk:WikiProject Languages/Archive 8

This is an archive of past discussions on Wikipedia:WikiProject Languages. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 5

Archive 6

Archive 7

→

Articles needing help / input request

Merger Request: Need further input for the request received at W:PM: "Saraiki dialect is redundant with the Riasti dialect, Shah puri dialect, Multani dialect, Multani language, Thalochi dialect, Thalochi, Derawali dialect articles. I suggest merging these articles, as all these are same. And also be Redirected to Saraiki language. Also Jhangvi dialect is dialect of Saraiki." DISCUSSION HERE. Can your project help? GenQuest ^{"Talk to Me"} 17:47, 14 April 2013 (UTC)

Northern cities vowel shift map

A user keeps removing the map saying it's inaccurate. Can anyone who knows more about the topic than me see if there's any merit to it? — Lfdder (talk) 14:46, 3 July 2013 (UTC)

"Ћ"

Languages of Palestine, Languages of the State of Palestine or Languages of the Palestinian territories?

I noticed an editor (Greyshark09) just moved all the pages from Category:Languages of Palestine to Category:Languages of the State of Palestine, which I guess kind of makes sense 'cos there aren't modern-day borders for 'Palestine', but looking at {{Languages of Asia (category)}}, the category there is Category:Languages of the Palestinian territories. 'Palestinian territories' seems to be what we use for categories like Economy of the Palestinian territories and the Education in the Palestinian territories. Category:State of Palestine hasn't got any 'State of Palestine' child cats. So, uhh, yeah, what should this cat be called? — Lfdder (talk) 21:16, 15 July 2013 (UTC)

"Languages of Palestine" is ambiguous (or agnostic) about whether Palestine is a state, an occupied territory, a conventional label for a region, or something else. I see that as an advantage, though I can understand why those favoring "State" or "territories" might see it as a disadvantage. Still, in my personal opinion this ambiguity is good, in that it does not unduly favor opponents or supporters of the State of Palestine, which I see as being in the spirit of WP:NPOV. Cnilep (talk) 00:50, 16 July 2013 (UTC)

I agree with Cnilep. Categories are purely for logical organization, not for "labeling" or any other purpose. Also, Wikipedia:Categorization says: "Categorizations should generally be uncontroversial". With these two points in mind, Category:Languages of Palestine seems the most appropriate, NPOV, category to use. The move to "..of the State of Palestine" feels like it was politically motivated, and in fact, regardless of the motivations of the editor who did the moving, has political implications.--William Thweatt ^Talk^Contribs 01:35, 16 July 2013 (UTC)

Or we could go for "Languages of Danzig"? :-)

Seriously, though. Just "Palestine" is clearly the best choice. The political attributes don't seem at all relevant for this kind of category. Using just the name of the region would also allow for some leeway since it would refer to a geographic area that is flexible, and langauges don't really follow borders that closely anyway. Less precise geographic limits seems beneficial in this case.

Peter ^Isotalo 08:35, 16 July 2013 (UTC)

List of most widely spoken languages (by number of countries)

I've added a proposal for a source and inclusion criteria for that article at talk:List of most widely spoken languages (by number of countries), and I would welcome a discussion there. Sjö (talk) 12:29, 18 July 2013 (UTC)

Nomination for deletion of Template:WikiIPA

Template:WikiIPA has been nominated for deletion. You are invited to comment on the discussion at the template's entry on the Templates for discussion page. Lfdder (talk) 14:58, 27 July 2013 (UTC)

Article alerts

I've subscribed us to article alerts found here and transcluded on the main project page. — Lfdder (talk) 11:50, 13 August 2013 (UTC)

Arabic help

Is there anyone here who can help me with some Arabic-language stuff? I've tried the reference desk but got nothing. –Roscelese (talk ⋅ contribs) 21:40, 13 August 2013 (UTC)

Notability of a new language

I know nothing about constructed languages, but do the editors here find that Angos (Constructed Language) meets typical notability criteria? Thanks all, Arbitrarily0 ^(talk) 11:20, 2 August 2013 (UTC)

It doesn't look like it. WP:GNG requires multiple sources independent of the author, and I only see one (the article by Libert). If sources are added to show Angos being mentioned in two or three other sources whose content Benjamin Wood has no control over, then maybe it will be notable enough. At any rate it should be moved to Angos (constructed language) in accordance with our title conventions. Aɴɢʀ (talk) 11:50, 2 August 2013 (UTC)

Conlang notability has always been a difficult issue, but I agree that one independent publication is not enough. Exactly the same thing goes for Kotava: both articles give only one external source but don't actually quote from it, both rely almost exclusively on primary sources, both are stuffed with OR, and both are obviously written by (someone close to) the authors of these languages. In both cases other independent sources don't seem to exist. On the other hand, both articles are reasonably well-written and I see little reason to doubt their authors' honesty. In any case, I have exposed my views about this on both talk pages, but as far as I can see, none of these issues have really been addressed. In my opinion it's really too early for these articles and they should be submitted for deletion. Wikipedia is there to describe things that are notable, not to make things notable. —IJzeren Jan _Uszkiełtu? 20:26, 2 August 2013 (UTC)

Thank you all for your input. For those interested, the page has been nominated for deletion at Wikipedia:Articles for deletion/Angos (constructed language). Arbitrarily0 ^(talk) 04:38, 6 August 2013 (UTC)

For that matter, I have listed Wikipedia:Articles for deletion/Kotava (3rd nomination) as well. —IJzeren Jan _Uszkiełtu? 21:23, 15 August 2013 (UTC)

Problematic use of Ethnologue

Unfortunately, several articles about languages continue to use Ethnologue. This is problematic as Ethnologue is not a reliable source and does not conduct any research. Ethnologue is very much like Wikipedia in the sense that it uses sources and report their findings, which would be fine if it would be done consistently. Unfortunately, it's not. For many European languages, the data have not been updated since the 1970s. In far too many cases, the people at Ethnologue has misrepresented their sources and come up with bizarre "facts" that certainly amuse anyone familiar with linguistics or sociolinguistics. Ethnologue is a Christian missionary organization and not should never take precedence over proper research done by experts. I can think of no article where Ethnologue add facts not found in other articles, while I can easily find articles where Ethnologue is used to state outright absurd claims that run contrary to all scholarship and research. In short, I do not think Ethnologue satisfy WP:RS and I don't see a reason why it should be used in any article.Jeppiz (talk) 22:41, 26 July 2013 (UTC)

I can understand some of the misgivings you have with Ethnologue - although it is near and dear to my heart, I have to admit that the information in there is sometimes dated or misleading. So I would suggest that whenever there is a better WP:RS on a particular language, everyone should feel free to use it instead. Trouble is that for many languages, Ethnologue is all there is, and the info that is stated in there at least goes back to some kind of research or verifyable information, even if it is old. This is still better than having users with a nationalist bias come up with their own inflated speaker numbers (this happens regularly for many languages). In those cases it is actually quite helpful to reign them in by pointing to the Ethnologue and their responsibility to cite a reliable source. If something that is cited in Ethnologue only serves to "amuse anyone familiar with linguistics or sociolinguistics", this can only be because there is some published reliable information out there which states the real truth. This should always take precendence. But often those who claim to be amused fail to provide any better sources, and then one wonders on what knowledge they base their amusement. Your statement that "Ethnologue is a Christian missionary organization and not should never take precedence over proper research done by experts" sounds like you want to say that a Christian missionary organization is not capable of conducting proper research, or that such an organization cannot have any experts working in it. I'm sure that you don't mean to say that. Landroving Linguist (talk) 23:23, 26 July 2013 (UTC)

You may be interested to read Wikipedia talk:WikiProject Languages/Archive 4#Accuracy of Ethnologue for Indo-Aryan, Wikipedia talk:WikiProject Languages/Archive 5#Political languages, and Wikipedia talk:WikiProject Languages/Archive 6#Creating language stubs for past discussions of the reliability of Ethnologue (maybe with Wikipedia talk:WikiProject Linguistics/Archive 5#Some doubt as to the nature of the Gorani language thrown in for good measure). While Ethnologue is frequently unreliable on specifics, its publisher SIL International is the registration authority for the ISO 639-3 standard, its religious mission notwithstanding. It seems obvious to me that where better sources are available these should be used, but equally obvious that Ethnologue should not be rejected out of hand for all uses. Cnilep (talk) 23:41, 26 July 2013 (UTC)

Over the past couple days I've replaced several population figures with the ones from E17 because they weren't ref'd. I've even replaced some that were ref'd, because the ref appeared to be political. I've also pointed out several hundred errors to SIL (many of which were corrected in E17), so I'm sympathetic to your concerns. But I think we could do worse than take Ethn. as our default until someone takes the time to find something more reliable, especially in the thousands of articles on minor languages that no-one but me or a couple other overworked editors has taken any time with at all. The problem is not the religious nature of SIL, but the fact that they are a relatively small organization without the resources to keep every language up to date – and Europe is not their area of interest. One thing I have seen in Europe and a few other places is people tracking down Ethnologue's sources and citing them directly. This can be useful, as often Ethn. leaves out important details, such as giving a figure of 450,000 (implying 445,000–455,000 or at least 400,000–500,000) when their source has 300,000–600,000. — kwami (talk) 00:25, 27 July 2013 (UTC)

Thanks to all of you for the good comments. I should start by clarifying myself, as I see that I expressed myself badly. The Christian nature of Ethnologue is no problem at all, and I did not intend to bring it that factor as a negative aspect - only to point out that it's not a linguistic research center (as you all know). My apologies if my poorly expressed statement mistakenly implied that the Christian nature of Ethnologue would be a problem per se. As for the rest, I very much agree with you all, using Ethnologue is of course better than using no source at all. My own area of expertise is European languages, and I frequently find Ethnologue way off the mark, which is understandable as Europe is not their number one priority. So I agree that Ethnologue could (and probably should) be used when no other WP:RS is available. For languages in Europe, I don't think that that is ever the case, so I would be reluctant to use it for any European country or language spoken in Europe, given the number of errors (the classification of Corsican, the number of French speakers in France, Welsh data from 1971 etc.).Jeppiz (talk) 09:19, 27 July 2013 (UTC)

Another reason for using Ethnologue is as a check on population inflation. If we allowed everyone to cherry-pick their sources, soon we'd be claiming there are 20 billion people in the world. This seems to be especially a problem in India, but can crop up anywhere. — kwami (talk) 17:18, 2 August 2013 (UTC)

I've added a section on the more common problems I've noticed with Ethnologue data. — kwami (talk) 21:50, 17 August 2013 (UTC)

Thanks. — Lfdder (talk) 11:04, 18 August 2013 (UTC)

Massed Tfd

A lot of language-related navboxes have been nominated for deletion at Wikipedia:Templates for discussion/Log/2013 August 19. De728631 (talk) 14:53, 19 August 2013 (UTC)

Template categories

Updated to Ethnologue 17

Articles with Ethnologue as a reference for speaker data have now been updated to E17. A few retain E15 or E16 in the ref field, either to show that an undated figure is old, because E17 does not support the language, or because of errors in the E17 entry. (These are tracked at Category:Language infobox tracking categories.) Except for a few odd cases where the reference field is not applicable, all infoboxes have something in the ref field, even if it's only 'citation needed'. All also have s.t. in the speaker field, though there's a question what we should do with Standard Chinese: should it be 'none' (like Standard Arabic), or left as 'data unavailable'? (Or we could just blank it, and that article would be the sole one in its tracking category.)

Currently rounding off populations to 2 sig figs. It had been 3, mostly, but that's far more precise than our sources warrant. We're actually lucky if we get even a single significant figure for most languages, so even with reducing it to 2 we're being spuriously precise. I've noticed in updating from E16 to E17 that increases or reductions of 4× or even 10× are not uncommon, that a language may go from an alleged 5,000 speakers or more to extinct. I had been doing this by hand, rounding to evens, but now am using a template so that the original figures remain in the coding, and the template always rounds up. The one irregularity is that I'm counting 10 as a single digit, as I was taught in school, so you'll see 103,000 or 10.3 million. This helps even out the rounding %age across digits, and also distinguishes that 10.3M from a 10M meaning somewhere between 5 and 15M. — kwami (talk) 05:02, 23 August 2013 (UTC)

Project affiliation of articles on phonology

Assigning articles to a project may be tedious in some cases. One instance are articles on Phonology which both relates to phonetics and general linguistics. I have been consistent in assigning those articles to WikiProject Linguistics|phonetics=yes, deleting WPLANG tags when present. The idea behind his was that iff we need two different WP projects on languages (which I would not necessarily support), there should be some kind of division of work. Very recently, User:Lfdder implemented his own view of the matter by deleting all the phonetics tags and replacing them by WPLANG tags instead. Given that this seems to remove part of Phonetics task force’s raison d’etre, I feel inclined to do a full scale revert, but maybe we should reach consent first. So, do we want phonology articles to be in the scope of WP languages, WP linguistics/Phonetics, or in the scope of both? (I am about to create a link from the Phonetics project to our talk page, so please discuss here!) G Purevdorj (talk) 02:17, 18 August 2013 (UTC)

Looking at the description of the project WP linguistics/Phonetics, the phonology articles definitely belong there. Putting them in WP languages as well seems useful, but not really necessary. −Woodstone (talk) 06:13, 18 August 2013 (UTC)
Some were WPLANG tagged, some were WPLING tagged, some were tagged with both, and some were not tagged at all. It seemed reasonable to me to move them all to the same project, so I went for it. A "full scale revert" would accomplish nothing other than reinstate the mess that was. If there's consensus to move them to WPLING/phonetics, I will go thru each one and change the tag. Part of the reason I picked WPLANG is that G Purevdorj had assessed some for this project but not LING. — Lfdder (talk) 08:02, 18 August 2013 (UTC)
I'd go for WP:LING. In my understanding, phonology and phonetics are subdisciplines of linguistics, and articles about linguistics are a subcategory of articles about language. However, I don't see any particular problem with tagging phonology/phonetics articles with both project tags. — Mr. Stradivarius on tour ^{♪ talk ♪} 09:30, 18 August 2013 (UTC)
While other projects complement each other, I’d suppose (but I might be wrong) that the perspectives of both WPLANG and the phoentics task force on a phonology article should be about the same. This is different e.g. with an article in the scope of a country project and WPLANG which represent different forms of expertise. Then, in order not to reduplicate the rating task, assigning them to a single project consistently seems to make most sense to me. Here, in my view, phonetics seems to take precedence over WPLANG. G Purevdorj (talk) 11:40, 18 August 2013 (UTC)

Input from other editors would be appreciated. Currently, there's no clear majority. G Purevdorj (talk) 11:42, 21 August 2013 (UTC)

I'd say articles on general phonology (i.e. not limited to a certain language) should be in WP Linguistics/Phonetics only, while the articles on the phonologies of specific languages should be in both WP Linguistics/Phonetics and WP Languages. Irish phonology, for example, is both an article about phonology/phonetics and an article about (an aspect of) a language. Syllable, on the other hand, is only an article about phonology/phonetics as it isn't about any particular language. Aɴɢʀ (talk) 16:39, 21 August 2013 (UTC)

This seems to be all the feedback we're getting. Should we add them to both projects then? — Lfdder (talk) 15:04, 26 August 2013 (UTC)

Yes, I guess that's the result of the discussion: both projects. G Purevdorj (talk) 00:18, 27 August 2013 (UTC)

recent moves

Can we get someone with knowledge of languages to look at all these moves. Just odd stable articles all being redirected. -- Moxy (talk) 23:49, 27 August 2013 (UTC)

Most of them are simple corrections that should be uncontroversial.User:Maunus ·ʍaunus·snunɐw· 00:12, 28 August 2013 (UTC)

Many of them aren't even that, just redirects for Voegelin (1977) names. (1600 red links to go, down from 1850 last week.) — kwami (talk) 00:49, 28 August 2013 (UTC)

Missing topics page

I have updated Missing topics about Languages - Skysmith (talk) 10:44, 29 August 2013 (UTC)

Creating redirects and fixing typos for some sections. — kwami (talk) 15:35, 29 August 2013 (UTC)

Na-Dene, Dene-Yeneseian and familycolor

Right now, Dene-Yeniseian has familycolor=Na-Dene in its infobox, which looks a bit silly. Can someone edit the (protected) colour table and add in Dene-Yeniseain, marked as uncertain and with the same colour as Na-Dene? KleptomaniacViolet (talk) 17:33, 7 September 2013 (UTC)

We don't need to change the template source code. In the article you can simply edit the infobox parameter to |familycolor = unclassified. That will display a white table header. De728631 (talk) 17:43, 7 September 2013 (UTC)

Whoops, my mistake. It turns out there is a Dene-Yeniseian familycolor already in there. I assumed that there wasn't because it wasn't explicitly on the quilt. KleptomaniacViolet (talk) 18:03, 7 September 2013 (UTC)

Languages of Oregon template

The new Languages of Oregon template could use proofreading and improvement. Anyone here familiar with these language families? Djembayz (talk) 01:19, 8 September 2013 (UTC)

Proposed Template:ISO 639 name change

See here. — Lfdder (talk) 18:38, 8 September 2013 (UTC)

Hungarian language

Edit warrior re. population and classification. Several editors rejected his earlier non-linguistic population sources, so now he's fighting for a cherry-picked linguistic source. We settled some time ago on omitting "Finno-Ugric" from all Uralic info-box classifications, but he's fighting to restore it to this one article. — kwami (talk) 02:30, 10 September 2013 (UTC)

Thank you for calling me "edit warrior" (despite you were edit warring as mush as I did) and thanks for presenting the issue in a "neutral" way. On the other hand, I am happy that you have raised the issue here, since I am very interested in unbiased, constructive comments. Please, take a look at the two threads below: "native speakers" and "Finno-Ugric group". You may also want to check these comments. Thanks and cheers, KœrteFa {ταλκ} 13:19, 10 September 2013 (UTC)

Lack of pages on Peruvian amazonian languages

The page for Orejón needs to be expanded because that is not the autonym of the language and there is no information on the page. — Preceding unsigned comment added by Efgoodrich (talk • contribs) 16:57, 12 September 2013 (UTC)

Template:Lang-ur

Can an admin who frequents this place fix this? Thanks — Lfdder (talk) 16:46, 12 September 2013 (UTC)

Done Aɴɢʀ (talk) 18:32, 12 September 2013 (UTC)

thanks. {{lang-ar}} too. — Lfdder (talk) 20:03, 12 September 2013 (UTC)

Child wikiprojects

Should we merge all 5 into this one? They're by all appearances dead. I tagged three a couple months ago as inactive; no one's untagged them. Anyway, merging them will mean less WikiProject banner clutter on talk pages. — Lfdder (talk) 01:59, 14 September 2013 (UTC)

It is not clear what projects you are talking about. G Purevdorj (talk) 09:35, 14 September 2013 (UTC)

Wikipedia:WikiProject_Languages#Related_WikiProjects — Lfdder (talk) 09:53, 14 September 2013 (UTC)

WP languages can and should not cover conlangs. Unless the rating process is to become completely unmanagable, the conlang project has to be kept apart! As for the other child projects, I wouldn't object a merger if they are inactive. G Purevdorj (talk) 16:52, 14 September 2013 (UTC)

I agree with G Purevdorj (though I'm not a particularly active project member. Mostly just a curious observer). There was a proliferation of wikiprojects over the last few years, many of which never saw much activity after the first few weeks. (A problem shared with Portals, but that's a different can o' worms). Merging inactive projects into active ones, is often a wise course of action. Generally the less pages there are to watchlist, and to keep updated, the better. –Quiddity (talk) 18:50, 14 September 2013 (UTC)

I dont see why we couldn't cover conlangs. We don't need to assign iso-codes and language family color to all languages, and the article quality ratings are the same for all languages. I would be for the merger.User:Maunus ·ʍaunus·snunɐw· 19:08, 14 September 2013 (UTC)

I don't understand what the issue with conlangs is either. — Lfdder (talk) 19:20, 14 September 2013 (UTC)

I would keep any of them if they were active , but the only member post at conlangs in the past year was IJzeren Jan saying there isn't much left for them to do. Still , they attract a different crowd . I'd post on their page and notify people like IJzeren Jan to see if anyone is maintaining it. — kwami (talk) 22:33, 14 September 2013 (UTC)

How are closures done ? Would moving them to "task forces" as at wp:ling be a way to preserve their history ? Or do we just grey them out ? — kwami (talk) 00:32, 15 September 2013 (UTC)

We could just switch talk page banners and set up redirects. I don't think there's any need to delete the pages. — Lfdder (talk) 00:41, 15 September 2013 (UTC)

@Lfdder: We almost never "delete" pages as part of merges. Always redirect, and preserve the old content in the history, in case it is ever needed, or in case it is revived. –Quiddity (talk) 17:20, 15 September 2013 (UTC)

Which is what I said. — Lfdder (talk) 19:54, 15 September 2013 (UTC)

There is a certain set of information that could be obtained for any natural language, but the things that could actually be said about a given conlang are far less clear. What can be known about a given conling in advance without some actual familiarity with it? This would presumably cause problems for rating. Secondly, there’s enough work to do without bothering about typologically uninteresting and socially mostly irrelevant conlangs. (Socially irrelevant may not hold for Esperanto, but for at least almost all the others.) G Purevdorj (talk) 01:51, 15 September 2013 (UTC)

There are a lot of natural languages for which hardly any information can or ever will be found. Its not that different really. And noone will be required to waste their time on languages that don't interest them.User:Maunus ·ʍaunus·snunɐw· 02:20, 15 September 2013 (UTC)

Well, it looks like at least one person cares about WP:ENLANG: [1]. (Personal attack removed) — Lfdder (talk) 13:33, 16 September 2013 (UTC)

Though they've said they "don't even edit any more", so I don't know why they bothered. [2] (Personal attack removed). — Lfdder (talk) 13:36, 16 September 2013 (UTC)

Hello. I used to be Ling.Nut .. and frankly, I just flatly do not understand. I have just never understood people who confuse a smackdown on a Wikiproject with making a meaningful contribution to the encyclopedia. You aren't adding anything; you're subtracting. Subtracting spurious articles is worthwhile. Subtracting quiet Wikiprojects is self-gratification that doesn't help the encyclopedia. I've added HUGE amounts of content to this encyclopedia... I have 5 FAs and many very valuable lists (just go look at all the endangered languages lists... see who made them... or who radically revised them...) , lotsa... well.. just lots and lotsa stuf. So I know what "content" is. And I know what "productive contribution" is NOT. This is not. It helps no one. it helps nothing. It just gives people some false perception that they are being productive when they are NOT. So... WHY? I saw the lame "watch list" justification, but if no one is editing, it ain't firing up your watch list, right? So, why? No reason, but.... false feeling of productivity. Illusion of contribution (I am not saying you haven't contributed; I am saying this is not a contribution). And as for the project being quiet... people are still doing things, right? So its purpose has not faded. And so on. • Serviceable†Villain 09:54, 17 September 2013 (UTC)
- Oh, so you just demolish things you don't understand then? Oh, and the FA process isn't for self-gratification? There's people who like to pick up useless shit that's laying about and throw it in the bin. And it's not even that we're throwing them away, really; we're setting up a redirect. Is it better for someone to post to a project that's obviously inactive to not get a response, or is it better for them to be directed here? False perception yadda yadda. Fucking muppet. — Lfdder (talk) 10:30, 17 September 2013 (UTC)

Lfdder, what the hell is wrong with you? — kwami (talk) 12:47, 17 September 2013 (UTC)

- First of all, I understand being templated; I was around before "templated" was a dirty word. Second of all, you're screaming NPA at me (if I understood the edit summary correctly), and what part of "fucking muppet" isn't a personal attack? I just now put something in over at Wikipedia:Administrators' noticeboard/Incidents. You can go argue with them. As I said there, I am too old and too tired. • Serviceable†Villain 12:22, 17 September 2013 (UTC)
  - If you're tired, you should try not to tire other people who're tired too. — Lfdder (talk) 12:48, 17 September 2013 (UTC)

Lfdder talked about redirecting . I don't know why we'd do that . Just mark the main page as inactive , and maybe archive and redirect the talk page . — kwami (talk) 12:58, 17 September 2013 (UTC)

Cross-checking language classification in infoboxes

See: wikidata:User:RoboViolet/Language families. This is almost entirely from infobox data. A big chunk of the issues on display are caused by my script not being smart enough and being excessively paranoid about things it doesn't understand, but there are also some real things to think about (e.g., we don't consistently deal with the major sub-groupings of Western Romance). I think it's pretty nifty, if I say so myself! If you change an article (or tag a redirect) and want to see it reflected in the next iteration of output, leave a note on my talkpage and I'll add it to the list of pages updated since the database dump. KleptomaniacViolet (talk) 08:44, 15 September 2013 (UTC)

I don't understand. What does it show? — kwami (talk) 09:54, 15 September 2013 (UTC)

Each language infobox is processed and turned into a set of parent-child relations (e.g., Slavic languages -> { Indo-European has child Balto-Slavic, Balto-Slavic has child Slavic languages, Slavic languages has child East Slavic languages, Slavic languages has child West Slavic languages, Slavic languages has child South Slavic languages }). Then all these relations are lumped together and the script tries to build consistent family trees out of them (paying special attention to the immediate parents and children claimed in infoboxes). These generated trees are grouped together by macrofamily on the subpages linked by the headings; these families are hand-picked to try and keep any one page from being too huge, but don't have meaning beyond that. For each tree/subtree, it'll show the consistent parent/child relationships in a table, and also list possible children and possible parents that it couldn't include in the tree because they would make it not a tree or there's a missing immediate parent/child link. KleptomaniacViolet (talk) 10:24, 15 September 2013 (UTC)

We sometimes skip intermediate nodes when the tree starts getting long , especially if they're not well supported . — kwami (talk) 13:11, 15 September 2013 (UTC)

Just curious, is there a reason why the two threads (below) aren't archived? When I visit a page, I go to the bottom of the Talk Page to see the most recent comments. When I saw the April 2013 date, I assumed this was not an active WikiProject. If these contain worthwhile information, they should be posted at the top of the discussion page. Liz ^{Read! Talk!} 21:45, 17 September 2013 (UTC)

Yeah , they were on the main page , and weren't very helpful there , so I moved them . I'll put them at the top. (No, one's reactivated , so moving it back down .) — kwami (talk) 22:56, 17 September 2013 (UTC)

Emilian

Slow edit war at Emilian language. According to E17 it's extinct, with speakers having shifted to Italian. People keep adding a population of 2M, which refs that either refer to Emiliano–Romagnolo or to the ethnicity, but not to actual Emilian speakers. — kwami (talk) 02:07, 9 October 2013 (UTC)

Kelao and YSL

any idea what "Kelao" is in Hmongic languages#Matisoff (2006)? a Hmongic influence on Gelao? — kwami (talk) 17:35, 18 September 2013 (UTC)

Got it. — kwami (talk) 02:11, 9 October 2013 (UTC)

asked for feedback from wp:jew on Yiddish Sign Language . — kwami (talk) 19:20, 19 September 2013 (UTC)

Deleted claims that this exists. It apparently doesn't exist in the lit. — kwami (talk) 02:13, 9 October 2013 (UTC)

Automating the infobox

Template:Automatic language box

I really thought it was amazing how the Template:Automatic taxobox can take just a genus/species and give all of the other taxonomy. We could create something similar that shows the precursor languages of any language inserted. I would definitely be willing to do the coding. —Preceding signed comment added by Nicky Nouse (talk • contribs • wikia) 01:21, 24 January 2011 (UTC)

I've always thought this would be a good idea. It would make maintenance and updating classifications much easier. But there have been some concerns with the server load of the bio template. I don't know the details, but if we do do this, IMO it should be a community project. — kwami (talk) 05:11, 7 October 2011 (UTC)

[I've since tried following up with Nicky Nouse, but got no response. — kwami (talk) 03:21, 6 November 2012 (UTC)]

Anything new regarding this idea? -PC-XT+ 08:05, 18 September 2013 (UTC)

Nope, never heard back. — kwami (talk) 17:39, 18 September 2013 (UTC)

I'll look into this. — Lfdder (talk) 14:35, 19 September 2013 (UTC)

I have an interest in this as well, but I'd think that we should look to integrate this with Wikidata and pull data from there (once it's been imported). KleptomaniacViolet (talk) 15:49, 19 September 2013 (UTC)

Guess things get more attention here than on the main page !

There are some irregularities in our boxes that I'd like to see replicated :

Uncertain nodes may appear when immediately ascending , but not when further up.

Long trees may be abbreviated : see Malinke language and the nodes above Polynesian at Hawaiian language .

For language X in node X , the latter displays as "X languages ", but for other languages as just "X".

Geographic groups such as Bornean languages appear in parentheses , as at Kayan–Murik languages.

— kwami (talk) 18:10, 19 September 2013 (UTC)

RE hiding nodes: on the display side I think the nicest solution is to show an abbreviated tree by default and a 'Show full tree' button below nearby that'll show a full one. (I don't think that swapping the trees in-place is an option without some wizardry, and it would probably be fragile and non-accessible.) This could be done today with a little bit of Lua hackery.

On the decision side: abbreviating distant questionable nodes should be easy enough, so long as the data source knows what's credible and what isn't. Cutting down long trees is a bit harder: our articles on Mande languages just cut out the middle of the tree, and while that's easy to code it's also very blunt. Some scheme of assigning weightiness to nodes and then dropping lightweight distant ones might do it?

The other two concerns should be sort-out-able too, if attention is paid to them. I don't think the coding here is fundamentally hard; the issue is we don't have a cache of machine-readable typological data lying around to generate infoboxes from. The creation of such a thing is what I'm slowly working towards with my infobox cross-referencing thingy, but there's a lot of obvious-to-humans contextual stuff that the computer doesn't stand a chance with. KleptomaniacViolet (talk) 18:44, 19 September 2013 (UTC)

We could by default display everything , but have check boxes in the database for uncertain , hypothetical , geographic , optional .

Also, there's uncertain classification , where one or more ascending nodes need a question mark . Perhaps a comment field for node - 1 etc.? (Would also be useful for adding references .) Also abbreviated display names : We'll often need explicit names for the database , but in context only want to display 'southern'. And we should have the option of manually adding the lowest nodes (field for node+1 etc), where they would only be used by two articles and inclusion in the database wouldn't make sense .

Allowing a manual field would obviate the need for names that only appear when immediately ascending . — kwami (talk) 01:34, 20 September 2013 (UTC)

I figure that the technical side will look very much like the automatic species taxobox stuff, even if we don't share any code, so I've hacked on the cross-referencer to also produce output that resembles their data. There's a sample here (picking Dravidian because our articles already largely agree with each other); for comparison, the biological data looks like this, so the live language data would look something like this:

Template:Automatic language infobox/Dravidian languages:

Display	Dravidian
Link	Dravidian languages
Always shown	Yes

Template:Automatic language infobox/Central Dravidian languages:

Display	Central
Parent	Template:Automatic language infobox/Dravidian languages
Always shown	Yes

(etc)

It's not currently outputting controversial/hypothetical/whatever fields, but that can be added. KleptomaniacViolet (talk) 14:21, 26 September 2013 (UTC)

Proof-of-concept: Module:Sandbox/KleptomaniacViolet/Language families/Data and Module:Sandbox/KleptomaniacViolet/Language families/Autotree.

Code	Result
{{#invoke:Sandbox/KleptomaniacViolet/Language families/Autotree\| gen_tree \|Tamil languages}}	The fallback node is redundant. There is no node with the given title or fallback. (Should add to a category.)

I looked at the automatic taxobox stuff, and I don't think that we should use it as a basis. It's clever, but I do not fancy trying to maintain anything complicated written in wikicode when Lua is an option. In my PoC, the data is all in one monolithic blob, generated using a script. This is not very editor-friendly, and a better way to do it would be to put the data on subpages (one to a node), but that would require the Lua script to parse, which is a minor undertaking to do right. KleptomaniacViolet (talk) 18:37, 26 September 2013 (UTC)

This looks promising. Where are we getting the names from? Article titles? Infobox titles? Infobox entries? Would it not be better to use ISO codes and maybe pull the names from Module:ISO 639 name? — Lfdder (talk) 18:51, 26 September 2013 (UTC)

Article titles and infobox entries, but there's also a normalisation/canonicalisation pass (and a few dozen manual corrections for awkward cases). I haven't thought about cross-referencing it to ISO 639; one issue is that there are quite a lot of nodes in our infoboxes, many of which don't have ISO codes referring specifically to them. My infobox extractor currently just ignores the code fields, but it shouldn't be hard to get that data propagated through it. KleptomaniacViolet (talk) 19:05, 26 September 2013 (UTC)

I think he meant the ISO codes for the languages, not the families. As in generate the tree from the ISO3 field? — kwami (talk) 21:22, 26 September 2013 (UTC)

I can't tell you how many hours I've spent trying to harmonize all our language and family articles. Wish we'd had this earlier!

One thing we'd want, once we get this up and running, is accessibility for editors who don't know Lua. We should be able to change the field values to change a superior node, and if possible to create a new node. It looks straightforward, but the more obvious the better.

For dialects, would we need to add the languages as additional nodes? — kwami (talk) 20:24, 26 September 2013 (UTC)

A minor point: We don't need to be so wordy in our names as "Kolami–Naiki Central Dravidian languages". "Kolami–Naiki languages" should be sufficient. (For one thing, the current name presupposes a classification which might be abandoned.)

Question: You have individual languages in the DB. Is this how you intend to proceed? I was thinking that we'd enter the lowest node in the info box, and that the script would take care of the rest. That is, that we'd only have DB entries for nodes shared by more than one article. If the individual languages need to be in the DB, then people won't be able to easily create info boxes for new language articles. — kwami (talk) 20:37, 26 September 2013 (UTC)

Arbitrary section

I've been thinking about editor-friendliness, and I realised that one-node-per-subpage is still unfriendly. The thing is that the node titles (the "Kolami–Naiki Central Dravidian languages" above) are arbitrary as far as the final display is concerned: they only look like that right now because that's the unique canonicalised name derived for that node. And since it's arbitrary, discoverability would be a problem for editors changing the classification data.

Here's another idea: store complete trees in subpages somewhere, looking something like this:

{{ language family node | article = Dravidian languages | display = Dravidian }}
* {{ language family node | display = South }}
:* {{ language family node | article = [[Tamil-Kannada]] | display = Tamil–Kannada }}
::* {{ language family node | display = Tamil–Kodagu }}
:::* {{ language family node | display = Tamil–Malayalam }}
::::* {{ language family node | article = Tamil languages | display = Tamil }}
:::::* {{ language family node | article = Tamil language | display = Tamil }}
::* {{ language family node | display = Kannada–Badaga }}
:::* {{ language family node | article = Kannada dialects | display = Kannada }}
::::* {{ language family node | article = Kannada language | display = Kannada }}

These subpages would then be processed by a bot, which would then generate a Lua data file. I think we can't get away from a bot generating the data, since we'll also need links downwards from parent nodes to child nodes for their infoboxes, and manually including them denormalises the database. I'm not 100% happy with this, and if anyone has any ideas on the topic of how to store the canonical database and how to get that data into a form for easy module consumption, I'd love to hear them.

RE languages as nodes: the intention is for the infobox to use the title of the page it's transcluded on as the first node title. Ideally, editors won't have to care about what the internal names of the nodes are: nodes without articles will get weird-looking names, but that's okay because the nodes with articles will have their names correspond to the article title. (Moves may be a problem, but I assume they won't be so frequent that manual fixing can't keep up.) New articles will need placing in the tree, which would be an extra step in the workflow, but the manual infobox will be sticking around for editors who don't know about/don't want to edit the central database, and they can be reconciled later. KleptomaniacViolet (talk) 13:40, 27 September 2013 (UTC)

I'm not sure I follow. This looks much less user-friendly. If I understand you, instead of changing just the parent of a node in cases of reclassification, we'd have to change every tree that includes that node, which is little better than the current situation. Keeping them all in sync would also be dicey. One of the expected advantages of automation lies in normalizing the trees across articles. That's the case with the bio boxes.

If we don't need to include the language in the DB, then it would be easy for new editors to create new articles: all they would have to do would be to specify the family/branch that the language is classified in, and the box would take care of the rest. This would also result in a much smaller DB: We'd only need to add languages in those relatively few cases where we have articles on individual dialects. For example, if Masica recognized a new Eastern Indic language, the creator of the article would enter "Eastern Indic" in the automated family field, and the box would render it the same as all the other Eastern Indic languages; if Eastern Indic were reclassified, we'd just change the parent of that single node and all the articles would be updated.

I don't understand what you mean by weird-looking names or why moves would be a problem. — kwami (talk) 23:52, 27 September 2013 (UTC)

I've put together a silly little example to demonstrate how this would work in practice.

User:KleptomaniacViolet/Example/data: This is the canonical tree, that editors would edit to change language classifications. In reality, this would be split across several pages for manageability (e.g., one per family), but there wouldn't be any duplication of data.
Module:Sandbox/KleptomaniacViolet/Language families/Breakfast data: This is the Lua data file used directly by the module to generate the classification shown in the infobox. It's machine-generated from the above. This should be done by a bot.
User:KleptomaniacViolet/Example/Infobox: The automatic infobox. It invokes the Lua module, passing the title of the page it's transcluded on as an argument.
User:KleptomaniacViolet/Example/Smoky Ham dialect, User:KleptomaniacViolet/Example/Breakfast languages, etc: The language articles. Note that just calling the automatic infobox, without any extra arguments passed in, will produce their classification.
User:KleptomaniacViolet/language node: Just a way of structuring the data in the data file and providing a nice way of looking at it for both humans and machines.

Not present in the example is automatic showing of child nodes, but that wouldn't be hard and I'll probably do it in a minute for completeness.

Let's consider how this affects various things:

Fiddling with language typology: Much improved. The classification data for many articles is stored in one place, and in a pretty human-friendly format.
Creating new articles: Requires editing the data file to get the automatic infobox to work. However, with that done, you also get links down from the parent families' infoboxes for free. I think that's more-or-less a wash.
Moving articles: Trickier. The nodes in the tree with articles are keyed by the article name. Moving the article will break that, because it can't see past redirects. The bot that generates the Lua tables may or may not be able to do something about this, but I'm really not sure.

KleptomaniacViolet (talk) 17:34, 28 September 2013 (UTC)

I think this is maybe too much magic. I don't like the idea of relying on a bot, either. I do generally agree with kwami above, though the list of exceptions (languages) could get fairly long, so a second parameter for parent language might be something to think about. — Lfdder (talk) 17:55, 28 September 2013 (UTC)

A bot will be necessary in one place or another if we want to generate parent -> child links and child -> parent links from the same data source without denormalising it (and therefore creating room for internal disagreement). The automatic taxobox system that inspired the whole thing runs a bot daily for this very purpose.

The other factor that's making me suggest a bot is that it means the canonical database can be in an editor-friendly format without forcing the automatic infobox to be an amazingly complicated system of templates and wikicode. I also think that the complete-tree-on-a-subpage model is better than the one-node-per-subpage model in the absence of specialised UI for the latter, but I don't think it's an option if the entire thing is built out of templates (at least, I can't see a way to do it that doesn't involve writing a wikicode-subset parser in wikicode).

One final consideration: in the long term, I'd like to move the classification data to Wikidata. Done right, that would mean there'd be no need for bots here on en. The Wikidata storage model is more-or-less subpage-per-node, but the UI is much better (e.g., compare the workflow for discovering the precise title of the parent node you want to link to). KleptomaniacViolet (talk) 18:24, 28 September 2013 (UTC)

I just realised that there's one more option that I'd forgotten: a wikitext-lite parser in Lua, which would obviate almost all of the need for an external bot. It wouldn't need to go anywhere near the whole way, but if it could parse things like User:KleptomaniacViolet/Example/data it could work... KleptomaniacViolet (talk) 18:57, 28 September 2013 (UTC)

We wouldn't wanna parse all these pages every time the script is run, so a bot would still be needed to trigger a 'reparse' every once in a while. — Lfdder (talk) 19:02, 28 September 2013 (UTC)

Or maybe we could have an invoke on every one of these pages to invoke the parser on save. — Lfdder (talk) 19:12, 28 September 2013 (UTC)

I don't really know how MediaWiki's caching and stuff works, I'll admit. There aren't more than 20,000 nodes to go through. Text munging should be pretty fast (right?), and a an acceptable time for a page to be generated is on the order of seconds (right?), which suggests to me that it's doable. KleptomaniacViolet (talk) 19:44, 28 September 2013 (UTC)

I like having the trees laid out like this (though it would be nice if we could sometimes split off subtrees like Bantu or Oceanic), but breaking the links every time we move an article is a serious problem. — kwami (talk) 01:07, 29 September 2013 (UTC)

Okay. On consideration, using {{{PAGENAME}}} does introduce too much magic and makes moving articles too awkward. Here's another idea: a node_title parameter to the automatic template, that will normally be identical to an article's title unless it's been moved recently and the database doesn't reflect the new name. A default automatic infobox could be substed onto the article by a meta-template as part of the suggested new article process maybe. There'd also need to be an optional second parameter, node_title_fallback, to cut the knot when updating the article's title in the database until it filters through and the new name is ready to use.

I'm currently leaning towards the bot option, since it's less of an unknown for me to implement. I'll put my code up on github now, and I'll try and make sure that manual updating is feasible until the bot is running, and if it ever breaks down.

RE splitting up trees: that will work as it is, so long as the common node is an article node. There's a little bit of room for redundancy here, but it's the sort that can very easily be automatically checked and have the bot complain about. KleptomaniacViolet (talk) 14:04, 29 September 2013 (UTC)

Sounds good. One other complication with relying on titles, however: There are hundreds of languages with the same name. At Ethnologue, there have been many misclassification problems because the trees are automated, and that's without them cuing from the titles. If we generate the trees from the article titles / language names, I suspect we'll have even more of a problem, with boxes colored as Papuan but having Austronesian trees, etc. I still wonder if manually adding the lowest node in the classification will be a more reliable method. Or at least checking that the family color matches the tree, which will catch at least some of the errors. — kwami (talk) 20:57, 29 September 2013 (UTC)

Script prototype

Github. The README has instructions on how to run it, and also a list of pages it's got an interest in. It doesn't operate autonomously yet as a full-fledged bot. I put up all the generated trees from infoboxes on subpages of User:KleptomaniacViolet/Language families data. I don't suggest correcting them by hand yet, since my uploader is stupid and will overwrite your edits, and I think there's still a chunk of automatic inference that can be squeezed out.

@kwami: Languages with the same name aren't a problem since our articles are going to be distinct, and their node titles are equal to their article titles. Non-article nodes have titles specifically constructed to avoid collisions: they include the name of their nearest ancestor that has an article, as well as the intermediate nodes in between. The display name isn't factored in at all as far as walking the tree goes. I think that answers your concern, but I'm not quite sure I've understood it. Feel free to have a poke around with the stuff I put up, and see if you find anything that worries you. KleptomaniacViolet (talk) 18:46, 1 October 2013 (UTC)

Many of the intermediate nodes have redirects when there's no article, so the possibility of collision has already been worked out.

My concern re. the naming is with moving articles: If language X is currently at the ambiguous name, but we decide language Y is better placed there, and switch their places without updating the DB, then the DB would presumably generate the wrong trees. Such moves are done fairly frequently. If the lowest superior node is hard-coded into the info box, however, there shouldn't be any possibility of confusion. I just know that without some sort of permanent and editable check in the article itself, we're going to have all sorts of problems with garbled classifications, and they will be a royal pain to identify. We might be able to verify with ISO codes, but not all languages have them, and sometimes we put a language in a different family than Ethnologue does, creating another possible point of confusion. — kwami (talk) 01:08, 2 October 2013 (UTC)

Okay, I think I've got what you mean now. User:KleptomaniacViolet/Example/Infobox now takes a fallbacknode and an expectedparentarticle parameter, and checks that the nearest article ancestor matches what it's given. (I don't want to end up with non-article node titles on pages, because they may be fragile.) If there are any problems (needing to use the fallback, the fallback being redundant, parent mismatches, etc), it's currently dumping an error message into the infobox, but that's fixable--adding to a maintenance category and not showing an ugly red error is probably better when it goes live. I've also made User:KleptomaniacViolet/Example/Metainfobox, which is designed to be substed in and automatically fills out targetnode and expectedparentarticle. KleptomaniacViolet (talk) 20:14, 2 October 2013 (UTC)

Good. You say you don't want to break off at the +1 node because you're afraid that may be fragile, but the error-tracking categories you're talking about should handle that just as easily as what you propose, wouldn't they? Maybe we can see how your coding plays out in a test run if no-one else has comments, but I'm concerned that people won't bother to add the expected-parent parameter in the info boxes, and that this will lead to garbled classifications down the line that no-one will be monitoring. I expect it would be more robust if we require a parent node to for the box to generate the classification.

We can maybe shorten the parameter names later so they're easier for people to type.

I'm not sure we want to completely automate the children part. We don't do that regularly now. In some cases it would just be a long redundant list of languages duplicating the text, and rather pointless, and also problems with possible children of uncertain classification. There are also cases where the children level does not exactly match the parent description in the child articles. Perhaps it would generate that level if we set the 'children' parameter to 'yes', and otherwise be set manually? — kwami (talk) 20:42, 2 October 2013 (UTC)

Not quite. I've got two separate concerns about including the parent field. 1) generating the tree from it means redundancy across articles. E.G., West Germanic languages would have parent = Germanic languages, but that parent-child relation is also contained in the database because the English, German, etc articles need that. If we've got the information that West Germanic's parent is Germanic in the database, we might as well use it on that article. 2) Intermediate non-linked node titles aren't stable and may be affected by distant changes. Consider Dravidian: as our articles currently have it, the relevant part of the tree for Telugu looks like this:

{{node | article = Dravidian languages | display = Dravidian}}
:{{node | display = Southern}}
::{{node | display = South-Central}}
:::{{node | article = Telugu languages | display = Telugu}}

If, on Telugu languages, we want to refer to the immediate parent node, what unique title should we give it? South-Central is probably amibiguous across the whole database, so that won't work. Right now, it comes out to something like Dravidian languages/Southern/South-Central, which is guaranteed to be unique by construction. But, suppose that later there's a reorganisation of the major branches of Dravidian and South-Central is moved back to a primary branch, the tree looks like this:

{{node | article = Dravidian languages | display = Dravidian}}
:{{node | display = South-Central}}
::{{node | article = Telugu languages | display = Telugu}}

Now the generated unique-by-construction title is Dravidian languages/South-Central, and any page referring to Dravidian languages/Southern/South-Central is broken. Article nodes don't have this problem, since we're piggybacking off their uniqueness and determinateness in mainspace, but since there may well be intermediate nodes between the nearest article ancestor and the page it's being generated for, this can only be used as a check and not an input.

People leaving out the parent article name will just trigger some error output, it won't lead to garbling; the actual tree returned is entirely determined from the given title (or the fallback). Though, the meta-infobox for substitution is precisely to try and make it easy to include the parent article name without thinking too hard about it. And, yeah, the automatic stuff won't work in every case, but I'm hopeful that with sufficient detail in the database we can get 90%+ of the way there. Manual overrides will be an option, in some shape or another. I have vague ideas about a semi-automatic tool that'll show you an automatic infobox and the current infobox on a page, highlight any differences and (if there's no significant difference) making the switch for you. KleptomaniacViolet (talk) 21:50, 2 October 2013 (UTC)

Off topic slightly, I think we should just use the linguistic names of the nodes. South-Central Dravidian, for example, should just be South-Central Dravidian, and likewise Southern Dravidian. Most names are already unambiguous.

As long as we have a robust way of tracking and verifying that languages are in the proper family, we should be alright. But I keep thinking of the IPA fix-up categories, which have gotten away from us a bit. With close to 10,000 articles, this could get away from us too if we're not careful. — kwami (talk) 11:51, 4 October 2013 (UTC)

There are also languages which are not clearly part of any one branch, such as Menchum language and Esimbi language. — kwami (talk) 10:07, 16 October 2013 (UTC)

Hover notes for transliteration

Hi!

I originally wrote this at the Village pump, but it was suggested I try here first.

I got this idea a few months ago, ever since footnotes on en:Wiki were made able to hover above the mouse pointer when the pointer hovers above the footnote number, ^[1]. [Since floating notes do not seem to work at WikiProjects, you might want to copy-paste the code for this entire section into the sandbox and preview it, to see what I mean.]

I was thinking if it's possible to make a type of "invisible" footnote which would not be listed at the bottom of an article, but that would only exist as a hovering note. (Therefore it wouldn't even technically be a "footnote" any more!)

Reasons:

This would be especially useful in linguistics articles. Often, when a language is written in another alphabet, any examples would, in a best-case scenario, be given in at least three parts – i.e. 1) the original word/sentence 2) its English transliteration, and 3) the English translation. e.g.

Russian has four third-person pronouns – он, она, оно, они (on, ona, ono, oni – he, she, it, they).

This is an oversimplified example, but I hope you get the point.

The major problem arises due to the fact that, in language articles, such triple-script examples are avoided for clarity, and so stage 2, the transliteration, is necessarily sacrificed entirely. Which is all well and good for people that can read Cyrillic, but not for anyone else that may be interested in Russian. Therefore the article a priori excludes a whole cross-section of readers.

Because of this, language articles are currently extremely exclusive of whole swathes of readers who are not already partly fluent in a given language, for precisely this reason. But many linguistically-minded people are interested in different languages precisely because they appreciate the intricacies and beauty of different languages.

And Cyrillic is comparatively easy for speakers of Roman-script languages. What do we do when the article uses Arabic, Devanagari, Chinese, or whatever (especially for people who don't have East Asian fonts installed, that can't even see the letters, let alone attempt to understand them...)

This I feel would be improved if the stage 2 was therefore written by using this "invisible note" that I'm proposing, which will allow inclusion of all Wikipedia users. It would therefore look something like this –

Russian has four third-person pronouns – он, она, оно, они ^[2] (he, she, it, they).

Or, since we're at it, to actually go all the way –

Russian has four third-person pronouns – он, она, оно, они ^[3]

Or, perhaps simpler ^[4] (including the original Cyrillic, for clarity?)

Since such examples would be in the dozens for larger articles, having them as proper footnotes would be impractical, as they would fill up the majority of the article. Hence the use of zero/hovering notes that I'm proposing, that don't appear at the bottom of the articles, but only when the mouse hovers over them.

Obviously, the users who write the articles and thus create the hover notes would choose which, if any, of the lines (transliteration; IPA, whatever), to include, and these could t hen be added to by anyone that wants to. In the same vein, there could be a small "settings" cog in the corner, just as there is in current pop-up footnotes, and users can themselves choose which lines they want to have included – for example, maybe one user will their hover notes to only include the respelling line, another may want only transliteration and IPS, or whatever.

This type of note would be created for this specific purpose by a technically-minded colleague. The good thing about this is that all this would not just work for linguistics, but maths, the sciences, and so many other subjects could find use for them. What do you all think, should we ask their creation by at the Village pump?

Thanks for your attention! BigSteve (talk) 19:57, 16 October 2013 (UTC)

^ thus
^ on, ona, ono, oni
^
trlit. – on, ona, ono, oni

resp. – ON, ən-NA, ən-NO, ən-NI

IPA – /ˈon/, /ɐˈna/, /ɐˈno/, /ɐˈni/

trlat. – he, she, it, they
^
– он, она, оно, они

– on, ona, ono, oni

– ON, ən-NA, ən-NO, ən-NI

– /ˈon/, /ɐˈna/, /ɐˈno/, /ɐˈni/

– he, she, it, they

What happens with touch devices? — Lfdder (talk) 21:40, 16 October 2013 (UTC)

Well, how do they deal with normal footnotes? The same way, I guess. What do you think of my suggestion generally, though? We can iron out any technical details if there is consensus to submit it for development. BigSteve (talk) 17:28, 17 October 2013 (UTC)

Proposed deletion of Free Greek language

The article Free Greek language has been proposed for deletion because of the following concern:

No indication of meeting the notability requirements at WP:GNG.

While all constructive contributions to Wikipedia are appreciated, content or articles may be deleted for any of several reasons.

You may prevent the proposed deletion by removing the {{proposed deletion/dated}} notice, but please explain why in your edit summary or on the article's talk page.

Please consider improving the article to address the issues raised. Removing {{proposed deletion/dated}} will stop the proposed deletion process, but other deletion processes exist. In particular, the speedy deletion process can result in deletion without discussion, and articles for deletion allows discussion to reach consensus for deletion. Aɴɢʀ (talk) 10:11, 4 October 2013 (UTC)

The PROD was contested and the article is now at AFD, see WP:Articles for deletion/Free Greek language. Aɴɢʀ (talk) 19:18, 17 October 2013 (UTC)

Notice of posting to TFA nominations

I've added Fuck (film) to TFA nominations, discussion is at Wikipedia:Today's_featured_article/requests#Fuck_.28film.29. — Cirt (talk) 22:34, 30 January 2014 (UTC)

Fuck peer review, again

I've listed the article Fuck: Word Taboo and Protecting Our First Amendment Liberties for peer review.

Help with furthering along the quality improvement process would be appreciated, at Wikipedia:Peer review/Fuck: Word Taboo and Protecting Our First Amendment Liberties/archive1.

Thank you for your time,

— Cirt (talk) 01:07, 26 January 2014 (UTC)

Three requested moves

There are currently three requested moves underway that are relevant to this WikiProject:

Standard German is requested to be moved to High German; see Talk:Standard German#Requested move to "High German";
Mandarin Chinese is requested to be moved to Mandarin dialects; see Talk:Mandarin Chinese#Requested move to Mandarin dialects;
Standard Chinese is requested to be moved to Putonghua; see Talk:Standard Chinese#Request move to Putonghua.

Please contribute to the discussion and help find consensus. Aɴɢʀ (talk) 12:17, 17 November 2013 (UTC)

List of Romanian words of possible Dacian origin

The List of Romanian words of possible Dacian origin article has been proposed for deletion. Your opinions are welcomed. --Norden1990 (talk) 17:39, 17 November 2013 (UTC)

What should we use for angle brackets?

Discussion here. — kwami (talk) 03:36, 19 November 2013 (UTC)

Cornish extinct?

There's a discussion at Cornish language over whether it's appropriate to say the language went extinct (before revival) in the info box. Every linguistic source I can find says the language went extinct, and if they mention Davies or other supposed native speakers, describe them as having some knowledge that had been passed down but not full language ability. But revivalist sources claim the language never did go extinct, and that people like Davies were a bridge to a new generation of native speakers. — kwami (talk) 00:41, 4 December 2013 (UTC)

Fuck: Word Taboo and Protecting Our First Amendment Liberties for Featured Article

I've nominated Fuck: Word Taboo and Protecting Our First Amendment Liberties for Featured Article candidacy.

Comments would be appreciated, at Wikipedia:Featured article candidates/Fuck: Word Taboo and Protecting Our First Amendment Liberties/archive1.

Thank you for your time,

— Cirt (talk) 05:32, 9 March 2014 (UTC)

Unami language original research

Hello, does anyone here work with the Unami language, Delaware languages, or Algonquian languages? I've had a headache over the last few weeks dealing with an editor who has his own orthographic system that he's determined to interject into Lenape-related articles, despite the fact that they aren't published anywhere. He's not a linguist or affiliated with the Delaware Tribe of Indian's Lenape Language Preservation Project (or any federally or state-recognized tribe). I'm not getting through with warnings against original research. He's mainly been working on Lenape, Susquehannock, Lenapehoking, and many geographical articles around Pennsylvania. This is the kind of material I've had to remove:

Almost every historian has misinterpreted the simple meaning of “Lenape.” According to interviews with those who have some familiarity of the ancient language, Doris Riverbird of Quitapahilla, Pennsylvania, and Gary "Deer Standing Schreckengost" (Ah-too Nee-poo We-po-schwa-gen She-pong of Neshaminy, Mahantango, Tionesta, and Cocalico, Pennsylvania...

Any assistance or advice how to stem the tide of original research and original orthographies would be greatly appreciated. -Uyvsdi (talk) 21:28, 8 December 2013 (UTC)Uyvsdi

This stuff is always difficult to deal with. The best approach I think is to enforce the verifiability policy aggressively, removing content that is not directly sourceable to a reliable source.User:Maunus ·ʍaunus·snunɐw· 21:34, 8 December 2013 (UTC)

Thanks. Some of the spellings simply don't exist anywhere on the internet, so I've pulled them. One problem with some of the edit books will be listed with no page numbers, so I honestly don't know where to look. -Uyvsdi (talk) 21:38, 8 December 2013 (UTC)Uyvsdi

Brunei English

It's not a dialect. Not even the book that's coined 'Brunei English' and that most of the article's based on claims that it is. There's no importance scale for this project presently. — Lfdder (talk) 07:35, 31 December 2013 (UTC)

Ranking Update (talk · contribs) assessed it as "high", if the project doesn't use importances, that setting should be removed from the talk page, and that user informed of the situation. You may also wish to propose the article for deletion, based on your statement about the reliability of the article. -- 76.65.128.112 (talk) 00:31, 1 January 2014 (UTC)

Only the intro is obviously misleading. — Lfdder (talk) 01:03, 1 January 2014 (UTC)

IPA's recording of RP

Peter Roach uploaded his RP recording that JIPA uses for its IPA transcription of English. It's been marked for deletion as copyvio. I've made some suggestions on his talk page; notifying y'all in case s.o. has a better suggestion or can navigate the bureaucracy better than us. — kwami (talk) 19:01, 2 January 2014 (UTC)

Template for mentioning words

Apparently no such exists yet, so I've created one and documented it here. I'd like to move it to template namespace and start using it in articles as soon as possible so...comments are welcome! --Ivan Štambuk (talk) 02:45, 22 December 2013 (UTC)

1. In your template documentation I see no mention of the template adding the articles it is used in to Category:Articles containing ...-language text. This would be desirable.

2. As all your examples start with “***: ” (where *** is the language name) this should be automated (with the possibility to override), maybe using the template {{ISO 639 name}} or some such. —LiliCharlie (talk) 03:59, 22 December 2013 (UTC)

1) It does so already, but only when used in the main namespace. Now it's documented.

2) Those were just to illustrate what language we're dealing with..I think that most usages of example words in articles are without specifying language name, so it would be overhead to enable that by default and disable it each time when it's not needed. However, it's no problem to add it. The preferred way would be to through some special parameter, and not duplicating the template name (like there is the entire set of lang-x templates). Perhaps to overload the first parameter with some kind of special syntax so that e.g. {{lang-ex|ru+|...}} (note the + sign) generates Russian: ..., and when ISO code is used without '+' no language name is generated? --Ivan Štambuk (talk) 05:25, 22 December 2013 (UTC)

Err, isn't this the purpose of {{lang}}? Why shouldn't it be merged in there? --Joy [shallot] (talk) 18:41, 4 January 2014 (UTC)

Because of its additional functionality - support for transliterations, links to wiktionary, glosses. These are currently used but in an unsystematic and more complicated manner. It's a simpler replacement for both {lang} and {lang-xx}. --Ivan Štambuk (talk) 20:19, 4 January 2014 (UTC)

Merge discussion for Guwen and Classical Chinese

Two articles related to this WikiProject, Guwen and Classical Chinese, have been proposed for a merger . If you are interested in the merge discussion, please participate by going here, and adding your comments on the discussion page. Thank you. — $Llywelyn II$ 08:11, 15 January 2014 (UTC)

Feedback request: VisualEditor special character inserter

The developers are working towards offering mw:VisualEditor to all users at about 50 Wikipedias that have complex language requirements. Many editors at these Wikipedias depend on being able to insert special characters to be able to write articles.

A special character inserter tool is available in VisualEditor now. They would like to know what you think about this tool, especially if you speak languages other than English. To try the ⧼visualeditor-specialcharacterinspector-title⧽ tool, please:

If you haven’t already opted-in, then opt-in to VisualEditor by going to Special:Preferences#mw-prefsection-betafeatures and choosing "VisualEditor". Save your preferences.
Edit any article or your user page in VisualEditor. See the mw:Help:VisualEditor/User guide for information on how to use VisualEditor.

To let the developers know what you think, please leave them a message with your comments and the language(s) that you tested at the feedback thread on Mediawiki.org or here at the English Wikipedia at Wikipedia:VisualEditor/Feedback. It is really important that the developers hear from as many editors as possible. Thank you, Whatamidoing (WMF) (talk) 20:28, 22 January 2014 (UTC)

Silesian

Move request at Talk:Silesian_language#Requested_move2. Difficult case due to the ambiguity of the name "Silesian" and the lack of good sources. — kwami (talk) 22:10, 22 January 2014 (UTC)

Population of Hebrew

Data at Hebrew language from newspapers etc. One problem is that many sources do not specify whether their figures are for native speakers, and for Hebrew this makes a huge difference. Granted, Ethn. is not the greatest source. Anyone have anything better? — kwami (talk) 04:15, 30 January 2014 (UTC)

Invitation to User Study

Would you be interested in participating in a user study? We are a team at University of Washington studying methods for finding collaborators within a Wikipedia community. We are looking for volunteers to evaluate a new visualization tool. All you need to do is to prepare for your laptop/desktop, web camera, and speaker for video communication with Google Hangout. We will provide you with a Amazon gift card in appreciation of your time and participation. For more information about this study, please visit our wiki page (http://meta.wikimedia.org/wiki/Research:Finding_a_Collaborator). If you would like to participate in our user study, please send me a message at Wkmaster (talk) 12:13, 30 January 2014 (UTC).

San or Bushmen?

There's a move discussion at Bushmen. Since both terms are derogatory, it's a bit contentious. — kwami (talk) 01:25, 31 January 2014 (UTC)

Archived some threads

I've archived some inactive threads to subsections which were notifications about discussions that have since been closed. — Cirt (talk) 06:01, 31 January 2014 (UTC)

What does ISO 639-3 zom mean?

I'm trying to make sense of Zou language which seems to have been the victim of some sort of nomenclature wars. One of its sources is Ethnologue[3] and I'm not sure what it means so far as the name of this language goes. Thanks. Dougweller (talk) 15:56, 1 February 2014 (UTC)

ISO 639-3 is a set of codes for language names and zom happens to be the ISO code for the Zou language. For comparison, English would be eng. De728631 (talk) 16:28, 1 February 2014 (UTC)

Sorry De728631, missed this response. What puzzles me also is that the heading in upper case at the top is ZO - what does that refer to? Thanks. Dougweller (talk) 05:57, 3 February 2014 (UTC)

When you say "the heading in upper case at the top", do you mean the top of the Ethnologue page you've linked to, or somewhere on the Wikipedia article Zou language? I don't see "ZO" in either of those places, but the Ethnologue article spells the language name "Zo", which is apparently one of several alternate names. Ethnologue lists "Zou" as the spelling used in India (down near the bottom of the page). Cnilep (talk) 06:31, 3 February 2014 (UTC)

Above "A language of Myanmar" on the Ethnologue page it says ZO as though it is the name of the article and language. But I think I'm clear now, thanks. I thought this page was on my watch list but it wasn't. Dougweller (talk) 11:01, 6 February 2014 (UTC)

Apparently Zo and Zou are equivalent names. The Ethnologue page has a comment that reads "Also in India (Zou)" so that should be the local name. See also their list of altername names. De728631 (talk) 13:19, 10 February 2014 (UTC)

Main Page appearance: Fuck (film)

This is a note to let the main editors of Fuck (film) know that the article will be appearing as today's featured article on March 1, 2014. If this article needs any attention or maintenance, it would be preferable if that could be done before its appearance on the Main Page. If you prefer that the article appear as TFA on a different date, or not at all, please ask Bencherlite (talk · contribs). You can view the TFA blurb at Wikipedia:Today's featured article/March 1, 2014. If it needs tweaking, or if it needs rewording to match improvements to the article between now and its main page appearance, please edit it, following the instructions at Wikipedia:Today's featured article/requests/instructions. The blurb as it stands now is below:

Fuck is a 2005 American documentary film by director Steve Anderson, which argues that the word is key to discussions about freedom of speech and censorship. The film provides perspectives from art, linguistics, society and comedy. Linguist Reinhold Albert Aman, journalism analyst David Shaw, language professor Geoffrey Nunberg and Oxford English Dictionary editor Jesse Sheidlower explain the term's history and evolution. The film features the last interview of author Hunter S. Thompson before his suicide. It was first shown at the AFI Film Festival at ArcLight Hollywood; it has subsequently been released on DVD in America and in the UK and used as a resource on several university courses. The New York Times critic A. O. Scott called the film a battle between advocates of morality and supporters of freedom of expression, while other reviews criticized its length and repetitiveness. Law professor Christopher M. Fairman commented on the film's importance in his 2009 book on the same subject. The American Film Institute said, "Ultimately, [it] is a movie about free speech ... Freedom of expression must extend to words that offend." (Full article...)

UcuchaBot (talk) 23:01, 11 February 2014 (UTC)

Above was posted to my user talk page, posting here as well. Cheers, — Cirt (talk) 23:19, 11 February 2014 (UTC)

Romanization in MoS/Japan-related articles

Someone with linguistic background is needed to make sense of romanization applied to Japan-related articles. Come to visit Wikipedia talk:Manual of Style/Japan-related articles#No standards, only deliberate differentiation and Wikipedia talk:Manual of Style/Japan-related articles#No standards outside Wikipedia, no standards in Wikipedia. --Nanshu (talk) 14:04, 15 February 2014 (UTC)

More opinions needed

Please submit your comments regarding on-going discussions at Talk:Latin_peoples 79.117.160.159 (talk) 11:56, 18 February 2014 (UTC)

Common practices/formatting on "language" pages?

Is there a style guide that is specific to pages about languages? Things like how to show phonology, grammar, things like that. What I'm looking for specifically is what consensus there is (if there is one) on including samples of a language in the form of a short list of common phrases, like we have on Zulu language. Personally I think those are not well-suited to an encyclopedic article, more suited to a dedicated phrasebook or a dictionary (that is, Wiktionary, which includes phrasebooks just like these). CodeCat (talk) 21:22, 27 February 2014 (UTC)

I removed a similar list from Modern Greek some time ago and it went unchallenged. — Lfdder (talk) 21:26, 27 February 2014 (UTC)

I would not pass an article as a GA that had extensive lists of vocabulary that do not serve to illustrate some specific purpose, or that gives lists of commonly used phrases. But apart from the MOS and WP:NOT I dont think there are specific guidelines about it.User:Maunus ·ʍaunus·snunɐw· 21:40, 27 February 2014 (UTC)

Four-paragraph leads -- a WP:RfC on the matter

Hello, everyone. There is a WP:RfC on whether or not the leads of articles should generally be no longer than four paragraphs (refer to WP:Manual of Style/Lead section for the current guideline). As this will affect Wikipedia on a wide scale, including WikiProjects that often deal with article formatting, if the proposed change is implemented, I invite you to the discussion; see here: Wikipedia talk:Manual of Style/Lead section#RFC on four paragraph lead. Flyer22 (talk) 16:35, 28 February 2014 (UTC)

Template:Lang-gkm

Anyone interested in commenting?!? Thanatos|talk|contributions 13:39, 5 March 2014 (UTC)

English words of (possibly) Malayalam origin

Someone may want to have a look at List of English words of Malayalam origin. I added some {{citation needed}} and other maintenance tags to some words whose origins are not precisely known. (Several words that entered English from Sanskrit have clear Dravidian roots, but just which Dravidian language should "get the credit" is probably an unanswerable question. At least, I know of no objective scholar who has answered it.) Another user has responded by creating a "Notes" section with historical arguments that look like original research. Cnilep (talk) 04:00, 23 February 2014 (UTC)

A little help here, please. The user continues to add words, citing sources that either are unreliable (blogs, Wiktionary) or fail to verify the assertions for which they are cited. (Some of the additions, though, are reliably sourced as English words of Malayalam origin.) See Talk:List of English words of Malayalam origin as well as numerous maintenance tags I have placed in the article. Cnilep (talk) 00:24, 6 March 2014 (UTC)

I've just found List of English words of Telugu origin – which, problematically, cites at least one of the same sources used to "verify" words as having Malayalam origin. I wonder if it would be worth the inevitable headaches to propose a List of English words of Dravidian origin, merging the content of these two lists? Cnilep (talk) 00:30, 6 March 2014 (UTC)

Yes, I think that's reasonable. It's often the case you can track down a loanword to a language family, but not to a particular language. There's also List of English words of Tamil origin. — Lfdder (talk) 00:57, 6 March 2014 (UTC)

Is Pala scholaris an English word? I would have said "no", but it's hard to prove a negative (and this user probably already thinks I'm out to get him/her). Does the fact that it's not in the Oxford English Dictionary matter? Cnilep (talk) 06:14, 14 March 2014 (UTC)

Fuck: Word Taboo and Protecting Our First Amendment Liberties promoted to Featured Article

Fuck: Word Taboo and Protecting Our First Amendment Liberties was promoted to Featured Article quality.

Thank you very much to all who helped with this successful quality improvement project related to freedom of speech and censorship,

— Cirt (talk) 00:39, 18 March 2014 (UTC)

Asash language

I can't figure out what this is. It had 3 links all to a Tanzanian language, but says it is related to Farsi, Pashto and Urdu and is almost extinct. There seems to be an article in the Urdu Wiki on this but also without sources.[4]. Thanks. Dougweller (talk) 16:03, 23 March 2014 (UTC)

Can't locate immediately, will look later. It might be a literary form. — kwami (talk) 22:53, 23 March 2014 (UTC)

It would appear to be a hoax. The Urdu WP article has fake refs just as this did before they were deleted, and it can hardly be influenced by Urdu it it's a language of ancient SW Asia. I can find no mention of any such language on GBooks or in my own library. Requesting deletion. — kwami (talk) 06:48, 24 March 2014 (UTC)

[1] thus

[2] on, ona, ono, oni

[3] 
trlit. – on, ona, ono, oni

resp. – ON, ən-NA, ən-NO, ən-NI

IPA – /ˈon/, /ɐˈna/, /ɐˈno/, /ɐˈni/

trlat. – he, she, it, they

[4] 
– он, она, оно, они

– on, ona, ono, oni

– ON, ən-NA, ən-NO, ən-NI

– /ˈon/, /ɐˈna/, /ɐˈno/, /ɐˈni/

– he, she, it, they

[1]

[2]

[3]

[4]