Jump to content

Wikipedia talk:WikiProject Tree of Life/Archive 50

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 45Archive 48Archive 49Archive 50Archive 51Archive 52Archive 55

Monkbot/task 19

As I write this, Monkbot/task 19 has made some 40,000 edits. This morning, Editor BhagyaMani reverted 40 of those edits most of which had the edit summary Ref iucn WAS already up to date (I have fixed the bug that added Sand cat to Category:Taxobox binomials not recognized by IUCN). Editor BhagyaMani also left a note on my talk page complaining that task 19 1) added obsolete parms like |volume= and |doi=; 2) removed the parm |author-link=; but 3) unnecessarily changed the ref name. 4) This ref and also the other one were already up to date.

I am addressing Editor BhagyaMani's complaints here because I believe that they deserve to be addressed by the community that uses {{cite iucn}} rather than by two editors having a discussion in the isolated backwater of a user talk page.

To address Editor BhagyaMani's complaints:

1 – I do not know that {{cite iucn}} parameters |volume= and |doi= are obsolete; are they? There is no discussion at Template talk:Cite iucn about that. Perhaps Editor BhagyaMani can link to the discussion(s) where these parameters are determined to be obsolete.
2 – yes it does (Editor Lyttle-Wight voiced the same complaint today). I think that author links in citations are useful only when it is necessary to establish that the author is sufficiently expert to be believed when the publisher is not known to be reliable or has a less than stellar reputation. IUCN is not one of those publishers so it is not necessary to wikilink the author names to establish credibility. Author-name links then is just WP:OVERLINKing; redlinks are worse (see this editwhich also misuses |author=)
3 – task 19 changes the <ref> tag name= attribute as a result of a comment here by Editor Tom.Reding which see and my reply. The entire conversation is now archived here.
3 & 4 – of the 40 task 19 edits that Editor BhagyaMani reverted, many of the restored references do not have |access-date= (Tibetan fox). Many of those restored references that do have |access-date= are two and three years old (Hairy-nosed otter). When task 19 inspects the reference in {{speciesbox}} and {{taxobox}} parameter |status_ref=, it looks for a <ref> tag with a dated name= attribute or failing that, for |access-date=. Task/19 will confirm/update the {{speciesbox}} and {{taxobox}} template parameters |status=, |status_system=, and |status_ref= when:
  • |status_ref= has a dated <ref> tag where the date is older than six months; OR
  • the value assigned to |status_ref= has |access-date= where the date is older than six months; OR
    • |status_ref= does not have a dated <ref> tag, AND
    • the value assigned to |status_ref= does not have |access-date=
Six months because it is assumed that IUCN update their database approximately twice a year. Without either of these dates, how is an editor to know that a status reference is already up to date? When the |access-date= was two or three years ago, how is an editor to know that that reference is already up to date? The IUCN database is not static, we should not treat is as a static thing.

Comments welcome. —Trappist the monk (talk) 15:33, 19 November 2021 (UTC)

Author links are helpful for providing the full name of an author, instead of just one or two initials. Red links are helpful in indicating the lack of a WP article about an author, and useful as a starting point if someone should want to write the needed WP article. Lyttle-Wight (talk) 15:45, 19 November 2021 (UTC)
I agree that live links are useful, but I don't think it is useful to have redlinks in citations. If the author has made a significant contribution that warrants an article, the author should be mentioned somewhere in the article text with a redlink.—  Jts1882 | talk  15:50, 19 November 2021 (UTC)
I also agree that authorlinks are useful; disagree with the conclusion that "Author-name links then is just WP:OVERLINKing" and the rationale leading up to that conclusion; and agree that author redlinks are not useful (WP:REDLINKBIO should apply here, anyway). Esculenta (talk) 16:42, 19 November 2021 (UTC)
I also agree that author-links are useful. And in particular the author-link to Birdlife International seems to be standard, though not used in all the pages on birds. – BhagyaMani (talk) 17:24, 19 November 2021 (UTC)
I think it was Jts1882, who informed me more than a year ago that the |doi= is not needed as the link is generated automatically through the page number. And I recall that Tom.Reading ran an update of lots of pages removing the |doi= some time ago. – BhagyaMani (talk) 17:24, 19 November 2021 (UTC)
moved the above comment out of my post per WP:TPO 4th paragraph
It is true that {{cite iucn}} builds a url for |title=, but... First Module:Cite iucn looks at |url= to see if there is a valid url there (where valid is defined as a 'new-form' url – these will eventually go away). When |url= is missing, empty, or has an 'old-form' url, Module:Cite iucn will build a url from the taxon/assessment IDs found in |page= (first choice) or |doi= (second choice).
If |doi= in {{cite iucn}} has been deprecated and should no longer be supported, you should be able to point me to the discussion that reached that consensus. With that, I will modify task 19 and {{make cite iucn}} to skip the doi url found in IUCN taxon citations and further, modify task 19 to remove |doi= from existing references and modify {{cite iucn}} to remove support for |doi=. Until such time as I can read the consensus discussion, task 19 will continue to include |doi= when it updates IUCN references, {{make cite iucn}} will continue to translate the doi url from IUCN taxon citations, and {{cite iucn}} will continue to support |doi=.
Trappist the monk (talk) 23:59, 19 November 2021 (UTC)
It is not necessary to change ref name from iucn to "iucn+a date". Without a date in ref name is easier to update manually. – BhagyaMani (talk) 19:05, 19 November 2021 (UTC)
Certainly agree with that - when the status changes in the future, I wouldn't want to have to choose between either a) updating the ref name to an appropriate date version AND scanning the entire text to update all repeat uses, or b) stick with an old and probably confusing outdated ref name. Much better to just have it named "iucn" or "IUCN" and avoid this dilemma. --Elmidae (talk · contribs) 03:09, 20 November 2021 (UTC)

Categories for discussion

The broader discussion at Wikipedia:Categories for discussion/Log/2021 November 20#Works by people not currently known to be notable may be of interest to this WikiProject. It is related to a previous discussion by this WikiProject, now located at Wikipedia talk:WikiProject Tree of Life/Archive 49#Bulk category creation. UnitedStatesian (talk) 23:36, 21 November 2021 (UTC)

The automatic taxonomy system

Hello! I'm an admin at the Albanian Wikipedia and lately I went to update some of our infoboxes related to taxonomy to make them up to date with their EnWiki homologues. 2 of my changes (taxobox and speciesbox) were reverted to an old state by an editor saying that those changes introduced errors in our articles and that the automatic system EnWiki uses can't be used in SqWiki because of the lack of other templates, the sheer number of which is gigantic. This surprised me a bit and made me start reading more about the said system here which was a rabbit hole on its own. I have a very naive question: Why is it that we're using templates in a very unusual manner instead of devising a better overall mechanism that deals specifically with this? What that mechanism would be? I don't know (hence why the question is naive) but my first reaction for this was that it should have been sorted out in Phabricator, not with modules here. Having said that, can someone find the time to explain very-very shortly to me how does everything work? I read a lot of documentation pages and so I have the general idea but I rarely work with articles per se personally, mostly dealing with the technical part, so it was still a bit confusing to me. I totally understand if the whole thing can't really be TL;DR-ed. And lastly, is there any guide on internationalization and localization in regard to this system? As I said, the editor who reverted me did mention a lot of missing templates but didn't really specify which, being that he doesn't really deal much with the technical aspect of the project. At this point, I'm confused as to what I should be importing and updating to not break anything. Any kind of help would be appreciated - Klein Muçi (talk) 08:32, 13 September 2021 (UTC)

Documentation for the automated taxobox system

Taxonomy templates

Taxobox templates

In short the taxonomy templates contain the parent of any taxon and the taxobox builds the classification by following the parents up to the top level, with rules on which ranks to display. The navbox in the {{automatic taxobox}} template documentation provides the details.
The template system for {{automatic taxobox}} was designed back in 2010 and has been fairly successful. A new system would probably be very different. I think there are something like 70k taxonomy templates, so any change would involve a lot of work. To duplicate it on your Wikipedia you would need to import all the taxonomy templates for any parent taxon of any taxon with an article. The only completely reliable method would be to import all of them as it would be difficult to be selective. Some taxonomic hierarchies are 50 parents high and identifying the templates to import would probably require a bot (Lua is used for taxoboxes but can't get info across wikipedias).
An alternative for small Wikipedias is to use Wikidata. Several Wikipedias have such systems (Hebrew, Catalan?). Here on English Wikipedia there is a consensus not to use Wikidata for a number of reasons, including easy of use and flexible control of the taxonomy system, which is limited by the data model on Wikidata (Peter coxhead could expand on this and has an essay somewhere). The system works and English Wikipedia has sufficient editors to keep it working smoothly. —  Jts1882 | talk  09:24, 13 September 2021 (UTC)
There are currently 120,038 templates and 10 subcategories listed in Category:Taxonomy templates.
I can imagine a bot that creates one or more lua data modules from the data in these taxonomy templates. Then, with appropriate modifications, Module:Autotaxobox could use that (those) data module(s) so that other wikis wouldn't need to import 120,038 templates to have the full benefit of the autotaxobox system.
Trappist the monk (talk) 13:18, 13 September 2021 (UTC)
I have thought about how data subpages could be done with Lua. The problem is how to organise them. You want the data to be editable by any editor in most cases, but a few higher ranks are template protected, as they are used in thousands of pages. A Lua array would also be more sensitive to exact syntax than the templates.
The best solution might be a modification to Lua allowing it to access templates on other Wikipedias using the title library. —  Jts1882 | talk  15:43, 13 September 2021 (UTC)
Oh, wow... Now I understand what that editor meant when he talked about the number of templates. Assuming we'd have a bot capable of dealing with the template migration it still would require immense work maintaining and localizing each one of them. I'm swimming in VERY unknown waters now but... Can't we get ready-made data somehow from Wikispecies while also having Wikispecies have a Meta like structure so all the info would be maintained in 1 place and called in the specific local language when needed? A similar thing to how we already use Translatewiki. Wikispecies looks like it is created specifically for taxonomical purposes. - Klein Muçi (talk) 17:55, 13 September 2021 (UTC)
The obvious first choice for organization would be alpha because that is how Category:Taxonomy templates is currently organized – the first three 'groups' (", , and ×) would all go into a single 'non-alpha' category. Yeah, editors will certainly want to add new taxa to a data module. That might be most easily accomplished by writing a template, perhaps {{add new taxon}}, which would take as input the same parameters as any of the current taxonomy templates. A lua module would error-check the provided inputs, check that the new taxon isn't already in the appropriate data module, add the new entry to the data module in the proper format, sort, and return a wikitext version of the data table that the editor would then copy/paste over the old version. This is much like creating a {{cite iucn}} template with {{make cite iucn}}.
I'm not going to hold my breath for a modification to Lua allowing it to access templates on other Wikipedias using the title library. It's possible to fetch data tables from commons (mw.ext.data(). That might be a good solution so long as some sort of tool is available to edit/create entries in the data table. The data are then available to all wikis without en.wiki violating the no-wikidata rule (if there is such a thing – aren't there several infoboxen that draw their contents from wikidata?).
Trappist the monk (talk) 18:16, 13 September 2021 (UTC)
It is possible to convert the taxonomy templates into lua data modules but ...
I hacked an awb script that fetched taxonomy templates from Category:Taxonomy templates, read them, converted what was read into lua k/v pairs where the key is the template's name and the value is a lua table of the values found in the template. For example, Template:Taxonomy/Felis converted to:
['Felis']={['rank']='genus', ['link']='Felis', ['parent']='Felinae'}
I have also hacked some crude test code that crawls the tree from a starting taxon with the goal of getting to 'Life' (the terminus). Detail and an example is available at Module talk:Sandbox/trappist the monk/taxonomy.
The problem is memory. The largest data table is Module:Sandbox/trappist the monk/taxonomy P which has 11,419 k/v pairs (1,817,540 bytes – perilously close to the 2,097,152 limit). A simple test function that only loads Module:Sandbox/trappist the monk/taxonomy P and then gets the rank from Pabstia consumes 8,503,169 bytes. If the simple test merely returns the static text 'genus', the module consumes 619,967 bytes. The limit is 52,428,800 bytes. Not sure how to address this problem. The obvious next step is to make more data modules by splitting the larger modules into sections. Of course there's no guarantee that it won't be necessary to load all of the sections which might be less memory efficient than loading the whole section ...
Wikidata is probably the best solution because, after all, keeping track of data is its reason for existence. But en.wiki editors don't like it because they believe that wikidata is unreliable... I only looked at one genus entry and noticed right away that the parent was something different from the parent in the taxonomy template so getting the data right at wikidata could be a monumental headache.
Trappist the monk (talk) 01:24, 11 October 2021 (UTC)
Most taxonomy templates don't have references. For those that do, the reference usually takes more bytes than everything else in the template. Taxonomy templates should have references, so that would be a lot of memory. I suppose memory could be saved if each unique reference were given an ID (a DOI would work in many cases) that could be looked up in another table and expanded into a full citation. Although that seems to be getting back into Wikidata territory. Plantdrew (talk) 01:42, 11 October 2021 (UTC)
@Trappist the monk: you can't achieve getting the data right at wikidata. There are three main reasons:
  • In spite of items being said to be instances of 'taxon', it is well understood that they are instances of 'taxon name'. The same taxon may be represented by multiple Wikidata items, which may have (and always will in the case of taxon names below the level of genus) different clasification hierarchies. Thus the species Acmispon procumbens is represented by at least six Wikidata items in four different genera. To model taxonomy templates, Wikidata would first have to model taxa, not taxon names. It appears from long discussions over there that no-one knows how to do this.
  • Wikidata is, rightly, neutral between different classification systems, so in a significant number of cases a so-called 'taxon item' has more than one parent. Wikidata taxon items form a net, not a tree. But articles and taxoboxes have to form a tree; we have to choose a preferred classification for this purpose, although explaining alternative classifications in the text. Taxonomy and classification is subjective; different language wikipedias can legitimately choose different ones.
  • Even within a language wikipedia, different groups of organisms may have incompatible classifications, as mammals, birds and dinosaurs do here, based on reliable taxonomic sources for that group. The automated taxobox system has devices to allow this; for example, the parent in a taxobox can be set to a "/skip" variant. Thus at Template:Taxonomy/Mammalia the parent is set to "Mammaliaformes/skip", whereas at Template:Taxonomy/Kuehneotheria the parent is set to "Mammaliaformes". This allows extant mammal classifications to skip a long list of clades used in the classification of fossil taxa. The meaning of the term "plant" varies. In taxoboxes for flowering plants, the traditional kingdom Plantae is used. In taxoboxes for groups of green algae, like Chlorophyta, the kingdom is Viridiplantae. Variant taxonomy templates are used, in this case those with or without "/Plantae" attached to the name of the parent. This is not data of the kind that Wikidata should try represent, unless it chooses to model language wikipedias rather than taxonomy.
See also my essay User:Peter coxhead/Wikidata issues. Peter coxhead (talk) 06:45, 11 October 2021 (UTC)
There are some tools where you can visualise the child-parent relationships on Wikidata. A Sparql query can show the child relationships of a taxon, e.g. Felidae and child taxa. The parent relationships and multiple pathways can be seen with Scolia, e.g. Felidae and parent taxon hierarchy. —  Jts1882 | talk  12:40, 11 October 2021 (UTC)
Thanks for that. I could get seasick watching those diagrams...
Trappist the monk (talk) 14:39, 11 October 2021 (UTC)
Ok, no wikidata. Thanks for the education.
Trappist the monk (talk) 14:39, 11 October 2021 (UTC)
moved here because this part of the discussion was inserted in the middle of my post...
What is the cause of the overhead? It seems to take five times the memory of the file. The total file size of all your subpages is about 14MB, well within the memory limits.
You could make the module subpages smaller by using rank='genus' instead of ['rank']='genus'. Reducing 12-16 characters per entry adds up over 100,000 characters in the larger files. You could go further and not use keys at all for the properties:
['Felis']='genus', 'Felis', 'Felinae', "", "", ""
This would leave plenty of room for more entries in even the largest files (P,C,A,S). I assume it would make little difference to the Lua memory use, though. —  Jts1882 | talk  13:44, 11 October 2021 (UTC)
Umm, as written that doesn't work because fetching the value assigned to ['Felis'] will only ever return 'genus'. The value assigned to ['Felis'] needs to be a table so that rank, link, parent, etc can be fetched. I thought about doing more-or-less what you suggest but I wanted the lua tables to be as 'user-friendly' as possible because you wrote above You want the data to be editable by any editor in most cases. I may still shift to something similar:
['Felis']={'genus', 'Felis', 'Felinae'}
only including values for the omitted elements of the table when there is something else to include:
['Felis']={'genus', 'Felis', 'Felinae', nil, nil, 'reference'}
I can write a bit of code that will show which are the most commonly used indexes so that the sequence tables are ordered in that way.
I don't know what causes scribuntu to suck up so much memory. Metatables? I confess that I have never been able to wrap my brain around metatables...
Trappist the monk (talk) 14:39, 11 October 2021 (UTC)
Oops, that code example was supposed to be a table, but I omitted the curly brackets. For more readable then how about my first suggestion. It's clear what each entry is and makes a significant file size saving (~8% for A templates?).
['Felis']= { rank='genus', link='Felis', parent='Felinae', refs='citation' }
You can get the default order from the create taxonomy page (see the preload template or an example). I'd expect the order of usage to be rank=link=parent > refs > extinct > same_as [Edit: the actual numbers are: rank=9417, link=9415, parent=9417, extinct=1396, same_as =5, refs =2926] —  Jts1882 | talk  16:39, 11 October 2021 (UTC)
I used the bracketed form because at Wikipedia:Automated taxobox system/taxonomy templates § same as it says:
(same as, i.e. without the underscore, can also be used.)
which suggested that |always_display= might also have an alias |always display=. At the time I was more interested in getting the templates translated accurately than in the actual prettiness of the taxon tables so I simply copied the parameter names from the template, wrapped them, (['...']), and added them to the new lua table in the order that they appeared in the source template. The next iteration of the script that reads the template can better organize the lua tables.
I manually split Module:Sandbox/trappist the monk/taxonomy P into a dozen smaller data modules (~Taxonomy P1 – ~Taxonomy P11, 1000 taxa each, and ~Taxonomy P12, the remaining 400ish). A simple experiment (not yet saved) tests the taxon name against the first and last entries of the data modules and then loads only the data module where the taxon name is located, renders Module talk:Sandbox/trappist the monk/taxonomy using 36,813,949 bytes instead of 41,847,892 bytes (a 5,033,943 byte improvement). That suggests that further improvements might be gained by splitting all of the modules above a certain size. Even if we spit all of the current data modules that have more than 1000 taxa, that would be about 100 modules; much more manageable than the 87,000+ individual templates that other wikis now have to import if they want the full suite.
Trappist the monk (talk) 17:54, 11 October 2021 (UTC)
Further splitting of all data modules with more than 2500 taxa has reduced the memory consumption (for this simple experiment only) to 16,820,090 bytes from 41,847,892 bytes (a 25,027,802 improvement).
Trappist the monk (talk) 19:20, 12 October 2021 (UTC)
A few more characters could be saved if you switch to " instead of ' as you won't need to escape those characters like in |\'\'Rubus\'\' subg. \'\'Chamaemorus\'\'. No idea if that helps in the byte section though. Gonnym (talk) 22:45, 13 October 2021 (UTC)
True. But, in today's creation of the biggest file (the P file that makes Module:Sandbox/trappist the monk/taxonomy P1 ...), there are 1490 instances of \'. P is 1,676,975 bytes of which 1490 is a trivial amount. If it gets to the point that we have to worry about such a small number of bytes, this experiment is doomed.
Trappist the monk (talk) 23:13, 13 October 2021 (UTC)
Another thing which might help save a few more is handling shared refs better. /P1 has 75 https://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=, with only the |search_value= number being different. Maybe the ref can store only that number and in the main module handle the ref? That source would be bigger if it used the complete citation information. Gonnym (talk) 13:32, 14 October 2021 (UTC)
Possible but I think it would be painful to invent some sort of mechanism that maps long part of the reference to the truncated short part. Once the taxonomy templates go away, how is the mechanism maintained. I suspect that whatever the mechanism, it will be a burden on editors. It will be difficult enough to maintain the modules because there will be times when the taxomap table (currently in Module:Sandbox/trappist the monk/taxonomy) will need to be updated when a new taxon is added or and old one removed.
Trappist the monk (talk) 16:46, 14 October 2021 (UTC)
What is the reason for this improvement? Does the module handle smaller blocks more efficiently or is it just loading less because it only loads the smaller blocks it needs? —  Jts1882 | talk  10:17, 14 October 2021 (UTC)
Smaller blocks loaded only as needed require less memory. If you edit and preview Module talk:Sandbox/trappist the monk/taxonomy and then look in Parser profiling data (bottom of the page) and [Show] Lua logs you can see a list of the modules that were loaded to render the left-hand taxa list. The table taxomap_t and the function module_select() in Module:Sandbox/trappist the monk/taxonomy use the taxon name to select a module to load for that taxon.
Trappist the monk (talk) 11:15, 14 October 2021 (UTC)
The righthand list at Module talk:Sandbox/trappist the monk/taxonomy now uses Module:Autotaxobox/sandbox which I have modified so that its data (taxa and rank tables) are extracted from the lua data modules by Module:Sandbox/trappist the monk/taxonomy. A small step.
Trappist the monk (talk) 16:46, 14 October 2021 (UTC)
@Trappist the monk: your changes are causing errors at Template:Taxonomy key/testcases, which then pollute the error tracking categories. Please fix a.s.a.p. Peter coxhead (talk) 21:02, 14 October 2021 (UTC)
Pings don't work when added after the fact. What categories are being polluted by sandbox changes? The errors occurred because the data being used is the original data without contributions from Category:Taxonomy templates for species, Category:Taxonomy templates with qualified names, and Category:Taxonomy templates with query. I started a new build of the data set last night which includes those categories so I've disabled my change until the new data set is complete.
Trappist the monk (talk) 21:39, 14 October 2021 (UTC)
@Trappist the monk: the errors have gone now, so thanks. They were only in the output using the sandbox, not in the output using the live version, so it was definitely a problem with the sandbox version. Taxobox and taxonomy template errors are tracked by a wide variety of categories; see Category:Taxobox cleanup. Because of the way the automated taxobox system works, a single error often results in pages showing up in multiple error-tracking categories. Peter coxhead (talk) 07:04, 15 October 2021 (UTC)
Since we're talking about errors, here is a list of what I think are errors that the conversion script detected:
errors detected by script when converting taxonomy templates to lua data 2021-10-13/14
Trappist the monk (talk) 14:04, 15 October 2021 (UTC)
This is useful. I'm working on fixing these errors, going through the alphabet backwards (completed G-Z). I should be able to finish this off today, but if anybody else wants to join in, just let me know what portion of the alphabet you're working on. Plantdrew (talk) 15:35, 15 October 2021 (UTC)
@Plantdrew: I'm working from A, but don't have much time today. A–F done. Peter coxhead (talk) 17:14, 15 October 2021 (UTC)
I think I've got them all fixed now. @Trappist the monk:, could you rerun this to make sure we got all the errors? (there were a few cases with some strangeness in a reference that I'm not positive I repaired). Plantdrew (talk) 19:31, 15 October 2021 (UTC)
I reran the script over the list of templates above. It did not like these two:
  • Template:Taxonomy/Lorisoidea – presentation 'code' does not belong in the data so {{plainlist}} and the unordered-list markup (*) should go away. If you want two citations to appear on separate lines in the 'Taxonomic references' line of the template doc, put them on separate lines in the template code and that is all you need do.
  • Template:Taxonomy/Macroderma|pages={{#if:||312–529}} doesn't make any sense. Essentially it means:
    If <nothing> evaluates to <something> then return <nothing> else return 312–529
    And, because <nothing> does not evaluate to <something>, it returns 312–529. Who thought that such a construct was a good idea? For this template, it should simply be |pages=312–529.
Trappist the monk (talk) 22:39, 15 October 2021 (UTC)

@Trappist the monk: it's been a useful clean up, so thanks! As the citation(s) given by |refs= aren't used by the system – and can't be, even if complete, for all the reasons that references in templates are problematic – I think it doesn't matter whether a limited amount of presentation code appears (e.g. list mark up, br tags), but I agree that templates, like {{plainlist}} shouldn't. The purpose of the citation(s) is to show other editors what classification system was used; they are woefully incomplete.

It's interesting, although not entirely surprising, how many creators of taxonomy templates left the HTML comment saying "don't use ref tags", and then did just that. Ideally the interface to the data currently in taxonomy templates would be done in the standard way for web interfaces to databases: generate a dynamic page from a backend, use Javascript to check the entries locally as far as possible, post to the backend for further checking and storage. This would bypass a lot of the current checking, which relies on regular monitoring of error-tracking categories, followed by manual corrections. Peter coxhead (talk) 06:33, 16 October 2021 (UTC)

You could add something like this to l.doShowRefs() at line 396:
	elseif refs:match ('\127[^\127]*UNIQ%-%-(%a+)%-[%a%d]+%-QINU[^\127]*\127') then
		-- choose a better error message
		-- and add an error category
		error ('has stripmarker')
The above should find any construct that uses Strip markers. Choose a better error message and don't use the lua error() function to format it and add a category to track these kinds of errors.
Trappist the monk (talk) 12:04, 16 October 2021 (UTC)
Yesterday I rebuild the data set with tightened error reporting. Here is a list of things that my script reported:
errors detected by script when converting taxonomy templates to lua data 2021-10-22
Trappist the monk (talk) 12:28, 23 October 2021 (UTC)
Gone through A-E, although I've left extinct=no. —  Jts1882 | talk  16:50, 23 October 2021 (UTC)
Completed list (F-Z). Loopy30 (talk) 12:42, 24 October 2021 (UTC)
It's not actually an error to put a value of "no" or "false" where "yes" or "true" is allowed. It's redundant, yes, but not an error. It may occasionally be useful to emphasize that a negative value is intended, rather than just accepted by default (e.g. if a taxon was previously thought to be extinct, but has since been rediscovered). Peter coxhead (talk) 16:28, 24 October 2021 (UTC)
The template documentation does not support the notion of a distinction between |exinct= (empty or omitted) and explicit |extinct=no or explicit |extinct=false; see Template:Automatic taxobox/editintro/preload and Wikipedia:Automated taxobox system/taxonomy templates#extinct.
There should be no distinction between default empty/omitted meaning boolean no/false and explicitly stated no/false. If the extinction status of a taxon is not definitively yes/true and not definitively no/false, that distinction should be made by the use of a non-boolean keyword to indicate something other than extinct/not extinct: unknown, dubious, disputed or some-such and the chosen keyword(s) and definitions should be specified in the template documentation.
Editors will 'fill-in the blanks' without intending to convey any emphasis. If emphasis is required for |extinct= then use of a non-boolean keyword and an appropriate citation in |refs= that supports that keyword choice is better than assuming emphasis where emphasis may or may not exist.
My script ignored |extinct=no, |extinct=false, |always_display=no, and |always_display=false because the template documentation does not distinguish between default and explicit and because it cannot 'know' that an explicit no/false is actually there as emphasis – I suspect that the same applies to most humans, even those who are expert in the intricacies of the automated taxobox system.
Trappist the monk (talk) 13:08, 25 October 2021 (UTC)
Template:Taxonomy/Abraeomorphus not detected by my script but is that an appropriate use of |refs=?
Trappist the monk (talk) 18:14, 23 October 2021 (UTC)
@Trappist the monk: any link to a source is better than none, although clearly it would be better not to use a bare URL. Peter coxhead (talk) 08:38, 24 October 2021 (UTC)
I wrote the above at 18:14, 23 October 2021. Editor Jts1882 edited the template at 19:20, 23 October 2021. See the original version.
Trappist the monk (talk) 11:22, 24 October 2021 (UTC)
@Trappist the monk: ah, sorry. Peter coxhead (talk) 16:28, 24 October 2021 (UTC)

Issues

Although converting taxonomy templates to entries in Lua tables is an interesting intellectual exercise, I have serious reservations about it ever being deployed. It solves one issue (moving taxonomy data to another language wiki) at the expense of creating problems that I see as major.

  • An important general principle of good software design is modularity. Information and actions on that information should be kept in self-contained units, with limited and strictly defined interfaces. This approach has been repeatedly shown to lead to better quality software, less prone to error and easier to maintain. Moving the data in disparate taxonomy templates into one Lua module seriously violates this principle. The only reason for the inclusion of particular entries appears to be that they start with the same letter.
  • It's vital that ordinary editors who create and maintain articles about taxa are able to create and edit the taxonomic information used in taxoboxes.
    • Clicking on the red pencil icon in a taxobox takes you directly to the relevant taxonomy template. How would this work with a Lua module? Taking you to the whole module is not likely to be helpful to new or less experienced editors.
    • If you create an automated taxobox targetting a taxon without a taxonomy template, a "fix" link appears which takes you directly to a partially filled in taxonomy template (including special cases like "/?" variants). How will this work with a Lua module?
    • Most taxonomy templates are open to editing by any editor. But some top-level templates that have been subject to repeated vandalism in the past and where changes would affect potentially thousands of articles are protected (mostly template editor status is required). How would different levels of protection for different entries in a Lua table work?
  • I monitor the main error-tracking categories for taxoboxes and taxonomy templates almost every day, and often more than once a day. It's rare for a day to pass without my needing to fix errors; some are trivial, others not. Even very experienced editors can occasionally make mistakes, and newer editors frequently get bits wrong. Making a mistake in (or deliberately vandalizing) a single taxonomy template can cause problems if it's high up the classification hierarchy, but doing the same in a Lua module holding who knows how many items of taxonomic information could cause literally tens of thousands of article taxoboxes to break if the error results in bad Lua code, which I suspect is highly likely given my experience of creating and editing Lua.

Peter coxhead (talk) 09:05, 15 October 2021 (UTC)

You seem intent on killing this idea before it has even got started. How many editor-hours has it taken to get the taxonomy template system to where it is now? Converting the existing system of 87000+ individual taxonomy templates, the handful of support templates, and a lua module to some other data structure (possibly lua, possibly something else) is not going to happen over night. I don't yet have answers for all of your issues but I have given some thought to them and will continue to do so.
I think that there is merit in your first bullet point but I don't entirely agree. Keeping a data set separate from the processes that operate on it is also modularity and is why databases exist and why wikidata seemed to be the right place for all of this taxonomy tree data. Sure, we could adopt a data organization that isn't alphabetical; suggest a better organization.
Trappist the monk (talk) 14:44, 15 October 2021 (UTC)
It's not a question of killing the idea, merely noting some issues that, in my view, absolutely must be satisfactorily resolved in any alternative data storage, i.e. working towards a specification of the user interface that any implementation of the automated taxobox system must meet.
Note that to users, Wikidata doesn't present itself as a relational database composed of tables, but as individual items, one per page. How the taxonomic data underlying the automated taxobox system is stored is one thing; there's certainly nothing sacred about using templates, which weren't intended for this purpose. My concern is not how the data is stored, but how it is presented and edited. If an underlying Lua storage can be combined with a single template-style presentation per taxon for viewing and editing, possibly looking more like a Wikidata item, then it will indeed be a useful change. If editors had to edit Lua tables, it would not be. Peter coxhead (talk) 16:36, 15 October 2021 (UTC)
A couple of thoughts on these issues
  • The alphabetical breakdown is just a simple practical split. In principle it should be possible to break up the taxonomy templates by biological group (e.g. by phyla or class). This wouldn't be simple as the sorting would need to traverse the taxonomy tree. The higher level taxa could all be placed in a protected module, which would deal with the vandalism issue.
  • The editing and creation of new taxonomy templates presents a difficult problem. At present, when a new taxonomy template is required, editors are presented with a page to create a new taxonomy templete with some information pre-entered. If they enter the data incorrectly it will only affect the page they are working on. If they have to add a new entry to a Lua module and made a syntax error that would affect a large number of pages. Most editors won’t be familiar with Lua syntax.
  • The alternative is to make editing the taxonomy modules a restricted task, but this goes against the ethos of Wikipedia. Any such system would make it more difficult for editors to create automatic taxoboxes or to change the taxonomy. Neither this situation nor the potential for errors when editing the Lua modules is a satisfactory solution.
  • The Lua module approach could be a good option for a small Wikipedia wanting to piggyback on the efforts of the English Wikipedia and its large editor base. If only a few dedicated editors are making the changes then Lua erros are less of a problem. Of they could use a set of modules at Commons that could be periodically updated from the new taxonomy template categories. I suspect a Commons solution might get resistance from those who would prefer a Wikidata solution.
  • I do see some additional uses of the taxonomy data modules. They could help examining the taxonomy systems in use (an equivalent to my taxonomy brower script in Lua), comparing to Wikidata (much easier in Lua than Javascript), and pinpointing issues (e.g. lack of references).
  • With the Lua memory issue being solved (by freeing memory of loaded modules), will the larger modules (e.g. for P) be recreated?
—  Jts1882 | talk  10:13, 18 October 2021 (UTC)
Yeah,
  • organization by rank doesn't make much sense; you'd have to know the rank of the taxon before you could get it's data (which lists the rank)
  • acknowledged. We could limit the broken module syntax issue to some extent by creating a template that creates a taxon entry much like the unsubst'd output of {{make cite iucn}} that editors could add to Module:New taxon data (or some such). A sufficiently clever template could read that module as wikitext, insert the new taxon data and then render the table from the new data module as wikitext so that editors simply replace to current module content with the new via copy/paste. When the number of entries in the new data module exceeds some threshold, its contents could be swept into the appropriate data modules by awb or somesuch. The new data module would always be consulted before each main data module is consulted when walking the taxon tree. We might also use it to mark 'deleted' taxa data by supporting ['obsolete taxon name']={deleted=true},. The next time the taxa data are swept into the main data modules, the sweeping script would know to remove the deleted taxa data. Attempts to use taxa data that has been marked for deletion would cause an error message.
  • I've been wondering some more about wikidata. Is there any restriction that would prevent the creation of, for lack of a better term, 'private' data set specifically dedicated to autotaxoboxen? There is objection to use of the 'generic' taxa data at wikidata but what if there were qids and properties dedicated solely to autotaxoboxen. The real pain there is that you have to know the qid for each taxon so that you could use {{#property:P...|from=Q...}} so instead of {{#property:P105|from=Q228283}} to get the taxon rank (genus) we would have a separate set of qids and perhaps some additional dedicated properties (taxon name (P225), taxon rank (P105), parent taxon (P171) already exist) so we'd need P numbers for taxon link (and some mechanism to link to language appropriate targets?), extinct, always_display, refs, and same_as. We'd want a new qid for use with instance of (P31) to identify autotaxobox taxa. No doubt, no doubt, there is stuff I haven't thought about here ...
  • I'm not sure if the large data modules should come back. We just don't know how much memory will be available to us when the autotaxoboxen are rendered. Using the large data modules may overrun the memory limit depending on when in the page rendering the autotaoboxen are rendered. Segmented data reduce the chances that memory overrun will occur.
Trappist the monk (talk) 13:59, 18 October 2021 (UTC)
Linking to language-appropriate wikipedia articles is simple. This returns the name of the local language's article for Felis (Q228283) (the names and associated language codes are listed in the panel top right):
mw.wikibase.getSitelink ('Q228283', mw.getContentLanguage():getCode() .. 'wiki')
Trappist the monk (talk) 14:54, 18 October 2021 (UTC)
On the Wikidata, in theory you could add an "English Wikipedia Taxobox" qualifier (PXXX) to a parent taxon (P171) that would specify the parent to use with automatic taxoboxes. This would be similar to how taxon name (P225) is qualified by taxon author (P405) and year of publication of scientific name for taxon (P574). This would allow the parent to use to be specified, but wouldn't allow the flexibility to allow alternative classifications the way the {{Taxonomy/skip}} templates allow different schemes for extant and extinct birds or different groups of plants. —  Jts1882 | talk  15:17, 18 October 2021 (UTC)
Umm, but if the goal is to make this not English Wikipedia Taxobox but rather, to make this 'All-Wikipedia Taxobox' (for all Wikipedias), then, to me, such a qualifier doesn't seem appropriate. And, we would also have to add properties to the generic taxa qids for stuff that the generic items don't need: extinct, always_display, refs, and same_as (to support ~/? and ~/skip). Isn't it better to create these data things cleanly without trying to shoehorn them into the existing data structure?
Trappist the monk (talk) 15:54, 18 October 2021 (UTC)
I'm all in favour of clean data structures, so I agree that Wikidata isn't the way to go.
I suspect that what would be most useful to a small wikipedia would be the logic: if there's local taxonomic data, use that, otherwise use the import from the English wikipedia.
Here's an idea. When trying to retrieve taxonomic data for a taxon, first try a taxonomy template. If this doesn't exist, try the Lua module datastore. Then have a bot running that moves data from templates to the Lua datastore. This would allow the use of the existing creation/editing interface with its advantages plus the advantages of a more compact data storage. Almost all that would need changing inside the automated taxobox system is the access to taxonomy templates via Module:Autotaxobox:getTaxonInfoItem. Peter coxhead (talk) 17:48, 18 October 2021 (UTC)
I think that you have misread what I wrote. I was suggesting a dedicated set of autotaxobox taxonomy data held separately from the taxonomy data already at wikidata. I have started a discussion: d:Wikidata:Project chat#is this possible?
Your idea of retrieving taxonomic data from an available template before falling back on the lua data modules has occurred to me though I haven't yet given it much thought. What else besides getTaxonInfoItem() would need to change?
Trappist the monk (talk) 14:09, 19 October 2021 (UTC)
Difficult to say for sure. I tried to map the operation of the system at Wikipedia:Automated taxobox system/map, although it may not be entirely up to date. The problem, as I discovered when I converted the core recursive parts to Lua, is that the system was constructed in typical Wikipedia fashion, i.e. piecemeal, by different editors (albeit only a few) with different coding styles, and above all with no planning and limited documentation of the overall system. In principle, I would expect only the access function and the preload templates to need revising – when an editor wants to edit data not held in a template, the generated taxonomy template would need to be preloaded from the alternative data source. But you'll only find out by experiment, I suspect. Peter coxhead (talk) 18:03, 19 October 2021 (UTC)
Regarding the usage of lua database and how it would be editable. The alternative is to make editing the taxonomy modules a restricted task, but this goes against the ethos of Wikipedia - being able to edit and see your changes at the same moment is not the same as not being able to edit at all. There are a great many of pages with restricted access for any number of reasons.
This leads directly to I monitor the main error-tracking categories for taxoboxes and taxonomy templates almost every day, and often more than once a day. It's rare for a day to pass without my needing to fix errors; some are trivial, others not - if the process of adding and modifying an entry revolves around an edit request (which can be created with some kind of preload template to make it user-friendly), then both above points are handled. Anyone can create or modify an entry and instead of going around fixing issues, those same issues can be handled before being deployed. Gonnym (talk) 08:35, 21 October 2021 (UTC)
@Gonnym: no, they can't. If an editor creates a new article about a taxon, they will, rightly, also create a taxobox. If the taxonomy templates don't exist to support the taxobox, then they can create them. Small errors in these templates can then be fixed by other editors – I'm not the only one monitoring the tracking categories. Similarly if an article is updated to a revised taxonomy, the overwhelming majority of taxonomy templates can be updated by the editor. Having to submit an edit request first will result in totally non-functioning or inconsistent taxoboxes. I insist that the acceptability and success of the automated taxonomy system is dependent on as much as possible of it being open to editing by ordinary editors. How this is achieved is another matter; I hold no brief for taxonomy templates as such (as I've said repeatedly) but I do for the user interface they provide. Peter coxhead (talk) 08:50, 21 October 2021 (UTC)
What you are describing is how the current system works. That doesn't mean that it is the only option. Gonnym (talk) 09:00, 21 October 2021 (UTC)
The edit request method is impractical. Someone wants to add an automatic taxobox and finds the taxonomy template for the taxon is missing. They submit an edit request but cant' finish editing the page with the article until this is approved. Then they find that the parent is missing a taxonomy template and have to add another edit request. More delay. Sometimes new taxa require four or more higher level taxonomy templates to be created. Currently these can be made quickly in sequence. The delays with edit requests would discourage people from using the automatic taxobox system. —  Jts1882 | talk  09:20, 21 October 2021 (UTC)
Exactly. Peter coxhead (talk) 16:43, 21 October 2021 (UTC)

Using Wikidata and module data pages in taxoboxes

As it might help with portability to other wikipedias I've added the options to use data from Wikidata and the module data pages to the prototype Lua taxonbox, {{biota infobox}} as an experiment.

{{automatic taxobox}} {{biota infobox|db=templates}} {{biota infobox|db=module}} {{biota infobox|db=wikidata}}

Felis
Scientific classification Edit this classification
Domain: Eukaryota
Kingdom: Animalia
Phylum: Chordata
Class: Mammalia
Order: Carnivora
Suborder: Feliformia
Family: Felidae
Subfamily: Felinae
Genus: Felis
Felis
Scientific classificationEdit this classification
Domain: Eukaryota
Kingdom: Animalia
Phylum: Chordata
Class: Mammalia
Order: Carnivora
Suborder: Feliformia
Family: Felidae
Subfamily: Felinae
Genus: Felis
Felis
Scientific classificationEdit this classification
Felis
Scientific classificationEdit this classification
Superkingdom Holozoa
Kingdom Animalia
Phylum Chordata
Class Mammalia
Order Carnivora
Family Felidae
Subfamily Felinae
Genus Felis

More examples, including species and subspecies can be found at User:Jts1882/taxobox. More taxoboxes with Wikidata can be found at User:Jts1882/taxobox/Wikidata.

The Wikidata versions are functional if just the major ranks are required, but lack flexibility. The big issue is what to do when multiple parents are presented. One can opt to take the first or last parent with quite different results. The following tables show an example (for Passer) selecting first or last parent when multiple parents are entered. The automated taxonomy version is shown to the right.

Using first parent when multiple parents on Wikidata Using last parent when multiple parents on Wikidata

Ancestral taxa taken from Wikidata.
Bold taxa show those that will be displayed in taxobox.

Ancestral taxa (from Wikidata)
Rank Taxon Wikidata ID Parents at Wikdata
Superdomain Biota
Superkingdom Eukaryota Q19088 [edit] Biota (Q2382443)
Cytota (Q3322575)
Unranked Amorphea Q474156 [edit] eukaryote (Q19088)
Podiata (Q48995893)
Unranked Obazoa Q22087764 [edit] Amorphea (Q474156)
Unranked Opisthokonta Q129021 [edit] Obazoa (Q22087764)
Unikont (Q964455)
Amorphea (Q474156)
Superkingdom Holozoa Q1205110 [edit] Opisthokont (Q129021)
Unranked Filozoa Q1131559 [edit] Holozoa (Q1205110)
Unranked Apoikozoa Q24966129 [edit] Filozoa (Q1131559)
Kingdom Animalia Q729 [edit] Apoikozoa (Q24966129)
Subkingdom Eumetazoa Q5174 [edit] animal (Q729)
Subkingdom Bilateria Q5173 [edit] Eumetazoa (Q5174)
ParaHoxozoa (Q16976127)
Unranked Nephrozoa Q3059449 [edit] Bilateria (Q5173)
Superphylum Deuterostomia Q150866 [edit] Nephrozoa (Q3059449)
Phylum Chordata Q10915 [edit] deuterostome (Q150866)
Subphylum Vertebrata Q25241 [edit] Chordata (Q10915)
Olfactores (Q3280581)
Infraphylum Gnathostomata Q26214 [edit] Vertebrata (Q25241)
Unranked Eugnathostomata Q3059636 [edit] Gnathostomata (Q26214)
Megaclass Osteichthyes Q27207 [edit] Eugnathostomata (Q3059636)
Superclass Sarcopterygii Q160830 [edit] Osteichthyes (Q27207)
Euteleostomi (Q1378800)
Unranked Dipnotetrapodomorpha Q23809240 [edit] Sarcopterygii (Q160830)
Unranked Tetrapodomorpha Q1209254 [edit] Dipnotetrapodomorpha (Q23809240)
Rhipidistia (Q150598)
Unranked Eotetrapodiformes Q5381965 [edit] Tetrapodomorpha (Q1209254)
Infraclass Elpistostegalia Q150821 [edit] Eotetrapodiformes (Q5381965)
Unranked Stegocephalia Q7460384 [edit] Elpistostegalia (Q150821)
Superclass Tetrapoda Q19159 [edit] Stegocephalia (Q7460384)
Unranked Reptiliomorpha Q134683 [edit] Tetrapoda (Q19159)
Unranked Amniota Q181537 [edit] Reptiliomorpha (Q134683)
Unranked Sauropsida Q329457 [edit] Amniota (Q181537)
Class Reptilia Q10811 [edit] Sauropsida (Q329457)
Unranked Eureptilia Q3060510 [edit] Reptilia (Q10811)
Unranked Romeriida Q1061761 [edit] Eureptilia (Q3060510)
Unranked Diapsida Q134688 [edit] Romeriida (Q1061761)
Unranked Neodiapsida Q3497035 [edit] Diapsids (Q134688)
Unranked Sauria Q2254408 [edit] Neodiapsida (Q3497035)
Unranked Archelosauria Q19595522 [edit] Sauria (Q2254408)
Unranked Archosauromorpha Q134676 [edit] Archelosauria (Q19595522)
Sauria (Q2254408)
Unranked Archosauriformes Q282487 [edit] Archosauromorpha (Q134676)
Crocopoda (Q24067255)
Unranked Eucrocopoda Q28647762 [edit] Archosauriformes (Q282487)
Unranked Crurotarsi Q131341 [edit] Eucrocopoda (Q28647762)
Unranked Archosauria Q130910 [edit] Crurotarsi (Q131341)
Unranked Avemetatarsalia Q133187 [edit] archosaur (Q130910)
Unranked Ornithodira Q133190 [edit] Avemetatarsalia (Q133187)
Unranked Dinosauromorpha Q616657 [edit] Ornithodira (Q133190)
Unranked Dinosauriformes Q2740164 [edit] Dinosauromorpha (Q616657)
Unranked Dracohors Q52798775 [edit] Dinosauriformes (Q2740164)
Superorder Dinosauria Q430 [edit] Dracohors (Q52798775)
Order Saurischia Q186334 [edit] dinosaur (Q430)
Unranked Eusaurischia Q2013709 [edit] Saurischia (Q186334)
Suborder Theropoda Q188438 [edit] Eusaurischia (Q2013709)
Saurischia (Q186334)
Unranked Neotheropoda Q145206 [edit] Тheropod (Q188438)
Unranked Averostra Q4828332 [edit] Neotheropoda (Q145206)
Infraorder Tetanurae Q131391 [edit] Averostra (Q4828332)
Unranked Orionides Q4188599 [edit] Tetanurae (Q131391)
Unranked Avetheropoda Q138921 [edit] Orionides (Q4188599)
Clade Coelurosauria Q131092 [edit] Avetheropoda (Q138921)
Unranked Tyrannoraptora Q2043179 [edit] Coelurosauria (Q131092)
Unranked Maniraptoromorpha Q52762328 [edit] Tyrannoraptora (Q2043179)
Clade Neocoelurosauria Q128863356 [edit] Maniraptoromorpha (Q52762328)
Clade Maniraptoriformes Q134143 [edit] Neocoelurosauria (Q128863356)
Clade Maniraptora Q131793 [edit] Maniraptoriformes (Q134143)
Unranked Pennaraptora Q4828939 [edit] Maniraptora (Q131793)
Unranked Paraves Q136586 [edit] Pennaraptora (Q4828939)
Unranked Avialae Q782930 [edit] Paraves (Q136586)
Unranked Euavialae Q18389731 [edit] Avialae (Q782930)
Unranked Avebrevicauda Q2200094 [edit] Euavialae (Q18389731)
Unranked Pygostylia Q135800 [edit] Avebrevicauda (Q2200094)
Unranked Ornithothoraces Q135333 [edit] Pygostylia (Q135800)
Unranked Euornithes Q2752642 [edit] Ornithothoraces (Q135333)
Unranked Ornithuromorpha Q13965050 [edit] Euornithes (Q2752642)
Unranked Ornithurae Q3239179 [edit] Ornithuromorpha (Q13965050)
Class Aves Q5113 [edit] Ornithurae (Q3239179)
Subclass Neornithes Q19163 [edit] bird (Q5113)
Subclass Neognathae Q19168 [edit] Neornithes (Q19163)
Superorder Neoaves Q2330918 [edit] Neognathae (Q19168)
Unranked Passerea Q19598418 [edit] Neoaves (Q2330918)
Unranked Telluraves Q20645445 [edit] Passerea (Q19598418)
Unranked Australavis Q14635103 [edit] Telluraves (Q20645445)
Unranked Eufalconimorphae Q326483 [edit] Australaves (Q14635103)
Unranked Psittacopasserae Q5856078 [edit] Eufalconimorphae (Q326483)
Order Passeriformes Q25341 [edit] Psittacopasserae (Q5856078)
Unranked Eupasseres Q104864059 [edit] Passeriformes (Q25341)
Suborder Passeri Q194240 [edit] Eupasseres (Q104864059)
Parvorder Passerida Q764420 [edit] songbirds (Q194240)
Superfamily Passeroidea Q749521 [edit] Passerida (Q764420)
Family Passeridae Q28922 [edit] Passeroidea (Q749521)
Subfamily Passerinae Q7369467 [edit] true sparrows (Q28922)
Genus Passer Q28753 [edit] Passerinae (Q7369467)

Ancestral taxa taken from Wikidata.
Bold taxa show those that will be displayed in taxobox.

Ancestral taxa (from Wikidata)
Rank Taxon Wikidata ID Parents at Wikdata
Superdomain Biota
Superdomain Cytota Q3322575 [edit] Biota (Q2382443)
Superkingdom Eukaryota Q19088 [edit] Biota (Q2382443)
Cytota (Q3322575)
Unranked Orthokaryotes Q48836620 [edit] eukaryote (Q19088)
Unranked Neokaryotes Q48836623 [edit] Orthokaryotes (Q48836620)
Unranked Podiata Q48995893 [edit] eukaryote (Q19088)
Neokaryotes (Q48836623)
Unranked Amorphea Q474156 [edit] eukaryote (Q19088)
Podiata (Q48995893)
Unranked Opisthokonta Q129021 [edit] Obazoa (Q22087764)
Unikont (Q964455)
Amorphea (Q474156)
Superkingdom Holozoa Q1205110 [edit] Opisthokont (Q129021)
Unranked Filozoa Q1131559 [edit] Holozoa (Q1205110)
Unranked Apoikozoa Q24966129 [edit] Filozoa (Q1131559)
Kingdom Animalia Q729 [edit] Apoikozoa (Q24966129)
Subkingdom Eumetazoa Q5174 [edit] animal (Q729)
Unranked Parahoxozoa Q16976127 [edit] Eumetazoa (Q5174)
Subkingdom Bilateria Q5173 [edit] Eumetazoa (Q5174)
ParaHoxozoa (Q16976127)
Unranked Nephrozoa Q3059449 [edit] Bilateria (Q5173)
Superphylum Deuterostomia Q150866 [edit] Nephrozoa (Q3059449)
Phylum Chordata Q10915 [edit] deuterostome (Q150866)
Unranked Olfactores Q3280581 [edit] Chordata (Q10915)
Subphylum Vertebrata Q25241 [edit] Chordata (Q10915)
Olfactores (Q3280581)
Infraphylum Gnathostomata Q26214 [edit] Vertebrata (Q25241)
Unranked Eugnathostomata Q3059636 [edit] Gnathostomata (Q26214)
Unranked Teleostomi Q134681 [edit] Gnathostomata (Q26214)
Eugnathostomata (Q3059636)
Unranked Euteleostomi Q1378800 [edit] Gnathostomata (Q26214)
Teleostomi (Q134681)
Superclass Sarcopterygii Q160830 [edit] Osteichthyes (Q27207)
Euteleostomi (Q1378800)
Unranked Rhipidistia Q150598 [edit] Sarcopterygii (Q160830)
Unranked Tetrapodomorpha Q1209254 [edit] Dipnotetrapodomorpha (Q23809240)
Rhipidistia (Q150598)
Unranked Eotetrapodiformes Q5381965 [edit] Tetrapodomorpha (Q1209254)
Infraclass Elpistostegalia Q150821 [edit] Eotetrapodiformes (Q5381965)
Unranked Stegocephalia Q7460384 [edit] Elpistostegalia (Q150821)
Superclass Tetrapoda Q19159 [edit] Stegocephalia (Q7460384)
Unranked Reptiliomorpha Q134683 [edit] Tetrapoda (Q19159)
Unranked Amniota Q181537 [edit] Reptiliomorpha (Q134683)
Unranked Sauropsida Q329457 [edit] Amniota (Q181537)
Class Reptilia Q10811 [edit] Sauropsida (Q329457)
Unranked Eureptilia Q3060510 [edit] Reptilia (Q10811)
Unranked Romeriida Q1061761 [edit] Eureptilia (Q3060510)
Unranked Diapsida Q134688 [edit] Romeriida (Q1061761)
Unranked Neodiapsida Q3497035 [edit] Diapsids (Q134688)
Unranked Sauria Q2254408 [edit] Neodiapsida (Q3497035)
Unranked Archosauromorpha Q134676 [edit] Archelosauria (Q19595522)
Sauria (Q2254408)
Unranked Crocopoda Q24067255 [edit] Archosauromorpha (Q134676)
Unranked Archosauriformes Q282487 [edit] Archosauromorpha (Q134676)
Crocopoda (Q24067255)
Unranked Eucrocopoda Q28647762 [edit] Archosauriformes (Q282487)
Unranked Crurotarsi Q131341 [edit] Eucrocopoda (Q28647762)
Unranked Archosauria Q130910 [edit] Crurotarsi (Q131341)
Unranked Avemetatarsalia Q133187 [edit] archosaur (Q130910)
Unranked Ornithodira Q133190 [edit] Avemetatarsalia (Q133187)
Unranked Dinosauromorpha Q616657 [edit] Ornithodira (Q133190)
Unranked Dinosauriformes Q2740164 [edit] Dinosauromorpha (Q616657)
Unranked Dracohors Q52798775 [edit] Dinosauriformes (Q2740164)
Superorder Dinosauria Q430 [edit] Dracohors (Q52798775)
Order Saurischia Q186334 [edit] dinosaur (Q430)
Suborder Theropoda Q188438 [edit] Eusaurischia (Q2013709)
Saurischia (Q186334)
Unranked Neotheropoda Q145206 [edit] Тheropod (Q188438)
Unranked Averostra Q4828332 [edit] Neotheropoda (Q145206)
Infraorder Tetanurae Q131391 [edit] Averostra (Q4828332)
Unranked Orionides Q4188599 [edit] Tetanurae (Q131391)
Unranked Avetheropoda Q138921 [edit] Orionides (Q4188599)
Clade Coelurosauria Q131092 [edit] Avetheropoda (Q138921)
Unranked Tyrannoraptora Q2043179 [edit] Coelurosauria (Q131092)
Unranked Maniraptoromorpha Q52762328 [edit] Tyrannoraptora (Q2043179)
Clade Neocoelurosauria Q128863356 [edit] Maniraptoromorpha (Q52762328)
Clade Maniraptoriformes Q134143 [edit] Neocoelurosauria (Q128863356)
Clade Maniraptora Q131793 [edit] Maniraptoriformes (Q134143)
Unranked Pennaraptora Q4828939 [edit] Maniraptora (Q131793)
Unranked Paraves Q136586 [edit] Pennaraptora (Q4828939)
Unranked Avialae Q782930 [edit] Paraves (Q136586)
Unranked Euavialae Q18389731 [edit] Avialae (Q782930)
Unranked Avebrevicauda Q2200094 [edit] Euavialae (Q18389731)
Unranked Pygostylia Q135800 [edit] Avebrevicauda (Q2200094)
Unranked Ornithothoraces Q135333 [edit] Pygostylia (Q135800)
Unranked Euornithes Q2752642 [edit] Ornithothoraces (Q135333)
Unranked Ornithuromorpha Q13965050 [edit] Euornithes (Q2752642)
Unranked Ornithurae Q3239179 [edit] Ornithuromorpha (Q13965050)
Class Aves Q5113 [edit] Ornithurae (Q3239179)
Subclass Neornithes Q19163 [edit] bird (Q5113)
Subclass Neognathae Q19168 [edit] Neornithes (Q19163)
Superorder Neoaves Q2330918 [edit] Neognathae (Q19168)
Unranked Passerea Q19598418 [edit] Neoaves (Q2330918)
Unranked Telluraves Q20645445 [edit] Passerea (Q19598418)
Unranked Australavis Q14635103 [edit] Telluraves (Q20645445)
Unranked Eufalconimorphae Q326483 [edit] Australaves (Q14635103)
Unranked Psittacopasserae Q5856078 [edit] Eufalconimorphae (Q326483)
Order Passeriformes Q25341 [edit] Psittacopasserae (Q5856078)
Unranked Eupasseres Q104864059 [edit] Passeriformes (Q25341)
Suborder Passeri Q194240 [edit] Eupasseres (Q104864059)
Parvorder Passerida Q764420 [edit] songbirds (Q194240)
Superfamily Passeroidea Q749521 [edit] Passerida (Q764420)
Family Passeridae Q28922 [edit] Passeroidea (Q749521)
Subfamily Passerinae Q7369467 [edit] true sparrows (Q28922)
Genus Passer Q28753 [edit] Passerinae (Q7369467)

Bold ranks show taxa that will be shown in taxoboxes
because rank is principal or always_display=yes.

Ancestral taxa
(from automated taxonomy templates)
Domain: Eukaryota /displayed  [Taxonomy; edit]
Clade: Amorphea  [Taxonomy; edit]
Clade: Obazoa  [Taxonomy; edit]
(unranked): Opisthokonta  [Taxonomy; edit]
(unranked): Holozoa  [Taxonomy; edit]
(unranked): Filozoa  [Taxonomy; edit]
Clade: Choanozoa  [Taxonomy; edit]
Kingdom: Animalia  [Taxonomy; edit]
Subkingdom: Eumetazoa  [Taxonomy; edit]
Clade: ParaHoxozoa  [Taxonomy; edit]
Clade: Bilateria  [Taxonomy; edit]
Clade: Nephrozoa  [Taxonomy; edit]
Superphylum: Deuterostomia  [Taxonomy; edit]
Phylum: Chordata  [Taxonomy; edit]
Clade: Olfactores  [Taxonomy; edit]
Subphylum: Vertebrata  [Taxonomy; edit]
Infraphylum: Gnathostomata  [Taxonomy; edit]
Clade: Eugnathostomata  [Taxonomy; edit]
Clade: Teleostomi  [Taxonomy; edit]
Superclass: Tetrapoda  [Taxonomy; edit]
Clade: Reptiliomorpha  [Taxonomy; edit]
Clade: Amniota  [Taxonomy; edit]
Clade: Sauropsida  [Taxonomy; edit]
..... .....
Clade: Archosauria /skip  [Taxonomy; edit]
..... .....
Clade: Avemetatarsalia /skip  [Taxonomy; edit]
..... .....
Clade: Dinosauria /skip  [Taxonomy; edit]
..... .....
Clade: Theropoda /skip  [Taxonomy; edit]
..... .....
Clade: Ornithurae /skip  [Taxonomy; edit]
Class: Aves  [Taxonomy; edit]
Infraclass: Neognathae  [Taxonomy; edit]
Clade: Neoaves  [Taxonomy; edit]
(unranked): Passerea  [Taxonomy; edit]
Clade: Telluraves  [Taxonomy; edit]
Clade: Australaves  [Taxonomy; edit]
Clade: Eufalconimorphae  [Taxonomy; edit]
Clade: Psittacopasseres  [Taxonomy; edit]
Order: Passeriformes  [Taxonomy; edit]
Clade: Eupasseres  [Taxonomy; edit]
Suborder: Passeri  [Taxonomy; edit]
Infraorder: Passerida  [Taxonomy; edit]
Superfamily: Passeroidea  [Taxonomy; edit]
Family: Passeridae  [Taxonomy; edit]
Genus: Passer  [Taxonomy; edit]

The main difference is the parent for Aves, which can be Tetrapoda or Paraves. The latter then gets a much more detailed taxonomy with all the archosaur classification. The first mimics the skip templates, but unfortunately there is no useful pattern to how the parents are entered on Wikidata, apart from first come first served.

Anyway, I thought I'd share the results of my experiments in case any one is interested. —  Jts1882 | talk  15:33, 23 November 2021 (UTC)

Nicely done. Alas, still have the 'how-do-editors-edit-a-taxonomy-item' when using |db=module – pencil links to the template which for the module data won't exist, right?
Trappist the monk (talk) 17:55, 23 November 2021 (UTC)
Yes, that is the major stumbling block. The biggest problem with Lua is editors unfamiliar with Lua breaking the tables with poor syntax. A couple of thoughts. When previewing modules there is a message about bad code, could there be settings that would prevent saving of a module when there are errors? Or a hybrid scheme might be possible where editors could create templates and these get used when the module data is missing. The module could then be updated by others later. —  Jts1882 | talk  18:05, 23 November 2021 (UTC)

easing the article-name portability problem

Because one of the big portability problems that I can see (beyond the sheer number of taxonomy templates and how to edit the data) is that the value assigned to |link= is the name of an article at en.wiki. Editors at other-language wikis want the taxonomy templates to link to articles on their local wiki. I've been wondering of late if one thing that might be done is to replace the article name in the taxonomy templates' |link= parameter with its wikidata qid. Then, we add a snippet of code to l.makeLink() just ahead of the current line 512 that might look something like this:

if linkTarget:match ('^Q%d+$') then	-- is |link= holding a wikidata qid?
	linkTarget = mw.wikibase.getSitelink (linkTarget, mw.getContentLanguage():getCode() .. 'wiki') or '';	-- get this wiki's article title; empty string else; getSitelink() returns nil when no sitelink for language
end

When there is no qid or qid doesn't have an article title for the local language, use whatever is in |link= or in the positional parameter for linking to the target (as l.makeLink() does already).

A quick hunt through the documentation does not show how to get the qid that matches an en.wiki article title... A bot can do it by reading the rendered en.wiki article html from which the qid can be extracted ("wgWikibaseItemId":"Q25314" from Flowering plant for example).

Trappist the monk (talk) 17:55, 23 November 2021 (UTC)

Doesn't this do what you want for the code called from another wiki?
local qid = mw.wikibase.getEntityIdForTitle(link, 'enwiki') -- qid referred to by link in module data
local otherWikipediaPage = mw.wikibase.getSitelink(qid)    -- page in local wikipedia
The link in the module data includes the Wikipedia page title for the taxon. We want the qid of that page and then get the link to the local wikipedia page, which will default to the language of the local wikipedia.
An alternative might be to forget the link and rely on redirects for the taxon to the page title. —  Jts1882 | talk  10:02, 24 November 2021 (UTC)
Doh! Don't know why or how I missed that. "When the age is in the wit is out." (Dogberry: Much Ado About Nothing Act III, scene V. Shakespeare) So, I've hacked a bit at Module:Sandbox/trappist the monk/taxonomy to make a function that trolls through a taxonomy template data module fetching qids:
{{#invoke:Sandbox/trappist_the_monk/taxonomy|qids_get|<list selector>|<taxonomy module suffix>}}
where:
<list selector> – a keyword that selects one of three lists to render:
no links – lists taxon names in the data module that do not have link values; these are most commonly taxon names that use |same_as= to redirect to another name
no qids – lists links that do not have qids and the associated taxon name; items in this list appear when there is no article matching the link name or when the link name exists as a redirect to another article
qids – lists links, their associated qid, and their associated taxon name
<taxonomy module suffix> – the suffix that identifies the lua taxonomy data module of interest: Q for Module:Sandbox/trappist the monk/taxonomy Q
Here is {{#invoke:Sandbox/trappist_the_monk/taxonomy|qids_get|no qids|Q}}

failed to load: Module:Sandbox/trappist_the_monk/taxonomy_Q

But, what to do with links like the one in this taxon data:
['Ficus subg. Ficus']={rank='subgenus', parent='Ficus', link='Ficus#Subgenus Ficus|\'\'F.\'\' subg. \'\'Ficus\'\''},
The module hack returns Q59798 (Ficus (Q59798)). If we are to replace all of the link targets in the taxonomy templates with an appropriate qid, then what to do with those that have fragments (section links)? We could write:
|link=Q59798#Subgenus Ficus|''F.'' subg. ''Ficus''
A slight tweak to the example Module:Autotaxobox code above can also extract a fragment if present and append that to the link returned from wikidata.
Trappist the monk (talk) 23:59, 24 November 2021 (UTC)
Most of those no_qid items have no qid because they are redirects. Qianshanornithidae redirects to Qianshanornis, which is associated with item Qianshanornis (Q21369173). The parent item Qianshanornithidae (Q21400383) exists but has no English wikipedia sitelink because there is no article for the family and the appropriate wikipedia article is already linked with the article at the genus. Qianshanornis rapax is the only species in its genus and family so there is one article covering the species, genus and family. The same English Wikipedia article is the appropriate target for Qianshanornis rapax (Q20721984), Qianshanornis (Q21369173) and Qianshanornithidae (Q21400383) but only one article can be linked to a Wikidata item. Ideally the redirect for the family would be linked to the Wikidata item for the family, but this was discourage until recently. Wikidata software still won't allow linking of redirects so its needs a workaround.
The others are redlinks and also seem to be monotypic taxa. For instance, Quadrigyrinae is the immediate parent (from the taxonomy template) of Quadrigyrus and doesn't have an article or redirect. Quadrigyrus redirects straight to the family Quadrigyridae.
Wikidata has items for all three taxa: genus, subfamily and family.
  • If you look at Quadrigyrus (Q2237361), four Wikipedias have articles sitelinked (ceb,nl,sv,war).
  • If you look at Quadrigyrinae (Q2030648), there are two Wikipedias with articles sitelinked (nl,fr).
  • If you look at (Q2197866)), there nine Wikipedias are sitelinked, eight to the family article (ca,ceb,en,es,fr,nl,sv,war) and one wikipedia (sl) to the order article (Gyracanthocephala).
  • If you look at Gyracanthocephala (Q3122720), there are links to three wikipedias (fr,hr, nl) articles. Note the Slovenian Wikipedia isn't listed because the order article is connected to the family item (Q2197866)).
The link to Quadrigyrus in a taxobox on the Dutch Wikipedia should link to nl:Quadrigyrus (similarly for ceb,sv, and war), that on the French to :fr:Quadrigyrinue, three others (ca, en, es) to the family and one (sl) to the order. If the link item from the taxonomy template is used it would give the wrong target for Wikipedias with articles at lower taxonomic ranks. However, the appropriate link could be got by traversing Wikidata until it finds a value for that taxon. The Slovenian one would get the right link (to the order) from the family Wikidata item. It's all a bit messy but seems possible. —  Jts1882 | talk  11:46, 25 November 2021 (UTC)
Because Wikidata does not correctly model either taxa or taxon articles in language wikipedias (as I discuss at User:Peter coxhead/Wikidata issues), I predict that any approach that goes via Wikidata items will fail in some cases, although if Wikidata editors could be persuaded to allow links to redirects to be set up without subterfuge, it would help. (Another problem is caused by synonyms: language wikis have articles on the same taxon but at different subjective synonyms. The Wikidata item that is connected to an article is often at a different taxon name. Some wikis, like viwiki, frequently have different articles at several synonyms of the same taxon. There would be a hopeless mess if they tried to use our taxonomy templates.) Peter coxhead (talk) 19:39, 25 November 2021 (UTC)
Seems there has been a change of opinion on linking redirects to Wikidata that the software has yet to support. d:Wikidata:Sitelinks_to_redirects suggests links to redirects are acceptable (also see this recent discussion at Wikipedia:Village_pump_(technical)/Archive_193#wikidata and the Wikidata deliberations). The close of the Wikidata discussion says "those redirects which help to solve existing problems are welcome", but also suggests change in the software might be far off and we will need to continue to use the workaround. —  Jts1882 | talk  08:10, 26 November 2021 (UTC)
However, (a) because redirects were not allowed to be linked earlier, they weren't linked when created, and most editors don't know that taxon redirects can now be linked, and (b) it still needs a workaround (temporarily removing the redirect status) that shows up in editors' watchlists as if it were an error. Unless and until the Wikidata software is fixed, it's unlikely that many redirects will be linked to the relevant Wikidata taxon name item. Peter coxhead (talk) 08:51, 26 November 2021 (UTC)
Looking at those discussions, I don't think there is any prospect of that happening soon. Linking appropriate redirects would go a long way to solving the problem with the one-one linking preventing relevant articles being connected. I'll do a few as I find them, but it's tedious and incovenient to use the workaround. —  Jts1882 | talk  07:56, 2 December 2021 (UTC)
@Jts1882: I agree about the prospect of change at Wikidata. To repeat myself, it's part of the now long history of editors there simply refusing to accept that their modelling of language wikipedia articles and of taxa is incorrect. Linking redirects here doesn't solve some problems, e.g. where we split a taxon and another language wiki doesn't, since it's the redirect there that needs to be linked, or where a language wiki (like vi wiki) incorrectly has articles at multiple synonyms of the same taxon, since we don't want our redirect to be linked to their wrongly duplicated article, but the Wikidata item usually is. Peter coxhead (talk) 10:10, 2 December 2021 (UTC)