Wikipedia talk:Wikidata/2017 State of affairs/Archive 10
This is an archive of past discussions about Wikipedia:Wikidata. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 5 | ← | Archive 8 | Archive 9 | Archive 10 | Archive 11 | Archive 12 | → | Archive 14 |
Extended disambiguator analogy
We have disambiguators in article titles, usually between brackets or after a comma (see #2 and #3 WP:ATDAB). For the purpose of this discussion I'll call these "short" disambiguators.
Then we have the somewhat longer disambiguating statements at, for instance, disambiguation pages, e.g. at Sarah Brown: "charity director, wife of former British Prime Minister Gordon Brown" and "American middle distance runner", plus often some type of time indicator (year of birth between brackets etc). Similar disambiguating statements may appear at the top of article pages, for instance implemented by {{this}}. These I'd call "extended" disambiguators, and are, afaics, the closest thing we have to Wikidata descriptors.
Thinking a bit further, it is clear how different Wikipedia's system for disambiguation is (i.e. the system for getting readers to the content they are looking for), compared to, for instance, Google's (which is the prevailing model on a broader scale). The systems are completely different, but are actually extremely complementary (Wikipedia's success owes quite somewhat to that complementarity). Wikidata-based search systems are a kind of hybrid: they generate a kind of disambiguation pages (if looked at from the Wikipedia angle), or a sort of Google-like search for small screens (if comparing to the market leader for such search operations) (tempted now to go on a tangent here about Jimbo Wales's ideas). But they're Wikidata, not Wikipedia, So I'd title them "Wikidata search", or whatever else that is not "Wikipedia". And look for fruitful complementarity instead of merging two fundamentally incompatible systems, which would be going completely and utterly nowhere.
As for the short descriptions under article titles of Wikipedia content pages: the development of these should be steered by those who actually use them. In-Wikipedia the apparent overlap with existing extended disambiguators could be exploited and expanded. But no live harvesting from another project (Wikidata) resulting in mixed content, for which neither project can be the ultimate stakeholder. It is extremely divisive (i.e. steering for disagreement between sibling projects) to suggest otherwise, and WMF spokespersons who keep harping on this have no place in this discussion: practically the tactics are a sort of "divide et impera", whether WMF spokespersons are aware of that or not (I prefer to think they are not, but in either case it should stop).
For VE and other uses of Wikidata descriptors: either it is some sort of disambiguator/search use (in which case: OK if clearly indicated as a "Wikidata" aid, see two paragraphs above), or it is something that should be developed in Wikipedia (in order to avoid mixed content being displayed as if it were all Wikipedia, and the in-Wikipedia development should be steered by those actually using the systems) – I think that for these other uses which one of the two routes sketched above applies most can usually be clarified in a few steps. --Francis Schonken (talk) 08:38, 1 October 2017 (UTC)
- Expanding on "the apparent overlap with existing extended disambiguators could be exploited and expanded" applied with the {{this}} (i.e. {{About}}) template:
- currently "109,000+ pages" contain information in this format:
- {{About|*Use1*[|...]}}
- Which results in:
- This page is about *Use1*. For other uses, see [...]
- being displayed on these 109,000+ pages.
- It is my contention that these 109,000+ "*Use1*" descriptions are the short descriptions we are looking for:
- First step: these 109,000+ short descriptions can be used immediately on these 109,000+ pages in mobile/app view (just leave out the preceding "This page is about" and the ensuing ". For other uses, see [...]", and you have a first base to start from (a base of 109,000+ descriptions not being too bad to get started...; also these 109,000+ pages are exactly those which are most in need of a short description while ambiguous with another similar term).
- Second step: it is incredibly easy to program this template thus that in normal Wikipedia surroundings nothing shows up in the event the suggested "For other uses" page would be a redlink (in normal screen view), after which the template could still be used for giving a short description of the article which can be harvested for app/mobile view; and if later the disambiguating or variant page gets created, nothing has to be done in Wikipedia normal view to make the DAB sentence pop up at the top of the page.
- Pages currently using similar templates (e.g. {{for}}, {{other uses}},...) could be integrated in the system, e.g. by allowing an "|about= ..." parameter, or merging their content in the {{about}} template once such template has been applied with a short description of the content of the page.
- currently "109,000+ pages" contain information in this format:
- --Francis Schonken (talk) 10:00, 1 October 2017 (UTC)
- At face value, looks like a good idea. I see no significant down side, though I have not searched very hard... Some work may be needed, but probably not difficult work. Cheers · · · Peter (Southwood) (talk): 12:17, 1 October 2017 (UTC)
- There are over 121000 pages using {{for}} and 45000 pages using {{other uses}}, a total of 166000 pages which will have to be checked and where appropriate, converted to {{about}}. I have no idea of what percentage will need a short description/disambiguator, but the point here is that this would be a good fix where necessary. It would improve the encyclopaedia consistently, while remaining entirely within Wikipedia both in terms of control and policy compliance. · · · Peter (Southwood) (talk): 17:21, 1 October 2017 (UTC)
Example of blatant Wikidata vandalism affecting multiple enwiki articles at once and for hours
Yesterday, the Wikidata item for the country Romania (not really an obscure topic I would think, one that is used in thousands of other Wikidata items) was moved (english label changed) to Moldavia, with vandalism to the description (!) and alias as well. This was only reverted nearly three hours later[1]. This is pretty quick by Wikidata standards, e.g. yesterday as well Fernando Alonso (another high profile article) was vandal-moved in English, Spanish and Catalan for more than two hours[2], while more obscure articles take about 8 hours to be reverted (labeling a living person a "fascist" seems like a quite obvious case of vandalism[3]), if they get spotted at all (I'm watching another high profile BLP, with articles in over 70 languages and more than 4000 pageviews per day on enwiki alone, which has been vandalized on Wikidata in September and is unlikely to be reverted any time soon probably).
Anyway, back to the issue at hand: the Romania vandalism was reflected in quite a few enwiki articles, including the UNESCO world heritage sites but basically every infobox that fetches information from Wikidata and includes the field "Romania" (biographies, location of artworks, observatories, ...). The end result was similar to what you can see in the images; in these cases, the country name was changed, but also the wrong map was shown, and the red dot location indicator was somewhere in the middle of the page instead of in the infobox. Basically, using Wikidata makes vandalism on many articles much easier, and is almost guaranteed to remain in the articles for a lot longer as well. Coupled with the recent changes delay (so even if you have Wikidata in recent changes enabled, chances are you wouldn't see this happening) and the problem that these don't appear in the page history, and you end up with a situation which is beneficial to vandals and negative for recent changes patrollers. Fram (talk) 07:46, 18 October 2017 (UTC)
- This one is the real problem. High-profile articles on Wikidata can not be protected indefinitely (at least not in large numbers) because this would block small projects (where editors typically are not autoconfirmed on Wikidata) from linking these articles on Wikidata, adding descriptions in their language and moving them. I raised the topic on the Wikidata Project Chat some time ago (I believe in September, difficult to find now), and the only reasonable suggestion was to run a anti-vandal bot, bot nobody volunteered to write the bot, and the topic was eventually archived. I think indeed until the vandalism problem in high-profile articles has been resolved the integration of problems with Wikidata will remain very limited.--Ymblanter (talk) 08:13, 18 October 2017 (UTC)
- Possibly a finer grained protection on Wikidata might be useful. Or a completely different mechanism. All the best: Rich Farmbrough, 19:36, 18 October 2017 (UTC).
- An anti-vandalism bot is running and can be extended. It is d:User:Dexbot. --Lydia Pintscher (WMDE) (talk) 13:51, 19 October 2017 (UTC)
- Hmm, first example I looked at, [4], is a helpful, good IP edit (or certainly not the kind of vandalism edit that needs reverting!) being reverted by the bot, while what for humans looks like blatant vandalism remains undetected. This as well seems like an incorrect bot edit. Does Wikidata have a WP:BITE policy? Perhaps it needs one... Fram (talk) 14:10, 19 October 2017 (UTC)
- Seems more like a problem with the ORES algorithm. I've done so on d:Wikidata:ORES/Report mistakes. Jo-Jo Eumerus (talk, contributions) 14:39, 19 October 2017 (UTC)
- Good, tnx Lydia.--Ymblanter (talk) 17:06, 19 October 2017 (UTC)
- Hmm, first example I looked at, [4], is a helpful, good IP edit (or certainly not the kind of vandalism edit that needs reverting!) being reverted by the bot, while what for humans looks like blatant vandalism remains undetected. This as well seems like an incorrect bot edit. Does Wikidata have a WP:BITE policy? Perhaps it needs one... Fram (talk) 14:10, 19 October 2017 (UTC)
- Possibly a finer grained protection on Wikidata might be useful. Or a completely different mechanism. All the best: Rich Farmbrough, 19:36, 18 October 2017 (UTC).
Persondata-like system
I got involved in the last stages of the deprecation of persondata, that is the stage where we were hammering out a system to actually completely remove all persondata from mainspace. I rather tried to slow it down, at least impede an uncontrolled deletion, thinking, maybe we might need this still some day. But the reasons for removal were overwhelming, and thus we eradicated uncountable hours of work by thousands and thousands of Wikipedians. Sure, the system that is proposed now will be "different" in several respects, but reasons for not having it will be, in whatever form we have it, still overlap to a great extent with the reasons for eradicating the persondata system (and these reasons were sound, nobody doubted that).
Long story short: I don't think we need a system that knowingly or unknowingly mimics the old persondata (again, even when several people will contend it is "different" in many respects). There are too many reasons for not having it, without even speaking about the insult to the Wikipedia editing community: first they were asked to provide and maintain data for a system that on most interfaces didn't show up; then the content in that system was completely and utterly eradicated, flushing countless man-hours down the drain; and now there is talk about embarking on a similar adventure – to which my answer would be: no, no and again no, whatever we do, not something that has too many similarities to that debacle. --Francis Schonken (talk) 08:38, 1 October 2017 (UTC)
- It was extremely annoying, given the amount of work I had put into systematising alternative names, in particular. I'm not sure if the data is stored anywhere retrievable. All the best: Rich Farmbrough, 19:29, 18 October 2017 (UTC).
- @Rich Farmbrough: There is a wikidata game that uses the info from persondata, which implies that the data is still available. Thanks. Mike Peel (talk) 19:31, 19 October 2017 (UTC)
Wikidata in recentchanges
Wikidata has been removed from recent changes in Commons and ruwiki, will be removed from a dozen other wikis (including Italian, French, and Swedish), but remains enabled on enwiki for the time being.
Fine, but it is utterly useless here. At the moment, I have to restrict recent chanegs to mainspace only, and manually override the limits to show 5000 edits instead of the defualt maximum of 500, to get some Wikidata changes. Basically, when I ran "recent changes" at 08.01, the most recent Wikidata change displayed is from 07.32, or half an hour old. For recent changes, this is utterly pointless (the standard "max" of 500 changes only goes back 10 minutes now), and makes recent changes on enwiki useless to catch Wikidata vandalism.
This isn't the first time this has happened, in my experience the delay fluctuates between 2 minutes at the very best to 24 hours at the worst.
As long as this problem occurs, every claim of "but you can check Wikidata changes from your watchlist and recent changes" becomes basically void, as it means that most changes won't be seen, or at best will be seen much too late. Fram (talk) 08:15, 17 October 2017 (UTC)
Delay now nearly an hour (09.54, most recent Wikidata change 08.58). The same applies to the Watchlist, but is of course less obvious there because it depends on having a recent Wikidata change to an item on your watchlist. Fram (talk) 09:58, 17 October 2017 (UTC)
- That is fairly terrible. If this isn't fixed soon (I hope it is), we will have to stop accepting data from Wikidata. Or at least apply the same time delay to updates from Wikidata that our monitoring feeds have. —Kusma (t·c) 11:31, 17 October 2017 (UTC)
- "will be removed from a dozen other wikis" - citation needed. If you want to catch Wikidata vandalism immediately, then you can always use the recent changes page on Wikidata. For most cases of watching changes to the content of infoboxes etc., then a delay of this kind of length isn't vital (particularly as pages are cached, so any changes from Wikidata might not show up immediately anyway unless the page is purged). You can see the dispatch stats on Wikidata at d:Special:DispatchStats. Mike Peel (talk) 12:10, 17 October 2017 (UTC)
- You don't understand it, do you? First of, for the dozen further wikis where this will happen, you could look at the relevant phabricator ticket, where dba jcrespo has today announced:
"Given the above data, I would like to disable recentchanges from wikidata (temporarily), purge and defragment on the following wikis:
bgwiki itwiki svwiki zhwiki bewiki cewiki dawiki hywiki ttwiki frwiki arwiki cawiki huwiki rowiki ukwiki
These will delete more than 1 million rows, and have 90%+ rcs coming from wikidatawiki." Replying to this, admin Reedy said "I think we should just disable it everywhere till Wikidata confirm it's fixed."
- So; not a dozen but 15 other language versions, including the three I mentioned; or all of them, depending on which opinion prevails in the end. Bck to the issue at hand: "If you want to catch Wikidata vandalism immediately, then you can always use the recent changes page on Wikidata." But that's not what the main Wikidata-supporters are always claiming, isn't it? The idea is that Wikidata can be safely used here, because you can patrol the changes here, on your watchlist and recent changes (though not in page history). To now dismiss these concerns with "but you can watch the changes there as well" is quite a big leap, and a position which will be very unpopular with many editrs who may reluctantly accept Wikidata data (in infoboxes and so on) otherwise. Requiring people to watch two watchlists or two pages with recent changes on different sites, and without any indication on Wikidata Recent Changes whether these items even have an enwiki page, will most likely not fly at all. "For most cases of watching changes to the content of infoboxes etc., then a delay of this kind of length isn't vital" That's not the problem though, isn't it? The delay means that not a single recent changes patroller, even if they have "Wikidata" enabled, will ever see these changes. Basically, this delay completely disables one of the two main checks enwiki has of Wikidata content, and makes it harder to check on the watchlist as well (as you can't just check new changes at the top, but also have to check for suddenly appearing changes from 1/2 hour or longer ago). Whether the changes themselves would be visible in the articles before they would reach the watchlist isn't clear to me: if so, it would make it even worse. I note from your link that enwiki is the slowest to get the changes (presumably because we are the largest?). The lag described at that page doesn't seem to match the end result at enwiki though; it now claims a 3-minute lag (down from a 9-minute lag just before), but in recent changes the most recent Wikidata entry is 6 minutes old, still way off the 500-changes limit (standard is 100 most recent changes, 500 is the most the dropdown provides, 1000 is the most the page happily accepts, and 5000 is the most I can force if necessary). Fram (talk) 13:19, 17 October 2017 (UTC)
- @Fram: Probably the more important point however, Fram, is that in the longer term this issue is ultimately likely to have a highly positive out-run. People have been complaining for at least two years that activating Wikidata changes was impractical because all it did was create a flooded watchlist with changes that were irrelevant to the en-wiki articles they were actually interested in. The current difficulties have (at last) now forced that issue to the top of the Wikidata dev team's agenda.
- Already there was a trial fix being piloted on el-wiki, to only show changes in statement-values or descriptions that were actually accessed by the page.
- Active study of the present issues is leading to further fixes, eg to limit changes that have very very wide effects; also some coding tweaks to templates and lua modules to reduce statement accesses that are not in fact needed.
- It's now a priority to make sure only relevant Wikidata changes are propagated to recent changes. An end to the flooding should make Wikidata-activated watchlists (at last) finally usable. Jheald (talk) 22:32, 17 October 2017 (UTC)
- The more important and positive point is that more people will probably realise that Wikidata isn't a cost-free solution and that not relying on Wikidata is the more robust, intuitive, maintainable and easy solution (easy, not lazy, that is; relying on Wikidata is the lazy solution). Fram (talk) 04:30, 18 October 2017 (UTC)
Delay is now 1 hour and 23 minutes (according to this, thanks to Mike Peel for showing me that page) and rising. It has been less than 2 minutes, but ha been steadily increasing again. Other wikis have similar issues (I have seen it at 44 minutes for Swedish wiki), but enwiki is most often the slowest. Fram (talk) 11:38, 18 October 2017 (UTC)
It has gone up to about 2 hours and 20 minutes; but is now slowly decreasing again (currently at 2 hours 15 minutes). So this wasn't just a spike or glitch, but a persistent problem. At the moment more than 100,000 changes have not been dispatched to enwiki yet. Now, this may not seem like a problem, as it means that vandalism isn't dispatched here either; but a) it won't appear on recent changes and the like, and b) if it gets noted and corrected on Wikidata anyway, the correction will also take hours to get here (just imagine the frustration of someone spotting an error or vandalism coming from Wikidata, correcting it there, and then purging and refreshing fruitlessly for hours on enwiki to get rid of the vandalism and to see his correction; and compare this to the situation with errors and vandalis directly on enwiki). Fram (talk) 14:30, 18 October 2017 (UTC
The decrease swiftly ended, it was still above 2 hours last night and was above 3 hours this morning, having now again dropped minimally to 2 hours and 51 minutes. This is a persistent and serious problem, rendering Wikidata changes totally invisible in recent changes and for all purposes invisible on long watchlists (my watchlist screen is filled with changes from the last 40 minutes, I would have no reason to scroll down to see if between the already checked changes from 3 hours ago suddenly some Wikidata stuff appears. Fram (talk) 08:20, 19 October 2017 (UTC)
And now up to 4 hours and 19 minutes. Time to have that RfC, me thinks. Fram (talk) 12:39, 19 October 2017 (UTC)
@Whatamidoing (WMF): I think you'll be interested for this discussion. -- Magioladitis (talk) 16:41, 19 October 2017 (UTC)
Update: We got the dispatch lag down to a few minutes now as it should be. Seems there was a hickup but we'll look more into it tomorrow. --Lydia Pintscher (WMDE) (talk) 22:21, 19 October 2017 (UTC)
The one time on Saturday I checked this, it was a lag of more than 1 hour. On Sunday it looked okay again, but this morning it was constantly between 30 minutes and 1 hour for a few hours before it dropped now to 20 minutes. Seems to be a constantly returning problem, not a one-off hiccup. Fram (talk) 11:40, 23 October 2017 (UTC)
Relation to dictionary definitions?
A discussion here put me to thinking: what is the difference between the descriptor and a dictionary definition? Should the Wikidata descriptor be identical to the Wiktionary definition for a term that appears in that dictionary? Does the policy Wikipedia:Wikipedia is not a dictionary come in play when what we're obviously trying to do here is "enrich" Wikipedia (an encyclopedia) with short dictionary-like definitions? --Francis Schonken (talk) 08:23, 21 October 2017 (UTC)
- A dictionary type definition as part of an article is not the same as a dictionary type definition as the article, but there are only some articles for which a dictionary type definition is appropriate, in some cases even possible, However, I think that is what we should aim for as the descriptor.· · · Peter (Southwood) (talk): 20:44, 27 October 2017 (UTC)
the override feature is worthless / Related Pages / WMF putting images on articles
Above Fram asked we decide which image, if any to use, no images are sent by default ever (I hope, or has the WMF done this as well somewhere?)
File:Andrex_puppy_(1994_advert).jpg: The Andrex Puppy, seen here in a British advertisement from 1994.
Unfortunately the answer is YES. In fact it took me two minutes to find that the WMF literally slapped an advertizement for a specific brand of facial tissues on our Facial tissue article. You can (maybe) see the problem at the bottom of the mobile view of Facial_tissue. However the Related Pages feature selects "see also" articles (and images) dynamically, so you may see different article links with different images. But if you randomly go to articles on "product" type items, it shouldn't take long to find one with a specific brand being promoted on the page. The Related Page feature links to articles containing many of the same words, so companies that sell that product are likely to be selected. You can also get some seriously bad BLP violations.
The community previously complained about this issue, but the WMF apparently didn't consider the concerns important. Not even when people were finding serious BLP violations on biographies. (I think one politician was linked to fascism, and if I recall correctly, Pig-faced women appeared somewhere bad.) They decided the "solution" was to hide the feature from editors hide the feature from desktop view, and to offer the same worthless "override" solution being proposed here. The Related Pages that are given change pseudorandomly, so only way the override feature would actually fix the problem is if we run a bot to set three (blank) overrides on EVERY every article.
Returning to the original topic: If the WMF insists on the "override keyword" for wikidata descriptions, then I now propose running a bot to apply the override on every article. The bot can either copy the lead sentence, or leave it blank with a hidden comment saying to fill it it. The same bot may as well apply three Related Pages overrides at the same time. Alsee (talk) 20:09, 29 September 2017 (UTC)
- Hang on, didn't this note that non-free images would be excluded from that feature? That Andrex image is tagged as non-free - quite apart from advert concerns, it shouldn't be showing up there. Nikkimaria (talk) 20:27, 29 September 2017 (UTC) @Deskana (WMF): seems to have been active on the phab task linked. Nikkimaria (talk) 20:33, 29 September 2017 (UTC)
- Nikkimaria, I'm pretty sure that appending a ping to an existing comment like that doesn't work. If you look at the diff for your edit you'll see that the blue diff-text does NOT contain a proper timestamp. Pings are only sent on a signed edit, and the software doesn't see a proper signature in that diff. I'll add your intended ping to @Deskana (WMF):. Alsee (talk) 21:09, 29 September 2017 (UTC)
- @Nikkimaria and Alsee: I don't work in this area any more, but I took a look and managed to fix it. See my comment below. --Dan Garry, Wikimedia Foundation (talk) 10:09, 30 September 2017 (UTC)
- Yes, quite aside from the advertising issues, this is a clear violation of our image use policy, as we have no fair use rationale for this use of the image. —David Eppstein (talk) 20:41, 29 September 2017 (UTC)
- Holy crap, it didn't occur to me that this was a non-free image. WTF?! I saw the Phab task that supposedly prevented that.
- Also, I had added the image to this section without noticing the non-free issue. I tried to remove it, but edit conflicted with Ymblanter fixing it. (Thanks Ymblanter.) Replaced with link to file page. Alsee (talk) 21:01, 29 September 2017 (UTC)
- Apologies for reverting the second time, I misread the diff. Now I restored it.--Ymblanter (talk) 21:07, 29 September 2017 (UTC)
- (ec)I thought phab:T124225 was resolved with the result that our non-free content policy won against the wishes of some of WMF's various coding teams. Has anything changed there? —Kusma (t·c) 21:09, 29 September 2017 (UTC)
- The clear attitude on display here by WMF is that whenever we come to a decision or consensus that goes against the wishes of the coding teams, they will go ahead and follow their wishes anyway. I can predict that their response will be that the phab thread only talked about "mobile apps, RelatedArticles, Gather, and mobile web search" and that the related article links on the mobile view are something other than these things and therefore not covered by this decision. —David Eppstein (talk) 21:21, 29 September 2017 (UTC)
- I am disappointed but not surprised by the WMF's response on the Wikidata descriptions... however I bet this non-free image issue is either a bug or (at worst) someone unaware who made an undiscussed change. I get the impression that WMF staff were OK with the non-free image issue, they explicitly built code to restrict when non-free images would be returned. (There is explicit concensus that non-free images can appear in the Popup-article-preview feature.) Alsee (talk) 22:34, 29 September 2017 (UTC)
- @David Eppstein: It's a bug caused by a bad template. Please don't jump to conclusions. --Dan Garry, Wikimedia Foundation (talk) 10:09, 30 September 2017 (UTC)
- The clear attitude on display here by WMF is that whenever we come to a decision or consensus that goes against the wishes of the coding teams, they will go ahead and follow their wishes anyway. I can predict that their response will be that the phab thread only talked about "mobile apps, RelatedArticles, Gather, and mobile web search" and that the related article links on the mobile view are something other than these things and therefore not covered by this decision. —David Eppstein (talk) 21:21, 29 September 2017 (UTC)
- Nikkimaria, I'm pretty sure that appending a ping to an existing comment like that doesn't work. If you look at the diff for your edit you'll see that the blue diff-text does NOT contain a proper timestamp. Pings are only sent on a signed edit, and the software doesn't see a proper signature in that diff. I'll add your intended ping to @Deskana (WMF):. Alsee (talk) 21:09, 29 September 2017 (UTC)
- I just created a Phab listing for the non-free image appearing in RelatedPages. Alsee (talk) 22:20, 29 September 2017 (UTC)
I don't work in this area anymore, but I took a quick look. The non-free page image associated with the Andrex article was due to an issue with Template:Non-free television screenshot which was not tagging the images associated images as non-free. I fixed the issue with that specific template. Maintenance of content-related templates is normally outside Foundation jurisdiction, so I encourage other people to take a look and see if the problem exists elsewhere. This is why we need structured data on Commons, by the way; this problem never would've happened if the data were properly structured. --Dan Garry, Wikimedia Foundation (talk) 10:09, 30 September 2017 (UTC)
- @Deskana (WMF): (a) we do have such a data structure, the coders just chose to use something else instead of our Category:All non-free media. (b) Wouldn't it make more sense to fix the issue by adding the invisible code you're looking for to Template:Non-free media instead? Adding specialised code to lots of individual non-free licensing templates seems not the best way to do this. —Kusma (t·c) 16:19, 30 September 2017 (UTC)
- The Non-free media template is doing that already. This is just a conflict between two templates. Dan made an intermediate fix, but that fix should not be necessary long term. I suggest people stop fingerpointing and stop blowing everything up. No one is able to keep up with what is going on right here, it fires of into every direction, this whole discussion is becoming completely useless this way. —TheDJ (talk • contribs) 17:26, 30 September 2017 (UTC)
- Deskana (WMF), thanks. I suspected this was some unintentional flaw. However I'm extremely surprised and confused at the method the WMF is using: an invisible, redundant, and apparently-undocumented-on-EnWiki "span" class. Is there any chance that the WMF could use the (obvious) Category:All non-free media? That's how we track non-free files. I expect more files will continue to slipping through the cracks based on the span method. Alsee (talk) 02:05, 1 October 2017 (UTC)
Note to WMF
You wrote here about we want to give English Wikipedia control of the descriptions
and here about the people at the WMF getting more nervous about community control
.
Control over en-WP content is not WMF's to "give". If WMF staff are "nervous" about the fundamental deal in the movement, then you should train them better. Control over en-WP content published by WMF as "Wikipedia" resides solely with the en-WP editing community. That is the fundamental deal in the movement.
The sprawling conversation above is exactly the kind of thing we get at AN/ANI when someone is being seriously out of step with community norms and we end up with drama exploding in all kinds of directions - when people are persistently doing what they can do, and not what they should do, in big ways. In this case it is more explosive because it is not a person, but an organization. In this case is is people using Wikidata in ways that it can be used, but not using it as it should be used in en-WP, in ways that respect the policies and guidelines here.
This conversation would be totally different, if it were being conducted on appropriate foundations - if it were 2015 and the WMF had come to en-WP and said - "Hey short descriptions would be really useful in a bunch of ways. We have a field in Wikidata we could use, but it would be best if en-WP content came from en-WP. How can the en-WP community help out here?" That is an entirely different conversation. That is an appropriate conversation.
We can still have it. There are people here willing to try to work within the en-WP community and in conversation with the WMF to generate en-WP native short descriptions that will be useful to WMF, but the conversation must happen on appropriate foundations that respect the respective roles of WMF and the en-WP community within the movement.
Please:
- Acknowledge that WMF overstepped in unilaterally and systematically changing en-WP content.
- Express an understanding that if the WMF wants to do this kind of thing in the future, it will get consensus beforehand
- Remove the Wikidata descriptions from the app content.
The WMF is a creating a sort of constitutional crisis that is entirely unnecessary. Rushing ahead to solve the problem, without dealing with what has caused the problem, is foolish. We should never be in this situation again.
Please reply and say yes to the three things above. Please. Jytdog (talk) 19:38, 2 October 2017 (UTC)
- Hi, Jytdog. I agree with you in a lot of ways. I think there should be way more communication and sincere partnership between the WMF product teams and the communities. That's something that I've been working on for a while, as the PM for Community Tech. I agree with you about what the 2015 conversation should have been, and I'm currently talking with people on the WMF product team about how we get closer to that, as an organization. That's why they asked me to be the lead on communication here, so that I can help us move in that direction. It's a process. So I wouldn't use the same words that you do, but yeah, this should have been done better in 2015, and I'm trying to help it be done better now.
- For #2, I have a similar answer. We're working on it. I know it doesn't mean very much when I say that; the thing that matters is what we actually do, from now on.
- For #3, I've answered that before. Short descriptions are very useful for the app. We're currently talking with folks on this page about how to help transition from a Wikidata-controlled system to an English Wikipedia-controlled system. During these conversations and while we're implementing the solution that we agree on, we're not going to take the existing descriptions down, because that would damage the users' learning experience, for the sake of making a point. I don't think that's necessary, or a good thing to do.
- I know, my answers are more complicated than "yes", and that's not what you asked for. But these are my honest answers, if that helps. -- DannyH (WMF) (talk) 02:42, 3 October 2017 (UTC)
- How much can you be "working on" item 2? We want you to agree to follow the rules and you are "working on it"? That does not sound good faith, it sounds like you are weaseling for loopholes or stonewalling. · · · Peter (Southwood) (talk): 06:10, 4 October 2017 (UTC)
- @DannyH (WMF):. That's not acceptable. If the subject of a biography (BLP) objects to being labelled as "Jewish", for example, or any other contentious ethnic/religious/racial categorisation, and there is no strong sourcing to support that label, then our policy is that it "should be removed immediately and without waiting for discussion" – WP:BLP, quoting "WikiEN-l Zero information is preferred to misleading or false information" Jimbo (@Jimbo Wales: FYI). Have you forgotten the Wikipedia Seigenthaler biography incident already? When the information is curated on Wikipedia, we see changes on our watchlists, and can insist on sourcing. There is no sourcing for Wikidata labels. It is a fundamental mistake to draw possibly contentious data from a source that is incapable of showing a reference, and it's stupid mistakes like this that makes it difficult for any of us who are working on legitimate ways of using Wikidata in the English Wikipedia. You will take the existing descriptions down – despite what you think about "users' learning experience" – because sooner or later, you'll put the WMF in the position of being sued by an aggrieved BLP subject and I'll make sure that they know you were warned about the potential danger and chose to do nothing. --RexxS (talk) 14:43, 6 October 2017 (UTC)
- @RexxS: Hold on. I thought there was consensus that sources were not required for infoboxes or the above-the-content lead section if the material was supported by sources in the main body of the article. It is straightforward for editors to check if there is sourcing for the claim in question in the rest of the article. (Or, for that matter, in an appropriate statement on Wikidata). The Wikidata description can be changed right now if we got a complaint, and I am sure WP:OFFICE would not hesitate in doing so, if a complaint came in. So I can't see any danger of any problem that would be outwith Section 230. The proposed opt-out mechanism would allow an even more direct approach -- it would make it very straightforward to put in an override description here, and lock it. Jheald (talk) 20:35, 6 October 2017 (UTC)
- @Jheald: With all due respect, I don't think I'll be holding on. Please look at the description field of any article on Wikidata. There is no guarantee that a source for that is present on English Wikipedia. Nor is there any way for Wikidata to store a source for any description. I'm not talking about editors; I'm talking about readers. How would the average person who reads a BLP about themselves on the Wikidata app and see themselves labelled as something they strongly disagree with go about editing that? Have we forgotten already about the alt-right nutjobs who go around adding Category:People of Jewish descent to everybody they didn't like? What if they find Wikidata's description field? How long before we see Bernie Sanders described as "Jewish-American Politician"? It's anything but straightforward for 90+ percent of Wikipedia editors to find where the description on the app comes from, let alone someone who doesn't even edit. How would they even know where to complain? I'm sorry, but that's just not good enough. We must use the precautionary principle in our descriptions of living people, and text which has no chance of being verified doesn't belong in the second line of every BLP displayed on the Wikipedia app. Nor, for that matter, does the decision on whether text is displayed in any version of English Wikipedia belong with a staffer, rather than the community. Especially when it's directly in violation of our BLP policy. --RexxS (talk) 21:10, 6 October 2017 (UTC)
- With luck, we have pointers to direct concerned individuals to WP:BLP/H, or WP:BLP/N, or the article talk page. If you're worried that it's too hard for them to find WP:BLP/H, then that's a different question: what we can do to make WP:BLP/H better signposted or easier to find.
- On the "difficulty to edit" point, that is precisely what the latest updates to the system were designed to make easier -- direct editability of the description from the app, or (in future) from desktop Wikipedia. (CSS patches may even already be available for this, for wikis currently supporting the broad use of these descriptions)
- As to your point about unverified text in the second line -- well, at the moment there is no requirement for inline sourcing for text in the second line of desktop Wikipedia, as I said above, so long as it is supported by the rest of the article.
- It's also quite a slide to go from "There is no guarantee that a source is present" to "no chance of being verified".
- Besides, if one did want to create a requirement for sourced support before particular keywords were allowed in the description, it would actually technically be much easier to implement on Wikidata, where an automated script could look for an appropriately sourced supporting statement much more easily than having to parse the natural language of a complete page of wikitext. Jheald (talk) 00:06, 7 October 2017 (UTC)
- @Jheald: Why do we have to trust to luck to avoid another Wikipedia Seigenthaler biography incident? I'm now looking at a BLP on the Wikipedia app. Please tell me how I find these pointers on the article talk page. Or even find the article talk page. You can't.
- Where is the direct editability of the description from the app? Looking at a couple of BLPs, no matter what I try, I just get "Sorry, this page cannot be edited anonymously at this time". That's a big help to anybody looking at their own BLP. Ok I found one that an unregistered editor could edit: Billie Jean King American tennis player. How would I edit "American tennis player" if it changed to something I found contentious? You can't.
- The desktop version of Maria Callas doesn't have "American-born Greek operatic soprano" as its second line, or anywhere else (the app does!). Are you sure you understand where this problem is occurring? Callas held both Greek and American citizenship for most of her life, so the description "American-born Greek" is contentious, by any standards. Not to mention wholy unsourced.
- It wouldn't be any easier to implement on Wikidata than on Wikipedia, because neither you nor I (nor the devs) are capable of writing an algorithm that detects contentious statements where there is no sourcing available to test them against. Even if you could, how would the fact that this magical script has found appropriate sourcing be recorded? There's nowhere in the Wikidata entry to store a reference for the description, much less the fact that a script had found one. --RexxS (talk) 00:41, 7 October 2017 (UTC)
- @RexxS: To flesh out what I was suggesting, if there was a list of words one was sensitive about in the description -- eg "Jewish" -- one could test for whether there was a supporting statement on the item with an appropriate value, eg religion or worldview (P140) = Judaism (Q9268) (or a subclass of it), and what kind of source (if any) that statement had on it. One could then use that to auto-classify whether the word "Jewish" in the description on the face of it appeared potentially acceptably supported, not well enough supported or unsupported.
- But that would be for the future. My more fundamental point, is that if we look at eg Billie Jean King, there is no sourcing in the first line for "is an American former World No. 1 professional tennis player". But it is acceptable, without specific inline cites, because it summarises facts in the rest of the article, that are required to support it. Similarly Maria Callas begins "was a Greek-American soprano", with no inline cite to back up that. Yes, the article doesn't begin "American-born Greek operatic soprano", but somebody could have edited that in, and they wouldn't have been required to provide an inline cite to back it up. So why make such a fuss that a definition pulled from Wikidata might say that and not have the equivalent of an inline cite? The two cases seem directly parallel to me.
- As to the direct editability, my understanding was that it was the addition of this feature on the Android app that sparked this current discussion. Jheald (talk) 01:07, 7 October 2017 (UTC)
- All words are potentially problematic, machine intelligence is not up to the task of evaluating whether the sourcing is adequate to justify specific wording, and WP:BLP is not optional and something we can put off "for the future". —David Eppstein (talk) 01:59, 7 October 2017 (UTC)
- No, but we be able to identify that a source is one the potential to be adequate, even if we can't guarantee that the actual text is supported.
- To get the descriptions we're going to have to start from somewhere (and we need them right now in searches). We realistically aren't going to build 5 million, or even x hundred thousand BLPs, starting over for all from scratch. It's not helpful to say you wouldn't start from here, when here is where we are. What is more helpful is to think how to move forward from here, in a prioritised achievable way. Jheald (talk) 02:15, 7 October 2017 (UTC)
- All words are potentially problematic, machine intelligence is not up to the task of evaluating whether the sourcing is adequate to justify specific wording, and WP:BLP is not optional and something we can put off "for the future". —David Eppstein (talk) 01:59, 7 October 2017 (UTC)
- @Jheald: With all due respect, I don't think I'll be holding on. Please look at the description field of any article on Wikidata. There is no guarantee that a source for that is present on English Wikipedia. Nor is there any way for Wikidata to store a source for any description. I'm not talking about editors; I'm talking about readers. How would the average person who reads a BLP about themselves on the Wikidata app and see themselves labelled as something they strongly disagree with go about editing that? Have we forgotten already about the alt-right nutjobs who go around adding Category:People of Jewish descent to everybody they didn't like? What if they find Wikidata's description field? How long before we see Bernie Sanders described as "Jewish-American Politician"? It's anything but straightforward for 90+ percent of Wikipedia editors to find where the description on the app comes from, let alone someone who doesn't even edit. How would they even know where to complain? I'm sorry, but that's just not good enough. We must use the precautionary principle in our descriptions of living people, and text which has no chance of being verified doesn't belong in the second line of every BLP displayed on the Wikipedia app. Nor, for that matter, does the decision on whether text is displayed in any version of English Wikipedia belong with a staffer, rather than the community. Especially when it's directly in violation of our BLP policy. --RexxS (talk) 21:10, 6 October 2017 (UTC)
- @RexxS: Hold on. I thought there was consensus that sources were not required for infoboxes or the above-the-content lead section if the material was supported by sources in the main body of the article. It is straightforward for editors to check if there is sourcing for the claim in question in the rest of the article. (Or, for that matter, in an appropriate statement on Wikidata). The Wikidata description can be changed right now if we got a complaint, and I am sure WP:OFFICE would not hesitate in doing so, if a complaint came in. So I can't see any danger of any problem that would be outwith Section 230. The proposed opt-out mechanism would allow an even more direct approach -- it would make it very straightforward to put in an override description here, and lock it. Jheald (talk) 20:35, 6 October 2017 (UTC)
- @DannyH (WMF):. That's not acceptable. If the subject of a biography (BLP) objects to being labelled as "Jewish", for example, or any other contentious ethnic/religious/racial categorisation, and there is no strong sourcing to support that label, then our policy is that it "should be removed immediately and without waiting for discussion" – WP:BLP, quoting "WikiEN-l Zero information is preferred to misleading or false information" Jimbo (@Jimbo Wales: FYI). Have you forgotten the Wikipedia Seigenthaler biography incident already? When the information is curated on Wikipedia, we see changes on our watchlists, and can insist on sourcing. There is no sourcing for Wikidata labels. It is a fundamental mistake to draw possibly contentious data from a source that is incapable of showing a reference, and it's stupid mistakes like this that makes it difficult for any of us who are working on legitimate ways of using Wikidata in the English Wikipedia. You will take the existing descriptions down – despite what you think about "users' learning experience" – because sooner or later, you'll put the WMF in the position of being sued by an aggrieved BLP subject and I'll make sure that they know you were warned about the potential danger and chose to do nothing. --RexxS (talk) 14:43, 6 October 2017 (UTC)
- How much can you be "working on" item 2? We want you to agree to follow the rules and you are "working on it"? That does not sound good faith, it sounds like you are weaseling for loopholes or stonewalling. · · · Peter (Southwood) (talk): 06:10, 4 October 2017 (UTC)
- @DannyH (WMF): you may recall the unease that followed President Trump failing to mention the Jews on International Holocaust Remembrance Day because, the White House said, other groups had suffered too. [5] The WMF description follows suit, with "programme of systematic state-sponsored murder by Nazi Germany". It isn't wrong, but it's the product of one person on Wikidata deciding in 2013 to override the academic debate about the definition of the Holocaust, without anyone on the English Wikipedia realizing they had done it. (Please don't reply that therefore we must watchlist Wikidata; no one wants to be forced to take part in another project.) Before the change, the Wikidata description was: "Nazi German genocide of approximately six million Jews during World War II." [6] (It could be worse; at one point it said: "The 6 million Lie".) [7] SarahSV (talk) 22:00, 6 October 2017 (UTC)
- Worth noting that "6 million lie" was reverted within 19 minutes, back in March. Since then anti-vandalism has continued to improve. Increased visibility, less flooded watchlists, watchlist-based reversion, and better integration with en-wiki tools should all help further, for those that do want to do anti-vandalism.
- Also worth noting that the description currently reads "state-sponsored genocide of Jews by Nazi Germany", even if only since last night, and even if as you note some might see this as an overly narrow definition. Jheald (talk) 23:30, 6 October 2017 (UTC)
- Besides, the proposed override mechanism addresses exactly this, allowing en-wiki to supply hand-crafted descriptions for the highest value and most sensitive articles, without requiring such measures for the multitude of run-of-the-mill stubs. Jheald (talk) 23:44, 6 October 2017 (UTC)
- Ymblanter changed it 13 minutes after I posted here. [8] (Jheald, I didn't say anything about an overly narrow definition.) The point is not what it says, but that one person on another website can override the debate. SarahSV (talk) 23:54, 6 October 2017 (UTC)
- (ec) You noted that there was "academic debate" as to the most appropriate definition. At The_Holocaust#Terminology, where we review that debate, we review three potential definitions presented by Gray; and that Niewyk and Nicosia preferred a broader definition. You noted there is a debate; the position of some in that discussion is to prefer a broader definition.
- As to the proposition that "one person on another website can override the debate", this is precisely what the suggested "local override" option addresses, putting ultimate control back in the hands of en-wiki. Jheald (talk) 00:21, 7 October 2017 (UTC)
- More importantly, how is an patroller to know that a particular description may or may not be contentious? The announcement of this idea used two screenshots of Maria Callas at mw:Reading/web/Projects/Wikidata Descriptions. The potential problem was explained (and ignored) on 26 August 2016 on the talk page mw:Talk:Reading/web/Projects/Wikidata Descriptions:
- [Melamrawy (WMF):] "In that specific example, why would the "American born Greek" description be problematic if Maria Callas was alive?"
- [RexxS:] "Because completely unsourced descriptions of a living person's ethnicity are potentially offensive or libellous, or both. Callas was an American of Greek descent and may have rightly objected to being identified otherwise without any supporting evidence. The idea that we might be using somebody's unverifiable judgement of a person's identity is so far from WMF's policy on living people that you really ought not to have to ask why it's a bad idea."
- Nothing I've seen since has led me to believe that WMF staff have any better understanding of "
Contentious material about living persons ... that is unsourced or poorly sourced—whether the material is negative, positive, neutral, or just questionable—should be removed immediately and without waiting for discussion.
" Nor have I revised my opinion that somebody's unsourced opinion (the Wikidata description) is acceptable in any way, shape or form as a description of a living person. No source = no inclusion on enwp. That's not negotiable. --RexxS (talk) 00:12, 7 October 2017 (UTC)- And yet, we allow free summarisation in the lead paragraphs of an article of the rest of that article. Don't we? Jheald (talk) 00:25, 7 October 2017 (UTC)
- Jheald, I believe you said something above about leads not requiring sources, but they do for anything contentious, including BLP issues. WP:LEADCITE: :"Any statements about living persons that are challenged or likely to be challenged must have an inline citation every time they are mentioned, including within the lead." And "Complex, current, or controversial subjects may require many citations; others, few or none." SarahSV (talk) 01:29, 7 October 2017 (UTC)
- And yet, we allow free summarisation in the lead paragraphs of an article of the rest of that article. Don't we? Jheald (talk) 00:25, 7 October 2017 (UTC)
- Ymblanter changed it 13 minutes after I posted here. [8] (Jheald, I didn't say anything about an overly narrow definition.) The point is not what it says, but that one person on another website can override the debate. SarahSV (talk) 23:54, 6 October 2017 (UTC)
Concerning vandalism on wikidata, notably on WP:BLP, Suga entry on Wikidata is being the subject of unabashed IP vandalism for many days already, without nobody apparently noticing it, with the vandal editions alternating with regulat botlike editions, messing up the whole thing. I was testing an Infobox using Wikidata information on that article at the Portuguese Wikipedia, but had to remove it from there. Not only the Wikidata entries do not seem to be properly monitored at all, they stay subject to vandalism/edit wars after the wikipedia article is protected (as was the case with Suga). The presence of the wikidata infobox there, with direct links to easily edit the Wikidata entry, was also apparently serving as a magnet to attract vandalism from our project straight into the wikidata entries, which are off-limits of the wikipedia watchlist (and which I'm not interested watching, anyway, I already have plenty of things to follow in my regular projects). As it is now, I do not consider Wikidata capable of being used on WP:BLP Infoboxes, at all.-- Darwin Ahoy! 03:57, 7 October 2017 (UTC)
Wikidata values appearing in Wikipedia are effectively bot edits
Various discussion above addresses whether or not Wikidata values should only be included when sourced, or subject to other restrictions or limitations.
Most editors on Wikipedia are familiar with things like BLP issues, what sort of information should be included where, and what does or doesn't need to be sourced. For example information in an infobox doesn't need a source if the information is sourced in the body of the article. We allow new users to edit without learning all of those rules first. We allow that as an expected and accepted part of teaching new contributors how to edit here productively.
Bot-edits are subject to far stricter scrutiny than human edits. These kinds of automated mass-changes are expected to affirmatively demonstrate (1) the clear positive value of the class of edits, (2) very high scrutiny on any error rate that might downgrading existing content or require human cleanup, and (3) strong observance of any relevant policies such as sourcing requirements.
Wikidata values appearing on Wikipedia are effectively bot-edits build directly into the Wikimedia software. A staggering proportion of edits at Wikidata are bot edits, then software is preforming a bot-like transfer of that bot-edit edit to Wikipedia.
Where Wikidata edits are preformed by humans, the software still preforms a bot-like transfer of that edit to Wikipedia. Neither the wikidata edit nor the transfer to Wikipedia are effectively subject to Wikipedia policies. The Wikidata edit is often done by people who have no knowledge of Wikipedia policies, and we are unable to educate/integrate that person to make more appropriate edits in the future. If a new user makes an edit here and we have to clean it up, we're investing in teaching that user how to edit here. If we have to clean up an edit that was bot-transferred to Wikipedia, we have to keep cleaning up future edits. Even if we go to wikidata and clean it up there, even if we contact that user on wikidata trying educate them what kind of edits we want, wikidata isn't subject to our policies. (Not to mention the fact that there's no reason to expect that user to speak English.)
If we look at wikidata-on-wikipedia as bot-edits, it becomes clear that the concerns raised here aren't new. Wikidata isn't facing random attacks, it isn't being baselessly held to random higher standards by haters. If the software for displaying wikidata-on-wikipedia were separated out, if it were a bot making edits to copying these values onto wikipedia, if that bot were submitted as a routine bot-approval request, I think it that bot-approval request would go down in flames. Alsee (talk) 14:56, 8 October 2017 (UTC)
- I do not quite understand your allegations. There is a process on Wikidata on bot task approval, quite similar to the English Wikipedia process, subject to the community scrutiny, and as a Wikidata crat I personally approved most of the requests. Currently, the vast majority of bot edits are actually transfer of data from reliable databases, which provide sources and would even satisfy the BLP policies of any Wikipedia.--Ymblanter (talk) 15:08, 8 October 2017 (UTC)
- Ymblanter, I'll try to explain it more simply and explicitly for you.
- Someone goes to the wikidata item for city, and adds an unsourced statement that it is: capital_of country.
- The Wikidata community then approves a bot to create a massive number of reciprocal statements.
- That bot sees city says it is capital_of county. It copies that info into the wikidata item for county, saying that country has capital city.
- The bot helpfully adds a reference for that edit: Stated_in Q(city). Note that this is a circular reference, saying this information was sourced from the wikidata item for city.
- The wikimedia software then copies that information from the wikidata to the EnWiki infobox for county stating the capitol is city.
- I am saying that step 4 is functionally a bot edit, copying the information from wikidata to wikipedia. And as an added bonus, independent of the "bot" issue, this bypasses the filter against "unsourced or wikipedia_sourced" information. It was unsourced information when it was added to Q(city), and it did not magically become sourced when it was copied to Q(county). And it was effectively a bot that copied the unsourced information into wikipedia. There are a massive number of unsourced/wikipedia_sourced statements with WP:circular-references bypassing the "sourced only filter". I identified over a million wikipedia-sourced statements bypassing the filter[9], and I can't even begin to determine how many circular "Stated in: Q(other wikidata item)" are bypassing the filter. All I know is that humans and bot-runs have been creating them in large numbers. I ran into a whole pile of them in just a few minutes of browsing an arbitrary list of wikidata items. Alsee (talk) 10:59, 20 October 2017 (UTC)
- No, I do not think we approve such bot tasks, at least definitely not now.--Ymblanter (talk) 11:07, 20 October 2017 (UTC)
- I don't know what bots are being approved at this very moment, however the proposal to create "inferred from"[10] explicitly states such bots were "currently" running as of a year ago:
Currently we have a bots that automatically add inverse statements. They sometimes use "stated in". I think it would be worthwhile to have a more specific property for the use case. ChristianKl (talk) 10:07, 27 September 2016 (UTC)
. And as I said, I came across many of these WP:circular references in just a few minutes when I was randomly skipping though Wikidata items matching EnWiki articles bearing a specific template. So these are far from rare. Alsee (talk) 13:35, 20 October 2017 (UTC)
- I don't know what bots are being approved at this very moment, however the proposal to create "inferred from"[10] explicitly states such bots were "currently" running as of a year ago:
- No, I do not think we approve such bot tasks, at least definitely not now.--Ymblanter (talk) 11:07, 20 October 2017 (UTC)
Wikidata values appearing in Wikipedia are not effectively bot edits
Most of the 48,261,252 registered users are not familiar with things like BLP issues, what sort of information should be included where, and what does or doesn't need to be sourced. Judging by the number of times we have to clean up mistakes, most of the 121,930 active users aren't particularly familiar with those issues either. Here's an example: with very few exceptions, all information displayed in an infobox must also be included in the body of the article, which makes the information subject to the normal sourcing rules, with the caveat that we often don't normally repeat citations for summaries such as the lead or the infobox (although it must be sourced if challenged even in the lead or infobox). We have editors who have been editing since 19 September 2006 who don't know that.
Wikidata values displayed on Wikipedia are passed through a filter which rejects unsourced data. In that way, they are better sourced than statements made directly on Wikipedia, where the local editor may choose, deliberately or through ignorance, not to include a source. Of course a cite on Wikidata may be bogus, but that applies equally to a cite made directly on Wikipedia. At least when the data comes from Wikidata, we are able to guarantee it's sourced. That means that we insulate Wikipedia from whatever is used to provide the data to Wikidata: if a bot or human adds a source, we get sourced data; if the bot or human doesn't add a source, we don't let it into the infobox. There is nothing bot-like about the process of fetching Wikidata into Wikipedia. The code is written by a dinosaur, not a bot. Whatever policies need to be met, we can write the software to meet them, or make it easy for editors to amend any inaccuracies that they spot in the value returned from Wikidata. When we fix mistakes on Wikidata, we don't just improve things for us, but for all the other 288 active Wikipedias.
We have the ability to make any infobox Wikidata-aware without disturbing a single article already using that infobox ("opt-in"). We can give the editors who curate a particular article the ability to switch on or off the fetching of data from Wikidata, in the infobox as a whole or field-by-field. We can provide links from the infobox directly to the statement on Wikidata where the information is fetched from. With that degree of control available in a suitably designed infobox, there's no doubt that if the Lua modules that actually do the fetching were subject to the same sort of scrutiny as bot applications, they would be passed with flying colours. --RexxS (talk) 17:36, 8 October 2017 (UTC)
- I don't agree with you here RexxS and I ask you not to obscure the underlying issue, that enabling Wikidata in an infobox and then deploying that infobox widely is way more "bot"-like than any kind of normal editing -- a) the coding to set up these infoboxes is advanced, like bot programming and not something in the realm of non-coders (i haven't invested the time to learn, and am not interested at this time in doing so); and b) it does exactly allow changing one thing (one data field in Wikidata, or one bot run in Wikidata) to change many articles in Wikipedia, which is way more bot-like than normal editing. Enabling something like should be thought through and approved before release infobox-by-infobox before they are deployed.
- And please don't use this old saw about "go fix it on Wikidata"; since you have, the standard responses to that remain the same as always -- a) WP editors are not necessarily volunteering to edit Wikidata (some may; I for one don't); b) this sets up a situation where WP editors need to go to Wikidata to fix policy violations or mistakes, which is essentially blackmail and hijacking of WP volunteer time; and c) driving WP editors to go edit WD just sets up intra-project edit wars between projects that have different policies and guidelines. Jytdog (talk) 11:52, 10 October 2017 (UTC)
- Advanced coding is not the same as bot work. I agree that there are potential issues around differing policies, but I think that particular point is the one obscuring the underlying concerns. Nikkimaria (talk) 14:24, 10 October 2017 (UTC)
- RexxS wtf? If a human adds unsourced information to an infobox, that's not a bot edit. That is a newbie, and we are willing to invest cleanup work in the hope of them making better edits tomorrow. If a piece of software copies unsourced information (or copies sourced information) from wikidata into wikipedia, that is a bot edit. The fact that someone at WMF shoves that bot into the wikimedia software itself doesn't change anything.
- Regarding your comment
Wikidata values displayed on Wikipedia are passed through a filter which rejects unsourced data
, that filter is doesn't work. I just found two vast classes of wikidata statements bypassing that filter. There are over a million wikipedia-sourced statements were bypassing the filter via a tools.wmflabs ref. Those could plausibly be added to the filter. I found an unknown but vast numbers of unsourced statements bypassing the filter via circular refs "Stated in: Q(other wikidata item)". Not only have humans been creating those circular refs, there were bot runs creating them en-mass. I invite you to add that to the filter - trying to filter them would pretty much nuke the import of most other wikidata statements. But more importantly, all of this highlights the utter worthlessness of the "sourced-only" option in general. Those are obviously not the only problems. The wikidata-community has jack-squat expectations for sourcing or anything else. Alsee (talk) 11:22, 20 October 2017 (UTC)- Stated in Q [item] is designed to be used when Wikidata has an entry on a particular source, such as a book or journal article so would not be a circular reference. Richard Nevell (talk) 20:16, 31 October 2017 (UTC)
- Richard Nevell I am discussing the references that are circular.
- Aquitaine (Q1179) has capital (P36) Bordeaux (Q1479). UNSOURCED. Added 7 February 2013.[11]
- Bordeaux (Q1479) capital of (P1376) Aquitaine (Q1179). REFERENCE stated in (P248) Aquitaine (Q1179). The information was copied from Aquitaine (Q1179) on 7 July 2015.[12]
- In fact that second diff shows eleven circular references in a row, all "Stated in: (other wikidata item)". Humans and bots have been creating these circular references en-mass. The presence of a circular references means it bypasses the infobox filter. The filter that supposed only imports information with a source (other than Wikipedia). Either we filter out naked "Stated_in:" refs (which would massively nuke any import of wikidata into infoboxes), or we have to acknowledge that the filter is a sham. Alsee (talk) 21:58, 31 October 2017 (UTC)
- Thanks for spotting that, I'll keep an eye out for similar cases where the referencing can be improved. Richard Nevell (talk) 01:25, 1 November 2017 (UTC)
- Richard Nevell, I apologize if I'm over explaining, but I want to be sure I'm clear here. This is not a stray bad-ref I spotted. Editing this one wikidata item, or editing a thousand wikidata items, makes no real difference. New users as well as many of the more prolific wikidata editors and bot operators have been systematically creating these refs en-mass. This is standard practice at Wikidata. I found upwards of a million wikipedia-sourced items bypassing the filter because the references don't include the word "Wikipedia", and I can't even begin to determine how many circular refs have been created by humans and bots over the years. I think "upwards of a million" circular refs is a very conservative lower-bound, given that many bots were doing this. In just those two classes of references we've got millions and millions of items bypassing the filter. And those are hardly the only classes of junk references on Wikidata. The whole idea of a filter to exclude unsourced and Wikipedia-sourced items little more than a hollow sales-pitch. If we use Wikidata-in-Wikipedia then we have to accept that all of this content is exempt from the normal Wikipedia policies, expectations, and standards. They are instead largely subject to Wikidata's mostly-nonexistent standards. Alsee (talk) 13:14, 1 November 2017 (UTC)
- That's interesting, is there a query that figure is based on? That would help with assessing the scale. Richard Nevell (talk) 18:58, 1 November 2017 (UTC)
- I don't think it's possible to do a query searching for circular refs. They look just like any other bare "Stated_in: Q#" ref. All I know is that bots have been mass-creating them, and they are apparently extremely common. In just a few minutes of browsing arbitrary wikidata items I came across a lot of them. However I did teach myself how to do wikidata queries a few days ago, specifically investigating Wikipedia-sourced items which bypassing the filter against Wikidata-sourced. Here is a very crude query which searches for refs to tools.wmflabs.org/heritage. Those are actually refs to Wiki-loves-monuments data extracted from Wikipedia. I couldn't determine how many instances of that specific reference exist because the query times out before it can complete the search. My highest search count that didn't die-to-timeout confirm that there were upwards of 1.1 million of these particular refs bypassing the filter. I also think it unlikely that circular refs and wmflabs.org/heritage refs are the only two classes of refs bypassing the filter. They're just two that immediately popped out at me when I started considering the filter-issue. Alsee (talk) 20:14, 1 November 2017 (UTC)
- That's interesting, is there a query that figure is based on? That would help with assessing the scale. Richard Nevell (talk) 18:58, 1 November 2017 (UTC)
- Richard Nevell, I apologize if I'm over explaining, but I want to be sure I'm clear here. This is not a stray bad-ref I spotted. Editing this one wikidata item, or editing a thousand wikidata items, makes no real difference. New users as well as many of the more prolific wikidata editors and bot operators have been systematically creating these refs en-mass. This is standard practice at Wikidata. I found upwards of a million wikipedia-sourced items bypassing the filter because the references don't include the word "Wikipedia", and I can't even begin to determine how many circular refs have been created by humans and bots over the years. I think "upwards of a million" circular refs is a very conservative lower-bound, given that many bots were doing this. In just those two classes of references we've got millions and millions of items bypassing the filter. And those are hardly the only classes of junk references on Wikidata. The whole idea of a filter to exclude unsourced and Wikipedia-sourced items little more than a hollow sales-pitch. If we use Wikidata-in-Wikipedia then we have to accept that all of this content is exempt from the normal Wikipedia policies, expectations, and standards. They are instead largely subject to Wikidata's mostly-nonexistent standards. Alsee (talk) 13:14, 1 November 2017 (UTC)
- Thanks for spotting that, I'll keep an eye out for similar cases where the referencing can be improved. Richard Nevell (talk) 01:25, 1 November 2017 (UTC)
- Richard Nevell I am discussing the references that are circular.
- Stated in Q [item] is designed to be used when Wikidata has an entry on a particular source, such as a book or journal article so would not be a circular reference. Richard Nevell (talk) 20:16, 31 October 2017 (UTC)
How does Wikidata handle non-overlapping meanings in coupled articles?
Example:
A problem, causing a lot of fruitless discussion on en.Wikipedia talk pages, is that de:Choral, correctly coupled to, among others, chorale at en.Wikipedia, can mean a lot of other things too in German (including e.g. Gregorian chant) not covered by any meaning of the English-language counterpart; while, on the other hand some English-language meanings of chorale might be translated to another term in German (e.g. de:Geistliches Lied).
For clarity: the meaning of English chorale and German Choral are exactly the same in most cases, e.g. Fantasy and Fugue on the chorale "Ad nos, ad salutarem undam" = Fantasie und Fuge über den Choral "Ad nos, ad salutarem undam" (emphasis added) – just trying to take away any potential doubts that the coupling made by Q724473 wouldn't have been correct.
What can Wikidata do to help sort out such issues of not completely overlapping meanings in different languages? --Francis Schonken (talk) 12:00, 23 October 2017 (UTC)
- As I understand it, you can report such issues at d:Wikidata:Interwiki conflicts. It may then take a long time until they are resolved. —Kusma (t·c) 13:04, 23 October 2017 (UTC)
- Thanks, but no: the thus-labelled interwiki conflicts are all about "how-to-couple?" issues. As I tried to make extensively clear above, there's no residue of a "how-to-couple?" question in the given example: the couplings are correct, they are satisfactory and afaics there is nothing to be resolved there (it would seem almost devious to recast this as an interwiki conflict: there is none). --Francis Schonken (talk) 13:50, 23 October 2017 (UTC)
- Francis Schonken, it is a design_flaw/design_limitation that Wikidata simply can't deal with language differences. The way Wikidatans "handle" it is to link an arbitrary pair of articles then mark it as "resolved-for-wikidata" despite nothing being resolved. They have been discussing this for years with zero progress (as far as I'm aware).[13][14][15] You can't have a wikidata interlanguage link from EnWiki Seesaw to Polish/Czech, and Polish/Czech wikidata interlanguage link to EnWiki Seesaw. The languages divide up the concept of playground-objects differently, and wikidata simply chokes on the idea that languages can handle concepts differently. I believe there is a similar problem with Soldier in Indonesian. Wikidata's #1 priority is to cater to machines, which means Wikidata says "screw you" to human-reality whenever human-reality doesn't fit into neat little computer-boxes. What is needed is the ability to interlink one page to multiple pages in a foreign language, but the wikidata community has rejected all approaches for doing so. Alsee (talk) 15:55, 1 November 2017 (UTC)
- I rather meant something to this effect (but then more visible in Wikidata's mainspace than on a less noticeable talk page). The rest of your explanations was what I more or less understood before asking the question (whether or not a "design_flaw/design_limitation", the question is: "What can Wikidata do to help sort out such issues ... ?", being understood in that question: from now on). --Francis Schonken (talk) 09:52, 2 November 2017 (UTC)
- Francis Schonken, it is a design_flaw/design_limitation that Wikidata simply can't deal with language differences. The way Wikidatans "handle" it is to link an arbitrary pair of articles then mark it as "resolved-for-wikidata" despite nothing being resolved. They have been discussing this for years with zero progress (as far as I'm aware).[13][14][15] You can't have a wikidata interlanguage link from EnWiki Seesaw to Polish/Czech, and Polish/Czech wikidata interlanguage link to EnWiki Seesaw. The languages divide up the concept of playground-objects differently, and wikidata simply chokes on the idea that languages can handle concepts differently. I believe there is a similar problem with Soldier in Indonesian. Wikidata's #1 priority is to cater to machines, which means Wikidata says "screw you" to human-reality whenever human-reality doesn't fit into neat little computer-boxes. What is needed is the ability to interlink one page to multiple pages in a foreign language, but the wikidata community has rejected all approaches for doing so. Alsee (talk) 15:55, 1 November 2017 (UTC)
- Thanks, but no: the thus-labelled interwiki conflicts are all about "how-to-couple?" issues. As I tried to make extensively clear above, there's no residue of a "how-to-couple?" question in the given example: the couplings are correct, they are satisfactory and afaics there is nothing to be resolved there (it would seem almost devious to recast this as an interwiki conflict: there is none). --Francis Schonken (talk) 13:50, 23 October 2017 (UTC)