User talk:OrangeCorner
Nomination of Mastercoin for deletion
[edit]A discussion is taking place as to whether the article Mastercoin is suitable for inclusion in Wikipedia according to Wikipedia's policies and guidelines or whether it should be deleted.
The article will be discussed at Wikipedia:Articles for deletion/Mastercoin until a consensus is reached, and anyone is welcome to contribute to the discussion. The nomination will explain the policies and guidelines which are of concern. The discussion focuses on high-quality evidence and our policies and guidelines.
Users may edit the article during the discussion, including to improve the article to address concerns raised in the discussion. However, do not remove the article-for-deletion notice from the top of the article. Smite-Meister (talk) 22:36, 7 January 2014 (UTC)
Current Work on the Public Domain Books Stub Project
[edit]List of TO DO's as suggested by AnomieBOT
- There are a number of things that must be done before such a bot can run:
- You must obtain access to the information in some machine-parsable format, so we don't have to crawl a million Google pages scraping the information. This need not be direct access to a live database of any sort, a dump of the necessary metadata or a way to download the list of PD books and the metadata for each book is fine. Periodic notification of new and updated PD books would also be nice, even though it would probably take over a year for the bot to get through the first million at normal editing rates (10 seconds per book that doesn't already have an article, plus 10 seconds per image if applicable, plus downtime whenever the Wikipedia servers are more than 5 seconds lagged).
- In fact, the ability to download a list of all PD books last modified in a given date range plus the ability to download the metadata just for specific books would probably be the most convenient, especially if the server supports HTTP persistent connection. The bot could then just download each month's worth of titles, check if each book's article already exists, and download the metadata for just the books it needs.
- The metadata should contain as many of the fields in {{Infobox Book}} as possible, the more you have the better your chances of getting community consensus for the proposal. Also, if available a synopsis would be helpful for including more than just "X is a book written by AUTHOR and published by COMPANY in YEAR" in the stub. And, of course, we need whatever information is necessary to generate a link back to Google's human-readable page for the book.
- If the metadata does include the synopses, you'd probably need to get permission sent from Google to WP:OTRS for those synopses to be uploaded as part of the article under the CC-BY-SA (or, better yet, Wikipedia's CC-BY-SA/GFDL dual license) as there may be sufficient original work in summarizing the book to garner copyright protection for the summary. Or get Google to just officially and explicitly state somewhere on their site that their synopses of PD books are themselves PD or CC-BY or CC-BY-SA or CC-BY-SA/GFDL dual licensed.
- If the metadata contains images (or reference to images) appropriate for the infobox, you'd also need to either determine that those images must be PD (e.g. as slavish reproductions of a 2D image; asking at an appropriate Commons page (e.g. Commons talk:Licensing) would be your best course of action for that), get permission sent from Google to WP:OTRS for those to be uploaded to Commons under a free license of their choice, or get Google to just officially and explicitly state somewhere on their site that their images of PD books are themselves PD or are released under an appropriate free license.
- Even if the bot proposal doesn't get community support, such permission would be beneficial to the project anyway for use in manually-created articles on these books.
- A strong community consensus must be obtained for a bot to create all these stubs. This probably means a full 30+ day RFC advertised on WP:VPR, Template:Cent, WT:BOOKS, WT:BK, and anywhere else you can think of. Since the details of the proposal in the RFC will depend on just what metadata is available (e.g. having synopses and images would be a big plus), it may be best to wait on this until the above are successfully completed.
- You must obtain access to the information in some machine-parsable format, so we don't have to crawl a million Google pages scraping the information. This need not be direct access to a live database of any sort, a dump of the necessary metadata or a way to download the list of PD books and the metadata for each book is fine. Periodic notification of new and updated PD books would also be nice, even though it would probably take over a year for the bot to get through the first million at normal editing rates (10 seconds per book that doesn't already have an article, plus 10 seconds per image if applicable, plus downtime whenever the Wikipedia servers are more than 5 seconds lagged).
My two pennorth about Google Books stubs
[edit]Hi OrangeCorner,
- I don't know if you've asked for consensus yet, as it was late in the list. Personally I would oppose such a large-scale bot addition.
- I think we should remember that Wikipedia is supposed to be an encyclopaedia, not a search engine. Therefore, on the whole I regard stubs – and especially thousands, or millions, of bot-generated stubs – to be essentially empty content. If they simply replicate information in Google Books, why not just let people find that in Google Books?
- The value of the encyclopaedia is that people can find useful content about subjects, not that such content exists. I've experienced this on a smaller scale in the realm I am currently concentranting on, Hungary-related articles. There are a couple of thousand place stubs, probably 90% of which are essentially empty. I argue that these are literally worse than useless, since it gives the impression of WP having content where none actually exists. I am not suggesting this is your view, but I think in the case of some editors they like the cachet of having "created an article" when in fact that article is a cipher and hard-working editors like me then have to go to it, correct the information, translate it or whatever, get slapped for not having reliable sources and all that, whereas it was apparently entirely acceptale for it to sit simply as a waste of electrons.
- I would certainly strongly oppose importing a million stubs. That would be about a quarter of the entire encylopaedia. I imagine this has happened a bit with the taxonomic articles of the natural world, though I can slightly see a more coherent case for that, as the exisiting presence of the taxonomy means the articles can be automatically structured on that taxonomy whereas for the book articles it would seem very difficult, I think, to find a good structure. I think you must think very carefully of the structure into which you intend to fit these stubs (categories etc), and how you deal with cases where a book already has an article, or something else has the title you want, etc. It may be, of course, that Google Books already has a structure that can be replicated, in which case all well and good, but that leads me to my original question why not leave it as a redlink and have people use Google to search for it in the first place? WP is not, and should not be, the entire World Wide Web.
- You have not explicitly stated you intend to import a million stubs, but you seem to imply it, and if that is your intention, I strongly oppose it. To have a million stub articles on one subject is just simply so much junk. Although I agree the synopses would perhaps be useful, and leaving licensing issues aside, I would worry that many if not most would be considered Original Research, and also how we deal with author attribution, even under a PD license, since in its current incarnation WP does not allow articles to be attributed to an author.
- I would, however, support assistance through templates, tools, bots etc to allow such book stubs (e.g. for the infoboxes etc) to be automatically or semi-automatically generated for books where an editor does indeed wish to add meaningful content, i.e. to use as a starting point. That kind of automation process is extremely useful, I think, to reduce the tedium of creating the scaffolding for each article, and ensuring consistency as far as possible. But simply providing this scaffolding with no intent to add content, be it for these stubs or other bot-generated stubs, I find does not improve, and often detracts from, the quality of Wikipedia. It is like going into a shop and finding that nothing is for sale because it is all just display items. Although in theory nobody here is better than any other, people also get a kinda cachet for the number of edits they make or articles they create, and this actually in my mind is counterproductive as it favours creating many many content-free articles instead of actually working hard on constructing a few content-rich ones. In short, I would prefer a WP with fewer articles that had greater content to one with more articles which contain next to nothing.
- As a compromise position, I would be more inclined to support the idea in principle if you were to suggest that you only take the books that are of a certain standard of notability etc. How an editor or bot could measure that qualitatively or quantitively I am not in a position to say, but I think it must be taken into account. Then the bar can be set very high, and slowly lowered if the, say, 1,000 stubs that are created on first run are measured either to have subsequently been edited or hit by readers.
- I am sorry to make this long, but I hope that sets out my position reasonably clearly. I do appreciate that your idea is in good faith, and you have been nothing but constructive in the way you have gone about proposing it (e.g. you did not just try to get any old bot approved and then import a million pieces of garbage; the bot approval process is I believe flawed in that more or less it is approved that it functions correctly, not that it should be applied at a particular scale or for a particular subject etc. This is not a fault of that process as such, just that some bot proposers seem to take a bot's approval as giving them carte blanche for its use, regardless of whether it is appropriate for a particular area of Wikipedia.) Similarly, I hope you treat my reply here as constructive. Let's all make WP better. Si Trew (talk) 11:09, 5 December 2009 (UTC)
- PS please feel free to move, copy or refer to this from other fora that may be more appropriate. I don't know where, if anywhere, you've decided to ask for consensus. I'd recommend we try to keep it in one place. Si Trew (talk) 11:12, 5 December 2009 (UTC)
Reply to Input on Google Books Proposal
[edit]Hi Simon Trew,
Thank you for your thoughtful and in-depth reply to my proposal. I appreciate the input you gave about the proposal and the constructive nature of your comments. Let me address your comments paragraph by paragraph.
One. I have not yet requested consensus on this proposal given that the idea is still in the early stages of planning and evaluation. That said I'm glad to have your input and all other input as early on in the process as possible in order to cull the ideas around the concept and begin to gauge community interest and support. I suppose when I do ask for consensus I'll do it here on this page in order to keep all the discussion in one place, excellent suggestion.
Two and Three. I agree with much of what you said about stubs and certainly we want to be careful that any auto generated content is appropriate and meets the general criteria of Wikipedia. Though I shall say a few words in the defense of stubs. I believe many that have been here for a while forget the habits of the new or casual Wikipedia editor. As part of this I would say that it is true the vast majority of new users are very unlikely to "create" a new article. Rather they are much more likely to edit an existing article. There for if we are to capture the input of new and casual editors this can only be done in a case where the article they are searching for already exists. Therefore it falls to the older users to create articles of note and general interest and help draw in the input of new users. My goal with Google Books is to do just this. Leverage the knowledge made available to us from a previously locked away resource "Public Domain books" and use this new tool to find and create notable articles. As to the third paragraph, I agree that WP is about "content" and not simply having a shell article on everything in existence.
Four and Five. No worries, I'm NOT proposing the importing of 1 million stubs sourced from Google Books day one. I think the ultimate number of articles that should or could be created from this resource will depend greatly on what level of notability the community agrees upon for this task. Technically speaking I believe we could easily ascertain the relative notability of a book based on a simple Google search for the "book title + author name" and base our analysis on the number of links that appear. For instance if I search "The innocents abroad By Mark Twain" and see that there are greater than 200,000 links on the internet which contain that title and author, it can be reasonably said this book is of notability. And on the other hand if we Google ""Saggi di critica By Silvio Federici" and the search returns just 1 link on the internet that contains the title and author, than it can be reasonably said this book is NOT of notability. Some where between hundreds of thousands of links and one link lies a gray zone where in is contained a wealth of notable public domain books that deserve articles, but which older users have not yet taken initiative on, given the wealth of literary history. I think this also addresses your comment about third party citation, though certainly Google Books it self is a third party not the author of the book.
Six and Seven. I agree with your position of wanting quality vs. quantity on Wikipedia. However I would point out that it is not necessarily a question of one or the other. Even as WP has expanded exponentially the number of edits per article has also increased over time from 5 or 10 in the beginning to now more than 35 per article. This again goes to my point of older wikipeadians enabling casual editors to make contributions. http://wiki.riteme.site/wiki/File:Number_of_edits_01.jpg I like your compromise position of creating some articles on a the basis of a very high notability and slowly lowering the bar over time if the results are positive. I believe this is certainly a reasonable position to take.
Eight and Nine: I appreciate that you understand my proposal is in good faith and indeed I fully intend to push this effort forward in a fashion that can be supported by my fellow editors. Thank you for the input. And I hope that I can recruit you to also join in this effort as we refine and work to implement this proposal in a workable manor.
Most Sincerely,
OrangeCorner (talk) 07:03, 7 December 2009 (UTC)
Reply to reply
[edit]I'm creating a subsection so as not to have to indent stacks of text. Thanks for your well-considered reply, which well covered the points I made. I have numbered them in my original, so that they are easier to refer to them as you did, and in the future (but did not change the original otherwise).
One: Just in case of doubt, I was not complaining you had not canvassed, just noting that if you had, I'd not seen any other editors' responses and did not know where.
Two, three: I am not against stubs as such, but am against the mass generation of them when there is no intent to add content. As a bit of a "scaffolder" I am well aware how difficult it can be to create just the skeleton of an article, and have no problem with an automated process doing that. (I frequently do it when User:Monkap and I translate articles together; she translates the Hungarian text as she is a native speaker, I do the scaffolding, putting files onto Commons, wikifying, etc.) If creating stubs encourages new(ish) editors to add content, then I am all for that. I have a bit of an advantage in that I am a software developer by trade, so the technicalities of the Wiki language (templates, markup etc) do not faze me, but I am well aware for others it could. To that end, I would go so far as to say a stub should contain empty section titles e.g. (very much an e.g.) "Background", "Synopsis", "References" and maybe include a template succession box of some kind indicating the previous and next publications in the author's oevre– I am not suggesting these exact things, but you see what I mean, the more that can be automatically generated, the more should be, and I would agree (I think it is agreement) that an empty or partially filled succession box is better than none at all, if it encourages other editors to fill in the details (finding the right template can be very tedious). I am, however, against creation of stubs just to fulfill the purpose of being able to say "we have an article on that", e.g. to avoid a redlink or for some cachet that articles are being created, when really they are not. There are of course occasions when such stubs should be created, e.g. if the intent is to create a series of articles it may be useful to create all articles as stubs first, but even so I would argue that this kind of scaffolding should be included from the outset (by which I do not literally mean the first edit).
Four, Five: Yes, an incremental application of the bot seems a good idea – of course it is for others to decide where and how to set the bar. The idea of using the Google hits to gauge notability I think a very good one, and since of course it is the essence of Google itself i.e. put what has most links to what is most popular, there may be an easy, fast way to get this through the Google API. (are you familiar with that?)
Six, Seven: I'm really not sure what to make of those figures. I guess it indicates an increasing active user base (pace the recent press, at least in the UK, suggesting that users are ceasing to be editors) and I guess also that the actual editing process, I mean just the business of adding text and stuff, has become easier with loads of templates etc to help (though it is still too difficult I think for many users, one really needs a good "Wikimedia Editing Environment", perhaps there are some, that is more WYSIWYG). It must amaze new users that they have to write text in this odd Wiki language, and one could do stuff like link lookup and all that stuff really nicely in an Integrated Development Environment. I could write one, at least for Windows, but I would imagine it is reinventing the wheel.) Whether it means that more edits = good I am not so sure, I am not saying it is bad, just that without the context of what kind of edits there are, it is somewhat meaningless. What would be interesting is how many edits by distinct users; for example I typically make very many small, incremental edits, but for your purposes these should really be counted as one edit, at least if they are consecutive.
Eight, Nine: Ditto, and I'm inclined to un-number these paras, with your approval (and in your reply, and here) since they were just tailing off, and it will become a bit ridiculous. Except I guess in any continuing discussion we could just sign off with:
"Eight, Nine Si Trew (talk) 09:11, 8 December 2009 (UTC)"
First Letter Sent to Google Books Team Requesting Assistance
[edit]Dear Google Books Team,
I'm a big fan of your efforts to make the store of human knowledge currently held in books available to the world and I would like to offer you some assistance in this goal.
Summary: I'm currently working with other editors and administrators on Wikipedia to create new articles based on the public domain works available through Google Books. The goal of our proposal is to utilize Google Books to find notable public domain books that do not yet have an Article on WP. We believe this project will help fill in the gaps when it comes to the many publications of authors and other important information that is still trapped in a non-digital format. For more information and to read the on going discussion surrounding this topic, you can visit either my user page or user talkpage: http://wiki.riteme.site/wiki/User:OrangeCorner http://wiki.riteme.site/wiki/User_talk:OrangeCorner
Assistance: I was hoping you would be able to provide us some general assistance in this task. Below is a short Wish list of items that would greatly help us in accomplishing this project.
The Wish List: There are a number of things that must be done before we can code and run a bot to access the wealth of knowledge on Google Books and begin the task of creating stub articles on WP.
1. Obtain access to the information in some machine-parsable format, so we don't have to crawl a million Google pages scraping the information. This need not be direct access to a live database of any sort, a dump of the necessary metadata or a way to download the list of PD books and the metadata for each book is fine.
2. Periodic notification of new and updated PD books would also be nice, even though it would probably take over a year for the bot to get through the first million at normal editing rates (10 seconds per book that doesn't already have an article, plus 10 seconds per image if applicable, plus downtime whenever the Wikipedia servers are more than 5 seconds lagged).
3. The ability to download a list of all PD books last modified in a given date range plus the ability to download the metadata just for specific books would probably be the most convenient, especially if the server supports HTTP persistent connection. The bot could then just download each month's worth of titles, check if each book's article already exists, and download the metadata for just the books it needs.
4. The metadata should contain as many of the fields in infobox (http://wiki.riteme.site/wiki/Template:Infobox_Book) as possible, the more we have, the easier it will be to get community consensus for the proposal.
5. Also, if available a synopsis would be helpful for including more than just "X is a book written by AUTHOR and published by COMPANY in YEAR" in the stub.
6. We need whatever information is necessary to generate a link back to Google's human-readable page for the book.
7. If the metadata does include the synopses, we probably need to get permission sent from Google to WP:OTRS http://wiki.riteme.site/wiki/Wikipedia:OTRS for those synopses to be uploaded as part of the article under the CC-BY-SA (or, better yet, Wikipedia's CC-BY-SA/GFDL dual license) as there may be sufficient original work in summarizing the book to garner copyright protection for the summary. Or Google can just officially and explicitly state somewhere on their site that their synopses of PD books are themselves PD or CC-BY or CC-BY-SA or CC-BY-SA/GFDL dual licensed.
8. If the metadata contains images (or reference to images) appropriate for the infobox, we'll also need to either determine that those images must be PD (e.g. as slavish reproductions of a 2D image; asking at an appropriate Commons page (e.g. Commons talk:Licensing) http://commons.wikimedia.org/wiki/Commons_talk:Licensing get permission sent from Google to WP:OTRS for those to be uploaded to Commons under a free license of their choice, or Google can just officially and explicitly state somewhere on their site that their images of PD books are themselves PD or are released under an appropriate free license.
9. We'll be going through a strong community consensus process to obtain the blessing for our proposal and bot to create all these stubs. This probably means a full 30+ day RFC http://wiki.riteme.site/wiki/Wikipedia:RFC advertised on WP:VPR, http://wiki.riteme.site/wiki/Wikipedia:VPR Template:Cent, http://wiki.riteme.site/wiki/Template:Cent WT:BOOKS, http://wiki.riteme.site/wiki/Wikipedia_talk:BOOKS WT:BK, http://wiki.riteme.site/wiki/Wikipedia_talk:BK and anywhere else we can think of. Since the details of the proposal in the RFC will depend on just what metadata is available (e.g. having synopses and images would be a big plus), our goal right now is obtain the items above in order to successfully complete the community consensus process.
Conclusion: If you could assist us by providing all or some of these wish list items, or by pointing us to appropriate links where they may already exist or offering your guidance in the process this would be very much appreciated.
I hope to hear from you soon.
Most Sincerely,
OrangeCorner Email: OrangeCorner88@gmail.com Wikipedia: http://wiki.riteme.site/wiki/User:OrangeCorner
Google Books Team Response to Request for Assistance
[edit]Hello,
Thank you for your email and for your interest in Google Books. At this time, we don't allow the specific kind of access that you've requested.
As noted in Google's Terms of Service, automated queries are not allowed: http://www.google.com/accounts/TOS.
Sincerely,
The Google Books Team
Archon X Prize error rate
[edit]Hi OrangeCorner,
Thanks for your contribution to the Archon X Prize article. I had to revert it, but I wanted to give you an explanation. This debate was had a while ago and we had a member of the X Prize committee provide a final answer. See here. Clerks. (talk) 13:44, 29 March 2010 (UTC)
European Stability Mechanism
[edit]Thanks for your help over at the European Stability Mechanism page. --Smart30 (talk) 10:22, 28 March 2011 (UTC)