Jump to content

User talk:GreenC/WorksByProject

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Greetings

[edit]

HI! I needed to add an article on a Gutenberg author in 2006, and I therefore fell into the habit of cleaning up the Gutenberg list on an occasional basis starting then. I also created the Worldcat Id template and began adding it. I realized at that time that we needed a much better method of resolving the various online resources, but I never did the hard work to actually resolve these resources. Thank you very much for your effort. May I make a suggestion? Do not try for perfection. Use algorithms where possible, but emit lists of ambiguous results when the algorithms cannot converge. A small team of humans can then work from these lists. This is more or less what I was doing with the Gutenberg lists. I will happily join your team and abandon the ancient Gutenberg effort, and recommend that it be terminated. But hey, notice that I was able to add a few hundred Gutenberg template using manual methods. -Arch dude (talk) 03:49, 18 February 2015 (UTC)[reply]

Hi Arch dude, good to hear from you. Yes that is what happened, the algo picked up some (majority), and others were sent to a log file where they were sorted out manually. The job is done, I've gone through the 20,000+ names and found them all (about 9000 matches). I'm currently in the process of adding them to Wikipedia, which is a time consuming. I add about 100-200 names a day, and there are over 9000, so I figure by spring I'll be finished. It's more than Gutenberg, also LibriVox and Internet Archive and some other things. Regarding WorldCat, the template is still useful in certain situations (such as when an author has multiple LCCNs), but mostly it's been superseded by {{Authority control}} which aggregates all the library resources. Otherwise WorldCat gets listed twice in an article.
If you would like to help with adding, it should be done with a script as there is so much going on that the script handles. If interested, it uses WP:AWB and Linux via Cygwin on Windows. -- GreenC 14:46, 18 February 2015 (UTC)[reply]

Is this a useful exercise

[edit]

Can we discuss this edit to the article James Wadsworth (Spanish scholar and pursuivant).

What was the point of the edit to this specific article? -- PBS (talk) 17:17, 21 February 2015 (UTC)[reply]

Yes I'm aware Internet Archive sometimes returns false positives (FP). It's a rare condition. I didn't notice it until well into the process. I plan on writing another script to re-examine the {{Internet Archive author}} "what links here" for FP conditions. It's not easy to determine (via a script), but I have some ideas how and I'll take care of it once done Step 4 is completed. -- GreenC 18:17, 21 February 2015 (UTC)[reply]
That does not explain the addition of
  • Works by James Wadsworth at Project Gutenberg
  • Works by James Wadsworth at LibriVox (public domain audiobooks) Speaker Icon.svg
which appear to be empty. It is better to add no information that false information. WP:AWB specifically states "Warning: You take full responsibility for any action you perform using AutoWikiBrowser." You should be checking each edit by AWB, editing it if necessary, before you click the save button. In this case you should have clicked the skip button. -- PBS (talk) 17:12, 23 February 2015 (UTC)[reply]
I don't know what you mean. They both have books. -- GreenC 18:20, 23 February 2015 (UTC)[reply]

HathiTrust

[edit]

This has been (and continues to be) an extremely valuable exercise. May I ask whether you have also considered doing something similar for books in the HathiTrust repository? GrindtXX (talk) 20:07, 2 June 2015 (UTC)[reply]

Hello [[GrindtXX. Yes agreed at a min it needs a template and conversion of the existing bare URLs. I'm still working on the three projects which has taken over 6 months so hopefully I'll get to it. Other commonly used book sites that also need work: unz.org, The Online Books Page, Open Library, Manybooks.net, Adelaide.edu.au .. but of these Hathi is highest priority since it represents Google Books the second-largest originator of scanned books. -- GreenC 15:40, 4 June 2015 (UTC)[reply]

Open Library?

[edit]

Any intent to do likewise for Open Library? It is admittedly problematic in that various versions of an authors name can appear there as distinct records, but in most regards it is better structured than the corresponding Internet Archive records.LeadSongDog come howl! 03:24, 24 June 2015 (UTC)[reply]

Yes noted above in my reply about Hathi, there are a dozen or so websites that could be added. I just focused on the three I thought were most important. Each requires its own coding. Of course the real work is not the coding, but actually running AutoWikiBrowser for each name and adding it manually in the External Links section. That's something anyone could do. I've automated things as much as possible so its considerably faster then doing it manually, but is still slow grudge work. If anyone wants the code to do it let me know (based on unix scripts running on Cygwin/Windows). -- GreenC 14:16, 24 June 2015 (UTC)[reply]
I've done a fair amount of work on open library, and it is quite common to encounter multiple authors with the same name. These often have several records each, sometimes "John Smith", "Smith John", "Smith, John", "Jean Smith", sometimes with birth or death dates. To further confuse matters, book records often are linked to the wrong author record. They really need to do some work on automation of cleanup, but they are understaffed and rather overwhelmed.The result for Wikipedia is that before using an OL record it needs close verification. It is much less reliable than is isni or viaf. LeadSongDog come howl! 03:19, 27 June 2015 (UTC)[reply]