Jump to content

User talk:PockBot

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Archive Old threads are moved to User:PockBot/Archive once responded to

Category:Long Beach, California

[edit]

I just had PockBot do a run on Category:Long Beach, California (talk). For both Long Beach, California and Signal Hill, California, it said "not yet classified", yet both of the articles do have ratings. I saw three or four 404 errors, even though the PockBot page said it was running when the Wikipedia load was very low. Is that the reason that PockBot didn't record classifications for those pages? BlankVerse 10:34, 11 January 2007 (UTC)[reply]

Thanks for your message. It appears that the wikiproject that rated that article uses a slightly different text format to output its ratings, so it didn't pattern-match in the bot's code. I have updated the bot now to take account of this variation and, if re-run, it should correctly flag those articles. 404 errors when server load is low would be very unusual, but the bot just makes requests as do you and I, so if it got a 404 request, that means there was either a server problem at wikipedia (this does happen occasionally) or a network problem somewhere betwee wikipedia and the bot. I just got a couple of wikipedia errors myself when browsing so the site may be having some small problems. As an aside, I note that you commented out the header ont he resulsts page. Just out of interest for further bot development, what was the reason for this? Did it cause a problem in your browser at all? Its meant to enable easy sorting of the results by either column. Many thanks, and thanks for using the bot - PocklingtonDan 10:49, 11 January 2007 (UTC)[reply]

Cycle handling; descendant-Cat reports; setting crawl limits

[edit]

Several matters that may be worth covering on or lk'g to from User:PockBot:

  1. I believe it is now regarded at least as inevitable that the structure of the Cat system should not be a DAG. Is it feasible to provide a brief statement on the bot's reaction to fully traversing a cycle?
  2. At some point, the Cat system was enhanced to dynamically explore on a single page the descendant (quasi-)tree of a Cat. But can PockBot provide usable output reflecting the full descendant (quasi-)tree that it visits for a given request?
  3. Long preface: It would AFAI can see require significant extensions of the Cat system if it were to support touring algorithms aimed at including only what i might call "common sense" descendants of a Cat. (For example, Spain is probably not be a child of Category:Countries, but is surely a descendant, and is also a common-sense descendant bcz it is a country. On the other hand Category:Spain surely has many descendant articles whose topics are not countries but rather aspects of the country of Spain: e.g., Bullfighting is presumably a descendant, but not a common-sense one, of Category:Countries.
Short question: Can/could PockBot be instructed against following Cat memberships that would include non-common-sense descendants? -- i suppose with some syntax like
Exclude/Include <Cat>/<Catlist> [Except <Cat>/<Catlist>]
and an include-the-parent-but-not-the-children option, if that's not implicit in what i said.

--Jerzyt 23:38, 31 January 2007 (UTC)[reply]

Hi. PockBot's operation works as follows - it fetches a list of all subcats in a category, and then all subcats of each of those subcats, and then again and so on. Then it starts fetching articles on a hierarchical basis (ie, first from the master cat, then fromt he subcats, then the sub-sub-cats and so on).
PockBot is a bot and so is stupid. It would be hopeless to try and program it to make common-sense judgements about category descendants. I have noticed the phenomenon you describe of sub-sub-cats being related to their parent cat, but not to that parent's parent's cat. However, in most cases, I have found that this is simply a case of mis-categorisation at some level, and something I have then corrected. In a lesser number of cases I have found that it is not miscategorisation pre se, but rather different interpretations of what a category should be. For example, for the category "People", I would expect to find only real historical or contemporary human persons. However, subcats might contain cartoon, or fictional people such as "Jessica Rabbit" and "Zorro". This is really more a problem to do with category names being too vague, and one person interpreting people as "real people, alive or dead" and another as "anything that can be considered a person, fictional or real"
In the ideal world, based on my experience with PockBot, I would like earch article to be placed in one, and only one, hierarchical category, and for each category to have an auxiliary name or tag that states what elements should be in it. For example, "bullfighting", should NOT be listed under a subcat of Spain, because its parent is a country, it should clearly only be listed under "sports" or similar.
In conclusion, PockBot shows up a lot of the flaws with the current categorisation system, but sadly doesn't really have a way of working around them.
Would it be useful to have an "ignore subcats and get only articles in root cat" option?
Thanks - PocklingtonDan 07:19, 1 February 2007 (UTC)[reply]

Cat People

[edit]

How bad an idea would it be to run PockBot against Category:People, in light of

  1. the unknown number and size of cycles,
  2. performance impact, given the 6-digit scale of real people descended from Category:People (Category:Living people alone has 152K entries), and
  3. the difficulty of weeding out descendants, on that scale, that are not real people?

--Jerzyt

Hi. It wouldn't be a bad idea so much as it wouldn't be particularly useful to you - PockBot has code built in so that it will cut out after a cycle if the article number has reached a certain threshold (i think about 2000). It would then notify you of the articles fetched thus far and that this reresented a subset of all the articles in that category. So you would see the list and status of a certain number of articles but, for a category of that size, nowhere near all of them. The bot is only really intended to be used on smaller, manageable cats, in order to give an indication of article statuses that can then be investigated or graded manually etc. - PocklingtonDan 07:04, 1 February 2007 (UTC)[reply]

This is a automated to all bot operators

[edit]

Please take a few moments and fill in the data for your bot on Wikipedia:Bots/Status Thank you Betacommand (talkcontribsBot) 19:44, 12 February 2007 (UTC)[reply]

Listing subcategories without the articles in them

[edit]

Over at WT:UCFD there is a plan to look at all existing user categories in order to spot structural issues and to propose naming conventions. User:Jc37 has volunteered to browse through every user category, but there are a lot of them, so I'm wondering whether a bot like PockBot could go through Category:Wikipedians and dump all the subcategories into an indented list, without listing the individual user pages that belong to those subcategories. This is the only bot I have found that performs a similar task, but if this is too different than what it does, I will ask at Wikipedia:Bot requests. –Pomte 17:51, 18 April 2007 (UTC)[reply]

PockBot could be easily modified to do this but I don't have the time. I can make source code available to anyone over at bot requests who would like to use my code to do this. Cheers - PocklingtonDan (talk) 15:51, 19 April 2007 (UTC)[reply]

doesn't work

[edit]

I tried to use PockBot on category's Category:Brisbane Broncos and Category:Brisbane Broncos rugby league players but it has been 24 hours now and nothing has appeared. SpecialWindler talk 08:09, 18 July 2007 (UTC)[reply]

Bug?

[edit]

Hello, just to let you know that I ran PockBot on the Category:Chess and I got the following error message:
"Thu Aug 16 11:07:03 2007] PockBot.cgi: thread failed to start: Can't call method "find_input" on an undefined value at /files/home2/thepaty/cgi-bin/PockBot.cgi line 443. " SyG 10:24, 16 August 2007 (UTC)[reply]

Number of pages counter

[edit]

I have been looking for a tool that could count the number of pages in a category. This bot seems to be a possible source for doing so with a minor modification. Would this be something you would consider doing? or writting a simpler bot that only counts the pages? Dbiel (Talk) 20:04, 26 September 2007 (UTC)[reply]

Bot off

[edit]

The bot is off —Preceding unsigned comment added by Rabbit67890 (talkcontribs) 20:24, 20 February 2008 (UTC)[reply]