User talk:DamianZaremba/Archives/2011/August
This is an archive of past discussions with User:DamianZaremba. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
CBNG hasn't reverted anything for over 4 days
Letting you know per the notice here. I've also alerted Rich. —SMALLJIM 09:27, 17 August 2011 (UTC)
- This should be running again, I've also made some changes to how it runs so hopefully it will re-start its self if it crashes. - DamianZaremba (talk • contribs) 10:05, 17 August 2011 (UTC)
- Thanks! I'll notify you quicker if I see that it's down again. Some means of automatically alerting yourself when it fails may be a good idea - I suspect that everyone who cares assumed it was offline for retraining or maintenance or something like that. After all, it does provide an invaluable service these days. —SMALLJIM 12:13, 17 August 2011 (UTC)
- If the processes crash now then they will attempt to auto restart and failing that will email me and Rich. It could still "break" in a number of ways which is much harder to quickly see such as the API not returning good data, UDP sockets not binding properly on the bot etc. I'll try and come up with some sort of test that can cover all the components but it is quite complex. Possibly even a cronjob to see if it has made any edits in the past hour would work but would be not be so quick to alert. - DamianZaremba (talk • contribs) 12:39, 17 August 2011 (UTC)
- I'd say a cronjob sounds best - simple and fast enough. Periodically checking the timestamp of its latest edit via the API would work, I think. —SMALLJIM 13:36, 17 August 2011 (UTC)
- This script is now running hourly and seems to work ok, updates at User:ClueBot NG/running. - DamianZaremba (talk • contribs) 14:57, 17 August 2011 (UTC)
- Nice one. Hope it doesn't wake you up in the middle of the night :) Is the updates page for anything other than a visual confirmation that CBNG is still running? If not, I'd suggest that looking at CBNG's contribs would be just as easy, making it perhaps unnecessary.
Out of interest, do you ever use Mediawiki::API to wrap LWP? It supports retries and error reporting. —SMALLJIM 18:16, 17 August 2011 (UTC) - Nope, it's there as a sanity check and just because I can. Never used the mediawiki package as I've never done any heavy lifting with mediawiki just simple page grabbing/uploading. - DamianZaremba (talk • contribs) 18:27, 17 August 2011 (UTC) - from phone
- Nice one. Hope it doesn't wake you up in the middle of the night :) Is the updates page for anything other than a visual confirmation that CBNG is still running? If not, I'd suggest that looking at CBNG's contribs would be just as easy, making it perhaps unnecessary.
- This script is now running hourly and seems to work ok, updates at User:ClueBot NG/running. - DamianZaremba (talk • contribs) 14:57, 17 August 2011 (UTC)
- I'd say a cronjob sounds best - simple and fast enough. Periodically checking the timestamp of its latest edit via the API would work, I think. —SMALLJIM 13:36, 17 August 2011 (UTC)
- If the processes crash now then they will attempt to auto restart and failing that will email me and Rich. It could still "break" in a number of ways which is much harder to quickly see such as the API not returning good data, UDP sockets not binding properly on the bot etc. I'll try and come up with some sort of test that can cover all the components but it is quite complex. Possibly even a cronjob to see if it has made any edits in the past hour would work but would be not be so quick to alert. - DamianZaremba (talk • contribs) 12:39, 17 August 2011 (UTC)
- Thanks! I'll notify you quicker if I see that it's down again. Some means of automatically alerting yourself when it fails may be a good idea - I suspect that everyone who cares assumed it was offline for retraining or maintenance or something like that. After all, it does provide an invaluable service these days. —SMALLJIM 12:13, 17 August 2011 (UTC)
Down again
Well that was a worst case scenario: CBNG stopped just after 0730 and your script should have alerted you at 0900. —SMALLJIM 08:40, 18 August 2011 (UTC)
- Indeed it did, the bot auto restarted as it should have done:
2011-08-18 09:53:47,117 INFO exited: cluebotng_bot (terminated by SIGKILL; not expected) 2011-08-18 09:53:48,150 INFO spawned: 'cluebotng_bot' with pid 4564 2011-08-18 09:53:49,414 INFO success: cluebotng_bot entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2011-08-18 09:54:22,002 INFO exited: cluebotng_bot (terminated by SIGKILL; not expected) 2011-08-18 09:54:23,006 INFO spawned: 'cluebotng_bot' with pid 4832 2011-08-18 09:54:24,149 INFO success: cluebotng_bot entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
, the interesting point is the bot is actually running now but looking at the output on every revert the bot is just getting an API error back. I'm going to have a dig though the code and try to figure what on earth is going on or poke Cobi when its day for him later. - DamianZaremba (talk • contribs) 11:04, 18 August 2011 (UTC)
- Ok, I fixed an issue with the code that was caused by ipv6 being stupid (the server is dual stacked). Should be functioning fine again now. - DamianZaremba (talk • contribs) 12:22, 18 August 2011 (UTC)
- Oh and the cause of the bot crashing was oom killer killing it - going to see about moving it to a server with 4times more ram but its no straightforward task. I blame php for being so fat ;) - DamianZaremba (talk • contribs) 12:24, 18 August 2011 (UTC)
- Thanks for the updates. I had to look up oom killer, which I'm sure didn't exist in my Linux days. I guess memory shortage is going to cause all sort of ongoing instabilities, so I hope you can fix that. Could you set /proc/<pid>/oomadj to OOM_DISABLE and let something else break instead? ;) —SMALLJIM 15:29, 18 August 2011 (UTC)
- I'd rather the bot crash and supervisord restart it a few seconds later than it crash the entire server ;) - DamianZaremba (talk • contribs) 15:33, 18 August 2011 (UTC)
- Thanks for the updates. I had to look up oom killer, which I'm sure didn't exist in my Linux days. I guess memory shortage is going to cause all sort of ongoing instabilities, so I hope you can fix that. Could you set /proc/<pid>/oomadj to OOM_DISABLE and let something else break instead? ;) —SMALLJIM 15:29, 18 August 2011 (UTC)
- Oh and the cause of the bot crashing was oom killer killing it - going to see about moving it to a server with 4times more ram but its no straightforward task. I blame php for being so fat ;) - DamianZaremba (talk • contribs) 12:24, 18 August 2011 (UTC)
- Ok, I fixed an issue with the code that was caused by ipv6 being stupid (the server is dual stacked). Should be functioning fine again now. - DamianZaremba (talk • contribs) 12:22, 18 August 2011 (UTC)
CBNG not happy
Hi Damian - as I'm sure you know, it's down for the second time today. Is this still the oom issue?
BTW, it would be good if you could add "up" or "down" to the edit summary of the automated posts made to /running, as this would benefit anyone with the page in their watchlist (i.e. me!)
Reducing the threshold before a "not running" warning is posted might help too: I know it's important to avoid false alarms, but has there ever been more than 15 mins between reverts when CBNG is up? At present any failures after half past the hour have to wait another hour before they get reported. —SMALLJIM 13:07, 26 August 2011 (UTC)
- I've dropped the check to 30min with a 15min threshold. It seems the Wikipedia API was "randomly" timing out causing the bot to not startup or do anything correctly. It seems to be working as expected now however I'll have a poke around the logs and try to figure out what's been going on in a while when I'm actually awake.- DamianZaremba (talk • contribs) 14:15, 26 August 2011 (UTC)
- Thanks again - and for the edit summary change. Sorry if I disturbed you - I imagined you sitting at work waiting for the weekend - one never knows... It's a mighty burden you're apparently carrying alone: ensuring that the world's largest encyclopaedia stays free of vandalism :) —SMALLJIM 14:44, 26 August 2011 (UTC)
hi i,ll use the computer tommrow okay — Preceding unsigned comment added by Sabrina1908 (talk • contribs) 23:58, 28 August 2011 (UTC)
erm listening talk to me about stuff — Preceding unsigned comment added by Sabrina1908 (talk • contribs) 12:20, 29 August 2011 (UTC)
A cupcake for you!
meet me in my talk Sabrina1908 (talk) 12:22, 29 August 2011 (UTC) |
It wasnt vandalism
i was trying to send him a message — Preceding unsigned comment added by SomedayCameSuddenly (talk • contribs) 02:08, 31 August 2011 (UTC)