Page 1 of 9

B3 dies without errors

Posted: 13 Nov 2010, 10:15
by Cheeseboy
Hello all,

I've had the following occurring twice since I got my B3:
- WiFi service disappears
- It is no longer possible to connect via SSH
- It is not possible to shut it down using the tiny button on the back plate

When this is happening the LED is blue as normal.
The only way I have been able to resolve it is by pulling the power out of the B3 and then restart it.

When it comes up again, I find no errors in:
/var/log/messages
/var/log/syslog
/var/log/kern.log

This is the last activity in syslog prior to me pulling the plug:

Code: Select all

Nov 13 15:45:01 b3 /USR/SBIN/CRON[4085]: (root) CMD (test -x /usr/bin/php && /usr/bin/php /usr/share/horde3/scripts/alarms.php > /dev/null 2>&1)
Nov 13 15:45:01 b3 /USR/SBIN/CRON[4086]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 13 15:45:01 b3 /USR/SBIN/CRON[4087]: (root) CMD (test -x /usr/lib/web-admin/notify-dispatcher.pl && /usr/lib/web-admin/notify-dispatcher.pl)
I think I have modified the first of the 3 cron jobs to get rid of a ton of errors in the logs...

My questions:
- Any ideas of what might be causing this?
- If the cron process which I have redirected the output of to /dev/null crashes the entire system, will the redirection of the output cause me to not see the source of the problem in the logs?
- Any suitable flags I can set to enable relevant logging in kern.log or others?

Thankful for any input!

/Cheeseboy

EDIT:
I've removed the redirection from /etc/cron.d/bubba-horde just in case...

Re: B3 dies without errors

Posted: 13 Nov 2010, 12:48
by DanielM
Feels familiar. I also had this behaviour, my B3 did this three times in the first two days after I got it. I then got the tip from Tor to look at this thread: http://forum.excito.net/viewtopic.php?p=12592. After setting my min_free_kbytes to 4096 I haven't experienced any more hangings, my uptime is now 21 days.

Later I discovered though that I had the page allocation failures in my syslog, so I raised min_free_kbytes to 8192 which also made the errors disappear from the log.

/Daniel

Re: B3 dies without errors

Posted: 13 Nov 2010, 13:41
by Cheeseboy
Hi Daniel,

Thanks for your reply.
I saw your thread the other day and I had actually set it to 8192 before yesterdays hang/crash :-)
Now I'm glad that I had the same problem before I changed that, as I would otherwise have to remove it as a possible cause...

/Cheeseboy

Re: B3 dies without errors

Posted: 14 Nov 2010, 07:37
by Asad
I have had the same problem. It should be set to 8192 kbytes by default.

Re: B3 dies without errors

Posted: 14 Nov 2010, 07:58
by Cheeseboy
Just to make things clear:
I have had the problem I described before and after setting vm.min_free_kbytes = 8192
I did not see any kernel stacktraces in my logs when my problem occurred, so I'm not at all sure it is related to this setting or the problem of the original poster (in the thread that led me to introduce this setting on my system).

The B3 just stopped responding to any input and left no errors in log files.
So my question remains:
What should I do to enable more log output so I have some information about the problem when it chooses to occur again?

Thanks for your replies btw, they are much appreciated!

/Cheese

Re: B3 dies without errors

Posted: 14 Nov 2010, 13:18
by Ubi
I don't believe the cron process is causing your network to drop. But if you want to be sure just stop the cron process altogether and see if it improves.
One way of handling this is by creating a cron job that (hourly) pings some server, displays the ifconfig data and restarts the network. Output all data to >> /tmp/something.txt together with a timestamp and you can do some easy postmortem analysis without recompiling the kernel.

ubi

Re: B3 dies without errors

Posted: 14 Nov 2010, 13:27
by [vEX]
Well, /tmp is (at least on normal linux dists) flushed clean on reboot so the output should preferably be saved somewhere else.

Re: B3 dies without errors

Posted: 14 Nov 2010, 16:02
by Ubi
no it's not. I don't know where you got this knowledge but there is no major distro that empties out /tmp by default on boot.

ubi

Re: B3 dies without errors

Posted: 15 Nov 2010, 07:57
by Cheeseboy
Thanks for your suggestion, Ubi.
I'll consider it. Or just wait and see if it happens again :-)

Re: B3 dies without errors

Posted: 15 Nov 2010, 12:54
by RandomUsername
Ubi wrote:no it's not. I don't know where you got this knowledge but there is no major distro that empties out /tmp by default on boot.

ubi
I can't speak for any other distro but this definitely happens on my B2.

[EDIT]It would also appear to happen on Suse at least.

Re: B3 dies without errors

Posted: 15 Nov 2010, 12:59
by Ubi
I've been running suse up untill 9.3 only so I don''t know about the currrent versions. RHEL does not do this. THe proper way of cleaning tmp is by doing a find with a timestamp of 3 days or so and remove that. Cleaning on boot is silly, but maybe the excito crew hacked this in. If they did then indeed tmp is not a good place. try /root instead :)

ubi

Re: B3 dies without errors

Posted: 15 Nov 2010, 13:52
by [vEX]
At least Arch Linux clears out /tmp on boot (and I wouldn't want it any other way).

Re: B3 dies without errors

Posted: 15 Nov 2010, 15:37
by Ubi
that's really nice. What's the output of that script (at least I thought this was still a thread about suicidal bubbas)?

Re: B3 dies without errors

Posted: 18 Nov 2010, 11:52
by philgaskin
Can anyone explain what the cron jobs are doing? I am having problems losing routing between LAN and WAN periodically (particularly after or during streaming large files to the LAN). When I do lose routing, it is without any errors in the syslog but it seems to always be at the same time as the following cron jobs (which appear to be the same as cheeseboy's).
Nov 18 16:35:01 B3 /USR/SBIN/CRON[17944]: (root) CMD (test -x /usr/lib/web-admin/notify-dispatcher.pl && /usr/lib/web-admin/notify-dispatcher.pl)
Nov 18 16:35:01 B3 /USR/SBIN/CRON[17945]: (root) CMD (test -x /usr/bin/php && /usr/bin/php /usr/share/horde3/scripts/alarms.php)
On the odd occasion wifi is lost for a few minutes too and on one occasion I was unable to get into the B3 at all. I have tried increasing my vm.min_free_kbytes to 8192 but it is still occuring.
I am not using any of B3's email services (all turned off) so I think I could probably stop the horde cron job although I just want to know the consequences and likely impact.

Re: B3 dies without errors

Posted: 18 Nov 2010, 14:24
by Asad
I am having the exact problem. It also happens during rsync backup and the only way is to reboot the device manually by taking out the power.

Edit:

the problem might be just that it doesn't handle heavy load, even if the cpu doesn't run high, the memory could be filled up, but that should have indicated something in syslog.