Forum OpenACS Q&A: Re: Fatal: received fatal signal 11

29: Re: Fatal: received fatal signal 11 - new error after years! (response to 1)

Posted by Matthew Coupe on 09/02/09 10:41 AM

This has just got a whole lot more bizarre.

We have a development machine which was working fine before which has now just started having the exact same problem. The development machine is probably 6 months old data and has had very little done to it.

could there possibly be some sort of expiration date

We have built an entirely new server and restored from an older backup and that is still giving us a Fatal Error 11.

Going to look into the core dump diagnostics now.

Cheers for all of the help so far.

30: Re: Fatal: received fatal signal 11 - new error after years! (response to 29)

Posted by Matthew Coupe on 09/02/09 10:44 AM

Sorry, I didn't finish this sentence.

Could there possibly be some sort of expiration date on a notification being sent or something not happening which is causing a problem? The only pattern I can see between the live and development machines is that a similar amount of time has passed!

31: Re: Fatal: received fatal signal 11 - new error after years! (response to 30)

Posted by Matthew Coupe on 09/02/09 12:27 PM

>Some thing very similar can happen when you have news aggregator instaled and the sources table is so big that the system can't load it into the memory, causing the the server to get a signal 11 and crash.

Just saw this and have had a look. We only had two entries in the na_sources table so I have removed them both along with all entries in the na_items table. I thought this may go with my theory that the only factor that seems consistent is the passing of time!!

With regards to the coredump. I have set the coredump to be unlimited and have set it so that all of the dumps go in the folder /coredumps on my server. When It crashed however it did not dump anything in that folder. Do I need to recompile Aolserver with some flags set? you mention that the dump will happen in the application folder. I have aolserver installed /usr/local/aolserver and the actually OpenACS instance at /var/lib/aolserver. Where abouts will this dump file go? For information we are using Red Hat ES 4.

Cheers again.

32: Re: Fatal: received fatal signal 11 - new error after years! (response to 31)

Posted by Matthew Coupe on 09/02/09 01:18 PM

What a ridiculous scenario. We had 2 news feeds pointing to the BBC website and they were causing the crash. I looked at the scheduled procs on the monitoring application and saw the exact point when it crashed was when the news aggregator updated.

I removed them both and the backup server worked fine as it passed through the proc. It's now been up for an hour after 10 minutes crashes.

Unbelievable really, there were 55,000 rows in the na_items table but only 2 on the na_sources table. Perhaps there was a dodgy entry on the site that was being downloaded?

Anyway, thanks for all of the help and we will work on the other recommendations for tweaking and fixing the duplicate email address error. That problem is caused by a single proc which attempts to grab a user id by email address (very wrong way of doing something!).

Cheers

33: Re: Fatal: received fatal signal 11 - new error after years! (response to 32)

Posted by Eduardo Santos on 09/02/09 01:34 PM

Hi Matthew,

Unfortunatelly, this issue is quite common. News aggregator UI doesn't have the option to delete old items, and this kind of thing usually happens when your server is significantly old to have lots of news feeds.

We have solved the problem building an UI where you can delete the feed entries. Take a look at your code and see if it can be usefull to you: http://svn.softwarepublico.gov.br/svn/openacs/branches/spb-2.0/packages/news-aggregator/

Forum OpenACS Q&A: Re: Fatal: received fatal signal 11 - new error after years!