Unplanned downtime 10/23/2012.
Hey folks, just an FYI on what happened on the server this morning. Looks like shortly before 10AM Eastern time, the server started to run out of memory. I'm still not entirely sure what caused the abnormal memory consumption, but I suspect one of the message queues on the server got backlogged. Either way, this log message appeared in the kernel logfile. ``` [7311113.986736] Out of memory: Kill process 11395 (beam.smp) score 227 or sacrifice child [7311113.992553] Killed process 11395 (beam.smp) total-vm:5712704kB, anon-rss:5262008kB, file-rss:100kB ``` beam.smp is the process name for our message queue between the various web processes and actors that make OpenStudy tick: RabbitMQ. When Rabbit failed, everything else took a nosedive and the site pretty much went down. Additionally, when we realized what was going on an tried to bring Rabbit back, we had difficulty getting it to start up correctly. Finally, we decided to blow away the data directory where it holds its queues and do some reconfiguration before bringing it back up for good. TL;DR The server with 23GB of memory, ran out of memory and things got sticky. But it's all good now, and we're going to try and figure out what caused the memory failure.
looks like you guys need a trip to TigerDirect
sorry about that, 23GB just isnt enough
I see the first log message says: "Out of memory: Kill process 11395 (beam.smp) score 227 or sacrifice child" Surely asking you to sacrifice your own child is going a bit too far! :D
CS people are known for their melodramatic ways
Join our real-time social learning platform and learn together with your friends!