Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#397 closed defect (fixed)

rasmgr segfaults when restarting rasserver

Reported by: Dimitar Misev Owned by: mrusu
Priority: critical Milestone: 8.4.4
Component: rasmgr Version: 8.4
Keywords: Cc: Peter Baumann, abeccati
Complexity: Medium

Description

This is bug probably introduced by the p2p patch changeset:90fa2688a862f227e7f52b0423041c185f1eed34

Seems to happen when the countdown/timeout is run out and a rasserver is to be restarted. To test run source:systemtest/testcases_petascope/test_wcps it will fail around the 48/49th test if the rasmgr.conf is with default countdown parameters.

Workaround: set countdown and timeout in rasmgr.conf to some large numbers.

Segfault stack trace:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000423811 in MasterComm::askOutpeer (this=0x6c1b80 <masterCommunicator>, peer=1,
    outmsg=0x7fff64b70a40 "POST peerrequest HTTP/1.1\r\nAccept: text/plain\r\nUserAgent: RasMGR/1.0\r\nAuthorization: ras  ras rasguest:8e70a429be359b6dace8b5b2500dedb0\r\nContent-length: 47\r\n\r\nhifi RASBASE RNP ro 2130706689:1369740188"...) at rasmgr_master_nb.cc:877
877        struct hostent *hostinfo = gethostbyname(config.outpeers[peer]);
(gdb) bt
#0  0x0000000000423811 in MasterComm::askOutpeer (this=0x6c1b80 <masterCommunicator>, peer=1,
    outmsg=0x7fff64b70a40 "POST peerrequest HTTP/1.1\r\nAccept: text/plain\r\nUserAgent: RasMGR/1.0\r\nAuthorization: ras  ras rasguest:8e70a429be359b6dace8b5b2500dedb0\r\nContent-length: 47\r\n\r\nhifi RASBASE RNP ro 2130706689:1369740188"...) at rasmgr_master_nb.cc:877
#1  0x00000000004231fd in MasterComm::getFreeServer (this=0x6c1b80 <masterCommunicator>, fake=false, frompeer=false) at rasmgr_master_nb.cc:802
#2  0x000000000042178f in MasterComm::processRequest (this=0x6c1b80 <masterCommunicator>, currentJob=...) at rasmgr_master_nb.cc:482
#3  0x00000000004206e2 in MasterComm::processJob (this=0x6c1b80 <masterCommunicator>, currentJob=...) at rasmgr_master_nb.cc:214
#4  0x00000000004200ce in MasterComm::Run (this=0x6c1b80 <masterCommunicator>) at rasmgr_master_nb.cc:165
#5  0x000000000040d066 in main (argc=1, argv=0x7fff64b71548, envp=0x7fff64b71558) at rasmgr_main.cc:172 

Change History (4)

comment:1 by Dimitar Misev, 11 years ago

Milestone: 8.4.38.4.4

comment:2 by mrusu, 11 years ago

Resolution: fixed
Status: newclosed

comment:3 by ungarj, 11 years ago

FYI, ticket #395 seems to be affected by this issue as well.

comment:4 by mrusu, 11 years ago

Yes, hopefully the fix for this ticket solves the other problem as well, as they seem to be both caused by the same bug in rasmgr's new logic.

Note: See TracTickets for help on using tickets.