Opened 8 years ago

Closed 6 years ago

#1276 closed defect (fixed)

rasserver dies with segfault when configured with a postgres backend

Reported by: Alex Dumitru Owned by: Dimitar Misev
Priority: major Milestone: 9.7
Component: rasserver Version: development
Keywords: Cc: Peter Baumann
Complexity: Hard

Description

Rasdaman has segfaults after a couple of queries when configured with the postgresql backend.
Logs can be found here: http://codereview.rasdaman.org/jenkins/job/vagrant-exec/ws/logs/build146.tar.gz and a new job is available on jenkins for testing this setup.

Stacktrace:

 [INFO] - 24/03/2016 10:50:05.101563: Segmentation fault caught, stacktrace:
 [INFO] - 24/03/2016 10:50:05.108553: [bt]: (1) /usr/lib/libpq.so.5 (??:0) - PQtransactionStatus+0xe [0x7f237a9a8f6e]
 [INFO] - 24/03/2016 10:50:05.113760: [bt]: (2) /usr/lib/libecpg.so.6 (??:0) - +0x56ad [0x7f237a68e6ad]
 [INFO] - 24/03/2016 10:50:05.117092: [bt]: (3) /usr/lib/libecpg.so.6 (??:0) - ECPGdo+0x18d [0x7f237a68f0ad]
 [INFO] - 24/03/2016 10:50:05.117117: [bt]: (4) /bin/rasserver() [0x5a75cd]
 [INFO] - 24/03/2016 10:50:05.117120: [bt]: (5) /bin/rasserver() [0x5caef9]
 [INFO] - 24/03/2016 10:50:05.117123: [bt]: (6) /bin/rasserver() [0x475322]
 [INFO] - 24/03/2016 10:50:05.117125: [bt]: (7) /bin/rasserver() [0x464b9c]
 [INFO] - 24/03/2016 10:50:05.117127: [bt]: (8) /bin/rasserver() [0x6380bb]
 [INFO] - 24/03/2016 10:50:05.117129: [bt]: (9) /bin/rasserver() [0x652d5e]
 [INFO] - 24/03/2016 10:50:05.117131: [bt]: (10) /bin/rasserver() [0x670172]
 [INFO] - 24/03/2016 10:50:05.117134: [bt]: (11) /bin/rasserver() [0x6c4422]
 [INFO] - 24/03/2016 10:50:05.117136: [bt]: (12) /bin/rasserver() [0x6cc832]
 [INFO] - 24/03/2016 10:50:05.117141: [bt]: (13) /bin/rasserver() [0x6cca23]
 [INFO] - 24/03/2016 10:50:05.121870: [bt]: (14) /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (??:?) - +0xb1a60 [0x7f2377fa4a60]
 [INFO] - 24/03/2016 10:50:05.124316: [bt]: (15) /lib/x86_64-linux-gnu/libpthread.so.0 (??:0) - +0x8182 [0x7f2379503182]
 [INFO] - 24/03/2016 10:50:05.127571: [bt]: (16) /lib/x86_64-linux-gnu/libc.so.6 (??:?) - clone+0x6d [0x7f237770c47d]
 [INFO] - 24/03/2016 10:50:05.127593: rasserver terminated.

Change History (17)

comment:1 by George Merticariu, 8 years ago

Owner: changed from Alex Dumitru to George Merticariu
Status: newassigned

comment:2 by George Merticariu, 8 years ago

Status: assignedaccepted

comment:3 by Dimitar Misev, 7 years ago

Owner: changed from George Merticariu to Dimitar Misev
Status: acceptedassigned

Has this been fixed?

comment:4 by Dimitar Misev, 7 years ago

Milestone: 9.29.3

comment:5 by Dimitar Misev, 7 years ago

Milestone: 9.39.4

comment:6 by Dimitar Misev, 7 years ago

Owner: changed from Dimitar Misev to bbell

comment:7 by Dimitar Misev, 7 years ago

Please check how to enable postgres with cmake (and document if it isn't), and then check about the segfault problem.

comment:8 by bbell, 7 years ago

Complexity: MediumHard

Enabling postgres with cmake (v3+) (see http://www.rasdaman.org/wiki/InstallFromSource/cmake):

in your build directory…
cmake …<arguments>… -DDEFAULT_DB=postgresql
make
make install

remark: if your "service" name is postgresql-version#, you will run into some trouble here. in linux, the service name should be postgresql.

Regarding the segfault: this seems to happen in different places, at different times. Usually, this happens for requests which take a longer time. For example, if attaching gdb to rasserver, and running some data ingestion (e.g. insert into <args>), then backtraces of the segfaults will point to the objectbroker files.

An approach towards attacking this ticket in the future: run git bisect with ticket:945's patch set to good. The associated sha1: d302f450a9837388754199cd4a051561e2d08f64

comment:9 by Dimitar Misev, 7 years ago

Milestone: 9.49.5

comment:10 by Dimitar Misev, 7 years ago

Owner: changed from bbell to Dimitar Misev

in reply to:  8 comment:11 by Dimitar Misev, 7 years ago

Replying to bbell:

Enabling postgres with cmake (v3+) (see http://www.rasdaman.org/wiki/InstallFromSource/cmake):

in your build directory…
cmake …<arguments>… -DDEFAULT_DB=postgresql

it's actually -DDEFAULT_BASEDB=postgresql

comment:12 by Dimitar Misev, 7 years ago

Cc: Peter Baumann added

On further inspection, it seems like the issue only happens with rasnet. Compiling —with-protocol=rnp and postgresql works with no issue. It also works compiled with rasnet but using directql instead of rasql.

The problem is that it segfaults randomly within libpq, in places where you wouldn't expect any segfault. It's very unclear at the moment what the issue is.

Last edited 7 years ago by Dimitar Misev (previous) (diff)

comment:13 by Peter Baumann, 7 years ago

just a suspicion: pg dies because the stack gets overwritten because some other ptr on the stack gets overwritten because of some memory overwrite beyond boundaries → try valgrind?

comment:14 by Dimitar Misev, 7 years ago

valgrind is not possible unfortunately as it only happens when the query goes through rasnet.
No issues with directql. But I noticed that it tends to happen starting from the second query, so it's possible that something is not cleaned up after the first query.

comment:15 by Vlad Merticariu, 6 years ago

Milestone: 9.59.6

comment:16 by Dimitar Misev, 6 years ago

Milestone: 9.6Future

comment:17 by Dimitar Misev, 6 years ago

Milestone: Future9.7
Resolution: fixed
Status: assignedclosed

It seems to have been resolved meanwhile, at least on Debian buster I don't get the issue anymore.

Note: See TracTickets for help on using tickets.