Opened 8 years ago

Closed 8 years ago

#1059 closed defect (fixed)

rasnet: query fails when executed right after start_rasdaman.sh

Reported by: Dimitar Misev Owned by: Alex Toader
Priority: critical Milestone: 9.1.x
Component: rasnet Version: development
Keywords: Cc: George Merticariu, Alex Dumitru, Vlad Merticariu, Peter Baumann
Complexity: Medium

Description

I'm writing a script which automatically restarts rasdaman, and it took me awhile to figure out the issue. See below on how to reproduce.

$ start_rasdaman.sh && rasql -q 'select c from RAS_COLLECTIONNAMES AS c' --out string -d RASBASE

----------------Checking starting Rasdaman Servers----------------

start_rasdaman.sh: starting rasdaman server complex...
nohup: appending output to ‘nohup.out’
start_rasdaman.sh: starting all rasdaman servers...


start_rasdaman.sh: done.

rasql: rasdaman query tool v1.0, rasdaman v9.1.0-g1c8a741 -- generated on 04.11.2015 16:50:03.
opening database RASBASE at localhost:7001...terminate called after throwing an instance of 'std::runtime_error'
  what():  There is no available server for the client.
Aborted (core dumped)

Change History (11)

comment:1 by Dimitar Misev, 8 years ago

Adding

sleep 0.1

after start_rasdaman.sh fixes the issue, but it would be better to fix it within rasdaman - once start_rasdaman.sh is executed, rasdaman should be ready.

Last edited 8 years ago by Dimitar Misev (previous) (diff)

comment:2 by Alex Toader, 8 years ago

Cc: Peter Baumann added

rasmgr is ready, but the rasserver processes have not yet completed the startup sequence.

The more important question is: What should happen in rasmgr when a request for a new server is received but there are no available servers? Should the client wait until a server becomes available? Should the rasmgr attempt to find an available server a fixed number of times and if no server is found return an error?

comment:3 by Dimitar Misev, 8 years ago

Perhaps rasmgr should wait for all rasservers to finish starting before it finishes its own startup?

comment:4 by Dimitar Misev, 8 years ago

Also notice the Aborted (core dumped) at the end, not quite sure whether it happens within rasnet.

comment:5 by Peter Baumann, 8 years ago

"core dump" is critical. The startup sequence is indeed depending on executing the config commands first. We might think about a specific command for enabling/disabling queries (not server control), question is whether it is worth the effort. Service ops just need to wait for 1 sec at most before sending queries.

Wrt Alex's question: currently we have a synchronous paradigm. Once we go async we could think about functionality as described.

in reply to:  2 comment:6 by Dimitar Misev, 8 years ago

Replying to atoader:

rasmgr is ready, but the rasserver processes have not yet completed the startup sequence.

Yes I understand this, but the script is start_rasdaman.sh and obviously rasdaman is not started after calling it. I think it would be best if rasmgr simply waits for the servers before it returns.

The more important question is: What should happen in rasmgr when a request for a new server is received but there are no available servers? Should the client wait until a server becomes available? Should the rasmgr attempt to find an available server a fixed number of times and if no server is found return an error?

This is a separate concern and should go into another ticket I'd say.

comment:7 by Peter Baumann, 8 years ago

indeed, please file a separate ticket for the dump.

comment:8 by Dimitar Misev, 8 years ago

Now, when you start a rasserver process, because there is no function to determine when the process has started, rasmgr waits for a registration message from the server announcing that it is online. This incurs a delay of several milliseconds. Also, this process is fully controlled by server manager inside rasmgr that has a thread that checks, at a set interval, the list of registered servers and starts or shutsdown servers.

Because the client request is instant and there is no waiting involved, there is no time for the process described above to take place.

Ok so in rasmgr's main() we just wait until a registration message has arrived before declaring that rasmgr is ready to receive messages from clients? Maybe I'm missing something though.

Alternatively if option 2. in #1061 is supported then that would work as well fixing this ticket.

comment:9 by Alex Toader, 8 years ago

When you start rasmgr from the script, you do not wait for any particular signal from rasmgr before you proceed. So even if I wait in rasmgr main for at least a server to register, it wouldn't change anything.

It looks like option 2 is a cleaner way to fix this.

comment:10 by Dimitar Misev, 8 years ago

Ah yes you're right, we start it in the background. Agreed about option 2.

comment:11 by Dimitar Misev, 8 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.