luxi: close and reopen the socket on errors
This is less of an actual issue for regular gnt-* clients, but it's easily reproducible with burnin and possible with RAPI (depending on how the program uses luxi.Client(s)). In case of burnin, if we interrupt the client (^C) while it polls the job, it will abort and raise an error. After that, burnin issues a remove instance job, and at this point, we send the submit job (remove) call but the first thing we read from the socket will be the response to the previous poll job request, since that was queued already from the master. To solve this, whenever we detect an error in Transport.Call(), we close that transport and re-create a new one, to start anew. The other alternative would be to introduce a sequence to the protocol, but this is something that would be design-level change and it's not recommended at this stage. Reviewed-by: imsnah
Please register or sign in to comment