Commit b544cfe0 authored by Iustin Pop's avatar Iustin Pop
Browse files

Reduce chance of ssh failures in verify cluster

The cluster verify builds a sorted list of nodes and passes that to all
the nodes (in parallel) for ssh checks. This means that for a cluster
with N nodes, there will be approximately N simultaneous connections to
the first node, then to the second node, etc. This, coupled with the
ssh daemon's “MaxStartups” parameter, can create false alarms about ssh
connectivity.

This patch randomizes the node list in the backend (therefore, each node
should have it's own order of ssh-ing to the other nodes) and the chance
of these alarms should be reduced.

Reviewed-by: ultrotter
parent 6c896e2f
......@@ -30,6 +30,7 @@ import stat
import errno
import re
import subprocess
import random
from ganeti import logger
from ganeti import errors
......@@ -200,6 +201,7 @@ def VerifyNode(what):
if 'nodelist' in what:
result['nodelist'] = {}
random.shuffle(what['nodelist'])
for node in what['nodelist']:
success, message = _GetSshRunner().VerifyNodeHostname(node)
if not success:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment