Commit a53772a0 authored by Klaus Aehlig's avatar Klaus Aehlig
Browse files

Merge branch 'stable-2.11' into master

* stable-2.11
  Update design doc wrt to improved SSL design
  Test node certificate renewal in QA
  Use node UUID as client certificate serial number
  Revert "Temporarily remove SSL changes from NEWS file"
  Revert "Disabling client certificate usage"
  Fix watcher tampering with instance userdown QA

	qa/ (trivial)
Signed-off-by: default avatarKlaus Aehlig <>
Reviewed-by: default avatarPetr Pudlak <>
parents a2a1a8ca 0565f862
......@@ -57,6 +57,11 @@ Incompatible/important changes
as well as the --enable-split-queries configuration option.
- Orphan volumes errors are demoted to warnings and no longer affect the exit
code of ``gnt-cluster verify``.
- RPC security got enhanced by using different client SSL certificates
for each node. In this context 'gnt-cluster renew-crypto' got a new
option '--renew-node-certificates', which renews the client
certificates of all nodes. After a cluster upgrade from pre-2.11, run
this to create client certificates and activate this feature.
New features
......@@ -29,6 +29,20 @@ This will carry out the steps described below in the section on upgrades from
way, specifiying the smaller version on the ``--to`` argument.
When upgrading to 2.11, first apply the instructions of ``2.11 and
above``. 2.11 comes with the new feature of enhanced RPC security
through client certificates. This features needs to be enabled after the
upgrade by::
$ gnt-cluster renew-crypto --new-node-certificates
Note that new node certificates are generated automatically without
warning when upgrading with ``gnt-cluster upgrade``.
2.1 and above
......@@ -105,8 +105,8 @@ distribution to master candidates only.
(Re-)Adding nodes to a cluster
According to ``design-node-add.rst``, Ganeti transfers the ssh keys to every
node that gets added to the cluster.
According to :doc:`design-node-add`, Ganeti transfers the ssh keys to
every node that gets added to the cluster.
We propose to change this procedure to treat master candidates and normal
nodes differently. For master candidates, the procedure would stay as is.
......@@ -192,7 +192,10 @@ in the design.
- Instead of using the same certificate for all nodes as both, server
and client certificate, we generate a common server certificate (and
the corresponding private key) for all nodes and a different client
certificate (and the corresponding private key) for each node.
certificate (and the corresponding private key) for each node. All
those certificates will be self-signed for now. The client
certificates will use the node UUID as serial number to ensure
uniqueness within the cluster.
- In addition, we store a mapping of
(node UUID, client certificate digest) in the cluster's configuration
and ssconf for hosts that are master or master candidate.
......@@ -248,6 +251,9 @@ Drawbacks of this design:
design (limiting ssh keys to master candidates), but it will be
eliminated with the second part of the design (separate ssh keys for
each master candidate).
- Even though this proposal is an improvement towards the previous
situation in Ganeti, it still does not use the full power of SSL. For
further improvements, see Section "Related and future work".
Alternative proposals:
......@@ -379,7 +385,7 @@ in this design so far. Also, other daemons than the ones mentioned so far
perform intra-cluster communication. Neither the keys nor the daemons will
be affected by this design for several reasons:
- The hmac key used by ConfD (see ``design-2.1.rst``): the hmac key is still
- The hmac key used by ConfD (see :doc:`design-2.1`): the hmac key is still
distributed to all nodes, because it was designed to be used for
communicating with ConfD, which should be possible from all nodes.
For example, the monitoring daemon which runs on all nodes uses it to
......@@ -392,7 +398,7 @@ be affected by this design for several reasons:
RPC requests is maintained with this design.
- The rapi SSL key certificate and rapi user/password file 'rapi_users' is
already only copied to the master candidates (see ``design-2.1.rst``,
already only copied to the master candidates (see :doc:`design-2.1`,
Section ``Redistribute Config``).
- The spice certificates are still distributed to all nodes, since it should
......@@ -407,9 +413,51 @@ be affected by this design for several reasons:
Related and Future Work
Ganeti RPC calls are currently done without server verification.
Establishing server verification might be a desirable feature, but is
not part of this design.
There a couple of suggestions on how to improve the SSL setup even more.
As a trade-off wrt to complexity and implementation effort, we did not
implement them yet (as of version 2.11) but describe them here for
future reference.
- All SSL certificates that Ganeti uses so far are self-signed. It would
increase the security if they were signed by a common CA. There is
already a design doc for a Ganeti CA which was suggested in a
different context (related to import/export). This would also be a
benefit for the RPC calls. See design doc :doc:`design-impexp2` for
more information. Implementing a CA is rather complex, because it
would mean also to support renewing the CA certificate and providing
and supporting infrastructure to revoke compromised certificates.
- An extension of the previous suggestion would be to even enable the
system administrator to use an external CA. Especially in bigger
setups, where already an SSL infrastructure exists, it would be useful
if Ganeti can simply be integrated with it, rather than forcing the
user to use the Ganeti CA.
- A lighter version of using a CA would be to use the server certificate
to sign the client certificate instead of using self-signed
certificates for both. The probleme here is that this would make
renewing the server certificate rather complicated, because all client
certificates would need to be resigned and redistributed as well,
which leads to interesting chicken-and-egg problems when this is done
via RPC calls.
- Ganeti RPC calls are currently done without checking if the hostname
of the node complies with the common name of the certificate. This
might be a desirable feature, but would increase the effort when a
node is renamed.
- The typical use case for SSL is to have one certificate per node
rather than one shared certificate (Ganeti's noded server certificate)
and a client certificate. One could change the design in a way that
only one certificate per node is used, but this would require a common
CA so that the validity of the certificate can be established by every
node in the cluster.
- With the proposed design, the serial numbers of the client
certificates are set to the node UUIDs. This is technically also not
complying to how SSL is supposed to be used, as the serial numbers
should reflect the enumeration of certificates created by the CA. Once
a CA is implemented, it might be reasonable to change this
accordingly. The implementation of the proposed design also has the
drawback of the serial number not changing even if the certificate is
replaced by a new one (for example when calling ``gnt-cluster renew-
crypt``), which also does not comply to way SSL was designed to be
.. vim: set textwidth=72 :
.. Local Variables:
......@@ -1206,6 +1206,8 @@ def GetCryptoTokens(token_requests):
if token_type == constants.CRYPTO_TYPE_SSL_DIGEST:
if action == constants.CRYPTO_ACTION_CREATE:
# extract file name from options
cert_filename = None
if options:
cert_filename = options.get(constants.CRYPTO_OPTION_CERT_FILE)
......@@ -1216,8 +1218,25 @@ def GetCryptoTokens(token_requests):
raise errors.ProgrammerError(
"The certificate file name path '%s' is not allowed." %
# extract serial number from options
serial_no = None
if options:
serial_no = int(options[constants.CRYPTO_OPTION_SERIAL_NO])
except ValueError:
raise errors.ProgrammerError(
"The given serial number is not an intenger: %s." %
except KeyError:
raise errors.ProgrammerError("No serial number was provided.")
if not serial_no:
raise errors.ProgrammerError(
"Cannot create an SSL certificate without a serial no.")
True, cert_filename,
True, cert_filename, serial_no,
"Create new client SSL certificate in %s." % cert_filename)
......@@ -3689,7 +3708,7 @@ def CreateX509Certificate(validity, cryptodir=pathutils.CRYPTO_KEYS_DIR):
(key_pem, cert_pem) = \
min(validity, _MAX_SSL_CERT_VALIDITY))
min(validity, _MAX_SSL_CERT_VALIDITY), 1)
cert_dir = tempfile.mkdtemp(dir=cryptodir,
prefix="x509-%s-" % utils.TimestampForFilename())
......@@ -138,7 +138,7 @@ def GenerateClusterCrypto(new_cluster_cert, new_rapi_cert, new_spice_cert,
# pylint: disable=R0913
# noded SSL certificate
new_cluster_cert, nodecert_file,
new_cluster_cert, nodecert_file, 1,
"Generating new cluster certificate at %s" % nodecert_file)
# confd HMAC key
......@@ -153,7 +153,7 @@ def GenerateClusterCrypto(new_cluster_cert, new_rapi_cert, new_spice_cert,
new_rapi_cert, rapicert_file,
new_rapi_cert, rapicert_file, 1,
"Generating new RAPI certificate at %s" % rapicert_file)
......@@ -173,7 +173,7 @@ def GenerateClusterCrypto(new_cluster_cert, new_rapi_cert, new_spice_cert,
logging.debug("Generating new self-signed SPICE certificate at %s",
(_, cert_pem) = utils.GenerateSelfSignedSslCert(spicecert_file)
(_, cert_pem) = utils.GenerateSelfSignedSslCert(spicecert_file, 1)
# Self-signed certificate -> the public certificate is also the CA public
# certificate
......@@ -3380,6 +3380,7 @@ class LUClusterVerifyGroup(LogicalUnit, _VerifyErrors):
feedback_fn("* Verifying configuration file consistency")
self._VerifyClientCertificates(self.my_node_info.values(), all_nvinfo)
# If not all nodes are being checked, we need to make sure the master node
# and a non-checked vm_capable node are in the list.
absent_node_uuids = set(self.all_node_info).difference(self.my_node_info)
......@@ -1275,6 +1275,7 @@ def CreateNewClientCert(lu, node_uuid, filename=None):
options = {}
if filename:
options[constants.CRYPTO_OPTION_CERT_FILE] = filename
options[constants.CRYPTO_OPTION_SERIAL_NO] = utils.UuidToInt(node_uuid)
result = lu.rpc.call_node_crypto_tokens(
......@@ -610,7 +610,7 @@ class HttpBase(object):
if ssl_verify_peer:
ctx.set_verify(OpenSSL.SSL.VERIFY_PEER |
# Also add our certificate as a trusted CA to be sent to the client.
# This is required at least for GnuTLS clients to work.
......@@ -36,6 +36,7 @@ import base64
import pycurl
import threading
import copy
import os
from ganeti import utils
from ganeti import objects
......@@ -97,15 +98,23 @@ def Shutdown():
def _ConfigRpcCurl(curl):
noded_cert = str(pathutils.NODED_CERT_FILE)
noded_client_cert = str(pathutils.NODED_CLIENT_CERT_FILE)
# FIXME: The next two lines are necessary to ensure upgradability from
# 2.10 to 2.11. Remove in 2.12, because this slows down RPC calls.
if not os.path.exists(noded_client_cert):"Using server certificate as client certificate for RPC"
noded_client_cert = noded_cert
curl.setopt(pycurl.FOLLOWLOCATION, False)
curl.setopt(pycurl.CAINFO, noded_cert)
curl.setopt(pycurl.SSL_VERIFYHOST, 0)
curl.setopt(pycurl.SSL_VERIFYPEER, True)
curl.setopt(pycurl.SSLCERTTYPE, "PEM")
curl.setopt(pycurl.SSLCERT, noded_cert)
curl.setopt(pycurl.SSLCERT, noded_client_cert)
curl.setopt(pycurl.SSLKEYTYPE, "PEM")
curl.setopt(pycurl.SSLKEY, noded_cert)
curl.setopt(pycurl.SSLKEY, noded_client_cert)
curl.setopt(pycurl.CONNECTTIMEOUT, constants.RPC_CONNECT_TIMEOUT)
......@@ -25,6 +25,7 @@
import logging
import OpenSSL
import os
import uuid as uuid_module
from ganeti.utils import io
from ganeti.utils import x509
......@@ -33,6 +34,11 @@ from ganeti import errors
from ganeti import pathutils
def UuidToInt(uuid):
uuid_obj = uuid_module.UUID(uuid)
return # pylint: disable=E1101
def AddNodeToCandidateCerts(node_uuid, cert_digest, candidate_certs,, warn_fn=logging.warn):
"""Adds an entry to the candidate certificate map.
......@@ -94,13 +100,15 @@ def GetCertificateDigest(cert_filename=pathutils.NODED_CLIENT_CERT_FILE):
return cert.digest("sha1")
def GenerateNewSslCert(new_cert, cert_filename, log_msg):
def GenerateNewSslCert(new_cert, cert_filename, serial_no, log_msg):
"""Creates a new SSL certificate and backups the old one.
@type new_cert: boolean
@param new_cert: whether a new certificate should be created
@type cert_filename: string
@param cert_filename: filename of the certificate file
@type serial_no: int
@param serial_no: serial number of the certificate
@type log_msg: string
@param log_msg: log message to be written on certificate creation
......@@ -111,7 +119,7 @@ def GenerateNewSslCert(new_cert, cert_filename, log_msg):
x509.GenerateSelfSignedSslCert(cert_filename, serial_no)
def VerifyCertificate(filename):
......@@ -254,7 +254,7 @@ def LoadSignedX509Certificate(cert_pem, key):
return (cert, salt)
def GenerateSelfSignedX509Cert(common_name, validity):
def GenerateSelfSignedX509Cert(common_name, validity, serial_no):
"""Generates a self-signed X509 certificate.
@type common_name: string
......@@ -273,7 +273,7 @@ def GenerateSelfSignedX509Cert(common_name, validity):
cert = OpenSSL.crypto.X509()
if common_name:
cert.get_subject().CN = common_name
......@@ -286,7 +286,8 @@ def GenerateSelfSignedX509Cert(common_name, validity):
return (key_pem, cert_pem)
def GenerateSelfSignedSslCert(filename, common_name=constants.X509_CERT_CN,
def GenerateSelfSignedSslCert(filename, serial_no,
"""Legacy function to generate self-signed X509 certificate.
......@@ -303,8 +304,8 @@ def GenerateSelfSignedSslCert(filename, common_name=constants.X509_CERT_CN,
# TODO: Investigate using the cluster name instead of X505_CERT_CN for
# common_name, as cluster-renames are very seldom, and it'd be nice if RAPI
# and node daemon certificates have the proper Subject/Issuer.
(key_pem, cert_pem) = GenerateSelfSignedX509Cert(common_name,
validity * 24 * 60 * 60)
(key_pem, cert_pem) = GenerateSelfSignedX509Cert(
common_name, validity * 24 * 60 * 60, serial_no)
utils_io.WriteFile(filename, mode=0400, data=key_pem + cert_pem)
return (key_pem, cert_pem)
......@@ -766,7 +766,7 @@ RENEW-CRYPTO
| **renew-crypto** [-f]
| [\--new-cluster-certificate]
| [\--new-cluster-certificate] | [\--new-node-certificates]
| [\--new-confd-hmac-key]
| [\--new-rapi-certificate] [\--rapi-certificate *rapi-cert*]
| [\--new-spice-certificate | \--spice-certificate *spice-cert*
......@@ -779,6 +779,11 @@ options ``--new-cluster-certificate`` and ``--new-confd-hmac-key``
can be used to regenerate respectively the cluster-internal SSL
certificate and the HMAC key used by **ganeti-confd**\(8).
The option ``--new-node-certificates`` will generate new node SSL
certificates for all nodes. Note that the regeneration of the node
certificates takes place after the other certificates are created
and distributed and the ganeti daemons are restarted again.
To generate a new self-signed RAPI certificate (used by
**ganeti-rapi**\(8)) specify ``--new-rapi-certificate``. If you want to
use your own certificate, e.g. one signed by a certificate
......@@ -1049,7 +1049,7 @@ def TestClusterRenewCrypto():
# Ensure certificate doesn't cause "gnt-cluster verify" to complain
validity = constants.SSL_CERT_EXPIRATION_WARN * 3
utils.GenerateSelfSignedSslCert(, validity=validity)
utils.GenerateSelfSignedSslCert(, 1, validity=validity)
tmpcert = qa_utils.UploadFile(master.primary,
......@@ -1074,7 +1074,12 @@ def TestClusterRenewCrypto():
# Normal case
AssertCommand(["gnt-cluster", "renew-crypto", "--force",
"--new-cluster-certificate", "--new-confd-hmac-key",
"--new-rapi-certificate", "--new-cluster-domain-secret"])
"--new-rapi-certificate", "--new-cluster-domain-secret",
# Only renew node certificates
AssertCommand(["gnt-cluster", "renew-crypto", "--force",
# Restore RAPI certificate
AssertCommand(["gnt-cluster", "renew-crypto", "--force",
......@@ -33,6 +33,7 @@ from ganeti import query
from ganeti import pathutils
import qa_config
import qa_daemon
import qa_utils
import qa_error
......@@ -1182,7 +1183,9 @@ def TestInstanceUserDown(instance, master):
(constants.HT_XEN_HVM, _TestInstanceUserDownXen),
(constants.HT_KVM, _TestInstanceUserDownKvm)]:
if hv in enabled_hypervisors:
fn(instance, master)
print "%s hypervisor is not enabled, skipping test for this hypervisor" \
% hv
......@@ -4231,6 +4231,10 @@ cryptoActions = ConstantUtils.mkSet [cryptoActionGet, cryptoActionCreate]
cryptoOptionCertFile :: String
cryptoOptionCertFile = "cert_file"
-- Serial number of the certificate
cryptoOptionSerialNo :: String
cryptoOptionSerialNo = "serial_no"
-- * SSH key types
sshkDsa :: String
......@@ -90,6 +90,7 @@ import Data.Maybe (fromMaybe)
import qualified Text.JSON as J
import Text.JSON.Pretty (pp_value)
import qualified Data.ByteString.Base64.Lazy as Base64
import System.Directory
import Network.Curl hiding (content)
import qualified Ganeti.Path as P
......@@ -228,8 +229,15 @@ getOptionsForCall cert_path client_cert_path call =
executeRpcCalls :: (Rpc a b) => [(Node, a)] -> IO [(Node, ERpcError b)]
executeRpcCalls nodeCalls = do
cert_file <- P.nodedCertFile
let (nodes, calls) = unzip nodeCalls
opts = map (getOptionsForCall cert_file cert_file) calls
client_cert_file_name <- P.nodedClientCertFile
client_file_exists <- doesFileExist client_cert_file_name
-- FIXME: This is needed to ensure upgradability to 2.11
-- Remove in 2.12.
let client_cert_file = if client_file_exists
then client_cert_file_name
else cert_file
(nodes, calls) = unzip nodeCalls
opts = map (getOptionsForCall cert_file client_cert_file) calls
opts_urls = zipWith3 (\n c o ->
case prepareHttpRequest o n c of
Left v -> Left v
......@@ -1086,6 +1086,140 @@ class TestLUClusterVerifyGroup(CmdlibTestCase):
class TestLUClusterVerifyClientCerts(CmdlibTestCase):
def _AddNormalNode(self):
self.normalnode = copy.deepcopy(self.master)
self.normalnode.master_candidate = False
self.normalnode.uuid = "normal-node-uuid"
self.cfg.AddNode(self.normalnode, None)
def testVerifyMasterCandidate(self):
client_cert = "client-cert-digest"
self.cluster.candidate_certs = {self.master.uuid: client_cert}
self.rpc.call_node_verify.return_value = \
RpcResultsBuilder() \
{constants.NV_CLIENT_CERT: (None, client_cert)}) \
op = opcodes.OpClusterVerifyGroup(group_name="default", verbose=True)
def testVerifyMasterCandidateInvalid(self):
client_cert = "client-cert-digest"
self.cluster.candidate_certs = {self.master.uuid: client_cert}
self.rpc.call_node_verify.return_value = \
RpcResultsBuilder() \
{constants.NV_CLIENT_CERT: (666, "Invalid Certificate")}) \
op = opcodes.OpClusterVerifyGroup(group_name="default", verbose=True)
self.mcpu.assertLogContainsRegex("Client certificate")
self.mcpu.assertLogContainsRegex("failed validation")
def testVerifyNoMasterCandidateMap(self):
client_cert = "client-cert-digest"
self.cluster.candidate_certs = {}
self.rpc.call_node_verify.return_value = \
RpcResultsBuilder() \
{constants.NV_CLIENT_CERT: (None, client_cert)}) \
op = opcodes.OpClusterVerifyGroup(group_name="default", verbose=True)
"list of master candidate certificates is empty")
def testVerifyNoSharingMasterCandidates(self):
client_cert = "client-cert-digest"
self.cluster.candidate_certs = {
self.master.uuid: client_cert,
"some-other-master-candidate-uuid": client_cert}
self.rpc.call_node_verify.return_value = \
RpcResultsBuilder() \
{constants.NV_CLIENT_CERT: (None, client_cert)}) \
op = opcodes.OpClusterVerifyGroup(group_name="default", verbose=True)
"two master candidates configured to use the same")
def testVerifyMasterCandidateCertMismatch(self):
client_cert = "client-cert-digest"
self.cluster.candidate_certs = {self.master.uuid: "different-cert-digest"}
self.rpc.call_node_verify.return_value = \
RpcResultsBuilder() \
{constants.NV_CLIENT_CERT: (None, client_cert)}) \
op = opcodes.OpClusterVerifyGroup(group_name="default", verbose=True)
self.mcpu.assertLogContainsRegex("does not match its entry")
def testVerifyMasterCandidateUnregistered(self):
client_cert = "client-cert-digest"
self.cluster.candidate_certs = {"other-node-uuid": "different-cert-digest"}
self.rpc.call_node_verify.return_value = \
RpcResultsBuilder() \
{constants.NV_CLIENT_CERT: (None, client_cert)}) \
op = opcodes.OpClusterVerifyGroup(group_name="default", verbose=True)
self.mcpu.assertLogContainsRegex("does not have an entry")
def testVerifyMasterCandidateOtherNodesCert(self):
client_cert = "client-cert-digest"
self.cluster.candidate_certs = {"other-node-uuid": client_cert}
self.rpc.call_node_verify.return_value = \
RpcResultsBuilder() \
{constants.NV_CLIENT_CERT: (None, client_cert)}) \
op = opcodes.OpClusterVerifyGroup(group_name="default", verbose=True)
self.mcpu.assertLogContainsRegex("using a certificate of another node")
def testNormalNodeStillInList(self):
client_cert_master = "client-cert-digest-master"
client_cert_normal = "client-cert-digest-normal"
self.cluster.candidate_certs = {
self.normalnode.uuid: client_cert_normal,
self.master.uuid: client_cert_master}
self.rpc.call_node_verify.return_value = \
RpcResultsBuilder() \
{constants.NV_CLIENT_CERT: (None, client_cert_normal)}) \
{constants.NV_CLIENT_CERT: (None, client_cert_master)}) \
op = opcodes.OpClusterVerifyGroup(group_name="default", verbose=True)
self.mcpu.assertLogContainsRegex("not a master candidate")
self.mcpu.assertLogContainsRegex("still listed")
def testNormalNodeStealingMasterCandidateCert(self):
client_cert_master = "client-cert-digest-master"
self.cluster.candidate_certs = {
self.master.uuid: client_cert_master}
self.rpc.call_node_verify.return_value = \
RpcResultsBuilder() \
{constants.NV_CLIENT_CERT: (None, client_cert_master)}) \
{constants.NV_CLIENT_CERT: (None, client_cert_master)}) \
op = opcodes.OpClusterVerifyGroup(group_name="default", verbose=True)
self.mcpu.assertLogContainsRegex("not a master candidate")
"certificate of another node which is master candidate")
class TestLUClusterVerifyGroupMethods(CmdlibTestCase):
"""Base class for testing individual methods in LUClusterVerifyGroup.
......@@ -97,7 +97,7 @@ class TestGetCryptoTokens(testutils.GanetiTestCase):
def testCreateSslToken(self):
result = backend.GetCryptoTokens(
{constants.CRYPTO_OPTION_SERIAL_NO: 42})])
self.assertTrue((constants.CRYPTO_TYPE_SSL_DIGEST, self._ssl_digest)
in result)
......@@ -106,7 +106,16 @@ class TestGetCryptoTokens(testutils.GanetiTestCase):
result = backend.GetCryptoTokens(
constants.CRYPTO_OPTION_SERIAL_NO: 42})])