1. 20 May, 2011 12 commits
    • Adeodato Simo's avatar
      Cluster verify: check for nodes/instances with no group · adfa3b26
      Adeodato Simo authored
      
      
      Previously, all nodes and instances would *always* be visited/verified. By
      driving the verification by node group now, we will miss nodes and
      instances that can't be reached from existing node groups, should that rare
      and bogus circumstance ever occur.
      
      We safeguard against that by checking for unreachable nodes and instances
      explicitly. (These will not be further verified.)
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      adfa3b26
    • Adeodato Simo's avatar
      Cluster verify: fix LV checks for split instances · fe870648
      Adeodato Simo authored
      
      
      When sharding by group, if a mirrored instance is split (primary and
      secondary) between two groups, its volumes will not be properly checked:
      the group of the primary will warn about a missing volume in the secondary,
      and the group of the secondary about an unknown volume (in the secondary as
      well).
      
      To solve the "missing volumes" bit, we will detect this case and perform an
      extra RPC verify call to these split secondaries (querying only for
      NV_LVLIST), and introduce the results in the node images appropriately. We
      do this detection early in ExpandNames/CheckPrereq, as to properly lock the
      extra nodes.
      
      As for the "unknown volumes" warning in the secondary, we update the volume
      mapping with split instances before checking for orphaned volumes.
      
      Finally, we mark nodes as "ghost" only if they really don't exist in the
      cluster configuration, which avoid spurious "instance lives in ghost node"
      warnings.
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      fe870648
    • Adeodato Simo's avatar
      Cluster verify: make NV_NODELIST smaller · 2dad1652
      Adeodato Simo authored
      
      
      To cope with increasing cluster sizes, we now make nodes try to contact all
      other nodes in their group, and one node from every other group.
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      2dad1652
    • Adeodato Simo's avatar
      Cluster verify: verify hypervisor parameters only once · d23a2a9d
      Adeodato Simo authored
      
      
      The list of all hypervisor parameters has to be computed in
      LUClusterVerifyGroup, since it needs to be passed to nodes as
      NV_HVPARAMS. However, it is better only to verify said parameters once,
      out of LUClusterVerifyConfig.
      
      For this, we refactor the code that constructs the list of parameters to a
      module-level _GetAllHypervisorParameters() function that both LUs can use.
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      d23a2a9d
    • Adeodato Simo's avatar
      Split LUClusterVerify into LUClusterVerify{Config,Group} · bf93ae69
      Adeodato Simo authored
      
      
      With this change, LUClusterVerifyConfig becomes a "light" LU that only
      verifies the global config and other, master-only settings, and the bulk of
      node/instance verification is done by LUClusterVerifyGroup, which only acts
      on nodes and instances of a given group.
      
      To ensure that `gnt-cluster verify` continues to operate on the whole
      cluster, the client creates an OpClusterVerifyGroup job per node group; for
      convenience, the list of node groups is returned by LUClusterVerifyConfig.
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      bf93ae69
    • Adeodato Simo's avatar
      Cluster verify: factor out error codes and functions · a5c30dc2
      Adeodato Simo authored
      
      
      We move all error code definitions, plus the _Error and _ErrorIf helpers,
      to a private _VerifyErrors mix-in class that can be later shared by the new
      two cluster verify LUs.
      
      (_Error and _ErrorIf code was moved around verbatim, except to disable
      "_VerifyError class does not have 'op' or '_feedback_fn' members" errors
      from pylint.)
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      a5c30dc2
    • Adeodato Simo's avatar
      Cluster verify: make "instance runs in wrong node" node-driven · 14970c32
      Adeodato Simo authored
      
      
      Previously, the "instance should not be running in this node" error was
      computed by verifying, for each instance, whether any node other than its
      primary was running it. But this is not a well-suited approach if we were
      to shard cluster verification (because, for each instance, we won't have
      information whether it's running *outside* the current set of nodes).
      
      By reversing the logic of the check, and asking instead, for each node,
      "is it running any instance for which it's not primary", we catch all
      occurrences of the problem even if running sharded.
      
      Because of this, we can also detect orphan instances at the same time
      (instances that are not known in the cluster config). We warn about them
      here too, and drop the later _VerifyOrphanInstances check.
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      14970c32
    • Guido Trotter's avatar
      Verify an absent vm_capable node for files · 4e272d8c
      Guido Trotter authored
      
      
      If we're not verifying all nodes, adding a node outside the current
      group for file checksums helps us making sure checksums are the same in
      all of the cluster.
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      4e272d8c
    • Adeodato Simo's avatar
      Cluster verify: master must be present for _VerifyFiles · 2f10179b
      Adeodato Simo authored
      
      
      This commit prepares the call to _VerifyFiles for the case when the master
      node is not one of the nodes that's being verified (which will be the case
      for all node groups but one). We fix it by always passing master info and
      checksums to _VerifyFiles, which ensures there's a cluster-wide consistency
      check.
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      2f10179b
    • Adeodato Simo's avatar
      Cluster verify: don't assume we're verifying all nodes/instances · cf692cd0
      Adeodato Simo authored
      
      
      This commit fixes a few initial simple cases in which it was assumed that
      we're always working over the whole cluster. With this change, we
      differentiate between "nodes/instances to verify" and "checks that need
      cluster-wide information".
      
      In particular:
      
        - retrieve hypervisor parameters always from all instances
        - always specify full node list in NV_NODELIST
        - retrieve OOB path from all nodes
        - verify DRBD devices against the full set of instances (this ensures
          minors get properly verified even if an instance is split between groups)
        - look up node groups against the set of all nodes (to avoid tracebacks
          in case instances are split between groups)
        - determine whether running instances are unknown by checking against the
          full list of instances
      
      Behavior in all cases stays the same if still running over the whole
      cluster.
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      cf692cd0
    • Adeodato Simo's avatar
      Cluster verify: gather node/instance list in CheckPrereq · c711d09e
      Adeodato Simo authored
      
      
      This commit introduces no behavior changes, and is only a minor refactoring
      that aids with a cleaner division of future LUClusterVerify work. The
      change consists in:
      
        - substitute the {node,instance}{list,info} structures previously created
          in Exec() by member variables created in CheckPrereq; and
      
        - mechanically convert all references to the old variables to the new
          member variables.
      
      Creating both self.all_{node,inst}_info and self.my_{node,inst}_info, both
      with the same contents at the moment, is not capricious. We've now made
      Exec use the my_* variables pervasively; in future commits, we'll break the
      assumption that all nodes and instances are listed there, and it'll become
      clear when the all_* variables have to be substituted instead.
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      c711d09e
    • Iustin Pop's avatar
      Merge remote branch 'origin/devel-2.4' · 6aac5aef
      Iustin Pop authored
      
      
      * origin/devel-2.4:
        Fix errors in hooks documentation
        Clarify a bit the noded man page
        Note --no-remember in NEWS
        Switch QA over to using instance stop --no-remember
        Implement no_remember at RAPI level
        Implement no_remember at CLI level
        Introduce instance start/stop no_remember attribute
        Bump version for the 2.4.2 release
        Fix a bug in LUInstanceMove
        Abstract ignore_consistency opcode parameter
        Preload the string-escape code in noded
        Fix error in iallocator documentation reg. disk mode
        Try to prevent instance memory changes N+1 failures
        Update NEWS file for the 2.4.2 release
      
      Conflicts:
              NEWS                (trivial)
              doc/iallocator.rst  (kept our version)
              lib/cli.py          (trivial)
              lib/opcodes.py      (removed duplicated work, both branches
                                   introduced the same new variable
                                    PIgnoreConsistency :)
              lib/rapi/client.py  (trivial)
              lib/rapi/rlib2.py   (almost trivial)
              qa/ganeti-qa.py     (below trivial)
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      6aac5aef
  2. 19 May, 2011 5 commits
  3. 17 May, 2011 4 commits
  4. 16 May, 2011 10 commits
  5. 13 May, 2011 3 commits
  6. 12 May, 2011 6 commits