1. 17 Jul, 2009 1 commit
  2. 16 Jul, 2009 3 commits
  3. 14 Jul, 2009 3 commits
  4. 13 Jul, 2009 2 commits
  5. 08 Jul, 2009 1 commit
  6. 07 Jul, 2009 4 commits
  7. 01 Jul, 2009 1 commit
  8. 30 Jun, 2009 5 commits
    • Iustin Pop's avatar
      Cleanup config data when draining nodes · dec0d9da
      Iustin Pop authored
      
      
      Currently, when draining nodes we reset their master candidate flag, but
      we don't instruct them to demote themselves. This leads to “ERROR: file
      '/var/lib/ganeti/config.data' should not exist on non master candidates
      (and the file is outdated)”.
      
      This patch simply adds a call to node_demote_from_mc in this case.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      dec0d9da
    • Iustin Pop's avatar
      Fix node readd issues · a8ae3eb5
      Iustin Pop authored
      
      
      This patch fixes a few node readd issues.
      
      Currently, the node readd consists of two opcodes:
        - OpSetNodeParms, which resets the offline/drained flags
        - OpAddNode (with readd=True), which reconfigures the node
      
      The problem is that between these two, the configuration is inconsistent
      for certain cluster configurations. Thus, this patch removes the first
      opcode and modified the LUAddNode to deal with this case too.
      
      The patch also modifies the computation of the intended master_candidate
      status, and actually sets the readded node to master candidate if
      needed. Previously, we didn't modify the existing node at all.
      
      Finally, the patch modifies the bottom of the Exec() function for this
      LU to:
        - trigger a node update, which in turn redistributes the ssconf files
          to all nodes (and thus the new node too)
        - if the new node is not a master candidate, then call the
          node_demote_from_mc RPC so that old master files are cleared
      
      My testing shows this behaves correctly for various cases.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      a8ae3eb5
    • Iustin Pop's avatar
      backend.DemoteFromMC: don't fail for missing files · 9a5cb537
      Iustin Pop authored
      
      
      If the config file is missing when the DemoteFromMC() function is
      called, it will raise a ProgrammerError. Instead of changing the
      utils.CreateBackup() file which is called from multiple places, for now
      we only change the DemoteFromMC() function to not call it if the file is
      not existing (we rely on the master to prevent race conditions here).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarOlivier Tharan <olive@google.com>
      9a5cb537
    • Iustin Pop's avatar
      Allow GetMasterCandidateStats to ignore some nodes · 23f06b2b
      Iustin Pop authored
      
      
      This patch modifies ConfigWriter.GetMasterCandidateStats to allow it to
      ignore some nodes in the calculation, so that we can use it to predict
      cluster state without some nodes (which we know we will modify, and thus
      we should not rely on their state).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarOlivier Tharan <olive@google.com>
      23f06b2b
    • Iustin Pop's avatar
      Fix error message for extra files on non MC nodes · e631cb25
      Iustin Pop authored
      
      
      Currently the message for extraneous files on non master candidates is
      confusing, to say the least. This makes it hopefully more clear.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarOlivier Tharan <olive@google.com>
      e631cb25
  9. 29 Jun, 2009 2 commits
  10. 23 Jun, 2009 2 commits
  11. 17 Jun, 2009 3 commits
    • Iustin Pop's avatar
      Fix handling of 'vcpus' in instance list · c1ce76bb
      Iustin Pop authored
      
      
      Currently running “gnt-instance list -o+vcpus” fails with a cryptic message:
        Unhandled Ganeti error: vcpus
      
      This is due to multiple issues:
        - in some corner cases cmdlib.py raises an errors.ParameterError but
          this is not handled by cli.py
        - LUQueryInstances declares ‘vcpu’ as a supported field, but doesn't handle
          it, so instead of failing with unknown parameter, e.g.:
            Failure: prerequisites not met for this operation:
            Unknown output fields selected: vcpuscd
          it raises the ParameteError message
      
      This patch:
        - adds handling of 'vcpus' to LUQueryInstances
        - adds handling of the ParameterError exception to cli.py
        - changes the 'else: raise errors.ParameterError' in the field handling of
          LUQueryInstance to an assert, since it's a programmer error if we reached
          this step
      
      With this, a future unhandled parameter will show:
        gnt-instance list -o+vcpus
        Unhandled protocol error while talking to the master daemon:
        Caught exception: Declared but unhandled parameter 'vcpus'
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      c1ce76bb
    • Iustin Pop's avatar
      Fix checking for valid OS in instance create · 6dfad215
      Iustin Pop authored
      
      
      The current check in LUCreateInstance.CheckPrereq() is wrong - it only checks
      if we got an OS, but not if we got a valid OS. This patch fixes it.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      6dfad215
    • Iustin Pop's avatar
      Show disk size in instance info · c98162a7
      Iustin Pop authored
      
      
      The size of the instance's disk was not shown in “gnt-instance info”.
      This patch adds it and formats it nicely if possible.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      c98162a7
  12. 16 Jun, 2009 2 commits
  13. 15 Jun, 2009 1 commit
  14. 08 Jun, 2009 3 commits
    • Iustin Pop's avatar
      Enable stripped LVs · fecbe9d5
      Iustin Pop authored
      
      
      This patch enables stripped LVs, falling back to non-stripped if the
      stripped creation fails. If the configure-time lvm-stripecount is 1,
      this patch becomes a noop (with an insignificant python-level overhead,
      but no extra lvm calls).
      
      The effect of this patch is that new instances will get stripped LVs
      from the start, whereas old instances will have their LVs stripped as
      soon as replace-disks is run for them.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      fecbe9d5
    • Iustin Pop's avatar
      Add a lvm stripecount configure parameter · 3736cb6b
      Iustin Pop authored
      
      
      This patch adds a configure-time customizable parameter that will be
      used to enable stripped LVs. The default of the parameter is 3.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      3736cb6b
    • Iustin Pop's avatar
      Add more constants for DRBD and change sync tests · 3c003d9d
      Iustin Pop authored
      
      
      This patch adds constants for the connection status, peer roles and disk
      status, and it changes the rules for when the disk is considered as
      “resyncing” - previously it was only for syncsource/synctarget, but
      there are many other transient statuses which could be misinterpreted as
      ‘degraded’ (because they where not considered as resyncing, but the disk
      is not consistent in these statuses).
      
      Furthermore, cmdlib.py:WaitForSync determines if a device is syncing or
      not based on sync_percent being not none. Not all DRBD resync statuses
      offer a percent done, so if we are syncing but don't have a sync
      percent, we'll report a zero sync percent (and no time estimate).
      
      The patch also removes a few unused variables (is_sync_target,
      peer_sync_target, is_resync) whose value doesn't make sense anymore with
      the new sync rules.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      3c003d9d
  15. 04 Jun, 2009 1 commit
    • Iustin Pop's avatar
      Wait for a while in failed resyncs · fbafd7a8
      Iustin Pop authored
      
      
      This patch is an attempt at fixing some very rare occurrences of messages like:
        - "There are some degraded disks for this instance", or:
        - "Cannot resync disks on node node3.example.com: [True, 100]"
      
      What I believe happens is that drbd has finished syncing, but not all
      fields are updated in 'Connected' state; maybe it's in WFBitmap[ST], or
      in some other transient state we don't handle well.
      
      The patch will change the _WaitForSync method to recheck up to a
      hardcoded number of times if we're finished syncing but we're degraded
      (using the same condition as the 'break' clause of the loop).
      
      The cons of this changes is that a normal, really-degraded due to
      network or disk failure will cause an extra delay before it aborts. For
      this, I'm happy to choose other values.
      
      A better, long term fix is to handle more DRBD state correctly (see the
      bdev.DRBD8Status class).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      fbafd7a8
  16. 03 Jun, 2009 2 commits
    • Iustin Pop's avatar
      Assemble DRBD using the known size · f069addf
      Iustin Pop authored
      
      
      This patch changes DRBD disk attachment to force the wanted size, as opposed to
      letting the device auto-discover its size.
      
      This should make the disks more resilient with regard to small differences in
      size (e.g. due to LVM rounding). This still works with regard to disk
      growth, but the instances needs to be fully restarted (including disks)
      in that case.
      
      This passes a full burning without problems, but it's still a tricky
      change - if the config.data is not synced with the reality, we might
      tell DRBD a wrong size. At least this will fail outright (and not
      introduce silent errors), as DRBD (per a quick check at the sources)
      tracks the size in the meta-dev and also does not allow shrinking
      consistent devices.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      f069addf
    • Iustin Pop's avatar
      Fix two issues with exports and snapshot errors · a97da6b7
      Iustin Pop authored
      
      
      This patch fixes two issues related to failed snapshots during exports:
        - first, the error messages used disk.logical_id[1], which is a node
          name for DRBD, and it resulted in strange error messages like
          "cannot snapshot block device node1 on node2"
        - second, if snapshotting fails for any disk, rpc.call_finalize_export
          fails as it didn't handle booleans (backend.FinalizeExport does)
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      a97da6b7
  17. 28 May, 2009 2 commits
  18. 25 May, 2009 2 commits
    • Iustin Pop's avatar
      rapi: rework error handling · 59b4eeef
      Iustin Pop authored
      
      
      Currently the rapi code doesn't have any custom error handling; any
      exceptions raised are simply converted into an HTTP 500 error, without
      much explanation.
      
      This patch adds a couple of generic SubmitJob/GetClient functions that
      handle some errors specially so that they are transformed into HTTP
      errors, with more detailed information.
      
      With this patch, the behaviour of rapi when the queue is full or
      drained, or when the master is down is more readable.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      59b4eeef
    • Iustin Pop's avatar
      Fix backend.OSEnvironment be/hv parameters · 030b218a
      Iustin Pop authored
      Commit 67fc3042
      
       added some more
      variables to be exported to OSEnvironment, but it has two bugs:
        - wrong variable name (env vs. result)
        - in OSEnvironment we don't have the automatic converstion to strings
          that we do in hooks, so we must manually enforce this
      
      With this patch instance creations work again.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      030b218a