walkthrough.rst 42.6 KB
Newer Older
Iustin Pop's avatar
Iustin Pop committed
1 2 3 4 5 6 7
Ganeti walk-through
===================

Documents Ganeti version |version|

.. contents::

Iustin Pop's avatar
Iustin Pop committed
8
.. highlight:: shell-example
Iustin Pop's avatar
Iustin Pop committed
9 10 11 12 13 14 15 16 17 18

Introduction
------------

This document serves as a more example-oriented guide to Ganeti; while
the administration guide shows a conceptual approach, here you will find
a step-by-step example to managing instances and the cluster.

Our simulated, example cluster will have three machines, named
``node1``, ``node2``, ``node3``. Note that in real life machines will
19 20
usually have FQDNs but here we use short names for brevity. We will use
a secondary network for replication data, ``192.0.2.0/24``, with nodes
Iustin Pop's avatar
Iustin Pop committed
21 22 23 24 25 26 27 28 29 30 31 32 33 34
having the last octet the same as their index. The cluster name will be
``example-cluster``. All nodes have the same simulated hardware
configuration, two disks of 750GB, 32GB of memory and 4 CPUs.

On this cluster, we will create up to seven instances, named
``instance1`` to ``instance7``.


Cluster creation
----------------

Follow the :doc:`install` document and prepare the nodes. Then it's time
to initialise the cluster::

Iustin Pop's avatar
Iustin Pop committed
35 36
  $ gnt-cluster init -s %192.0.2.1% --enabled-hypervisors=xen-pvm %example-cluster%
  $
Iustin Pop's avatar
Iustin Pop committed
37 38 39 40

The creation was fine. Let's check that one node we have is functioning
correctly::

Iustin Pop's avatar
Iustin Pop committed
41
  $ gnt-node list
Iustin Pop's avatar
Iustin Pop committed
42 43
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
Iustin Pop's avatar
Iustin Pop committed
44
  $ gnt-cluster verify
Iustin Pop's avatar
Iustin Pop committed
45 46 47 48 49 50 51 52 53
  Mon Oct 26 02:08:51 2009 * Verifying global settings
  Mon Oct 26 02:08:51 2009 * Gathering data (1 nodes)
  Mon Oct 26 02:08:52 2009 * Verifying node status
  Mon Oct 26 02:08:52 2009 * Verifying instance status
  Mon Oct 26 02:08:52 2009 * Verifying orphan volumes
  Mon Oct 26 02:08:52 2009 * Verifying remaining instances
  Mon Oct 26 02:08:52 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 02:08:52 2009 * Other Notes
  Mon Oct 26 02:08:52 2009 * Hooks Results
Iustin Pop's avatar
Iustin Pop committed
54
  $
Iustin Pop's avatar
Iustin Pop committed
55 56 57

Since this proceeded correctly, let's add the other two nodes::

Iustin Pop's avatar
Iustin Pop committed
58
  $ gnt-node add -s %192.0.2.2% %node2%
Iustin Pop's avatar
Iustin Pop committed
59 60 61 62 63
  -- WARNING --
  Performing this operation is going to replace the ssh daemon keypair
  on the target machine (node2) with the ones of the current one
  and grant full intra-cluster ssh root access to/from it

Iustin Pop's avatar
Iustin Pop committed
64 65 66 67 68
  Unable to verify hostkey of host xen-devi-5.fra.corp.google.com:
  f7:…. Do you want to accept it?
  y/[n]/?: %y%
  Mon Oct 26 02:11:53 2009  Authentication to node2 via public key failed, trying password
  root password:
Iustin Pop's avatar
Iustin Pop committed
69
  Mon Oct 26 02:11:54 2009  - INFO: Node will be a master candidate
Iustin Pop's avatar
Iustin Pop committed
70
  $ gnt-node add -s %192.0.2.3% %node3%
Iustin Pop's avatar
Iustin Pop committed
71 72
  -- WARNING --
  Performing this operation is going to replace the ssh daemon keypair
Iustin Pop's avatar
Iustin Pop committed
73
  on the target machine (node3) with the ones of the current one
Iustin Pop's avatar
Iustin Pop committed
74 75
  and grant full intra-cluster ssh root access to/from it

Iustin Pop's avatar
Iustin Pop committed
76 77

  Mon Oct 26 02:12:43 2009  - INFO: Node will be a master candidate
Iustin Pop's avatar
Iustin Pop committed
78 79 80

Checking the cluster status again::

Iustin Pop's avatar
Iustin Pop committed
81
  $ gnt-node list
Iustin Pop's avatar
Iustin Pop committed
82 83 84 85
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  node2   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  node3   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
Iustin Pop's avatar
Iustin Pop committed
86
  $ gnt-cluster verify
Iustin Pop's avatar
Iustin Pop committed
87 88 89 90 91 92 93 94 95
  Mon Oct 26 02:15:14 2009 * Verifying global settings
  Mon Oct 26 02:15:14 2009 * Gathering data (3 nodes)
  Mon Oct 26 02:15:16 2009 * Verifying node status
  Mon Oct 26 02:15:16 2009 * Verifying instance status
  Mon Oct 26 02:15:16 2009 * Verifying orphan volumes
  Mon Oct 26 02:15:16 2009 * Verifying remaining instances
  Mon Oct 26 02:15:16 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 02:15:16 2009 * Other Notes
  Mon Oct 26 02:15:16 2009 * Hooks Results
Iustin Pop's avatar
Iustin Pop committed
96
  $
Iustin Pop's avatar
Iustin Pop committed
97 98 99

And let's check that we have a valid OS::

Iustin Pop's avatar
Iustin Pop committed
100
  $ gnt-os list
Iustin Pop's avatar
Iustin Pop committed
101 102 103 104
  Name
  debootstrap
  node1#

105 106
Running a burn-in
-----------------
Iustin Pop's avatar
Iustin Pop committed
107 108 109 110 111 112

Now that the cluster is created, it is time to check that the hardware
works correctly, that the hypervisor can actually create instances,
etc. This is done via the debootstrap tool as described in the admin
guide. Similar output lines are replaced with ``…`` in the below log::

Iustin Pop's avatar
Iustin Pop committed
113
  $ /usr/lib/ganeti/tools/burnin -o debootstrap -p instance{1..5}
Iustin Pop's avatar
Iustin Pop committed
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
  - Testing global parameters
  - Creating instances
    * instance instance1
      on node1, node2
    * instance instance2
      on node2, node3

    * instance instance5
      on node2, node3
    * Submitted job ID(s) 157, 158, 159, 160, 161
      waiting for job 157 for instance1

      waiting for job 161 for instance5
  - Replacing disks on the same nodes
    * instance instance1
      run replace_on_secondary
      run replace_on_primary

    * instance instance5
      run replace_on_secondary
      run replace_on_primary
    * Submitted job ID(s) 162, 163, 164, 165, 166
      waiting for job 162 for instance1

  - Changing the secondary node
    * instance instance1
      run replace_new_secondary node3
    * instance instance2
      run replace_new_secondary node1

    * instance instance5
      run replace_new_secondary node1
    * Submitted job ID(s) 167, 168, 169, 170, 171
      waiting for job 167 for instance1

  - Growing disks
    * instance instance1
      increase disk/0 by 128 MB

    * instance instance5
      increase disk/0 by 128 MB
    * Submitted job ID(s) 173, 174, 175, 176, 177
      waiting for job 173 for instance1

  - Failing over instances
    * instance instance1

    * instance instance5
    * Submitted job ID(s) 179, 180, 181, 182, 183
      waiting for job 179 for instance1

  - Migrating instances
    * instance instance1
      migration and migration cleanup

    * instance instance5
      migration and migration cleanup
    * Submitted job ID(s) 184, 185, 186, 187, 188
      waiting for job 184 for instance1

  - Exporting and re-importing instances
    * instance instance1
      export to node node3
      remove instance
      import from node3 to node1, node2
      remove export

    * instance instance5
      export to node node1
      remove instance
      import from node1 to node2, node3
      remove export
    * Submitted job ID(s) 196, 197, 198, 199, 200
      waiting for job 196 for instance1

  - Reinstalling instances
    * instance instance1
      reinstall without passing the OS
      reinstall specifying the OS

    * instance instance5
      reinstall without passing the OS
      reinstall specifying the OS
    * Submitted job ID(s) 203, 204, 205, 206, 207
      waiting for job 203 for instance1

  - Rebooting instances
    * instance instance1
      reboot with type 'hard'
      reboot with type 'soft'
      reboot with type 'full'

    * instance instance5
      reboot with type 'hard'
      reboot with type 'soft'
      reboot with type 'full'
    * Submitted job ID(s) 208, 209, 210, 211, 212
      waiting for job 208 for instance1

  - Adding and removing disks
    * instance instance1
      adding a disk
      removing last disk

    * instance instance5
      adding a disk
      removing last disk
    * Submitted job ID(s) 213, 214, 215, 216, 217
      waiting for job 213 for instance1

  - Adding and removing NICs
    * instance instance1
      adding a NIC
      removing last NIC

    * instance instance5
      adding a NIC
      removing last NIC
    * Submitted job ID(s) 218, 219, 220, 221, 222
      waiting for job 218 for instance1

  - Activating/deactivating disks
    * instance instance1
      activate disks when online
      activate disks when offline
      deactivate disks (when offline)

    * instance instance5
      activate disks when online
      activate disks when offline
      deactivate disks (when offline)
    * Submitted job ID(s) 223, 224, 225, 226, 227
      waiting for job 223 for instance1

  - Stopping and starting instances
    * instance instance1

    * instance instance5
    * Submitted job ID(s) 230, 231, 232, 233, 234
      waiting for job 230 for instance1

  - Removing instances
    * instance instance1

    * instance instance5
    * Submitted job ID(s) 235, 236, 237, 238, 239
      waiting for job 235 for instance1

Iustin Pop's avatar
Iustin Pop committed
262
  $
Iustin Pop's avatar
Iustin Pop committed
263

264 265
You can see in the above what operations the burn-in does. Ideally, the
burn-in log would proceed successfully through all the steps and end
Iustin Pop's avatar
Iustin Pop committed
266 267 268 269 270 271 272 273 274 275 276
cleanly, without throwing errors.

Instance operations
-------------------

Creation
++++++++

At this point, Ganeti and the hardware seems to be functioning
correctly, so we'll follow up with creating the instances manually::

277
  $ gnt-instance add -t drbd -o debootstrap -s %256m% %instance1%
Iustin Pop's avatar
Iustin Pop committed
278 279 280 281
  Mon Oct 26 04:06:52 2009  - INFO: Selected nodes for instance instance1 via iallocator hail: node2, node3
  Mon Oct 26 04:06:53 2009 * creating instance disks...
  Mon Oct 26 04:06:57 2009 adding instance instance1 to cluster config
  Mon Oct 26 04:06:57 2009  - INFO: Waiting for instance instance1 to sync disks.
Iustin Pop's avatar
Iustin Pop committed
282
  Mon Oct 26 04:06:57 2009  - INFO: - device disk/0: 20.00\% done, 4 estimated seconds remaining
Iustin Pop's avatar
Iustin Pop committed
283 284 285 286
  Mon Oct 26 04:07:01 2009  - INFO: Instance instance1's disks are in sync.
  Mon Oct 26 04:07:01 2009 creating os for instance instance1 on node node2
  Mon Oct 26 04:07:01 2009 * running the instance OS create scripts...
  Mon Oct 26 04:07:14 2009 * starting instance...
Iustin Pop's avatar
Iustin Pop committed
287
  $ gnt-instance add -t drbd -o debootstrap -s %256m% -n %node1%:%node2% %instance2%
Iustin Pop's avatar
Iustin Pop committed
288 289 290
  Mon Oct 26 04:11:37 2009 * creating instance disks...
  Mon Oct 26 04:11:40 2009 adding instance instance2 to cluster config
  Mon Oct 26 04:11:41 2009  - INFO: Waiting for instance instance2 to sync disks.
Iustin Pop's avatar
Iustin Pop committed
291 292 293 294 295
  Mon Oct 26 04:11:41 2009  - INFO: - device disk/0: 35.40\% done, 1 estimated seconds remaining
  Mon Oct 26 04:11:42 2009  - INFO: - device disk/0: 58.50\% done, 1 estimated seconds remaining
  Mon Oct 26 04:11:43 2009  - INFO: - device disk/0: 86.20\% done, 0 estimated seconds remaining
  Mon Oct 26 04:11:44 2009  - INFO: - device disk/0: 92.40\% done, 0 estimated seconds remaining
  Mon Oct 26 04:11:44 2009  - INFO: - device disk/0: 97.00\% done, 0 estimated seconds remaining
Iustin Pop's avatar
Iustin Pop committed
296 297 298 299
  Mon Oct 26 04:11:44 2009  - INFO: Instance instance2's disks are in sync.
  Mon Oct 26 04:11:44 2009 creating os for instance instance2 on node node1
  Mon Oct 26 04:11:44 2009 * running the instance OS create scripts...
  Mon Oct 26 04:11:57 2009 * starting instance...
Iustin Pop's avatar
Iustin Pop committed
300
  $
Iustin Pop's avatar
Iustin Pop committed
301 302 303 304 305

The above shows one instance created via an iallocator script, and one
being created with manual node assignment. The other three instances
were also created and now it's time to check them::

Iustin Pop's avatar
Iustin Pop committed
306
  $ gnt-instance list
Iustin Pop's avatar
Iustin Pop committed
307 308 309 310 311 312 313 314 315 316 317 318
  Instance  Hypervisor OS          Primary_node Status  Memory
  instance1 xen-pvm    debootstrap node2        running   128M
  instance2 xen-pvm    debootstrap node1        running   128M
  instance3 xen-pvm    debootstrap node1        running   128M
  instance4 xen-pvm    debootstrap node3        running   128M
  instance5 xen-pvm    debootstrap node2        running   128M

Accessing instances
+++++++++++++++++++

Accessing an instance's console is easy::

Iustin Pop's avatar
Iustin Pop committed
319
  $ gnt-instance console %instance2%
Iustin Pop's avatar
Iustin Pop committed
320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347
  [    0.000000] Bootdata ok (command line is root=/dev/sda1 ro)
  [    0.000000] Linux version 2.6…
  [    0.000000] BIOS-provided physical RAM map:
  [    0.000000]  Xen: 0000000000000000 - 0000000008800000 (usable)
  [13138176.018071] Built 1 zonelists.  Total pages: 34816
  [13138176.018074] Kernel command line: root=/dev/sda1 ro
  [13138176.018694] Initializing CPU#0

  Checking file systems...fsck 1.41.3 (12-Oct-2008)
  done.
  Setting kernel variables (/etc/sysctl.conf)...done.
  Mounting local filesystems...done.
  Activating swapfile swap...done.
  Setting up networking....
  Configuring network interfaces...done.
  Setting console screen modes and fonts.
  INIT: Entering runlevel: 2
  Starting enhanced syslogd: rsyslogd.
  Starting periodic command scheduler: crond.

  Debian GNU/Linux 5.0 instance2 tty1

  instance2 login:

At this moment you can login to the instance and, after configuring the
network (and doing this on all instances), we can check their
connectivity::

Iustin Pop's avatar
Iustin Pop committed
348
  $ fping %instance{1..5}%
Iustin Pop's avatar
Iustin Pop committed
349 350 351 352 353
  instance1 is alive
  instance2 is alive
  instance3 is alive
  instance4 is alive
  instance5 is alive
Iustin Pop's avatar
Iustin Pop committed
354
  $
Iustin Pop's avatar
Iustin Pop committed
355 356 357 358 359 360

Removal
+++++++

Removing unwanted instances is also easy::

Iustin Pop's avatar
Iustin Pop committed
361
  $ gnt-instance remove %instance5%
Iustin Pop's avatar
Iustin Pop committed
362 363
  This will remove the volumes of the instance instance5 (including
  mirrors), thus removing all the data of the instance. Continue?
Iustin Pop's avatar
Iustin Pop committed
364 365
  y/[n]/?: %y%
  $
Iustin Pop's avatar
Iustin Pop committed
366 367 368 369 370 371 372 373 374 375 376


Recovering from hardware failures
---------------------------------

Recovering from node failure
++++++++++++++++++++++++++++

We are now left with four instances. Assume that at this point, node3,
which has one primary and one secondary instance, crashes::

Iustin Pop's avatar
Iustin Pop committed
377
  $ gnt-node info %node3%
Iustin Pop's avatar
Iustin Pop committed
378
  Node name: node3
379 380
    primary ip: 198.51.100.1
    secondary ip: 192.0.2.3
Iustin Pop's avatar
Iustin Pop committed
381 382 383 384 385 386 387
    master candidate: True
    drained: False
    offline: False
    primary for instances:
      - instance4
    secondary for instances:
      - instance1
Iustin Pop's avatar
Iustin Pop committed
388
  $ fping %node3%
Iustin Pop's avatar
Iustin Pop committed
389 390 391 392 393 394
  node3 is unreachable

At this point, the primary instance of that node (instance4) is down,
but the secondary instance (instance1) is not affected except it has
lost disk redundancy::

Iustin Pop's avatar
Iustin Pop committed
395
  $ fping %instance{1,4}%
Iustin Pop's avatar
Iustin Pop committed
396 397
  instance1 is alive
  instance4 is unreachable
Iustin Pop's avatar
Iustin Pop committed
398
  $
Iustin Pop's avatar
Iustin Pop committed
399 400 401 402

If we try to check the status of instance4 via the instance info
command, it fails because it tries to contact node3 which is down::

Iustin Pop's avatar
Iustin Pop committed
403
  $ gnt-instance info %instance4%
Iustin Pop's avatar
Iustin Pop committed
404 405
  Failure: command execution error:
  Error checking node node3: Connection failed (113: No route to host)
Iustin Pop's avatar
Iustin Pop committed
406
  $
Iustin Pop's avatar
Iustin Pop committed
407 408 409 410

So we need to mark node3 as being *offline*, and thus Ganeti won't talk
to it anymore::

Iustin Pop's avatar
Iustin Pop committed
411
  $ gnt-node modify -O yes -f %node3%
Iustin Pop's avatar
Iustin Pop committed
412 413 414 415 416
  Mon Oct 26 04:34:12 2009  - WARNING: Not enough master candidates (desired 10, new value will be 2)
  Mon Oct 26 04:34:15 2009  - WARNING: Communication failure to node node3: Connection failed (113: No route to host)
  Modified node node3
   - offline -> True
   - master_candidate -> auto-demotion due to offline
Iustin Pop's avatar
Iustin Pop committed
417
  $
Iustin Pop's avatar
Iustin Pop committed
418 419 420

And now we can failover the instance::

421
  $ gnt-instance failover %instance4%
Iustin Pop's avatar
Iustin Pop committed
422 423
  Failover will happen to image instance4. This requires a shutdown of
  the instance. Continue?
Iustin Pop's avatar
Iustin Pop committed
424
  y/[n]/?: %y%
Iustin Pop's avatar
Iustin Pop committed
425 426 427
  Mon Oct 26 04:35:34 2009 * checking disk consistency between source and target
  Failure: command execution error:
  Disk disk/0 is degraded on target node, aborting failover.
Iustin Pop's avatar
Iustin Pop committed
428
  $ gnt-instance failover --ignore-consistency %instance4%
Iustin Pop's avatar
Iustin Pop committed
429 430 431 432 433 434 435 436 437 438 439
  Failover will happen to image instance4. This requires a shutdown of
  the instance. Continue?
  y/[n]/?: y
  Mon Oct 26 04:35:47 2009 * checking disk consistency between source and target
  Mon Oct 26 04:35:47 2009 * shutting down instance on source node
  Mon Oct 26 04:35:47 2009  - WARNING: Could not shutdown instance instance4 on node node3. Proceeding anyway. Please make sure node node3 is down. Error details: Node is marked offline
  Mon Oct 26 04:35:47 2009 * deactivating the instance's disks on source node
  Mon Oct 26 04:35:47 2009  - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline
  Mon Oct 26 04:35:47 2009 * activating the instance's disks on target node
  Mon Oct 26 04:35:47 2009  - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline
  Mon Oct 26 04:35:48 2009 * starting the instance on the target node
Iustin Pop's avatar
Iustin Pop committed
440
  $
Iustin Pop's avatar
Iustin Pop committed
441 442 443 444 445

Note in our first attempt, Ganeti refused to do the failover since it
wasn't sure what is the status of the instance's disks. We pass the
``--ignore-consistency`` flag and then we can failover::

Iustin Pop's avatar
Iustin Pop committed
446
  $ gnt-instance list
Iustin Pop's avatar
Iustin Pop committed
447 448 449 450 451
  Instance  Hypervisor OS          Primary_node Status  Memory
  instance1 xen-pvm    debootstrap node2        running   128M
  instance2 xen-pvm    debootstrap node1        running   128M
  instance3 xen-pvm    debootstrap node1        running   128M
  instance4 xen-pvm    debootstrap node1        running   128M
Iustin Pop's avatar
Iustin Pop committed
452
  $
Iustin Pop's avatar
Iustin Pop committed
453 454 455 456

But at this point, both instance1 and instance4 are without disk
redundancy::

Iustin Pop's avatar
Iustin Pop committed
457
  $ gnt-instance info %instance1%
Iustin Pop's avatar
Iustin Pop committed
458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478
  Instance name: instance1
  UUID: 45173e82-d1fa-417c-8758-7d582ab7eef4
  Serial number: 2
  Creation time: 2009-10-26 04:06:57
  Modification time: 2009-10-26 04:07:14
  State: configured to be up, actual state is up
    Nodes:
      - primary: node2
      - secondaries: node3
    Operating system: debootstrap
    Allocated network port: None
    Hypervisor: xen-pvm
      - root_path: default (/dev/sda1)
      - kernel_args: default (ro)
      - use_bootloader: default (False)
      - bootloader_args: default ()
      - bootloader_path: default ()
      - kernel_path: default (/boot/vmlinuz-2.6-xenU)
      - initrd_path: default ()
    Hardware:
      - VCPUs: 1
479 480
      - maxmem: 256MiB
      - minmem: 512MiB
Iustin Pop's avatar
Iustin Pop committed
481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503
      - NICs:
        - nic/0: MAC: aa:00:00:78:da:63, IP: None, mode: bridged, link: xen-br0
    Disks:
      - disk/0: drbd8, size 256M
        access mode: rw
        nodeA:       node2, minor=0
        nodeB:       node3, minor=0
        port:        11035
        auth key:    8e950e3cec6854b0181fbc3a6058657701f2d458
        on primary:  /dev/drbd0 (147:0) in sync, status *DEGRADED*
        child devices:
          - child 0: lvm, size 256M
            logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data
            on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data (254:0)
          - child 1: lvm, size 128M
            logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta
            on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta (254:1)

The output is similar for instance4. In order to recover this, we need
to run the node evacuate command which will change from the current
secondary node to a new one (in this case, we only have two working
nodes, so all instances will be end on nodes one and two)::

Iustin Pop's avatar
Iustin Pop committed
504
  $ gnt-node evacuate -I hail %node3%
Iustin Pop's avatar
Iustin Pop committed
505 506
  Relocate instance(s) 'instance1','instance4' from node
   node3 using iallocator hail?
Iustin Pop's avatar
Iustin Pop committed
507
  y/[n]/?: %y%
Iustin Pop's avatar
Iustin Pop committed
508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527
  Mon Oct 26 05:05:39 2009  - INFO: Selected new secondary for instance 'instance1': node1
  Mon Oct 26 05:05:40 2009  - INFO: Selected new secondary for instance 'instance4': node2
  Mon Oct 26 05:05:40 2009 Replacing disk(s) 0 for instance1
  Mon Oct 26 05:05:40 2009 STEP 1/6 Check device existence
  Mon Oct 26 05:05:40 2009  - INFO: Checking disk/0 on node2
  Mon Oct 26 05:05:40 2009  - INFO: Checking volume groups
  Mon Oct 26 05:05:40 2009 STEP 2/6 Check peer consistency
  Mon Oct 26 05:05:40 2009  - INFO: Checking disk/0 consistency on node node2
  Mon Oct 26 05:05:40 2009 STEP 3/6 Allocate new storage
  Mon Oct 26 05:05:40 2009  - INFO: Adding new local storage on node1 for disk/0
  Mon Oct 26 05:05:41 2009 STEP 4/6 Changing drbd configuration
  Mon Oct 26 05:05:41 2009  - INFO: activating a new drbd on node1 for disk/0
  Mon Oct 26 05:05:42 2009  - INFO: Shutting down drbd for disk/0 on old node
  Mon Oct 26 05:05:42 2009  - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline
  Mon Oct 26 05:05:42 2009       Hint: Please cleanup this device manually as soon as possible
  Mon Oct 26 05:05:42 2009  - INFO: Detaching primary drbds from the network (=> standalone)
  Mon Oct 26 05:05:42 2009  - INFO: Updating instance configuration
  Mon Oct 26 05:05:45 2009  - INFO: Attaching primary drbds to new secondary (standalone => connected)
  Mon Oct 26 05:05:46 2009 STEP 5/6 Sync devices
  Mon Oct 26 05:05:46 2009  - INFO: Waiting for instance instance1 to sync disks.
Iustin Pop's avatar
Iustin Pop committed
528
  Mon Oct 26 05:05:46 2009  - INFO: - device disk/0: 13.90\% done, 7 estimated seconds remaining
Iustin Pop's avatar
Iustin Pop committed
529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553
  Mon Oct 26 05:05:53 2009  - INFO: Instance instance1's disks are in sync.
  Mon Oct 26 05:05:53 2009 STEP 6/6 Removing old storage
  Mon Oct 26 05:05:53 2009  - INFO: Remove logical volumes for 0
  Mon Oct 26 05:05:53 2009  - WARNING: Can't remove old LV: Node is marked offline
  Mon Oct 26 05:05:53 2009       Hint: remove unused LVs manually
  Mon Oct 26 05:05:53 2009  - WARNING: Can't remove old LV: Node is marked offline
  Mon Oct 26 05:05:53 2009       Hint: remove unused LVs manually
  Mon Oct 26 05:05:53 2009 Replacing disk(s) 0 for instance4
  Mon Oct 26 05:05:53 2009 STEP 1/6 Check device existence
  Mon Oct 26 05:05:53 2009  - INFO: Checking disk/0 on node1
  Mon Oct 26 05:05:53 2009  - INFO: Checking volume groups
  Mon Oct 26 05:05:53 2009 STEP 2/6 Check peer consistency
  Mon Oct 26 05:05:53 2009  - INFO: Checking disk/0 consistency on node node1
  Mon Oct 26 05:05:54 2009 STEP 3/6 Allocate new storage
  Mon Oct 26 05:05:54 2009  - INFO: Adding new local storage on node2 for disk/0
  Mon Oct 26 05:05:54 2009 STEP 4/6 Changing drbd configuration
  Mon Oct 26 05:05:54 2009  - INFO: activating a new drbd on node2 for disk/0
  Mon Oct 26 05:05:55 2009  - INFO: Shutting down drbd for disk/0 on old node
  Mon Oct 26 05:05:55 2009  - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline
  Mon Oct 26 05:05:55 2009       Hint: Please cleanup this device manually as soon as possible
  Mon Oct 26 05:05:55 2009  - INFO: Detaching primary drbds from the network (=> standalone)
  Mon Oct 26 05:05:55 2009  - INFO: Updating instance configuration
  Mon Oct 26 05:05:55 2009  - INFO: Attaching primary drbds to new secondary (standalone => connected)
  Mon Oct 26 05:05:56 2009 STEP 5/6 Sync devices
  Mon Oct 26 05:05:56 2009  - INFO: Waiting for instance instance4 to sync disks.
Iustin Pop's avatar
Iustin Pop committed
554
  Mon Oct 26 05:05:56 2009  - INFO: - device disk/0: 12.40\% done, 8 estimated seconds remaining
Iustin Pop's avatar
Iustin Pop committed
555 556 557 558 559 560 561
  Mon Oct 26 05:06:04 2009  - INFO: Instance instance4's disks are in sync.
  Mon Oct 26 05:06:04 2009 STEP 6/6 Removing old storage
  Mon Oct 26 05:06:04 2009  - INFO: Remove logical volumes for 0
  Mon Oct 26 05:06:04 2009  - WARNING: Can't remove old LV: Node is marked offline
  Mon Oct 26 05:06:04 2009       Hint: remove unused LVs manually
  Mon Oct 26 05:06:04 2009  - WARNING: Can't remove old LV: Node is marked offline
  Mon Oct 26 05:06:04 2009       Hint: remove unused LVs manually
Iustin Pop's avatar
Iustin Pop committed
562
  $
Iustin Pop's avatar
Iustin Pop committed
563 564 565

And now node3 is completely free of instances and can be repaired::

Iustin Pop's avatar
Iustin Pop committed
566
  $ gnt-node list
Iustin Pop's avatar
Iustin Pop committed
567 568 569 570 571 572 573 574 575 576 577
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G 30.2G     3     1
  node2   1.3T  1.3T  32.0G  1.0G 30.4G     1     3
  node3      ?     ?      ?     ?     ?     0     0

Re-adding a node to the cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's say node3 has been repaired and is now ready to be
reused. Re-adding it is simple::

Iustin Pop's avatar
Iustin Pop committed
578
  $ gnt-node add --readd %node3%
579
  The authenticity of host 'node3 (198.51.100.1)' can't be established.
Iustin Pop's avatar
Iustin Pop committed
580 581 582 583 584
  RSA key fingerprint is 9f:2e:5a:2e:e0:bd:00:09:e4:5c:32:f2:27:57:7a:f4.
  Are you sure you want to continue connecting (yes/no)? yes
  Mon Oct 26 05:27:39 2009  - INFO: Readding a node, the offline/drained flags were reset
  Mon Oct 26 05:27:39 2009  - INFO: Node will be a master candidate

585
And it is now working again::
Iustin Pop's avatar
Iustin Pop committed
586

Iustin Pop's avatar
Iustin Pop committed
587
  $ gnt-node list
Iustin Pop's avatar
Iustin Pop committed
588 589 590 591 592
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G 30.2G     3     1
  node2   1.3T  1.3T  32.0G  1.0G 30.4G     1     3
  node3   1.3T  1.3T  32.0G  1.0G 30.4G     0     0

593
.. note:: If Ganeti has been built with the htools
594 595
   component enabled, you can shuffle the instances around to have a
   better use of the nodes.
Iustin Pop's avatar
Iustin Pop committed
596 597 598 599 600 601 602 603 604 605 606 607

Disk failures
+++++++++++++

A disk failure is simpler than a full node failure. First, a single disk
failure should not cause data-loss for any redundant instance; only the
performance of some instances might be reduced due to more network
traffic.

Let take the cluster status in the above listing, and check what volumes
are in use::

Iustin Pop's avatar
Iustin Pop committed
608
  $ gnt-node volumes -o phys,instance %node2%
Iustin Pop's avatar
Iustin Pop committed
609 610 611 612 613 614 615 616 617
  PhysDev   Instance
  /dev/sdb1 instance4
  /dev/sdb1 instance4
  /dev/sdb1 instance1
  /dev/sdb1 instance1
  /dev/sdb1 instance3
  /dev/sdb1 instance3
  /dev/sdb1 instance2
  /dev/sdb1 instance2
Iustin Pop's avatar
Iustin Pop committed
618
  $
Iustin Pop's avatar
Iustin Pop committed
619 620 621 622

You can see that all instances on node2 have logical volumes on
``/dev/sdb1``. Let's simulate a disk failure on that disk::

Iustin Pop's avatar
Iustin Pop committed
623 624 625 626
  $ ssh node2
  # on node2
  $ echo offline > /sys/block/sdb/device/state
  $ vgs
Iustin Pop's avatar
Iustin Pop committed
627 628 629 630 631 632 633 634 635 636
    /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
    /dev/sdb1: read failed after 0 of 4096 at 750153695232: Input/output error
    /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
    Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'.
    Couldn't find all physical volumes for volume group xenvg.
    /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
    /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
    Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'.
    Couldn't find all physical volumes for volume group xenvg.
    Volume group xenvg not found
Iustin Pop's avatar
Iustin Pop committed
637
  $
Iustin Pop's avatar
Iustin Pop committed
638 639 640 641

At this point, the node is broken and if we are to examine
instance2 we get (simplified output shown)::

Iustin Pop's avatar
Iustin Pop committed
642
  $ gnt-instance info %instance2%
Iustin Pop's avatar
Iustin Pop committed
643 644 645 646 647 648 649 650 651 652 653 654 655
  Instance name: instance2
  State: configured to be up, actual state is up
    Nodes:
      - primary: node1
      - secondaries: node2
    Disks:
      - disk/0: drbd8, size 256M
        on primary:   /dev/drbd0 (147:0) in sync, status ok
        on secondary: /dev/drbd1 (147:1) in sync, status *DEGRADED* *MISSING DISK*

This instance has a secondary only on node2. Let's verify a primary
instance of node2::

Iustin Pop's avatar
Iustin Pop committed
656
  $ gnt-instance info %instance1%
Iustin Pop's avatar
Iustin Pop committed
657 658 659 660 661 662 663 664 665
  Instance name: instance1
  State: configured to be up, actual state is up
    Nodes:
      - primary: node2
      - secondaries: node1
    Disks:
      - disk/0: drbd8, size 256M
        on primary:   /dev/drbd0 (147:0) in sync, status *DEGRADED* *MISSING DISK*
        on secondary: /dev/drbd3 (147:3) in sync, status ok
Iustin Pop's avatar
Iustin Pop committed
666
  $ gnt-instance console %instance1%
Iustin Pop's avatar
Iustin Pop committed
667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697

  Debian GNU/Linux 5.0 instance1 tty1

  instance1 login: root
  Last login: Tue Oct 27 01:24:09 UTC 2009 on tty1
  instance1:~# date > test
  instance1:~# sync
  instance1:~# cat test
  Tue Oct 27 01:25:20 UTC 2009
  instance1:~# dmesg|tail
  [5439785.235448] NET: Registered protocol family 15
  [5439785.235489] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
  [5439785.235495] All bugs added by David S. Miller <davem@redhat.com>
  [5439785.235517] XENBUS: Device with no driver: device/console/0
  [5439785.236576] kjournald starting.  Commit interval 5 seconds
  [5439785.236588] EXT3-fs: mounted filesystem with ordered data mode.
  [5439785.236625] VFS: Mounted root (ext3 filesystem) readonly.
  [5439785.236663] Freeing unused kernel memory: 172k freed
  [5439787.533779] EXT3 FS on sda1, internal journal
  [5440655.065431] eth0: no IPv6 routers present
  instance1:~#

As you can see, the instance is running fine and doesn't see any disk
issues. It is now time to fix node2 and re-establish redundancy for the
involved instances.

.. note:: For Ganeti 2.0 we need to fix manually the volume group on
   node2 by running ``vgreduce --removemissing xenvg``

::

Iustin Pop's avatar
Iustin Pop committed
698
  $ gnt-node repair-storage %node2% lvm-vg %xenvg%
Iustin Pop's avatar
Iustin Pop committed
699
  Mon Oct 26 18:14:03 2009 Repairing storage unit 'xenvg' on node2 ...
Iustin Pop's avatar
Iustin Pop committed
700 701 702 703
  $ ssh %node2% vgs
  VG    #PV #LV #SN Attr   VSize   VFree
  xenvg   1   8   0 wz--n- 673.84G 673.84G
  $
Iustin Pop's avatar
Iustin Pop committed
704 705 706 707 708

This has removed the 'bad' disk from the volume group, which is now left
with only one PV. We can now replace the disks for the involved
instances::

Iustin Pop's avatar
Iustin Pop committed
709
  $ for i in %instance{1..4}%; do gnt-instance replace-disks -a $i; done
Iustin Pop's avatar
Iustin Pop committed
710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725
  Mon Oct 26 18:15:38 2009 Replacing disk(s) 0 for instance1
  Mon Oct 26 18:15:38 2009 STEP 1/6 Check device existence
  Mon Oct 26 18:15:38 2009  - INFO: Checking disk/0 on node1
  Mon Oct 26 18:15:38 2009  - INFO: Checking disk/0 on node2
  Mon Oct 26 18:15:38 2009  - INFO: Checking volume groups
  Mon Oct 26 18:15:38 2009 STEP 2/6 Check peer consistency
  Mon Oct 26 18:15:38 2009  - INFO: Checking disk/0 consistency on node node1
  Mon Oct 26 18:15:39 2009 STEP 3/6 Allocate new storage
  Mon Oct 26 18:15:39 2009  - INFO: Adding storage on node2 for disk/0
  Mon Oct 26 18:15:39 2009 STEP 4/6 Changing drbd configuration
  Mon Oct 26 18:15:39 2009  - INFO: Detaching disk/0 drbd from local storage
  Mon Oct 26 18:15:40 2009  - INFO: Renaming the old LVs on the target node
  Mon Oct 26 18:15:40 2009  - INFO: Renaming the new LVs on the target node
  Mon Oct 26 18:15:40 2009  - INFO: Adding new mirror component on node2
  Mon Oct 26 18:15:41 2009 STEP 5/6 Sync devices
  Mon Oct 26 18:15:41 2009  - INFO: Waiting for instance instance1 to sync disks.
Iustin Pop's avatar
Iustin Pop committed
726
  Mon Oct 26 18:15:41 2009  - INFO: - device disk/0: 12.40\% done, 9 estimated seconds remaining
Iustin Pop's avatar
Iustin Pop committed
727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744
  Mon Oct 26 18:15:50 2009  - INFO: Instance instance1's disks are in sync.
  Mon Oct 26 18:15:50 2009 STEP 6/6 Removing old storage
  Mon Oct 26 18:15:50 2009  - INFO: Remove logical volumes for disk/0
  Mon Oct 26 18:15:52 2009 Replacing disk(s) 0 for instance2
  Mon Oct 26 18:15:52 2009 STEP 1/6 Check device existence

  Mon Oct 26 18:16:01 2009 STEP 6/6 Removing old storage
  Mon Oct 26 18:16:01 2009  - INFO: Remove logical volumes for disk/0
  Mon Oct 26 18:16:02 2009 Replacing disk(s) 0 for instance3
  Mon Oct 26 18:16:02 2009 STEP 1/6 Check device existence

  Mon Oct 26 18:16:09 2009 STEP 6/6 Removing old storage
  Mon Oct 26 18:16:09 2009  - INFO: Remove logical volumes for disk/0
  Mon Oct 26 18:16:10 2009 Replacing disk(s) 0 for instance4
  Mon Oct 26 18:16:10 2009 STEP 1/6 Check device existence

  Mon Oct 26 18:16:18 2009 STEP 6/6 Removing old storage
  Mon Oct 26 18:16:18 2009  - INFO: Remove logical volumes for disk/0
Iustin Pop's avatar
Iustin Pop committed
745
  $
Iustin Pop's avatar
Iustin Pop committed
746 747 748 749 750 751 752 753

As this point, all instances should be healthy again.

.. note:: Ganeti 2.0 doesn't have the ``-a`` option to replace-disks, so
   for it you have to run the loop twice, once over primary instances
   with argument ``-p`` and once secondary instances with argument
   ``-s``, but otherwise the operations are similar::

Iustin Pop's avatar
Iustin Pop committed
754
     $ gnt-instance replace-disks -p instance1
Iustin Pop's avatar
Iustin Pop committed
755

Iustin Pop's avatar
Iustin Pop committed
756
     $ for i in %instance{2..4}%; do gnt-instance replace-disks -s $i; done
Iustin Pop's avatar
Iustin Pop committed
757 758 759 760 761 762 763 764 765 766

Common cluster problems
-----------------------

There are a number of small issues that might appear on a cluster that
can be solved easily as long as the issue is properly identified. For
this exercise we will consider the case of node3, which was broken
previously and re-added to the cluster without reinstallation. Running
cluster verify on the cluster reports::

Iustin Pop's avatar
Iustin Pop committed
767
  $ gnt-cluster verify
Iustin Pop's avatar
Iustin Pop committed
768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783
  Mon Oct 26 18:30:08 2009 * Verifying global settings
  Mon Oct 26 18:30:08 2009 * Gathering data (3 nodes)
  Mon Oct 26 18:30:10 2009 * Verifying node status
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: unallocated drbd minor 0 is in use
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: unallocated drbd minor 1 is in use
  Mon Oct 26 18:30:10 2009 * Verifying instance status
  Mon Oct 26 18:30:10 2009   - ERROR: instance instance4: instance should not run on node node3
  Mon Oct 26 18:30:10 2009 * Verifying orphan volumes
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_data is unknown
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data is unknown
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta is unknown
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta is unknown
  Mon Oct 26 18:30:10 2009 * Verifying remaining instances
  Mon Oct 26 18:30:10 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 18:30:10 2009 * Other Notes
  Mon Oct 26 18:30:10 2009 * Hooks Results
Iustin Pop's avatar
Iustin Pop committed
784
  $
Iustin Pop's avatar
Iustin Pop committed
785 786 787 788 789 790

Instance status
+++++++++++++++

As you can see, *instance4* has a copy running on node3, because we
forced the failover when node3 failed. This case is dangerous as the
791
instance will have the same IP and MAC address, wreaking havoc on the
Iustin Pop's avatar
Iustin Pop committed
792 793 794 795 796
network environment and anyone who tries to use it.

Ganeti doesn't directly handle this case. It is recommended to logon to
node3 and run::

Iustin Pop's avatar
Iustin Pop committed
797
  $ xm destroy %instance4%
Iustin Pop's avatar
Iustin Pop committed
798 799 800 801 802 803 804

Unallocated DRBD minors
+++++++++++++++++++++++

There are still unallocated DRBD minors on node3. Again, these are not
handled by Ganeti directly and need to be cleaned up via DRBD commands::

Iustin Pop's avatar
Iustin Pop committed
805 806 807 808 809
  $ ssh %node3%
  # on node 3
  $ drbdsetup /dev/drbd%0% down
  $ drbdsetup /dev/drbd%1% down
  $
Iustin Pop's avatar
Iustin Pop committed
810 811 812 813 814 815 816 817 818

Orphan volumes
++++++++++++++

At this point, the only remaining problem should be the so-called
*orphan* volumes. This can happen also in the case of an aborted
disk-replace, or similar situation where Ganeti was not able to recover
automatically. Here you need to remove them manually via LVM commands::

Iustin Pop's avatar
Iustin Pop committed
819 820 821 822
  $ ssh %node3%
  # on node3
  $ lvremove %xenvg%
  Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data"? [y/n]: %y%
Iustin Pop's avatar
Iustin Pop committed
823
    Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data" successfully removed
Iustin Pop's avatar
Iustin Pop committed
824
  Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta"? [y/n]: %y%
Iustin Pop's avatar
Iustin Pop committed
825
    Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta" successfully removed
Iustin Pop's avatar
Iustin Pop committed
826
  Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data"? [y/n]: %y%
Iustin Pop's avatar
Iustin Pop committed
827
    Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data" successfully removed
Iustin Pop's avatar
Iustin Pop committed
828
  Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta"? [y/n]: %y%
Iustin Pop's avatar
Iustin Pop committed
829 830 831 832 833
    Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta" successfully removed
  node3#

At this point cluster verify shouldn't complain anymore::

Iustin Pop's avatar
Iustin Pop committed
834
  $ gnt-cluster verify
Iustin Pop's avatar
Iustin Pop committed
835 836 837 838 839 840 841 842 843
  Mon Oct 26 18:37:51 2009 * Verifying global settings
  Mon Oct 26 18:37:51 2009 * Gathering data (3 nodes)
  Mon Oct 26 18:37:53 2009 * Verifying node status
  Mon Oct 26 18:37:53 2009 * Verifying instance status
  Mon Oct 26 18:37:53 2009 * Verifying orphan volumes
  Mon Oct 26 18:37:53 2009 * Verifying remaining instances
  Mon Oct 26 18:37:53 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 18:37:53 2009 * Other Notes
  Mon Oct 26 18:37:53 2009 * Hooks Results
Iustin Pop's avatar
Iustin Pop committed
844
  $
Iustin Pop's avatar
Iustin Pop committed
845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860

N+1 errors
++++++++++

Since redundant instances in Ganeti have a primary/secondary model, it
is needed to leave aside on each node enough memory so that if one of
its peer node fails, all the secondary instances that have that node as
primary can be relocated. More specifically, if instance2 has node1 as
primary and node2 as secondary (and node1 and node2 do not have any
other instances in this layout), then it means that node2 must have
enough free memory so that if node1 fails, we can failover instance2
without any other operations (for reducing the downtime window). Let's
increase the memory of the current instances to 4G, and add three new
instances, two on node2:node3 with 8GB of RAM and one on node1:node2,
with 12GB of RAM (numbers chosen so that we run out of memory)::

Iustin Pop's avatar
Iustin Pop committed
861
  $ gnt-instance modify -B memory=%4G% %instance1%
Iustin Pop's avatar
Iustin Pop committed
862
  Modified instance instance1
863 864
   - be/maxmem -> 4096
   - be/minmem -> 4096
Iustin Pop's avatar
Iustin Pop committed
865
  Please don't forget that these parameters take effect only at the next start of the instance.
Iustin Pop's avatar
Iustin Pop committed
866
  $ gnt-instance modify …
Iustin Pop's avatar
Iustin Pop committed
867

Iustin Pop's avatar
Iustin Pop committed
868
  $ gnt-instance add -t drbd -n %node2%:%node3% -s %512m% -B memory=%8G% -o %debootstrap% %instance5%
Iustin Pop's avatar
Iustin Pop committed
869

Iustin Pop's avatar
Iustin Pop committed
870
  $ gnt-instance add -t drbd -n %node2%:%node3% -s %512m% -B memory=%8G% -o %debootstrap% %instance6%
Iustin Pop's avatar
Iustin Pop committed
871

Iustin Pop's avatar
Iustin Pop committed
872 873
  $ gnt-instance add -t drbd -n %node1%:%node2% -s %512m% -B memory=%8G% -o %debootstrap% %instance7%
  $ gnt-instance reboot --all
Iustin Pop's avatar
Iustin Pop committed
874 875 876 877 878 879 880 881 882 883
  The reboot will operate on 7 instances.
  Do you want to continue?
  Affected instances:
    instance1
    instance2
    instance3
    instance4
    instance5
    instance6
    instance7
Iustin Pop's avatar
Iustin Pop committed
884
  y/[n]/?: %y%
Iustin Pop's avatar
Iustin Pop committed
885 886 887 888 889 890 891 892
  Submitted jobs 677, 678, 679, 680, 681, 682, 683
  Waiting for job 677 for instance1...
  Waiting for job 678 for instance2...
  Waiting for job 679 for instance3...
  Waiting for job 680 for instance4...
  Waiting for job 681 for instance5...
  Waiting for job 682 for instance6...
  Waiting for job 683 for instance7...
Iustin Pop's avatar
Iustin Pop committed
893
  $
Iustin Pop's avatar
Iustin Pop committed
894

Iustin Pop's avatar
Iustin Pop committed
895
We rebooted the instances for the memory changes to have effect. Now the
Iustin Pop's avatar
Iustin Pop committed
896 897
cluster looks like::

Iustin Pop's avatar
Iustin Pop committed
898
  $ gnt-node list
Iustin Pop's avatar
Iustin Pop committed
899 900 901 902
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G  6.5G     4     1
  node2   1.3T  1.3T  32.0G  1.0G 10.5G     3     4
  node3   1.3T  1.3T  32.0G  1.0G 30.5G     0     2
Iustin Pop's avatar
Iustin Pop committed
903
  $ gnt-cluster verify
Iustin Pop's avatar
Iustin Pop committed
904 905 906 907 908 909 910
  Mon Oct 26 18:59:36 2009 * Verifying global settings
  Mon Oct 26 18:59:36 2009 * Gathering data (3 nodes)
  Mon Oct 26 18:59:37 2009 * Verifying node status
  Mon Oct 26 18:59:37 2009 * Verifying instance status
  Mon Oct 26 18:59:37 2009 * Verifying orphan volumes
  Mon Oct 26 18:59:37 2009 * Verifying remaining instances
  Mon Oct 26 18:59:37 2009 * Verifying N+1 Memory redundancy
911
  Mon Oct 26 18:59:37 2009   - ERROR: node node2: not enough memory to accommodate instance failovers should node node1 fail
Iustin Pop's avatar
Iustin Pop committed
912 913
  Mon Oct 26 18:59:37 2009 * Other Notes
  Mon Oct 26 18:59:37 2009 * Hooks Results
Iustin Pop's avatar
Iustin Pop committed
914
  $
Iustin Pop's avatar
Iustin Pop committed
915 916 917 918 919 920 921

The cluster verify error above shows that if node1 fails, node2 will not
have enough memory to failover all primary instances on node1 to it. To
solve this, you have a number of options:

- try to manually move instances around (but this can become complicated
  for any non-trivial cluster)
922 923 924 925 926 927 928 929 930
- try to reduce the minimum memory of some instances on the source node
  of the N+1 failure (in the example above ``node1``): this will allow
  it to start and be failed over/migrated with less than its maximum
  memory
- try to reduce the runtime/maximum memory of some instances on the
  destination node of the N+1 failure (in the example above ``node2``)
  to create additional available node memory (check the :doc:`admin`
  guide for what Ganeti will and won't automatically do in regards to
  instance runtime memory modification)
931 932 933
- if Ganeti has been built with the htools package enabled, you can run
  the ``hbal`` tool which will try to compute an automated cluster
  solution that complies with the N+1 rule
Iustin Pop's avatar
Iustin Pop committed
934 935 936 937 938 939 940 941

Network issues
++++++++++++++

In case a node has problems with the network (usually the secondary
network, as problems with the primary network will render the node
unusable for ganeti commands), it will show up in cluster verify as::

Iustin Pop's avatar
Iustin Pop committed
942
  $ gnt-cluster verify
Iustin Pop's avatar
Iustin Pop committed
943 944 945 946 947 948 949 950 951 952 953 954 955 956
  Mon Oct 26 19:07:19 2009 * Verifying global settings
  Mon Oct 26 19:07:19 2009 * Gathering data (3 nodes)
  Mon Oct 26 19:07:23 2009 * Verifying node status
  Mon Oct 26 19:07:23 2009   - ERROR: node node1: tcp communication with node 'node3': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009   - ERROR: node node2: tcp communication with node 'node3': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009   - ERROR: node node3: tcp communication with node 'node1': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009   - ERROR: node node3: tcp communication with node 'node2': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009   - ERROR: node node3: tcp communication with node 'node3': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009 * Verifying instance status
  Mon Oct 26 19:07:23 2009 * Verifying orphan volumes
  Mon Oct 26 19:07:23 2009 * Verifying remaining instances
  Mon Oct 26 19:07:23 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 19:07:23 2009 * Other Notes
  Mon Oct 26 19:07:23 2009 * Hooks Results
Iustin Pop's avatar
Iustin Pop committed
957
  $
Iustin Pop's avatar
Iustin Pop committed
958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979

This shows that both node1 and node2 have problems contacting node3 over
the secondary network, and node3 has problems contacting them. From this
output is can be deduced that since node1 and node2 can communicate
between themselves, node3 is the one having problems, and you need to
investigate its network settings/connection.

Migration problems
++++++++++++++++++

Since live migration can sometimes fail and leave the instance in an
inconsistent state, Ganeti provides a ``--cleanup`` argument to the
migrate command that does:

- check on which node the instance is actually running (has the
  command failed before or after the actual migration?)
- reconfigure the DRBD disks accordingly

It is always safe to run this command as long as the instance has good
data on its primary node (i.e. not showing as degraded). If so, you can
simply run::

Iustin Pop's avatar
Iustin Pop committed
980
  $ gnt-instance migrate --cleanup %instance1%
Iustin Pop's avatar
Iustin Pop committed
981 982 983 984
  Instance instance1 will be recovered from a failed migration. Note
  that the migration procedure (including cleanup) is **experimental**
  in this version. This might impact the instance if anything goes
  wrong. Continue?
Iustin Pop's avatar
Iustin Pop committed
985
  y/[n]/?: %y%
Iustin Pop's avatar
Iustin Pop committed
986 987 988 989 990 991 992 993 994
  Mon Oct 26 19:13:49 2009 Migrating instance instance1
  Mon Oct 26 19:13:49 2009 * checking where the instance actually runs (if this hangs, the hypervisor might be in a bad state)
  Mon Oct 26 19:13:49 2009 * instance confirmed to be running on its primary node (node2)
  Mon Oct 26 19:13:49 2009 * switching node node1 to secondary mode
  Mon Oct 26 19:13:50 2009 * wait until resync is done
  Mon Oct 26 19:13:50 2009 * changing into standalone mode
  Mon Oct 26 19:13:50 2009 * changing disks into single-master mode
  Mon Oct 26 19:13:50 2009 * wait until resync is done
  Mon Oct 26 19:13:51 2009 * done
Iustin Pop's avatar
Iustin Pop committed
995
  $
Iustin Pop's avatar
Iustin Pop committed
996 997 998 999 1000 1001 1002

In use disks at instance shutdown
+++++++++++++++++++++++++++++++++

If you see something like the following when trying to shutdown or
deactivate disks for an instance::

Iustin Pop's avatar
Iustin Pop committed
1003
  $ gnt-instance shutdown %instance1%
Iustin Pop's avatar
Iustin Pop committed
1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022
  Mon Oct 26 19:16:23 2009  - WARNING: Could not shutdown block device disk/0 on node node2: drbd0: can't shutdown drbd device: /dev/drbd0: State change failed: (-12) Device is held open by someone\n

It most likely means something is holding open the underlying DRBD
device. This can be bad if the instance is not running, as it might mean
that there was concurrent access from both the node and the instance to
the disks, but not always (e.g. you could only have had the partitions
activated via ``kpartx``).

To troubleshoot this issue you need to follow standard Linux practices,
and pay attention to the hypervisor being used:

- check if (in the above example) ``/dev/drbd0`` on node2 is being
  mounted somewhere (``cat /proc/mounts``)
- check if the device is not being used by device mapper itself:
  ``dmsetup ls`` and look for entries of the form ``drbd0pX``, and if so
  remove them with either ``kpartx -d`` or ``dmsetup remove``

For Xen, check if it's not using the disks itself::

Iustin Pop's avatar
Iustin Pop committed
1023
  $ xenstore-ls /local/domain/%0%/backend/vbd|grep -e "domain =" -e physical-device
Iustin Pop's avatar
Iustin Pop committed
1024 1025 1026 1027 1028 1029
  domain = "instance2"
  physical-device = "93:0"
  domain = "instance3"
  physical-device = "93:1"
  domain = "instance4"
  physical-device = "93:2"
Iustin Pop's avatar
Iustin Pop committed
1030
  $
Iustin Pop's avatar
Iustin Pop committed
1031 1032 1033

You can see in the above output that the node exports three disks, to
three instances. The ``physical-device`` key is in major:minor format in
Iustin Pop's avatar
Iustin Pop committed
1034 1035 1036
hexadecimal, and ``0x93`` represents DRBD's major number. Thus we can
see from the above that instance2 has /dev/drbd0, instance3 /dev/drbd1,
and instance4 /dev/drbd2.
Iustin Pop's avatar
Iustin Pop committed
1037

1038 1039 1040 1041 1042 1043 1044
LUXI version mismatch
+++++++++++++++++++++

LUXI is the protocol used for communication between clients and the
master daemon. Starting in Ganeti 2.3, the peers exchange their version
in each message. When they don't match, an error is raised::

Iustin Pop's avatar
Iustin Pop committed
1045
  $ gnt-node modify -O yes %node3%
1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056
  Unhandled Ganeti error: LUXI version mismatch, server 2020000, request 2030000

Usually this means that server and client are from different Ganeti
versions or import their libraries from different, consistent paths
(e.g. an older version installed in another place). You can print the
import path for Ganeti's modules using the following command (note that
depending on your setup you might have to use an explicit version in the
Python command, e.g. ``python2.6``)::

  python -c 'import ganeti; print ganeti.__file__'

Iustin Pop's avatar
Iustin Pop committed
1057 1058 1059 1060 1061
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End: