walkthrough.rst 41.6 KB
Newer Older
Iustin Pop's avatar
Iustin Pop committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Ganeti walk-through
===================

Documents Ganeti version |version|

.. contents::

.. highlight:: text

Introduction
------------

This document serves as a more example-oriented guide to Ganeti; while
the administration guide shows a conceptual approach, here you will find
a step-by-step example to managing instances and the cluster.

Our simulated, example cluster will have three machines, named
``node1``, ``node2``, ``node3``. Note that in real life machines will
usually FQDNs but here we use short names for brevity. We will use a
20
secondary network for replication data, ``192.0.2.0/24``, with nodes
Iustin Pop's avatar
Iustin Pop committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
having the last octet the same as their index. The cluster name will be
``example-cluster``. All nodes have the same simulated hardware
configuration, two disks of 750GB, 32GB of memory and 4 CPUs.

On this cluster, we will create up to seven instances, named
``instance1`` to ``instance7``.


Cluster creation
----------------

Follow the :doc:`install` document and prepare the nodes. Then it's time
to initialise the cluster::

35
  node1# gnt-cluster init -s 192.0.2.1 --enabled-hypervisors=xen-pvm example-cluster
Iustin Pop's avatar
Iustin Pop committed
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
  node1#

The creation was fine. Let's check that one node we have is functioning
correctly::

  node1# gnt-node list
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  node1# gnt-cluster verify
  Mon Oct 26 02:08:51 2009 * Verifying global settings
  Mon Oct 26 02:08:51 2009 * Gathering data (1 nodes)
  Mon Oct 26 02:08:52 2009 * Verifying node status
  Mon Oct 26 02:08:52 2009 * Verifying instance status
  Mon Oct 26 02:08:52 2009 * Verifying orphan volumes
  Mon Oct 26 02:08:52 2009 * Verifying remaining instances
  Mon Oct 26 02:08:52 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 02:08:52 2009 * Other Notes
  Mon Oct 26 02:08:52 2009 * Hooks Results
  node1#

Since this proceeded correctly, let's add the other two nodes::

58
  node1# gnt-node add -s 192.0.2.2 node2
Iustin Pop's avatar
Iustin Pop committed
59
60
61
62
63
  -- WARNING --
  Performing this operation is going to replace the ssh daemon keypair
  on the target machine (node2) with the ones of the current one
  and grant full intra-cluster ssh root access to/from it

64
  The authenticity of host 'node2 (192.0.2.2)' can't be established.
Iustin Pop's avatar
Iustin Pop committed
65
66
67
68
  RSA key fingerprint is 9f:…
  Are you sure you want to continue connecting (yes/no)? yes
  root@node2's password:
  Mon Oct 26 02:11:54 2009  - INFO: Node will be a master candidate
69
  node1# gnt-node add -s 192.0.2.3 node3
Iustin Pop's avatar
Iustin Pop committed
70
71
72
73
74
  -- WARNING --
  Performing this operation is going to replace the ssh daemon keypair
  on the target machine (node2) with the ones of the current one
  and grant full intra-cluster ssh root access to/from it

75
  The authenticity of host 'node3 (192.0.2.3)' can't be established.
Iustin Pop's avatar
Iustin Pop committed
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
  RSA key fingerprint is 9f:…
  Are you sure you want to continue connecting (yes/no)? yes
  root@node2's password:
  Mon Oct 26 02:11:54 2009  - INFO: Node will be a master candidate

Checking the cluster status again::

  node1# gnt-node list
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  node2   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  node3   1.3T  1.3T  32.0G  1.0G 30.5G     0     0
  node1# gnt-cluster verify
  Mon Oct 26 02:15:14 2009 * Verifying global settings
  Mon Oct 26 02:15:14 2009 * Gathering data (3 nodes)
  Mon Oct 26 02:15:16 2009 * Verifying node status
  Mon Oct 26 02:15:16 2009 * Verifying instance status
  Mon Oct 26 02:15:16 2009 * Verifying orphan volumes
  Mon Oct 26 02:15:16 2009 * Verifying remaining instances
  Mon Oct 26 02:15:16 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 02:15:16 2009 * Other Notes
  Mon Oct 26 02:15:16 2009 * Hooks Results
  node1#

And let's check that we have a valid OS::

  node1# gnt-os list
  Name
  debootstrap
  node1#

Running a burnin
----------------

Now that the cluster is created, it is time to check that the hardware
works correctly, that the hypervisor can actually create instances,
etc. This is done via the debootstrap tool as described in the admin
guide. Similar output lines are replaced with ``…`` in the below log::

  node1# /usr/lib/ganeti/tools/burnin -o debootstrap -p instance{1..5}
  - Testing global parameters
  - Creating instances
    * instance instance1
      on node1, node2
    * instance instance2
      on node2, node3

    * instance instance5
      on node2, node3
    * Submitted job ID(s) 157, 158, 159, 160, 161
      waiting for job 157 for instance1

      waiting for job 161 for instance5
  - Replacing disks on the same nodes
    * instance instance1
      run replace_on_secondary
      run replace_on_primary

    * instance instance5
      run replace_on_secondary
      run replace_on_primary
    * Submitted job ID(s) 162, 163, 164, 165, 166
      waiting for job 162 for instance1

  - Changing the secondary node
    * instance instance1
      run replace_new_secondary node3
    * instance instance2
      run replace_new_secondary node1

    * instance instance5
      run replace_new_secondary node1
    * Submitted job ID(s) 167, 168, 169, 170, 171
      waiting for job 167 for instance1

  - Growing disks
    * instance instance1
      increase disk/0 by 128 MB

    * instance instance5
      increase disk/0 by 128 MB
    * Submitted job ID(s) 173, 174, 175, 176, 177
      waiting for job 173 for instance1

  - Failing over instances
    * instance instance1

    * instance instance5
    * Submitted job ID(s) 179, 180, 181, 182, 183
      waiting for job 179 for instance1

  - Migrating instances
    * instance instance1
      migration and migration cleanup

    * instance instance5
      migration and migration cleanup
    * Submitted job ID(s) 184, 185, 186, 187, 188
      waiting for job 184 for instance1

  - Exporting and re-importing instances
    * instance instance1
      export to node node3
      remove instance
      import from node3 to node1, node2
      remove export

    * instance instance5
      export to node node1
      remove instance
      import from node1 to node2, node3
      remove export
    * Submitted job ID(s) 196, 197, 198, 199, 200
      waiting for job 196 for instance1

  - Reinstalling instances
    * instance instance1
      reinstall without passing the OS
      reinstall specifying the OS

    * instance instance5
      reinstall without passing the OS
      reinstall specifying the OS
    * Submitted job ID(s) 203, 204, 205, 206, 207
      waiting for job 203 for instance1

  - Rebooting instances
    * instance instance1
      reboot with type 'hard'
      reboot with type 'soft'
      reboot with type 'full'

    * instance instance5
      reboot with type 'hard'
      reboot with type 'soft'
      reboot with type 'full'
    * Submitted job ID(s) 208, 209, 210, 211, 212
      waiting for job 208 for instance1

  - Adding and removing disks
    * instance instance1
      adding a disk
      removing last disk

    * instance instance5
      adding a disk
      removing last disk
    * Submitted job ID(s) 213, 214, 215, 216, 217
      waiting for job 213 for instance1

  - Adding and removing NICs
    * instance instance1
      adding a NIC
      removing last NIC

    * instance instance5
      adding a NIC
      removing last NIC
    * Submitted job ID(s) 218, 219, 220, 221, 222
      waiting for job 218 for instance1

  - Activating/deactivating disks
    * instance instance1
      activate disks when online
      activate disks when offline
      deactivate disks (when offline)

    * instance instance5
      activate disks when online
      activate disks when offline
      deactivate disks (when offline)
    * Submitted job ID(s) 223, 224, 225, 226, 227
      waiting for job 223 for instance1

  - Stopping and starting instances
    * instance instance1

    * instance instance5
    * Submitted job ID(s) 230, 231, 232, 233, 234
      waiting for job 230 for instance1

  - Removing instances
    * instance instance1

    * instance instance5
    * Submitted job ID(s) 235, 236, 237, 238, 239
      waiting for job 235 for instance1

  node1#

You can see in the above what operations the burnin does. Ideally, the
burnin log would proceed successfully through all the steps and end
cleanly, without throwing errors.

Instance operations
-------------------

Creation
++++++++

At this point, Ganeti and the hardware seems to be functioning
correctly, so we'll follow up with creating the instances manually::

  node1# gnt-instance add -t drbd -o debootstrap -s 256m -n node1:node2 instance3
  Mon Oct 26 04:06:52 2009  - INFO: Selected nodes for instance instance1 via iallocator hail: node2, node3
  Mon Oct 26 04:06:53 2009 * creating instance disks...
  Mon Oct 26 04:06:57 2009 adding instance instance1 to cluster config
  Mon Oct 26 04:06:57 2009  - INFO: Waiting for instance instance1 to sync disks.
  Mon Oct 26 04:06:57 2009  - INFO: - device disk/0: 20.00% done, 4 estimated seconds remaining
  Mon Oct 26 04:07:01 2009  - INFO: Instance instance1's disks are in sync.
  Mon Oct 26 04:07:01 2009 creating os for instance instance1 on node node2
  Mon Oct 26 04:07:01 2009 * running the instance OS create scripts...
  Mon Oct 26 04:07:14 2009 * starting instance...
  node1# gnt-instance add -t drbd -o debootstrap -s 256m -n node1:node2 instanc<drbd -o debootstrap -s 256m -n node1:node2 instance2
  Mon Oct 26 04:11:37 2009 * creating instance disks...
  Mon Oct 26 04:11:40 2009 adding instance instance2 to cluster config
  Mon Oct 26 04:11:41 2009  - INFO: Waiting for instance instance2 to sync disks.
  Mon Oct 26 04:11:41 2009  - INFO: - device disk/0: 35.40% done, 1 estimated seconds remaining
  Mon Oct 26 04:11:42 2009  - INFO: - device disk/0: 58.50% done, 1 estimated seconds remaining
  Mon Oct 26 04:11:43 2009  - INFO: - device disk/0: 86.20% done, 0 estimated seconds remaining
  Mon Oct 26 04:11:44 2009  - INFO: - device disk/0: 92.40% done, 0 estimated seconds remaining
  Mon Oct 26 04:11:44 2009  - INFO: - device disk/0: 97.00% done, 0 estimated seconds remaining
  Mon Oct 26 04:11:44 2009  - INFO: Instance instance2's disks are in sync.
  Mon Oct 26 04:11:44 2009 creating os for instance instance2 on node node1
  Mon Oct 26 04:11:44 2009 * running the instance OS create scripts...
  Mon Oct 26 04:11:57 2009 * starting instance...
  node1#

The above shows one instance created via an iallocator script, and one
being created with manual node assignment. The other three instances
were also created and now it's time to check them::

  node1# gnt-instance list
  Instance  Hypervisor OS          Primary_node Status  Memory
  instance1 xen-pvm    debootstrap node2        running   128M
  instance2 xen-pvm    debootstrap node1        running   128M
  instance3 xen-pvm    debootstrap node1        running   128M
  instance4 xen-pvm    debootstrap node3        running   128M
  instance5 xen-pvm    debootstrap node2        running   128M

Accessing instances
+++++++++++++++++++

Accessing an instance's console is easy::

  node1# gnt-instance console instance2
  [    0.000000] Bootdata ok (command line is root=/dev/sda1 ro)
  [    0.000000] Linux version 2.6…
  [    0.000000] BIOS-provided physical RAM map:
  [    0.000000]  Xen: 0000000000000000 - 0000000008800000 (usable)
  [13138176.018071] Built 1 zonelists.  Total pages: 34816
  [13138176.018074] Kernel command line: root=/dev/sda1 ro
  [13138176.018694] Initializing CPU#0

  Checking file systems...fsck 1.41.3 (12-Oct-2008)
  done.
  Setting kernel variables (/etc/sysctl.conf)...done.
  Mounting local filesystems...done.
  Activating swapfile swap...done.
  Setting up networking....
  Configuring network interfaces...done.
  Setting console screen modes and fonts.
  INIT: Entering runlevel: 2
  Starting enhanced syslogd: rsyslogd.
  Starting periodic command scheduler: crond.

  Debian GNU/Linux 5.0 instance2 tty1

  instance2 login:

At this moment you can login to the instance and, after configuring the
network (and doing this on all instances), we can check their
connectivity::

  node1# fping instance{1..5}
  instance1 is alive
  instance2 is alive
  instance3 is alive
  instance4 is alive
  instance5 is alive
  node1#

Removal
+++++++

Removing unwanted instances is also easy::

  node1# gnt-instance remove instance5
  This will remove the volumes of the instance instance5 (including
  mirrors), thus removing all the data of the instance. Continue?
  y/[n]/?: y
  node1#


Recovering from hardware failures
---------------------------------

Recovering from node failure
++++++++++++++++++++++++++++

We are now left with four instances. Assume that at this point, node3,
which has one primary and one secondary instance, crashes::

  node1# gnt-node info node3
  Node name: node3
381
382
    primary ip: 198.51.100.1
    secondary ip: 192.0.2.3
Iustin Pop's avatar
Iustin Pop committed
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
    master candidate: True
    drained: False
    offline: False
    primary for instances:
      - instance4
    secondary for instances:
      - instance1
  node1# fping node3
  node3 is unreachable

At this point, the primary instance of that node (instance4) is down,
but the secondary instance (instance1) is not affected except it has
lost disk redundancy::

  node1# fping instance{1,4}
  instance1 is alive
  instance4 is unreachable
  node1#

If we try to check the status of instance4 via the instance info
command, it fails because it tries to contact node3 which is down::

  node1# gnt-instance info instance4
  Failure: command execution error:
  Error checking node node3: Connection failed (113: No route to host)
  node1#

So we need to mark node3 as being *offline*, and thus Ganeti won't talk
to it anymore::

  node1# gnt-node modify -O yes -f node3
  Mon Oct 26 04:34:12 2009  - WARNING: Not enough master candidates (desired 10, new value will be 2)
  Mon Oct 26 04:34:15 2009  - WARNING: Communication failure to node node3: Connection failed (113: No route to host)
  Modified node node3
   - offline -> True
   - master_candidate -> auto-demotion due to offline
  node1#

And now we can failover the instance::

  node1# gnt-instance failover --ignore-consistency instance4
  Failover will happen to image instance4. This requires a shutdown of
  the instance. Continue?
  y/[n]/?: y
  Mon Oct 26 04:35:34 2009 * checking disk consistency between source and target
  Failure: command execution error:
  Disk disk/0 is degraded on target node, aborting failover.
  node1# gnt-instance failover --ignore-consistency instance4
  Failover will happen to image instance4. This requires a shutdown of
  the instance. Continue?
  y/[n]/?: y
  Mon Oct 26 04:35:47 2009 * checking disk consistency between source and target
  Mon Oct 26 04:35:47 2009 * shutting down instance on source node
  Mon Oct 26 04:35:47 2009  - WARNING: Could not shutdown instance instance4 on node node3. Proceeding anyway. Please make sure node node3 is down. Error details: Node is marked offline
  Mon Oct 26 04:35:47 2009 * deactivating the instance's disks on source node
  Mon Oct 26 04:35:47 2009  - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline
  Mon Oct 26 04:35:47 2009 * activating the instance's disks on target node
  Mon Oct 26 04:35:47 2009  - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline
  Mon Oct 26 04:35:48 2009 * starting the instance on the target node
  node1#

Note in our first attempt, Ganeti refused to do the failover since it
wasn't sure what is the status of the instance's disks. We pass the
``--ignore-consistency`` flag and then we can failover::

  node1# gnt-instance list
  Instance  Hypervisor OS          Primary_node Status  Memory
  instance1 xen-pvm    debootstrap node2        running   128M
  instance2 xen-pvm    debootstrap node1        running   128M
  instance3 xen-pvm    debootstrap node1        running   128M
  instance4 xen-pvm    debootstrap node1        running   128M
  node1#

But at this point, both instance1 and instance4 are without disk
redundancy::

  node1# gnt-instance info instance1
  Instance name: instance1
  UUID: 45173e82-d1fa-417c-8758-7d582ab7eef4
  Serial number: 2
  Creation time: 2009-10-26 04:06:57
  Modification time: 2009-10-26 04:07:14
  State: configured to be up, actual state is up
    Nodes:
      - primary: node2
      - secondaries: node3
    Operating system: debootstrap
    Allocated network port: None
    Hypervisor: xen-pvm
      - root_path: default (/dev/sda1)
      - kernel_args: default (ro)
      - use_bootloader: default (False)
      - bootloader_args: default ()
      - bootloader_path: default ()
      - kernel_path: default (/boot/vmlinuz-2.6-xenU)
      - initrd_path: default ()
    Hardware:
      - VCPUs: 1
      - memory: 128MiB
      - NICs:
        - nic/0: MAC: aa:00:00:78:da:63, IP: None, mode: bridged, link: xen-br0
    Disks:
      - disk/0: drbd8, size 256M
        access mode: rw
        nodeA:       node2, minor=0
        nodeB:       node3, minor=0
        port:        11035
        auth key:    8e950e3cec6854b0181fbc3a6058657701f2d458
        on primary:  /dev/drbd0 (147:0) in sync, status *DEGRADED*
        child devices:
          - child 0: lvm, size 256M
            logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data
            on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_data (254:0)
          - child 1: lvm, size 128M
            logical_id: xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta
            on primary: /dev/xenvg/22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta (254:1)

The output is similar for instance4. In order to recover this, we need
to run the node evacuate command which will change from the current
secondary node to a new one (in this case, we only have two working
nodes, so all instances will be end on nodes one and two)::

  node1# gnt-node evacuate -I hail node3
  Relocate instance(s) 'instance1','instance4' from node
   node3 using iallocator hail?
  y/[n]/?: y
  Mon Oct 26 05:05:39 2009  - INFO: Selected new secondary for instance 'instance1': node1
  Mon Oct 26 05:05:40 2009  - INFO: Selected new secondary for instance 'instance4': node2
  Mon Oct 26 05:05:40 2009 Replacing disk(s) 0 for instance1
  Mon Oct 26 05:05:40 2009 STEP 1/6 Check device existence
  Mon Oct 26 05:05:40 2009  - INFO: Checking disk/0 on node2
  Mon Oct 26 05:05:40 2009  - INFO: Checking volume groups
  Mon Oct 26 05:05:40 2009 STEP 2/6 Check peer consistency
  Mon Oct 26 05:05:40 2009  - INFO: Checking disk/0 consistency on node node2
  Mon Oct 26 05:05:40 2009 STEP 3/6 Allocate new storage
  Mon Oct 26 05:05:40 2009  - INFO: Adding new local storage on node1 for disk/0
  Mon Oct 26 05:05:41 2009 STEP 4/6 Changing drbd configuration
  Mon Oct 26 05:05:41 2009  - INFO: activating a new drbd on node1 for disk/0
  Mon Oct 26 05:05:42 2009  - INFO: Shutting down drbd for disk/0 on old node
  Mon Oct 26 05:05:42 2009  - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline
  Mon Oct 26 05:05:42 2009       Hint: Please cleanup this device manually as soon as possible
  Mon Oct 26 05:05:42 2009  - INFO: Detaching primary drbds from the network (=> standalone)
  Mon Oct 26 05:05:42 2009  - INFO: Updating instance configuration
  Mon Oct 26 05:05:45 2009  - INFO: Attaching primary drbds to new secondary (standalone => connected)
  Mon Oct 26 05:05:46 2009 STEP 5/6 Sync devices
  Mon Oct 26 05:05:46 2009  - INFO: Waiting for instance instance1 to sync disks.
  Mon Oct 26 05:05:46 2009  - INFO: - device disk/0: 13.90% done, 7 estimated seconds remaining
  Mon Oct 26 05:05:53 2009  - INFO: Instance instance1's disks are in sync.
  Mon Oct 26 05:05:53 2009 STEP 6/6 Removing old storage
  Mon Oct 26 05:05:53 2009  - INFO: Remove logical volumes for 0
  Mon Oct 26 05:05:53 2009  - WARNING: Can't remove old LV: Node is marked offline
  Mon Oct 26 05:05:53 2009       Hint: remove unused LVs manually
  Mon Oct 26 05:05:53 2009  - WARNING: Can't remove old LV: Node is marked offline
  Mon Oct 26 05:05:53 2009       Hint: remove unused LVs manually
  Mon Oct 26 05:05:53 2009 Replacing disk(s) 0 for instance4
  Mon Oct 26 05:05:53 2009 STEP 1/6 Check device existence
  Mon Oct 26 05:05:53 2009  - INFO: Checking disk/0 on node1
  Mon Oct 26 05:05:53 2009  - INFO: Checking volume groups
  Mon Oct 26 05:05:53 2009 STEP 2/6 Check peer consistency
  Mon Oct 26 05:05:53 2009  - INFO: Checking disk/0 consistency on node node1
  Mon Oct 26 05:05:54 2009 STEP 3/6 Allocate new storage
  Mon Oct 26 05:05:54 2009  - INFO: Adding new local storage on node2 for disk/0
  Mon Oct 26 05:05:54 2009 STEP 4/6 Changing drbd configuration
  Mon Oct 26 05:05:54 2009  - INFO: activating a new drbd on node2 for disk/0
  Mon Oct 26 05:05:55 2009  - INFO: Shutting down drbd for disk/0 on old node
  Mon Oct 26 05:05:55 2009  - WARNING: Failed to shutdown drbd for disk/0 on oldnode: Node is marked offline
  Mon Oct 26 05:05:55 2009       Hint: Please cleanup this device manually as soon as possible
  Mon Oct 26 05:05:55 2009  - INFO: Detaching primary drbds from the network (=> standalone)
  Mon Oct 26 05:05:55 2009  - INFO: Updating instance configuration
  Mon Oct 26 05:05:55 2009  - INFO: Attaching primary drbds to new secondary (standalone => connected)
  Mon Oct 26 05:05:56 2009 STEP 5/6 Sync devices
  Mon Oct 26 05:05:56 2009  - INFO: Waiting for instance instance4 to sync disks.
  Mon Oct 26 05:05:56 2009  - INFO: - device disk/0: 12.40% done, 8 estimated seconds remaining
  Mon Oct 26 05:06:04 2009  - INFO: Instance instance4's disks are in sync.
  Mon Oct 26 05:06:04 2009 STEP 6/6 Removing old storage
  Mon Oct 26 05:06:04 2009  - INFO: Remove logical volumes for 0
  Mon Oct 26 05:06:04 2009  - WARNING: Can't remove old LV: Node is marked offline
  Mon Oct 26 05:06:04 2009       Hint: remove unused LVs manually
  Mon Oct 26 05:06:04 2009  - WARNING: Can't remove old LV: Node is marked offline
  Mon Oct 26 05:06:04 2009       Hint: remove unused LVs manually
  node1#

And now node3 is completely free of instances and can be repaired::

  node1# gnt-node list
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G 30.2G     3     1
  node2   1.3T  1.3T  32.0G  1.0G 30.4G     1     3
  node3      ?     ?      ?     ?     ?     0     0

Re-adding a node to the cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Let's say node3 has been repaired and is now ready to be
reused. Re-adding it is simple::

  node1# gnt-node add --readd node3
581
  The authenticity of host 'node3 (198.51.100.1)' can't be established.
Iustin Pop's avatar
Iustin Pop committed
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
  RSA key fingerprint is 9f:2e:5a:2e:e0:bd:00:09:e4:5c:32:f2:27:57:7a:f4.
  Are you sure you want to continue connecting (yes/no)? yes
  Mon Oct 26 05:27:39 2009  - INFO: Readding a node, the offline/drained flags were reset
  Mon Oct 26 05:27:39 2009  - INFO: Node will be a master candidate

And is now working again::

  node1# gnt-node list
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G 30.2G     3     1
  node2   1.3T  1.3T  32.0G  1.0G 30.4G     1     3
  node3   1.3T  1.3T  32.0G  1.0G 30.4G     0     0

.. note:: If you have the ganeti-htools package installed, you can
   shuffle the instances around to have a better use of the nodes.

Disk failures
+++++++++++++

A disk failure is simpler than a full node failure. First, a single disk
failure should not cause data-loss for any redundant instance; only the
performance of some instances might be reduced due to more network
traffic.

Let take the cluster status in the above listing, and check what volumes
are in use::

  node1# gnt-node volumes -o phys,instance node2
  PhysDev   Instance
  /dev/sdb1 instance4
  /dev/sdb1 instance4
  /dev/sdb1 instance1
  /dev/sdb1 instance1
  /dev/sdb1 instance3
  /dev/sdb1 instance3
  /dev/sdb1 instance2
  /dev/sdb1 instance2
  node1#

You can see that all instances on node2 have logical volumes on
``/dev/sdb1``. Let's simulate a disk failure on that disk::

  node1# ssh node2
  node2# echo offline > /sys/block/sdb/device/state
  node2# vgs
    /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
    /dev/sdb1: read failed after 0 of 4096 at 750153695232: Input/output error
    /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
    Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'.
    Couldn't find all physical volumes for volume group xenvg.
    /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
    /dev/sdb1: read failed after 0 of 4096 at 0: Input/output error
    Couldn't find device with uuid '954bJA-mNL0-7ydj-sdpW-nc2C-ZrCi-zFp91c'.
    Couldn't find all physical volumes for volume group xenvg.
    Volume group xenvg not found
  node2#

At this point, the node is broken and if we are to examine
instance2 we get (simplified output shown)::

  node1# gnt-instance info instance2
  Instance name: instance2
  State: configured to be up, actual state is up
    Nodes:
      - primary: node1
      - secondaries: node2
    Disks:
      - disk/0: drbd8, size 256M
        on primary:   /dev/drbd0 (147:0) in sync, status ok
        on secondary: /dev/drbd1 (147:1) in sync, status *DEGRADED* *MISSING DISK*

This instance has a secondary only on node2. Let's verify a primary
instance of node2::

  node1# gnt-instance info instance1
  Instance name: instance1
  State: configured to be up, actual state is up
    Nodes:
      - primary: node2
      - secondaries: node1
    Disks:
      - disk/0: drbd8, size 256M
        on primary:   /dev/drbd0 (147:0) in sync, status *DEGRADED* *MISSING DISK*
        on secondary: /dev/drbd3 (147:3) in sync, status ok
  node1# gnt-instance console instance1

  Debian GNU/Linux 5.0 instance1 tty1

  instance1 login: root
  Last login: Tue Oct 27 01:24:09 UTC 2009 on tty1
  instance1:~# date > test
  instance1:~# sync
  instance1:~# cat test
  Tue Oct 27 01:25:20 UTC 2009
  instance1:~# dmesg|tail
  [5439785.235448] NET: Registered protocol family 15
  [5439785.235489] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
  [5439785.235495] All bugs added by David S. Miller <davem@redhat.com>
  [5439785.235517] XENBUS: Device with no driver: device/console/0
  [5439785.236576] kjournald starting.  Commit interval 5 seconds
  [5439785.236588] EXT3-fs: mounted filesystem with ordered data mode.
  [5439785.236625] VFS: Mounted root (ext3 filesystem) readonly.
  [5439785.236663] Freeing unused kernel memory: 172k freed
  [5439787.533779] EXT3 FS on sda1, internal journal
  [5440655.065431] eth0: no IPv6 routers present
  instance1:~#

As you can see, the instance is running fine and doesn't see any disk
issues. It is now time to fix node2 and re-establish redundancy for the
involved instances.

.. note:: For Ganeti 2.0 we need to fix manually the volume group on
   node2 by running ``vgreduce --removemissing xenvg``

::

  node1# gnt-node repair-storage node2 lvm-vg xenvg
  Mon Oct 26 18:14:03 2009 Repairing storage unit 'xenvg' on node2 ...
  node1# ssh node2 vgs
    VG    #PV #LV #SN Attr   VSize   VFree
    xenvg   1   8   0 wz--n- 673.84G 673.84G
  node1#

This has removed the 'bad' disk from the volume group, which is now left
with only one PV. We can now replace the disks for the involved
instances::

  node1# for i in instance{1..4}; do gnt-instance replace-disks -a $i; done
  Mon Oct 26 18:15:38 2009 Replacing disk(s) 0 for instance1
  Mon Oct 26 18:15:38 2009 STEP 1/6 Check device existence
  Mon Oct 26 18:15:38 2009  - INFO: Checking disk/0 on node1
  Mon Oct 26 18:15:38 2009  - INFO: Checking disk/0 on node2
  Mon Oct 26 18:15:38 2009  - INFO: Checking volume groups
  Mon Oct 26 18:15:38 2009 STEP 2/6 Check peer consistency
  Mon Oct 26 18:15:38 2009  - INFO: Checking disk/0 consistency on node node1
  Mon Oct 26 18:15:39 2009 STEP 3/6 Allocate new storage
  Mon Oct 26 18:15:39 2009  - INFO: Adding storage on node2 for disk/0
  Mon Oct 26 18:15:39 2009 STEP 4/6 Changing drbd configuration
  Mon Oct 26 18:15:39 2009  - INFO: Detaching disk/0 drbd from local storage
  Mon Oct 26 18:15:40 2009  - INFO: Renaming the old LVs on the target node
  Mon Oct 26 18:15:40 2009  - INFO: Renaming the new LVs on the target node
  Mon Oct 26 18:15:40 2009  - INFO: Adding new mirror component on node2
  Mon Oct 26 18:15:41 2009 STEP 5/6 Sync devices
  Mon Oct 26 18:15:41 2009  - INFO: Waiting for instance instance1 to sync disks.
  Mon Oct 26 18:15:41 2009  - INFO: - device disk/0: 12.40% done, 9 estimated seconds remaining
  Mon Oct 26 18:15:50 2009  - INFO: Instance instance1's disks are in sync.
  Mon Oct 26 18:15:50 2009 STEP 6/6 Removing old storage
  Mon Oct 26 18:15:50 2009  - INFO: Remove logical volumes for disk/0
  Mon Oct 26 18:15:52 2009 Replacing disk(s) 0 for instance2
  Mon Oct 26 18:15:52 2009 STEP 1/6 Check device existence

  Mon Oct 26 18:16:01 2009 STEP 6/6 Removing old storage
  Mon Oct 26 18:16:01 2009  - INFO: Remove logical volumes for disk/0
  Mon Oct 26 18:16:02 2009 Replacing disk(s) 0 for instance3
  Mon Oct 26 18:16:02 2009 STEP 1/6 Check device existence

  Mon Oct 26 18:16:09 2009 STEP 6/6 Removing old storage
  Mon Oct 26 18:16:09 2009  - INFO: Remove logical volumes for disk/0
  Mon Oct 26 18:16:10 2009 Replacing disk(s) 0 for instance4
  Mon Oct 26 18:16:10 2009 STEP 1/6 Check device existence

  Mon Oct 26 18:16:18 2009 STEP 6/6 Removing old storage
  Mon Oct 26 18:16:18 2009  - INFO: Remove logical volumes for disk/0
  node1#

As this point, all instances should be healthy again.

.. note:: Ganeti 2.0 doesn't have the ``-a`` option to replace-disks, so
   for it you have to run the loop twice, once over primary instances
   with argument ``-p`` and once secondary instances with argument
   ``-s``, but otherwise the operations are similar::

     node1# gnt-instance replace-disks -p instance1

     node1# for i in instance{2..4}; do gnt-instance replace-disks -s $i; done

Common cluster problems
-----------------------

There are a number of small issues that might appear on a cluster that
can be solved easily as long as the issue is properly identified. For
this exercise we will consider the case of node3, which was broken
previously and re-added to the cluster without reinstallation. Running
cluster verify on the cluster reports::

  node1# gnt-cluster verify
  Mon Oct 26 18:30:08 2009 * Verifying global settings
  Mon Oct 26 18:30:08 2009 * Gathering data (3 nodes)
  Mon Oct 26 18:30:10 2009 * Verifying node status
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: unallocated drbd minor 0 is in use
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: unallocated drbd minor 1 is in use
  Mon Oct 26 18:30:10 2009 * Verifying instance status
  Mon Oct 26 18:30:10 2009   - ERROR: instance instance4: instance should not run on node node3
  Mon Oct 26 18:30:10 2009 * Verifying orphan volumes
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_data is unknown
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data is unknown
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta is unknown
  Mon Oct 26 18:30:10 2009   - ERROR: node node3: volume 22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta is unknown
  Mon Oct 26 18:30:10 2009 * Verifying remaining instances
  Mon Oct 26 18:30:10 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 18:30:10 2009 * Other Notes
  Mon Oct 26 18:30:10 2009 * Hooks Results
  node1#

Instance status
+++++++++++++++

As you can see, *instance4* has a copy running on node3, because we
forced the failover when node3 failed. This case is dangerous as the
instance will have the same IP and MAC address, wreaking havok on the
network environment and anyone who tries to use it.

Ganeti doesn't directly handle this case. It is recommended to logon to
node3 and run::

  node3# xm destroy instance4

Unallocated DRBD minors
+++++++++++++++++++++++

There are still unallocated DRBD minors on node3. Again, these are not
handled by Ganeti directly and need to be cleaned up via DRBD commands::

  node3# drbdsetup /dev/drbd0 down
  node3# drbdsetup /dev/drbd1 down
  node3#

Orphan volumes
++++++++++++++

At this point, the only remaining problem should be the so-called
*orphan* volumes. This can happen also in the case of an aborted
disk-replace, or similar situation where Ganeti was not able to recover
automatically. Here you need to remove them manually via LVM commands::

  node3# lvremove xenvg
  Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data"? [y/n]: y
    Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_data" successfully removed
  Do you really want to remove active logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta"? [y/n]: y
    Logical volume "22459cf8-117d-4bea-a1aa-791667d07800.disk0_meta" successfully removed
  Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data"? [y/n]: y
    Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_data" successfully removed
  Do you really want to remove active logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta"? [y/n]: y
    Logical volume "1aaf4716-e57f-4101-a8d6-03af5da9dc50.disk0_meta" successfully removed
  node3#

At this point cluster verify shouldn't complain anymore::

  node1# gnt-cluster verify
  Mon Oct 26 18:37:51 2009 * Verifying global settings
  Mon Oct 26 18:37:51 2009 * Gathering data (3 nodes)
  Mon Oct 26 18:37:53 2009 * Verifying node status
  Mon Oct 26 18:37:53 2009 * Verifying instance status
  Mon Oct 26 18:37:53 2009 * Verifying orphan volumes
  Mon Oct 26 18:37:53 2009 * Verifying remaining instances
  Mon Oct 26 18:37:53 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 18:37:53 2009 * Other Notes
  Mon Oct 26 18:37:53 2009 * Hooks Results
  node1#

N+1 errors
++++++++++

Since redundant instances in Ganeti have a primary/secondary model, it
is needed to leave aside on each node enough memory so that if one of
its peer node fails, all the secondary instances that have that node as
primary can be relocated. More specifically, if instance2 has node1 as
primary and node2 as secondary (and node1 and node2 do not have any
other instances in this layout), then it means that node2 must have
enough free memory so that if node1 fails, we can failover instance2
without any other operations (for reducing the downtime window). Let's
increase the memory of the current instances to 4G, and add three new
instances, two on node2:node3 with 8GB of RAM and one on node1:node2,
with 12GB of RAM (numbers chosen so that we run out of memory)::

  node1# gnt-instance modify -B memory=4G instance1
  Modified instance instance1
   - be/memory -> 4096
  Please don't forget that these parameters take effect only at the next start of the instance.
  node1# gnt-instance modify …

  node1# gnt-instance add -t drbd -n node2:node3 -s 512m -B memory=8G -o debootstrap instance5

  node1# gnt-instance add -t drbd -n node2:node3 -s 512m -B memory=8G -o debootstrap instance6

  node1# gnt-instance add -t drbd -n node1:node2 -s 512m -B memory=8G -o debootstrap instance7
  node1# gnt-instance reboot --all
  The reboot will operate on 7 instances.
  Do you want to continue?
  Affected instances:
    instance1
    instance2
    instance3
    instance4
    instance5
    instance6
    instance7
  y/[n]/?: y
  Submitted jobs 677, 678, 679, 680, 681, 682, 683
  Waiting for job 677 for instance1...
  Waiting for job 678 for instance2...
  Waiting for job 679 for instance3...
  Waiting for job 680 for instance4...
  Waiting for job 681 for instance5...
  Waiting for job 682 for instance6...
  Waiting for job 683 for instance7...
  node1#

We rebooted instances for the memory changes to have effect. Now the
cluster looks like::

  node1# gnt-node list
  Node  DTotal DFree MTotal MNode MFree Pinst Sinst
  node1   1.3T  1.3T  32.0G  1.0G  6.5G     4     1
  node2   1.3T  1.3T  32.0G  1.0G 10.5G     3     4
  node3   1.3T  1.3T  32.0G  1.0G 30.5G     0     2
  node1# gnt-cluster verify
  Mon Oct 26 18:59:36 2009 * Verifying global settings
  Mon Oct 26 18:59:36 2009 * Gathering data (3 nodes)
  Mon Oct 26 18:59:37 2009 * Verifying node status
  Mon Oct 26 18:59:37 2009 * Verifying instance status
  Mon Oct 26 18:59:37 2009 * Verifying orphan volumes
  Mon Oct 26 18:59:37 2009 * Verifying remaining instances
  Mon Oct 26 18:59:37 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 18:59:37 2009   - ERROR: node node2: not enough memory on to accommodate failovers should peer node node1 fail
  Mon Oct 26 18:59:37 2009 * Other Notes
  Mon Oct 26 18:59:37 2009 * Hooks Results
  node1#

The cluster verify error above shows that if node1 fails, node2 will not
have enough memory to failover all primary instances on node1 to it. To
solve this, you have a number of options:

- try to manually move instances around (but this can become complicated
  for any non-trivial cluster)
- try to reduce memory of some instances to accommodate the available
  node memory
- if you have the ganeti-htools package installed, you can run the
  ``hbal`` tool which will try to compute an automated cluster solution
  that complies with the N+1 rule

Network issues
++++++++++++++

In case a node has problems with the network (usually the secondary
network, as problems with the primary network will render the node
unusable for ganeti commands), it will show up in cluster verify as::

  node1# gnt-cluster verify
  Mon Oct 26 19:07:19 2009 * Verifying global settings
  Mon Oct 26 19:07:19 2009 * Gathering data (3 nodes)
  Mon Oct 26 19:07:23 2009 * Verifying node status
  Mon Oct 26 19:07:23 2009   - ERROR: node node1: tcp communication with node 'node3': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009   - ERROR: node node2: tcp communication with node 'node3': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009   - ERROR: node node3: tcp communication with node 'node1': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009   - ERROR: node node3: tcp communication with node 'node2': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009   - ERROR: node node3: tcp communication with node 'node3': failure using the secondary interface(s)
  Mon Oct 26 19:07:23 2009 * Verifying instance status
  Mon Oct 26 19:07:23 2009 * Verifying orphan volumes
  Mon Oct 26 19:07:23 2009 * Verifying remaining instances
  Mon Oct 26 19:07:23 2009 * Verifying N+1 Memory redundancy
  Mon Oct 26 19:07:23 2009 * Other Notes
  Mon Oct 26 19:07:23 2009 * Hooks Results
  node1#

This shows that both node1 and node2 have problems contacting node3 over
the secondary network, and node3 has problems contacting them. From this
output is can be deduced that since node1 and node2 can communicate
between themselves, node3 is the one having problems, and you need to
investigate its network settings/connection.

Migration problems
++++++++++++++++++

Since live migration can sometimes fail and leave the instance in an
inconsistent state, Ganeti provides a ``--cleanup`` argument to the
migrate command that does:

- check on which node the instance is actually running (has the
  command failed before or after the actual migration?)
- reconfigure the DRBD disks accordingly

It is always safe to run this command as long as the instance has good
data on its primary node (i.e. not showing as degraded). If so, you can
simply run::

  node1# gnt-instance migrate --cleanup instance1
  Instance instance1 will be recovered from a failed migration. Note
  that the migration procedure (including cleanup) is **experimental**
  in this version. This might impact the instance if anything goes
  wrong. Continue?
  y/[n]/?: y
  Mon Oct 26 19:13:49 2009 Migrating instance instance1
  Mon Oct 26 19:13:49 2009 * checking where the instance actually runs (if this hangs, the hypervisor might be in a bad state)
  Mon Oct 26 19:13:49 2009 * instance confirmed to be running on its primary node (node2)
  Mon Oct 26 19:13:49 2009 * switching node node1 to secondary mode
  Mon Oct 26 19:13:50 2009 * wait until resync is done
  Mon Oct 26 19:13:50 2009 * changing into standalone mode
  Mon Oct 26 19:13:50 2009 * changing disks into single-master mode
  Mon Oct 26 19:13:50 2009 * wait until resync is done
  Mon Oct 26 19:13:51 2009 * done
  node1#

In use disks at instance shutdown
+++++++++++++++++++++++++++++++++

If you see something like the following when trying to shutdown or
deactivate disks for an instance::

  node1# gnt-instance shutdown instance1
  Mon Oct 26 19:16:23 2009  - WARNING: Could not shutdown block device disk/0 on node node2: drbd0: can't shutdown drbd device: /dev/drbd0: State change failed: (-12) Device is held open by someone\n

It most likely means something is holding open the underlying DRBD
device. This can be bad if the instance is not running, as it might mean
that there was concurrent access from both the node and the instance to
the disks, but not always (e.g. you could only have had the partitions
activated via ``kpartx``).

To troubleshoot this issue you need to follow standard Linux practices,
and pay attention to the hypervisor being used:

- check if (in the above example) ``/dev/drbd0`` on node2 is being
  mounted somewhere (``cat /proc/mounts``)
- check if the device is not being used by device mapper itself:
  ``dmsetup ls`` and look for entries of the form ``drbd0pX``, and if so
  remove them with either ``kpartx -d`` or ``dmsetup remove``

For Xen, check if it's not using the disks itself::

  node1# xenstore-ls /local/domain/0/backend/vbd|grep -e "domain =" -e physical-device
  domain = "instance2"
  physical-device = "93:0"
  domain = "instance3"
  physical-device = "93:1"
  domain = "instance4"
  physical-device = "93:2"
  node1#

You can see in the above output that the node exports three disks, to
three instances. The ``physical-device`` key is in major:minor format in
hexadecimal, and 0x93 represents DRBD's major number. Thus we can see
from the above that instance2 has /dev/drbd0, instance3 /dev/drbd1, and
instance4 /dev/drbd2.

.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End: