- Feb 18, 2011
-
-
Stephen Shirley authored
Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 17, 2011
-
-
Iustin Pop authored
Currently, there is at least one LU that does wrong validation of HV parameters (against all nodes, LUClusterSetParams). It's possible to fix this case, but I went and modified the base functions to filter out non-vm_capable nodes so all callers are protected. Note: the _CheckOSParams function is never called with all nodes list, so modifying it shouldn't be needed. However, I think it's safe to do so (and it shouldn't hurt as an instance's node shouldn't ever lack the vm_capable bit). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Since we don't have the data per design, UNAVAIL is appropriate here, while NODATA is not. The patch also adds a comment: if we extend the live fields list to contain other data in the future, we need to reevaluate this solution. This should fix issue 143. The listing now shows (node2==ofline, node3==not vm_capable): Node DTotal DFree MTotal MNode MFree Pinst Sinst node1 698.6G 630.5G 32.0G 1.0G 30.0G 8 7 node2 (offline) (offline) (offline) (offline) (offline) 9 4 node3 (unavail) (unavail) (unavail) (unavail) (unavail) 0 0 Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Because non-vm_capable nodes most likely don't have a hypervisor configured and/or storage, so the call will fail anyway. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 09, 2011
-
-
Iustin Pop authored
Currently, for both primary and secondary offline nodes, we give the same message: - ERROR: instance instance14: instance lives on offline node(s) node3 - ERROR: instance instance15: instance lives on offline node(s) node3 - ERROR: instance instance16: instance lives on offline node(s) node3 - ERROR: instance instance17: instance lives on offline node(s) node3 This is confusing, as an offline primary is in a different category than a secondary. The patch changes the warnings to have different error messages: - ERROR: instance instance14: instance has offline secondary node(s) node3 - ERROR: instance instance15: instance has offline secondary node(s) node3 - ERROR: instance instance16: instance lives on offline node node3 - ERROR: instance instance17: instance lives on offline node node3 Thanks to Alexander Schreiber <als@google.com> for reporting this issue. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Alexander Schreiber <als@google.com>
-
Iustin Pop authored
Currently, cluster-verify says: - ERROR: instance instance14: couldn't retrieve status for disk/0 on node3: node offline - ERROR: instance instance14: instance lives on offline node(s) node3 - ERROR: instance instance15: couldn't retrieve status for disk/0 on node3: node offline - ERROR: instance instance15: instance lives on offline node(s) node3 This is redundant as the “lives on offline node” message should be all we need to understand the cluster situation. The patch fixes this and also corrects a very old idiom. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Stephen Shirley <diamond@google.com>
-
Iustin Pop authored
Currently, cluster verify shows warnings N+1 warnings for offline nodes having any redundant instances since the memory data that we have for those nodes is zero, so any instance will trigger the warning. As the comment says, we already list secondary instances on offline nodes, so that warning is enough, and we skip the N+1 one. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Stephen Shirley <diamond@google.com>
-
- Feb 08, 2011
-
-
Stephen Shirley authored
The current code gives: Failure: prerequisites not met for this operation: error type: wrong_input, error details: Selection filter does not match any instances Signed-off-by:
Stephen Shirley <diamond@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 04, 2011
-
-
Stephen Shirley authored
This is needed so cluster-merge can add nodes from other clusters. Signed-off-by:
Stephen Shirley <diamond@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 03, 2011
-
-
Iustin Pop authored
Currently, the export timeout is 10 times 20 seconds, but the import is only 30 seconds. I'm raising this to 60 seconds with two goals in mind: - when debugging manually, this allows for easier synchronisation of the processes - 60 equals to 3 full 20 second intervals, which I think is better than just one an a half This change shouldn't make a big difference either way (at most, it will possibly delay the job in case of failures by half a minute). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
In case of failures, the recent daemon output is logged as %r on a list of unicode strings, which results in the (ugly): Thu Feb 3 05:13:34 2011 snapshot/0 failed to send data: Exited with status 1 (recent output: [u' DUMP: Date of this level 0 dump: Thu Feb 3 05:13:18 2011', u' DUMP: Dumping /dev/mapper/6369a5f7-1e67-4d0d-a4f0-956b3649c6d7.disk0_data.snap-1 (an unlisted file system) to standard output', u' DUMP: Label: none', u' DUMP: Writing 10 Kilobyte records', u' DUMP: mapping (Pass I) [regular files]', u' DUMP: mapping (Pass II) [directories]', u' DUMP: estimated 54301 blocks.', u' DUMP: Volume 1 started with block 1 at: Thu Feb 3 05:13:19 2011', u' DUMP: dumping (Pass III) [directories]', u' DUMP: dumping (Pass IV) [regular files]', u'socat: E SSL_write(): Connection reset by peer', u"dd: dd: writing `standard output': Broken pipe", u' DUMP: Broken pipe', u' DUMP: The ENTIRE dump is aborted.']) This patch joins this list and makes it a non-unicode string, thus resulting in the more readable (and ~10% shorter): Thu Feb 3 05:16:04 2011 snapshot/0 failed to send data: Exited with status 1 (recent output: DUMP: Date of this level 0 dump: Thu Feb 3 05:15:58 2011\n DUMP: Dumping /dev/mapper/6369a5f7-1e67-4d0d-a4f0-956b3649c6d7.disk0_data.snap-1 (an unlisted file system) to standard output\n DUMP: Label: none\n DUMP: Writing 10 Kilobyte records\n DUMP: mapping (Pass I) [regular files]\n DUMP: mapping (Pass II) [directories]\n DUMP: estimated 54350 blocks.\n DUMP: Volume 1 started with block 1 at: Thu Feb 3 05:15:59 2011\n DUMP: dumping (Pass III) [directories]\nsocat: E SSL_write(): Connection reset by peer\ndd: dd: writing `standard output': Broken pipe\n DUMP: Broken pipe\n DUMP: The ENTIRE dump is aborted.) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This adds a message and nice handling of ^C, especially useful for ``gnt-job watch``. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
The new import/export infrastructure in Ganeti 2.2 and up handles compression differently. It no longer writes compressed files to the destination. Unfortunately changing this behaviour would be non-trivial, so in the meantime setting “compression = none” will hopefully avoid some confusion. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 02, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
This function can be used from a SIGHUP handler to reopen log files. Initial, simple unittests are included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
It's passed in by most users (daemons, CLI scripts) and for the others (burnin, watcher) it certainly doesn't hurt, especially when using syslog. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
The I/O error will occur while opening the file, not while adding and configuring the handler. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Feb 01, 2011
-
-
Stephen Shirley authored
Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Stephen Shirley authored
This allows calling of _UnlockedLookupNodeGroup() from within AddNodeGroup() Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jan 31, 2011
-
-
Stephen Shirley authored
Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
This patch adds a new log handler class based on the standard library's BaseRotatingHandler. This new class allows the log file to be re-opened, e.g. upon receiving a SIGHUP signal. The latter will be implemented in forthcoming patches. The patch does not change the behaviour regarding writing to /dev/console. Quite a bit of code had to be changed to unittest the log handlers. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 28, 2011
-
-
Iustin Pop authored
This patch implements recreation of instance disk symlinks when the activate-disks operation is run. Until now, it was not possible to re-create these symlinks without stopping and starting or migrating an instance as the RPC call where this is done was in instance startup and migration. In order to do this, the blockdev_assemble rpc call needs the disk index too, which is added to the protocol. This is a change from 2.3 and makes instance startup incompatible (FYI). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This makes it possible to get the console information via a LUXI query. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This adds checking (in the configuration) for invalid be, nd and nic params. The code is a bit tricky as nd params are at cluster, nodegroup and node level, nicparams are at cluster and nic level, whereas beparams are at cluster and instance level. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This just adds a 'cluster' local variable for reducing duplication. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Guido Trotter authored
Closes issue: 130 Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Stephen Shirley authored
- Add check in ConfigWriter to prevent last node group from being removed - Tidy up error message a bit Signed-off-by:
Stephen Shirley <diamond@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
If for some reason (e.g. failed migration) one instance is running on multiple nodes the output can become inconsistent. To get that error and make it consistent between runs we make the call on the secondary too and look if it's running there. If so we report the instance as ERROR_wrongnode. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 27, 2011
-
-
Iustin Pop authored
Currently, the validity of the hypervisor parameters is only checked at init/modification time, and not in the cluster verify. This is bad, as it can lead to inconsistent state that is only detected when the next modification (which can be unrelated) is made, leading to unexpected error messages. This patch adds both syntax verification (in masterd) and validity verification on remote nodes. The downside of the patch is that on clusters with many instances which have custom parameters, it will be slow. A possible improvement would be to detect duplicate, identical set of parameters, and collapse these into a single verification, but that is left as a TODO (in case it becomes problematic). An additional change is in utils.ForceDict, where we said 'key', whereas this function is always used with parameter dicts, so I changed it to "Unknown parameter". Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
René Nussbaumer authored
In cases where secondary was offline and not evacuated watcher tried to activate-disks in an endless manner, but this is useless, as the secondary is offline and therefore not responding to this approach. This patch skips activation of the disk if the secondary is bad but instance up and running. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jan 26, 2011
-
-
Iustin Pop authored
The recent work on multi-VG support has converted LUClusterVerifyDisks into doing serialised calls to each node, as each node can have different VGs. This is suboptimal, especially for big clusters, where this LU is executed by the watcher very often. This patch changes the logic based on the observation that querying a node for its VGs and then requesting a LV list for those VGs is equivalent to simply asking for all LVs, without specifying the VG name(s). So backend.py needs changes to accept an empty VG list, and the LU itself partially reverts to the previous version. Additionally, we do two other fixes to this LU: - small improvement in getting the instance list from the config - MapLVsByNode works for all disk types, hence no need to restrict to the DRBD template, especially as today we can "recreate" disks for plain volumes too (the warning message in gnt-cluster is updated too) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Recent multi-VG work already exports the missing LV names as vg/lv, not simply lv. So the query and addition of the VG name in gnt-cluster verify-disks is redundant, and even wrong for non-default-VG instances. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
In some cases (e.g. the hypervisor not running at all), we might want to force disk deactivation, skipping the hypervisor checks. I believe this is not a good thing to do all the time, so this patch adds the force option to allow manual selection of this operation mode. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
This is analogue to the existing check for a responsive node daemon. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
At least ganeti-confd was not started. It got started a few minutes later by ganeti-watcher. Also move one pylint disable to the effective line. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Also replace hardcoded “xenvg” with constant. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-