- Jun 11, 2012
-
-
Iustin Pop authored
Commit 4f580fef added the keymap support, but missed that this directory needs to be ensured/created at hypervisor init time. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 11, 2012
-
-
Iustin Pop authored
Copy-paste mismatch :) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com> (cherry picked from commit 36c70d4d) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
We already have a ./configure-time variable for this, but it seems to be actually unused. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit 3c4afa2e) Signed-off-by:
Iustin Pop <iustin@google.com> (trivial patch, let's cherry-pick it) Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
The reason why grow-disk was doing: $ gnt-instance grow-disk instance3 0 -64 Unhandled Ganeti error: Invalid format Is because it does it's own ParseUnit call, and doesn't transform that into a nicer message. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> (cherry picked from commit c8bde61e) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This is a partial cherry-pick from 7530364d on master: Currently, noded requires PUT, even though the semantics of the RPC calls do not match a PUT. We change the code accept both PUT and POST, with the intention to remove the PUT support in a later version. Additionally, we add a message to the HttpBadRequest exception to make clear the failure mode (not seeing any error message was what made me send this patch…). This was the only description-less use of this exception, by the way. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> (cherry picked from commit 7530364d) What was not cherry-picked is the rpc change (to switch to PUT). The reason I want to backport this to devel-2.5 is that when upgrading to 2.6, having noded accept both makes for an easier upgrade path. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit 5d0566de) Signed-off-by:
Iustin Pop <iustin@google.com> Yet another cherry-pick (must go deeper!); since we might not make a new release from the devel-2.5 branch, let's add this to stable-2.5. Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Mention that instances can be passed on the CLI when “--help” is used. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com> (cherry picked from commit eb5ac108) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
The vgreduce command has changed behaviour from when we initially wrote the code (2.02.02 versus 2.02.66, 4 years delta): - if there are LVs which will be impacted, it requires --force - otherwise refuses to proceed, but it still returns exit code 0 We handle this by looking to see if it returns "Wrote out consistent volume group" (behaviour unchanged), or if it complains about "--force"; in the case it didn't complete, we retry the operation. We improve a bit the checking of "vgs", as it uses to fail silently and we didn't detect it. New tests for this function should test, I believe, all the expected variations; at the least we now have data files with the expected output. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit 048eeb2b) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 09, 2012
-
-
Iustin Pop authored
In commit 896a03f6 I cleaned up the environment for OS scripts, however I think that was a bit too extreme - it breaks our own instance-debootstrap hooks, because for example dpkg (called from the grub script) requires PATH to be set. Instead of requiring every OS to define a path, let's set a default PATH for the OS scripts, which should cover most common uses. A more specialised PATH can be set, if needed, in the OS scripts. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Andrea Spadaccini authored
Move the contents of the PATH environment variable for hooks to constants, and use its value in the code and in the hooks documentation. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit fe5ca2bb) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit e687ec01 (present in 2.5 since the 2.5 beta 3) did consistency fixes across the code-base. Unfortunately this was done without enough checks on the actual meaning of one of the fixes, which means error re-raising in lib/errors.py is broken. The problem is that: raise cls, args is different than: raise cls(args) And our unit-tests didn't catch this (this patch updates the tests). This breakage is usually trivial, like wrong error messages: $ gnt-instance remove no-such-instance Failure: prerequisites not met for this operation: ("Instance 'no-such-instance' not known", 'unknown_entity') versus: $ gnt-instance remove no-such-instance Failure: prerequisites not met for this operation: error type: unknown_entity, error details: Instance 'no-such-instance' not known or: $ gnt-instance add … no-such-instance Failure: prerequisites not met for this operation: ('The given name (no-such-instance) does not resolve: Name or service not known', 'resolver_error') versus: $ gnt-instance add … no-such-instance Failure: prerequisites not met for this operation: error type: resolver_error, error details: The given name (no-such-instance) does not resolve: Name or service not known But in some cases where we rely on a certain data representation (e.g. HooksAbort), this actually breaks because we try to iterate over the wrong type: File "/usr/lib/python2.6/dist-packages/ganeti/cli.py", line 1907, in FormatError for node, script, out in err.args[0]: ValueError: need more than 1 value to unpack Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 07, 2012
-
-
Iustin Pop authored
Per commit 0304f0ec, newer LVM has extended the lv_attr field. However, that commit was incomplete as we examine this attribute in another place in the code. Thanks to user alperhome, the _LVSLINE_REGEX in lib/backend.py also needs fixing. I've used the same change as in the above commit: accept at minimum 6 characters, but allow for more. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
- Apr 11, 2012
-
-
Iustin Pop authored
Sorry, didn't catch this before… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 54b010ca) Signed-off-by:
Michael Hanselmann <hansmi@google.com>
-
Dimitris Aragiorgis authored
Commit 3b3b1bca does not entirely fix the bug introduced in commit f396ad8c. It fixes consistency of config data in permanent storage, but does not ensure consistency in data held in runtime memory of masterd. The bug of duplicate ports is still triggered when LUInstanceRemove() invokes _RemoveDisks() and this returns False (in case call_blockdev_remove RPC fails). The drbd ports get returned in the pool, but execution is aborted and RemoveInstance() is never invoked. Due to the fact that port handling is not done with TemporaryReservationManager, ensure that ports are released, only if disk related config data is deleted. In _RemoveDisks() release ports only if all RPCs succeed. Extend _RemoveDisks() to include ignore_failures argument passed by _RemoveInstance() to handle the ports appropriately. Signed-off-by:
Dimitris Aragiorgis <dimara@grnet.gr> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Dimitris Aragiorgis authored
Commit f396ad8c returns the TCP port used by DRBD disk back to the TCP/UDP port pool using AddTcpUdpPort(). However, AddTcpUdpPort() writes the config on every invocation, using _WriteConfig(). This causes two problems: * it causes critical errors logged by VerifyConfig(), after the DRBD disk removal, and until the actual instance removal. * if the code following AddTcpUdpPort() fails, the port is already returned back the pool, which causes the port to have duplicates (inconsistent config). AddTcpUdpPort() is invoked in three cases: * during InstanceRemove() through _RemoveDisks(). * during InstanceSetParams() in case of disk removal. * during InstanceSetParams() through _ConvertDrbdToPlain(). This commit fixes the problem by removing the _WriteConfig() call from AddTcpUdpPort(), delegate it to Update() via the TemporaryReservationManager and ensure AddTcpUdpPort() precedes Update(). Signed-off-by:
Dimitris Aragiorgis <dimara@grnet.gr> [iustin@google.com: small comments adjustements] Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 3b3b1bca)
-
- Mar 30, 2012
-
-
Iustin Pop authored
Sorry, didn't catch this before… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Mar 29, 2012
-
-
Dimitris Aragiorgis authored
Commit f396ad8c returns the TCP port used by DRBD disk back to the TCP/UDP port pool using AddTcpUdpPort(). However, AddTcpUdpPort() writes the config on every invocation, using _WriteConfig(). This causes two problems: * it causes critical errors logged by VerifyConfig(), after the DRBD disk removal, and until the actual instance removal. * if the code following AddTcpUdpPort() fails, the port is already returned back the pool, which causes the port to have duplicates (inconsistent config). AddTcpUdpPort() is invoked in three cases: * during InstanceRemove() through _RemoveDisks(). * during InstanceSetParams() in case of disk removal. * during InstanceSetParams() through _ConvertDrbdToPlain(). This commit fixes the problem by removing the _WriteConfig() call from AddTcpUdpPort(), delegate it to Update() via the TemporaryReservationManager and ensure AddTcpUdpPort() precedes Update(). Signed-off-by:
Dimitris Aragiorgis <dimara@grnet.gr> [iustin@google.com: small comments adjustements] Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 28, 2012
-
-
Bernardo Dal Seno authored
Fixed a typo so that now LUOobCommand acquires the BLG in shared mode, as intended. Signed-off-by:
Bernardo Dal Seno <bdalseno@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 23, 2012
-
-
René Nussbaumer authored
There are other ways to leave the cluster in a broken state than just the version check. However they are not very trivial to fix in 2.5. So leave it up to 2.6 for a nicer fix. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> (cherry picked from commit e2ea8de1)
-
Iustin Pop authored
LVM version 2.02.93 (or at least, sometimes after .88) has extend the lv_attr field with two more flag; we only care about the first digit, so let's change the "!= 6" check to "< 6". Thanks to Robin H Johnson <robbat2@gentoo.org> for finding this issue. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Mar 22, 2012
-
-
Michael Hanselmann authored
This reverts commit 0fa753ba. Turns out there are more queries acquiring locks than we'd like. This patch goes to version 2.6 and a separate patch fixes the immediate issues in LUClusterVerifyConfig. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
Michael Hanselmann authored
Instead of acquiring the BGL in exclusive mode (which blocks all other operations), we acquire all locks for groups, nodes and instances in shared mode before verifying the configuration. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
- Mar 21, 2012
-
-
Guido Trotter authored
This fixes issue 222. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 20, 2012
-
-
Michael Hanselmann authored
Short description: This fixes an issue whereby masterd would become unresponsive on the LUXI socket, leading to client timeouts. While made worse in 2.5, the underlying issue was already present in 2.4. Longer description: Until now all LUXI queries would acquire the BGL (big Ganeti lock) in shared mode. With the exception of OpNodeAdd and OpNodeRemove, this was also the case for all opcodes before version 2.5. In 2.5 we split OpClusterVerify into multiple opcodes, one of which (OpClusterVerifyConfig) now acquires the BGL in exclusive mode. Whether or not doing so is good is a separate discussion: OpNodeAdd and OpNodeRemove, as of this writing, still require an exclusive BGL. OpClusterVerifyConfig is run more often than OpNodeAdd or OpNodeRemove in normal clusters, which is why we only recognized this issue in 2.5. What would happen is that once OpClusterVerifyConfig tried to acquire its exclusive BGL while it was actually held by other opcodes (e.g. OpInstanceReplaceDisks), the locking code would not grant shared acquires for the BGL, even when the exclusive acquire is removed from the queue for a short amount of time after a timeout. This is necessary to prevent lock starvation. In this situation further LUXI queries requiring the BGL in shared mode, e.g. OpClusterQuery, would block and the client eventually time out. Over time they fill the client request workerpool's queue and at that point even requests not requiring the BGL stop working. Once the long-running operation(s) holding the BGL in shared mode finished, OpClusterVerifyConfig gets it in exclusive mode and everything returns to normal. LUXI recovers very soon too. I'd like to thank Bernardo Dal Seno for his contribution to this bugfix. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
- Mar 19, 2012
-
-
Iustin Pop authored
If a specific list of groups has been requested, then the code used that, without transforming it to a (frozen)set first, which results in: unsupported operand type(s) for &: 'list' and 'frozenset' Trivial fix is to do that in the 'then' branch. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jan 31, 2012
-
-
Michael Hanselmann authored
Just using ht.TListOf as a type check doesn't work correctly. The function must be called with the expected item type. In this specific case TListOf was always called with the filter as a value, and the result of that call evaluated to truth. Since filters can be quite complex there's no check yet, and therefore just “TList” is used. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jan 26, 2012
-
-
Iustin Pop authored
Furthermore, correct the --help display on evacuate. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jan 25, 2012
-
-
Michael Hanselmann authored
This patch attempts to fix a number of issues with “gnt-cluster verify” in presence of multiple node groups and DRBD8 instances split over nodes in more than one group. - Look up instances in a group only by their primary node (otherwise split instances would be considered when verifying any of their node's groups) - When gathering additional nodes for LV checks, just compare instance's node's groups with the currently verified group instead of comparing against the primary node's group - Exclude nodes in other groups when calculating N+1 errors and checking logical volumes Not directly related, but a small error text is also clarified. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 20, 2012
-
-
Guido Trotter authored
Cleanup just updates the config with the correct location of the instance, or informs of its down status, but never starts it. As such there's no point in checking for enough free memory. Actually this check could prevent a perfectly safe cleanup operation if a node is busy. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jan 06, 2012
-
-
Guido Trotter authored
This of course was working for all the rcs, but broke with 1.0 itself. In addition: - split between running kvm --version and parsing its output - unittest parsing for various known --help outputs - updated NEWS file - happy 2012 wishes - the hope to finish this patch before it's time to say happy easter :) Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 21, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When an opcode is about to be processed its dependencies are evaluated using “_JobDependencyManager.CheckAndRegister”. Due to its nature that function requires a lock on the manager's internal structures. All of this happens while the job queue lock is held in shared mode (required for the job processor). When a job has been processed any pending dependencies are re-added to the job workerpool. Before this patch that would require the manager's lock and then, for adding the jobs, the job queue lock. Since this is in reverse order it will lead to deadlocks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 30, 2011
-
-
Nikos Skalkotos authored
Fix bug affecting command line options of "keyval" type. Although escaping commands with \ is supported, it is is not applied to the input recursively. Signed-off-by:
Nikos Skalkotos <skalkoto@grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 24, 2011
-
-
Michael Hanselmann authored
The parameter is called “mods”, not “modes”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com> (cherry picked from commit 1730d4a1)
-
Michael Hanselmann authored
The parameter is called “mods”, not “modes”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com>
-
Michael Hanselmann authored
Note: This bug only manifests itself in Ganeti 2.5, but since the problematic code also exists in 2.4, I decided to fix it there. If a node was assigned to a new group using “gnt-group assign-nodes” the node object's group would be changed, but not the duplicate member list in the group object. The latter is an optimization to require fewer locks for other operations. The per-group member list is only kept in memory and not written to disk. Ganeti 2.5 starts to make use of the data kept in the per-group member list and consequently fails when it is out of date. The following commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was confirmed using additional logging): $ gnt-group add foo $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name) $ gnt-cluster verify # Fails with KeyError This patch moves the code modifying node and group objects into “config.ConfigWriter” to do the complete operation under the config lock, and also to avoid making use of side-effects of modifying objects without calling “ConfigWriter.Update”. A unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 218f4c3d)
-
Michael Hanselmann authored
Note: This bug only manifests itself in Ganeti 2.5, but since the problematic code also exists in 2.4, I decided to fix it there. If a node was assigned to a new group using “gnt-group assign-nodes” the node object's group would be changed, but not the duplicate member list in the group object. The latter is an optimization to require fewer locks for other operations. The per-group member list is only kept in memory and not written to disk. Ganeti 2.5 starts to make use of the data kept in the per-group member list and consequently fails when it is out of date. The following commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was confirmed using additional logging): $ gnt-group add foo $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name) $ gnt-cluster verify # Fails with KeyError This patch moves the code modifying node and group objects into “config.ConfigWriter” to do the complete operation under the config lock, and also to avoid making use of side-effects of modifying objects without calling “ConfigWriter.Update”. A unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Commit c50452c3 added an exception when all instances should be evacuated off a node, but did so in a way which made pylint complain about unreachable code. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 23, 2011
-
-
Michael Hanselmann authored
There is a design issue in the iallocator interface which prevents us from doing this. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
When evacuating a node, only an assertion without informative text was used to check if the necessary node locks had been acquired. This was on top of evaluating the list of nodes without having a node group lock, so this was changed as well. Also update some exception messages to include “retry the operation”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
ConfigWriter.GetAllInstancesInfo returns a dictionary, not a list. Removing a node would fail with “too many values to unpack”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-