Commits · ce44589764265c9195ebadbb7c4bd844cda72dbe · itminedu / snf-ganeti

Apr 12, 2012

Merge branch 'stable-2.5' into devel-2.5 · ce445897

Michael Hanselmann authored 13 years ago


* stable-2.5:
  Bump version for 2.5.0 final release
  configure.ac: Fix “too many arguments” error
  Fix extra whitespace
  Further fixes concerning drbd port release
  Fix a bug concerning TCP port release
  Fix extra whitespace
  Fix a bug concerning TCP port release

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ce445897

Apr 11, 2012

Bump version for 2.5.0 final release · c434401a

Michael Hanselmann authored 13 years ago


Also update NEWS file.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c434401a

Merge branch 'devel-2.4' into stable-2.5 · 6cd4e775

Michael Hanselmann authored 13 years ago


* devel-2.4:
  Fix extra whitespace
  Further fixes concerning drbd port release
  Fix a bug concerning TCP port release

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6cd4e775

configure.ac: Fix “too many arguments” error · e2e8af73

Michael Hanselmann authored 13 years ago


If GHC_PKG_QUICKCHECK contains multiple values, the test would fail
with “too many arguments”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e2e8af73

Fix extra whitespace · 612f7fd4

Iustin Pop authored 13 years ago


Sorry, didn't catch this before…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 54b010ca)

Signed-off-by: Michael Hanselmann <hansmi@google.com>

612f7fd4

Further fixes concerning drbd port release · 42f25b0b

Dimitris Aragiorgis authored 13 years ago


Commit 3b3b1bca does not entirely fix the bug introduced in commit
f396ad8c. It fixes consistency of config data in permanent storage, but
does not ensure consistency in data held in runtime memory of masterd.

The bug of duplicate ports is still triggered when LUInstanceRemove()
invokes _RemoveDisks() and this returns False (in case
call_blockdev_remove RPC fails). The drbd ports get returned in the
pool, but execution is aborted and RemoveInstance() is never invoked.

Due to the fact that port handling is not done with
TemporaryReservationManager, ensure that ports are released, only if
disk related config data is deleted.

In _RemoveDisks() release ports only if all RPCs succeed.

Extend _RemoveDisks() to include ignore_failures argument passed by
_RemoveInstance() to handle the ports appropriately.

Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

42f25b0b

Fix a bug concerning TCP port release · 2522b7c4

Dimitris Aragiorgis authored 13 years ago


Commit f396ad8c returns the TCP port used by DRBD disk back to the
TCP/UDP port pool using AddTcpUdpPort().

However, AddTcpUdpPort() writes the config on every invocation,
using _WriteConfig(). This causes two problems:

 * it causes critical errors logged by VerifyConfig(), after the DRBD
   disk removal, and until the actual instance removal.
 * if the code following AddTcpUdpPort() fails, the port is already
   returned back the pool, which causes the port to have duplicates
   (inconsistent config).

AddTcpUdpPort() is invoked in three cases:

 * during InstanceRemove() through _RemoveDisks().
 * during InstanceSetParams() in case of disk removal.
 * during InstanceSetParams() through _ConvertDrbdToPlain().

This commit fixes the problem by removing the _WriteConfig() call from
AddTcpUdpPort(), delegate it to Update() via the
TemporaryReservationManager and ensure AddTcpUdpPort() precedes
Update().

Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>
[iustin@google.com: small comments adjustements]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 3b3b1bca)

2522b7c4

Mar 30, 2012

Fix extra whitespace · 54b010ca

Iustin Pop authored 13 years ago


Sorry, didn't catch this before…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

54b010ca

Mar 29, 2012

Fix a bug concerning TCP port release · 3b3b1bca

Dimitris Aragiorgis authored 13 years ago


Commit f396ad8c returns the TCP port used by DRBD disk back to the
TCP/UDP port pool using AddTcpUdpPort().

However, AddTcpUdpPort() writes the config on every invocation,
using _WriteConfig(). This causes two problems:

 * it causes critical errors logged by VerifyConfig(), after the DRBD
   disk removal, and until the actual instance removal.
 * if the code following AddTcpUdpPort() fails, the port is already
   returned back the pool, which causes the port to have duplicates
   (inconsistent config).

AddTcpUdpPort() is invoked in three cases:

 * during InstanceRemove() through _RemoveDisks().
 * during InstanceSetParams() in case of disk removal.
 * during InstanceSetParams() through _ConvertDrbdToPlain().

This commit fixes the problem by removing the _WriteConfig() call from
AddTcpUdpPort(), delegate it to Update() via the
TemporaryReservationManager and ensure AddTcpUdpPort() precedes
Update().

Signed-off-by: Dimitris Aragiorgis <dimara@grnet.gr>
[iustin@google.com: small comments adjustements]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3b3b1bca

Mar 28, 2012

ganeti.initd: Add “status” action · 8e2ed2e8

Michael Hanselmann authored 13 years ago


Eric Rostetter sent a patch adding a “status” action, but unfortunately
his code was apparently specific to Red Hat. I hope this implementation
is more distribution-agnostic; after all “status_of_proc” is part of
LSB. Example output:

$ /etc/init.d/ganeti status
ganeti-noded is not running ... failed!
ganeti-masterd is running.
ganeti-rapi is running.
ganeti-confd is running.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8e2ed2e8

Add whitelist for opcodes using BGL · c9c33a28

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c9c33a28

Merge branch 'stable-2.5' into devel-2.5 · 1501cd6b

Michael Hanselmann authored 13 years ago


* stable-2.5:
  LUOobCommand: acquire BGL in shared mode

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1501cd6b

LUOobCommand: acquire BGL in shared mode · 6977943c

Bernardo Dal Seno authored 13 years ago


Fixed a typo so that now LUOobCommand acquires the BLG in shared mode, as
intended.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6977943c

Mar 23, 2012

Fix docstring bug · 566db1f2

Iustin Pop authored 13 years ago


Fix a typo introduced in commit c85b15c1, which breaks epydoc.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

566db1f2

Merge branch 'stable-2.5' into devel-2.5 · 744dd57c

Guido Trotter authored 13 years ago


* stable-2.5:
  LUNodeAdd: Verify version in Prereq
  Fix LV status parsing to accept newer LVM
  Bump version for 2.5.0~rc6 release
  Revert "Stop acquiring BGL for LUXI queries"
  LUClusterVerifyConfig: Share BGL, acquire all locks in shared mode
  KVM: don't add -nographic using spice
  Stop acquiring BGL for LUXI queries
  Fix type error in LUInstanceChangeGroup

Conflicts:
	lib/hypervisor/hv_kvm.py
	  - trivial, keep both changes

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

744dd57c

LUNodeAdd: Verify version in Prereq · 2d453213

René Nussbaumer authored 13 years ago


There are other ways to leave the cluster in a broken state than just
the version check. However they are not very trivial to fix in 2.5. So
leave it up to 2.6 for a nicer fix.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
(cherry picked from commit e2ea8de1)

2d453213

LUNodeAdd: Verify version in Prereq · e2ea8de1

René Nussbaumer authored 13 years ago


There are other ways to leave the cluster in a broken state than just
the version check. However they are not very trivial to fix in 2.5. So
leave it up to 2.6 for a nicer fix.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e2ea8de1

Fix LV status parsing to accept newer LVM · 0304f0ec

Iustin Pop authored 13 years ago


LVM version 2.02.93 (or at least, sometimes after .88) has extend the
lv_attr field with two more flag; we only care about the first digit,
so let's change the "!= 6" check to "< 6".

Thanks to Robin H Johnson <robbat2@gentoo.org> for finding this issue.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

0304f0ec

Mar 22, 2012

gnt-instance info: Show node group information · 080fbeea

Michael Hanselmann authored 13 years ago


This requires acquiring the node group locks in shared mode.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

080fbeea

cmdlib: Factorize checking acquired node group locks · c85b15c1

Michael Hanselmann authored 13 years ago


The “cur_group_uuid” parameter is optional to prepare for using the
factorized code from LUInstanceQueryData.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c85b15c1

Bump version for 2.5.0~rc6 release · 18e2d065

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

18e2d065

cmdlib: Stop forking in LUClusterQuery · a20e4768

Michael Hanselmann authored 13 years ago


While debugging another issue we realized that LUClusterQuery forks.
This turned out to be the “platform.architecture” function from the
Python library. It uses the “file” command to determine the architecture
of the Python binary.

This patch adds two new functions to the “runtime” module to get this
information once per process instead of doing it every single time
LUClusterQuery is used. Forking is a no-go in a multi-threaded
environment anyway.

A future change will also have to change the terminology in “gnt-cluster
info”: it reports the binary architecture simply as “architecture”, when
it's actually the binaries' architecture. Kernel and userland can be
different.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

a20e4768

locking: Notify only once on release · 70567db0

Michael Hanselmann authored 13 years ago


Don't notify for every released lock in shared mode. The last one is
enough.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

70567db0

locking: Handle spurious notifications on lock acquire · 8d7d8b57

Michael Hanselmann authored 13 years ago


This was already a TODO since the implementation of lock priorities in
September 2010. Under certain conditions a waiting acquire can be
notified at a time when it can't actually get the lock. In this case it
would try and fail to acquire the lock and then return to the caller
before the timeout ends.

While this is not bad (nothing breaks), it isn't nice either. A separate
patch will prevent unnecessary notifications when shared locks are
released.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

8d7d8b57

locking: Fix lock deletion with timeout · 26082b7e

Michael Hanselmann authored 13 years ago


While working on another SharedLock fix I realized timeouts on lock
deletion don't work very well if the timeout actually expires. This
patch fixes the issue and adds a new unittest.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

26082b7e

Move _TimeoutExpired to utils · 6b27f535

Andrea Spadaccini authored 13 years ago


Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

(cherry picked from commit f8326fca)

6b27f535

Revert "Stop acquiring BGL for LUXI queries" · 6fe4baf0

Michael Hanselmann authored 13 years ago


This reverts commit 0fa753ba.

Turns out there are more queries acquiring locks than we'd like. This
patch goes to version 2.6 and a separate patch fixes the immediate
issues in LUClusterVerifyConfig.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

6fe4baf0

LUClusterVerifyConfig: Share BGL, acquire all locks in shared mode · a5485ffc

Michael Hanselmann authored 13 years ago


Instead of acquiring the BGL in exclusive mode (which blocks all other
operations), we acquire all locks for groups, nodes and instances in
shared mode before verifying the configuration.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

a5485ffc

Mar 21, 2012

KVM: don't add -nographic using spice · 596b2459

Guido Trotter authored 13 years ago


This fixes issue 222.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

596b2459

Mar 20, 2012

Stop acquiring BGL for LUXI queries · 0fa753ba

Michael Hanselmann authored 13 years ago


Short description: This fixes an issue whereby masterd would become
unresponsive on the LUXI socket, leading to client timeouts. While made
worse in 2.5, the underlying issue was already present in 2.4.

Longer description: Until now all LUXI queries would acquire the BGL
(big Ganeti lock) in shared mode. With the exception of OpNodeAdd and
OpNodeRemove, this was also the case for all opcodes before version 2.5.
In 2.5 we split OpClusterVerify into multiple opcodes, one of which
(OpClusterVerifyConfig) now acquires the BGL in exclusive mode. Whether
or not doing so is good is a separate discussion: OpNodeAdd and
OpNodeRemove, as of this writing, still require an exclusive BGL.
OpClusterVerifyConfig is run more often than OpNodeAdd or OpNodeRemove
in normal clusters, which is why we only recognized this issue in 2.5.

What would happen is that once OpClusterVerifyConfig tried to acquire
its exclusive BGL while it was actually held by other opcodes (e.g.
OpInstanceReplaceDisks), the locking code would not grant shared
acquires for the BGL, even when the exclusive acquire is removed from
the queue for a short amount of time after a timeout. This is necessary
to prevent lock starvation.

In this situation further LUXI queries requiring the BGL in shared mode,
e.g. OpClusterQuery, would block and the client eventually time out.
Over time they fill the client request workerpool's queue and at that
point even requests not requiring the BGL stop working. Once the
long-running operation(s) holding the BGL in shared mode finished,
OpClusterVerifyConfig gets it in exclusive mode and everything returns
to normal. LUXI recovers very soon too.

I'd like to thank Bernardo Dal Seno for his contribution to this bugfix.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

0fa753ba

Mar 19, 2012

EPO: Pass the no_remember parameter to preserve state · 3e0ed18c
René Nussbaumer authored 13 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
3e0ed18c

Fix type error in LUInstanceChangeGroup · 666e013f

Iustin Pop authored 13 years ago


If a specific list of groups has been requested, then the code used
that, without transforming it to a (frozen)set first, which results
in:

 unsupported operand type(s) for &: 'list' and 'frozenset'

Trivial fix is to do that in the 'then' branch.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

666e013f

Feb 23, 2012

Merge branch 'stable-2.5' into devel-2.5 · a0a63e76

Michael Hanselmann authored 13 years ago


* stable-2.5:
  Fix Makefile.am compatibility with automake 1.11.2

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a0a63e76

Feb 20, 2012

Fix Makefile.am compatibility with automake 1.11.2 · b8fe7ca6

Iustin Pop authored 13 years ago


Automake 1.11.2 made the following change:

* Long-standing bugs:
  - Automake now warns about more primary/directory invalid combinations,
    such as "doc_LIBRARIES" or "pkglib_PROGRAMS".

Unfortunately, this breaks our Makefile.am (issue 216) exactly because
we were relying on pkglib_SCRIPTS.

This patch works around this by adding a new myexeclibdir variable
(exec so that it is intalled at `install-exec` time, the same as the
pkglibdir), and switches to that.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

b8fe7ca6

Feb 15, 2012

Reconcile Makefile.am and test data files · 1a1e7ab3

Iustin Pop authored 13 years ago


Sorry, forgot this in previous commit.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

1a1e7ab3

Workaround changed LVM behaviour · 048eeb2b

Iustin Pop authored 13 years ago


The vgreduce command has changed behaviour from when we initially
wrote the code (2.02.02 versus 2.02.66, 4 years delta):

- if there are LVs which will be impacted, it requires --force
- otherwise refuses to proceed, but it still returns exit code 0

We handle this by looking to see if it returns "Wrote out consistent
volume group" (behaviour unchanged), or if it complains about
"--force"; in the case it didn't complete, we retry the operation.

We improve a bit the checking of "vgs", as it uses to fail silently
and we didn't detect it.

New tests for this function should test, I believe, all the expected
variations; at the least we now have data files with the expected
output.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

048eeb2b

Feb 07, 2012

Accept both PUT and POST in noded · 5d0566de

Iustin Pop authored 13 years ago


This is a partial cherry-pick from
7530364d on master:

Currently, noded requires PUT, even though the semantics of the RPC
calls do not match a PUT. We change the code accept both PUT and POST,
with the intention to remove the PUT support in a later version.

Additionally, we add a message to the HttpBadRequest exception to make
clear the failure mode (not seeing any error message was what made me
send this patch…). This was the only description-less use of this
exception, by the way.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
(cherry picked from commit 7530364d)

What was not cherry-picked is the rpc change (to switch to PUT). The
reason I want to backport this to devel-2.5 is that when upgrading to
2.6, having noded accept both makes for an easier upgrade path.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

5d0566de

Feb 01, 2012

Merge branch 'stable-2.5' into devel-2.5 · dd9b9d7b

Michael Hanselmann authored 13 years ago


* stable-2.5:
  Fix type check for OpQuery.filter
  Fix explanation of gnt-node evacuate --primaries-only
  Makefile.am: fix permissions for Python scripts on install
  devel/upload: Fix permissions for installed directories
  Fix cluster verification issues on multi-group clusters

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

dd9b9d7b

Jan 31, 2012

Fix type check for OpQuery.filter · 545d0362

Michael Hanselmann authored 13 years ago

Just using ht.TListOf as a type check doesn't work correctly. The
function must be called with the expected item type. In this specific
case TListOf was always called with the filter as a value, and the
result of that call evaluated to truth. Since filters can be quite
complex there's no check yet, and therefore just “TList” is used.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

545d0362

Jan 26, 2012

Fix explanation of gnt-node evacuate --primaries-only · f1dff7ec

Iustin Pop authored 13 years ago


Furthermore, correct the --help display on evacuate.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f1dff7ec