- Nov 21, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Andrea Spadaccini authored
In the last merge I erroneously discarded the changes introduced by commit 2a6de57a "Check the results of master IP RPCs". This commit reintroduces them. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Patch tested and confirmed to work by Andrea Spadaccini <spadaccio@google.com>. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com>
-
Michael Hanselmann authored
Otherwise jobs started after an unclean master shutdown will fail as they depend on the RPC client. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Until now, if masterd received a fatal signal, it would start shutting down immediately. In the meantime it would hang while jobs are still processed. Clients couldn't connect anymore to retrieve a jobs' status. This this patch masterd checks if any job is running before shutting down. If there is it'll check again every five seconds. Once all jobs are finished, it waits another five seconds to give clients a chance to retrieve the jobs' status. After that masterd will shutdown in a clean fashion. If a second signal is received the old behaviour is preserved. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Instead of aborting the main loop as soon as a fatal signal (SIGTERM or SIGINT) is received, additional logic allows waiting for tasks to finish while I/O is still being processed. If no callback function is provided the old behaviour--shutting down on the first signal--is preserved. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This is needed in case the scheduler user (daemon.Mainloop in this case) has other timeouts at the same time. Needed for clean master shutdown. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Doing so will prevent job submissions (similar to a drained queue), but won't affect currently running jobs. No further jobs will be executed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Also log a message when a fatal signal was received and use dict.items. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 18, 2011
-
-
Iustin Pop authored
While testing with ghc 7.2, I saw that some imports we are using are very old (from ghc 6.8 time), even though current libraries are using different names. We fix this and bump minimum documented version to ghc 6.12, as I don't have 6.10 to test anymore (possibly still works with that version, but better safe - both Ubuntu Lucid and Debian Squeeze ship with 6.12 nowadays). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Andrea Spadaccini authored
* devel-2.5: (24 commits) LUInstanceCreate: Release unused node locks htools: rework message display construction hbal: handle empty node groups Document OpNodeMigrate's result for RAPI Ensure unused ports return to the free port pool Re-wrap a paragraph to eliminate a sphinx warning Fix newer pylint's E0611 error in compat.py Fail if node/group evacuation can't evacuate instances Update init script description LUInstanceRename: Compare name with name LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode Update synopsis for “gnt-cluster repair-disk-sizes” Move hooks PATH environment variable to constants Check the results of master IP RPCs Add documentation for the master IP hooks Add master IP turnup and turndown hooks Add RunLocalHooks decorator Generalize HooksMaster Update NEWS for 2.5.0~rc4 Bump version to 2.5.0~rc4 ... Conflicts: NEWS doc/hooks.rst lib/backend.py lib/cmdlib.py lib/constants.py Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
* stable-2.5: htools: rework message display construction hbal: handle empty node groups Document OpNodeMigrate's result for RAPI Fail if node/group evacuation can't evacuate instances LUInstanceRename: Compare name with name LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode Update NEWS for 2.5.0~rc4 Bump version to 2.5.0~rc4 jqueue: Allow zero jobs to be submitted at once hail: don't select the primary as new secondary hail: add an extra safety check in relocate Bump version to 2.5.0~rc3 Conflicts: configure.ac: Trivial Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
* devel-2.4: Ensure unused ports return to the free port pool Re-wrap a paragraph to eliminate a sphinx warning Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 17, 2011
-
-
Agata Murawska authored
Signed-off-by:
Agata Murawska <agatamurawska@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Agata Murawska authored
Signed-off-by:
Agata Murawska <agatamurawska@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Agata Murawska authored
Signed-off-by:
Agata Murawska <agatamurawska@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Agata Murawska authored
Signed-off-by:
Agata Murawska <agatamurawska@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
We still allow explicit shutdown of confd, but we prevent manual or automatic start-up. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
If confd is disabled, do not automatically restart it. Furthermore, we can't run maintenance actions if it is disabled so log a warning. Note that I haven't completely disabled the NodeMaintenance class with ENABLE_CONFD = False because I think they are at two different levels (e.g. we might have other maintenance actions done even with confd disabled). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Doesn't do anything yet. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently, the code in Node.hs is overly strict: once a node's free memory reaches 0, it will refuse to add any instances (offline or not). I think this is a safe safeguard (I don't expect nodes to run without at least 1MB of free memory), so rather than change this behaviour we need to restrict the Node generation in the unittest to skip such nodes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
It is not used. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This is different from “Quiesce” in the sense that this function just changes an internal flag and doesn't wait for the queue to be empty. Tasks already being processed continue normally, but no new tasks will be started. New tasks can still be added, but won't be processed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This saves us from returning to the worker code when there is no task to be processed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This is in preparation for a clean(er) shutdown of masterd. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
After iallocator ran we can release any unused node locks. Since they must be in exclusive mode this should improve parallelization during instance creation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
… instead of a variable which needs to be incremented for every step. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 16, 2011
-
-
Iustin Pop authored
While diagnosing some (unrelated) memory usage in htools, I've stumbled upon some very bad behaviour in checkData: mapAccum is non-strict, and the tuple we use also, so that results in the list of list of messages being very bad space-wise (hundreds of MB of memory for a simulated cluster with thousands of nodes, all with errors). The new, explicit reuse of the old message list has a linear memory behaviour. The only downside is that messages are listed in the reverse order (which I'll fix on master). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-