From c6d992c539127b37f188c7adbb108aadfa5b240a Mon Sep 17 00:00:00 2001 From: Iustin Pop <iustin@google.com> Date: Wed, 18 Apr 2012 18:15:51 +0200 Subject: [PATCH] Add design document for query path splitting MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Iustin Pop <iustin@google.com> Reviewed-by: RenΓ© Nussbaumer <rn@google.com> --- doc/design-draft.rst | 1 + doc/design-query-splitting.rst | 156 +++++++++++++++++++++++++++++++++ 2 files changed, 157 insertions(+) create mode 100644 doc/design-query-splitting.rst diff --git a/doc/design-draft.rst b/doc/design-draft.rst index c349092f9..629a386c2 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -14,6 +14,7 @@ Design document drafts design-node-state-cache.rst design-resource-model.rst design-virtual-clusters.rst + design-query-splitting.rst .. vim: set textwidth=72 : .. Local Variables: diff --git a/doc/design-query-splitting.rst b/doc/design-query-splitting.rst new file mode 100644 index 000000000..2992f74b4 --- /dev/null +++ b/doc/design-query-splitting.rst @@ -0,0 +1,156 @@ +=========================================== +Splitting the query and job execution paths +=========================================== + + +Introduction +============ + +Currently, the master daemon does two main roles: + +- execute jobs that change the cluster state +- respond to queries + +Due to the technical details of the implementation, the job execution +and query paths interact with each other, and for example the "masterd +hang" issue that we had late in the 2.5 release cycle was due to the +interaction between job queries and job execution. + +Furthermore, also because technical implementations (Python lacking +read-only variables being one example), we can't share internal data +structures for jobs; instead, in the query path, we read them from +disk in order to not block job execution due to locks. + +All these point to the fact that the integration of both queries and +job execution in the same process (multi-threaded) creates more +problems than advantages, and hence we should look into separating +them. + + +Proposed design +=============== + +In Ganeti 2.7, we will introduce a separate, optional daemon to handle +queries (note: whether this is an actual "new" daemon, or its +functionality is folded into confd, remains to be seen). + +This daemon will expose exactly the same Luxi interface as masterd, +except that job submission will be disabled. If so configured (at +build time), clients will be changed to: + +- keep sending REQ_SUBMIT_JOB, REQ_SUBMIT_MANY_JOBS, and all requests + except REQ_QUERY_* to the masterd socket (but also QR_LOCK) +- redirect all REQ_QUERY_* requests to the new Luxi socket of the new + daemon (except generic query with QR_LOCK) + +This new daemon will serve both pure configuration queries (which +confd can already serve), and run-time queries (which currently only +masterd can serve). Since the RPC can be done from any node to any +node, the new daemon can run on all master candidates, not only on the +master node. This means that all gnt-* list options can be now run on +other nodes than the master node. If we implement this as a separate +daemon that talks to confd, then we could actually run this on all +nodes of the cluster (to be decided). + +During the 2.7 release, masterd will still respond to queries itself, +but it will log all such queries for identification of "misbehaving" +clients. + +Advantages +---------- + +As far as I can see, this will bring some significant advantages. + +First, we remove any interaction between the job execution and cluster +query state. This means that bugs in the locking code (job execution) +will not impact the query of the cluster state, nor the query of the +job execution itself. Furthermore, we will be able to have different +tuning parameters between job execution (e.g. 25 threads for job +execution) versus query (since these are transient, we could +practically have unlimited numbers of query threads). + +As a result of the above split, we move from the current model, where +shutdown of the master daemon practically "breaks" the entire Ganeti +functionality (no job execution nor queries, not even connecting to +the instance console), to a split model: + +- if just masterd is stopped, then other cluster functionality remains + available: listing instances, connecting to the console of an + instance, etc. +- if just "queryd" is stopped, masterd can still process jobs, and one + can furthermore run queries from other nodes (MCs) +- only if both are stopped, we end up with the previous state + +This will help, for example, in the case where the master node has +crashed and we haven't failed it over yet: querying and investigating +the cluster state will still be possible from other master candidates +(on small clusters, this will mean from all nodes). + +A last advantage is that we finally will be able to reduce the +footprint of masterd; instead of previous discussion of splitting +individual jobs, which requires duplication of all the base +functionality, this will just split the queries, a more trivial piece +of code than job execution. This should be a reasonable work effort, +with a much smaller impact in case of failure (we can still run +masterd as before). + +Disadvantages +------------- + +We might get increased inconsistency during queries, as there will be +a delay between masterd saving an updated configuration and +confd/query loading and parsing it. However, this could be compensated +by the fact that queries will only look at "snapshots" of the +configuration, whereas before it could also look at "in-progress" +modifications (due to the non-atomic updates). I think these will +cancel each other out, we will have to see in practice how it works. + +Another disadvantage *might* be that we have a more complex setup, due +to the introduction of a new daemon. However, the query path will be +much simpler, and when we remove the query functionality from masterd +we should have a more robust system. + +Finally, we have QR_LOCK, which is an internal query related to the +master daemon, using the same infrastructure as the other queries +(related to cluster state). This is unfortunate, and will require +untangling in order to keep code duplication low. + +Long-term plans +=============== + +If this works well, the plan would be (tentatively) to disable the +query functionality in masterd completely in Ganeti 2.8, in order to +remove the duplication. This might change based on how/if we split the +configuration/locking daemon out, or not. + +Once we split this out, there is not technical reason why we can't +execute any query from any node; except maybe practical reasons +(network topology, remote nodes, etc.) or security reasons (if/whether +we want to change the cluster security model). In any case, it should +be possible to do this in a reliable way from all master candidates. + +Some implementation details +--------------------------- + +We will fold this in confd, at least initially, to reduce the +proliferation of daemons. Haskell will limit (if used properly) any too +deep integration between the old "confd" functionality and the new query +one. As advantages, we'll have a single daemons that handles +configuration queries. + +The redirection of Luxi requests can be easily done based on the +request type, if we have both sockets open, or if we open on demand. + +We don't want the masterd to talk to the queryd itself (hidden +redirection), since we want to be able to run queries while masterd is +down. + +During the 2.7 release cycle, we can test all queries against both +masterd and queryd in QA, so we know we have exactly the same +interface and it is consistent. + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: -- GitLab