From c6d992c539127b37f188c7adbb108aadfa5b240a Mon Sep 17 00:00:00 2001
From: Iustin Pop <iustin@google.com>
Date: Wed, 18 Apr 2012 18:15:51 +0200
Subject: [PATCH] Add design document for query path splitting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: RenΓ© Nussbaumer <rn@google.com>
---
 doc/design-draft.rst           |   1 +
 doc/design-query-splitting.rst | 156 +++++++++++++++++++++++++++++++++
 2 files changed, 157 insertions(+)
 create mode 100644 doc/design-query-splitting.rst

diff --git a/doc/design-draft.rst b/doc/design-draft.rst
index c349092f9..629a386c2 100644
--- a/doc/design-draft.rst
+++ b/doc/design-draft.rst
@@ -14,6 +14,7 @@ Design document drafts
    design-node-state-cache.rst
    design-resource-model.rst
    design-virtual-clusters.rst
+   design-query-splitting.rst
 
 .. vim: set textwidth=72 :
 .. Local Variables:
diff --git a/doc/design-query-splitting.rst b/doc/design-query-splitting.rst
new file mode 100644
index 000000000..2992f74b4
--- /dev/null
+++ b/doc/design-query-splitting.rst
@@ -0,0 +1,156 @@
+===========================================
+Splitting the query and job execution paths
+===========================================
+
+
+Introduction
+============
+
+Currently, the master daemon does two main roles:
+
+- execute jobs that change the cluster state
+- respond to queries
+
+Due to the technical details of the implementation, the job execution
+and query paths interact with each other, and for example the "masterd
+hang" issue that we had late in the 2.5 release cycle was due to the
+interaction between job queries and job execution.
+
+Furthermore, also because technical implementations (Python lacking
+read-only variables being one example), we can't share internal data
+structures for jobs; instead, in the query path, we read them from
+disk in order to not block job execution due to locks.
+
+All these point to the fact that the integration of both queries and
+job execution in the same process (multi-threaded) creates more
+problems than advantages, and hence we should look into separating
+them.
+
+
+Proposed design
+===============
+
+In Ganeti 2.7, we will introduce a separate, optional daemon to handle
+queries (note: whether this is an actual "new" daemon, or its
+functionality is folded into confd, remains to be seen).
+
+This daemon will expose exactly the same Luxi interface as masterd,
+except that job submission will be disabled. If so configured (at
+build time), clients will be changed to:
+
+- keep sending REQ_SUBMIT_JOB, REQ_SUBMIT_MANY_JOBS, and all requests
+  except REQ_QUERY_* to the masterd socket (but also QR_LOCK)
+- redirect all REQ_QUERY_* requests to the new Luxi socket of the new
+  daemon (except generic query with QR_LOCK)
+
+This new daemon will serve both pure configuration queries (which
+confd can already serve), and run-time queries (which currently only
+masterd can serve). Since the RPC can be done from any node to any
+node, the new daemon can run on all master candidates, not only on the
+master node. This means that all gnt-* list options can be now run on
+other nodes than the master node. If we implement this as a separate
+daemon that talks to confd, then we could actually run this on all
+nodes of the cluster (to be decided).
+
+During the 2.7 release, masterd will still respond to queries itself,
+but it will log all such queries for identification of "misbehaving"
+clients.
+
+Advantages
+----------
+
+As far as I can see, this will bring some significant advantages.
+
+First, we remove any interaction between the job execution and cluster
+query state. This means that bugs in the locking code (job execution)
+will not impact the query of the cluster state, nor the query of the
+job execution itself. Furthermore, we will be able to have different
+tuning parameters between job execution (e.g. 25 threads for job
+execution) versus query (since these are transient, we could
+practically have unlimited numbers of query threads).
+
+As a result of the above split, we move from the current model, where
+shutdown of the master daemon practically "breaks" the entire Ganeti
+functionality (no job execution nor queries, not even connecting to
+the instance console), to a split model:
+
+- if just masterd is stopped, then other cluster functionality remains
+  available: listing instances, connecting to the console of an
+  instance, etc.
+- if just "queryd" is stopped, masterd can still process jobs, and one
+  can furthermore run queries from other nodes (MCs)
+- only if both are stopped, we end up with the previous state
+
+This will help, for example, in the case where the master node has
+crashed and we haven't failed it over yet: querying and investigating
+the cluster state will still be possible from other master candidates
+(on small clusters, this will mean from all nodes).
+
+A last advantage is that we finally will be able to reduce the
+footprint of masterd; instead of previous discussion of splitting
+individual jobs, which requires duplication of all the base
+functionality, this will just split the queries, a more trivial piece
+of code than job execution. This should be a reasonable work effort,
+with a much smaller impact in case of failure (we can still run
+masterd as before).
+
+Disadvantages
+-------------
+
+We might get increased inconsistency during queries, as there will be
+a delay between masterd saving an updated configuration and
+confd/query loading and parsing it. However, this could be compensated
+by the fact that queries will only look at "snapshots" of the
+configuration, whereas before it could also look at "in-progress"
+modifications (due to the non-atomic updates). I think these will
+cancel each other out, we will have to see in practice how it works.
+
+Another disadvantage *might* be that we have a more complex setup, due
+to the introduction of a new daemon. However, the query path will be
+much simpler, and when we remove the query functionality from masterd
+we should have a more robust system.
+
+Finally, we have QR_LOCK, which is an internal query related to the
+master daemon, using the same infrastructure as the other queries
+(related to cluster state). This is unfortunate, and will require
+untangling in order to keep code duplication low.
+
+Long-term plans
+===============
+
+If this works well, the plan would be (tentatively) to disable the
+query functionality in masterd completely in Ganeti 2.8, in order to
+remove the duplication. This might change based on how/if we split the
+configuration/locking daemon out, or not.
+
+Once we split this out, there is not technical reason why we can't
+execute any query from any node; except maybe practical reasons
+(network topology, remote nodes, etc.) or security reasons (if/whether
+we want to change the cluster security model). In any case, it should
+be possible to do this in a reliable way from all master candidates.
+
+Some implementation details
+---------------------------
+
+We will fold this in confd, at least initially, to reduce the
+proliferation of daemons. Haskell will limit (if used properly) any too
+deep integration between the old "confd" functionality and the new query
+one. As advantages, we'll have a single daemons that handles
+configuration queries.
+
+The redirection of Luxi requests can be easily done based on the
+request type, if we have both sockets open, or if we open on demand.
+
+We don't want the masterd to talk to the queryd itself (hidden
+redirection), since we want to be able to run queries while masterd is
+down.
+
+During the 2.7 release cycle, we can test all queries against both
+masterd and queryd in QA, so we know we have exactly the same
+interface and it is consistent.
+
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End:
-- 
GitLab