diff --git a/agkyra/docs/lost_conflict_hazard.txt b/agkyra/docs/lost_conflict_hazard.txt new file mode 100644 index 0000000000000000000000000000000000000000..1955dbe7cd4c88e6fc2bb422394752918393ba45 --- /dev/null +++ b/agkyra/docs/lost_conflict_hazard.txt @@ -0,0 +1,75 @@ +Overview +~~~~~~~~ + +Synchronizing between two copies of the same file is clean when only one +of them has changed since the last successful synchronization. If both +of them have changed, then there are in general three possibilities: + +1. Merge the two changes to obtain the next common version of the file, + avoiding, or automatically resolving any conflicts. This requires + knowledge of the semantics of the changes. + +2. Abort the synchronization and either block further changes until + conflict resolution is performed externally, or let the versions + diverge. + +3. Choose one side, the master side, that will force its version onto + the other, slave side, so that synchronization is always achieved. + The slave's changes, however can be preserved as a branching remnant + of the conflict so that there is no data loss. + +The first solution can't be general enough for arbitrary user files. +Eventually, the user is the one responsible to make up their mind and +resolve a conflict. If the user has two computers and is writing a poem, +there is no way the system can automatically resolve the conflicting use +between two words in a way that satisfies the user. + +Aborting synchronization or letting the versions diverge defeats the +purpose of synchronization. The user can copy a file and produce +diverging versions as they see fit. + +Therefore, the only practical option is to force one side's version onto +another and preserve any conflicting changes separately archived. The +concern is that the user must not lose data because of a conflict. +Rather, they must be able to review the conflicting changes and decide +what to do with them themselves. + +Synchronize-Update Race +~~~~~~~~~~~~~~~~~~~~~~~ + +In synchronizing two files, say A/alpha and B/alpha, there is a race +between those who introduce new changes to the files and the system that +keeps the files synchronized. The decision to copy one version onto +another, for example A/alpha -> B/alpha is made after examining the +contents of both versions. + +However, by the time that the copying takes place, either of the two +versions could have been updated, obsoleting the states which the +decision was based on. A straightforward solution is to keep both files +locked as the synchronization is running. + +Locking both files, which may include remote systems, is not at all +practical. The synchronization may take time for data inspection and +transfers, and loss of connectivity may result in inaccessible files. +This greatly reduces usability. + +Lost-Conflict Hazard +~~~~~~~~~~~~~~~~~~~~ + +Instead, we choose to be optimistic and not lock anything. Instead we +prepare the synchronization without yet committing to it, even based on +potentially stale state. However, at the last moment, just before we +make the new version of the updated file visible, we check that it is +indeed the same version we started with and that no changes have been +made to it as we were preparing to update it. + +This last step must perform the committing update and the check +atomically. Otherwise, concurrent updates to the file will remain unseen +and therefore discarded, when they should have been archived as a +conflict. Hence, the lost-conflict hazard. + +Performing a commit-if-unchanged action at the end still requires +synchronization but the action can be compressed in a single point in +time and at a single computer. Locking would create a critical time +period among different machines where all but one would be denied +access. diff --git a/agkyra/docs/partial_update_hazard.txt b/agkyra/docs/partial_update_hazard.txt new file mode 100644 index 0000000000000000000000000000000000000000..4711dc48ddf3ecafb28228d621c20293b6851e1f --- /dev/null +++ b/agkyra/docs/partial_update_hazard.txt @@ -0,0 +1,51 @@ + +Consider a shell script file with the following segment: + + # Run tests + TDIR=./test-data + ./test $TDIR + rm -rf $TDIR + +Consider a small change: + + # Run tests in subdir + TDIR=./test-data + ./test $TDIR + rm -rf $TDIR + +Consider the byte strings for the two segments above: + +"# Run tests\nTDIR=./test-data\n./test $TDIR\nrm -rf $TDIR\n" +"# Run tests in subdir\nTDIR=./test-data\n./test $TDIR\nrm -rf $TDIR\n" + ^^ + || + Previous page <------ -----> Next page + VV + Memory page + boundary + +If the second version of the file is partially updated so that only the +previous page is updated then the resulting segment is: + +"XXXXXXXXXXXXXXXXXXXXXXXXXXXXX\n./test $TDIR\nrm -rf $TDIR\n" +"# Run tests in subdir\nTDIR=.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" + + # Run tests in subdir + TDIR=. + ./test $TDIR + rm -rf $TDIR + +The last command inadvertently removes the PARENT of the directory +intended for removal. This will result in unintended loss of data. + +For this reason, a partial update is not safe to perform. +But how to avoid it? + +Partial update is a hazard both while reading the source file and while +writing the target file. + +For reading a file, one can wait until nobody has write access and +then exclude all writers until the file has been read. + +For writing a file, one can write a new file and when complete, +atomically replace the old file. diff --git a/agkyra/docs/sketch.txt b/agkyra/docs/sketch.txt new file mode 100644 index 0000000000000000000000000000000000000000..ab0cd261a781c1662e8f5ef90bf542fd57e8e94b --- /dev/null +++ b/agkyra/docs/sketch.txt @@ -0,0 +1,101 @@ +""" + Sketch for a safe asynchronous syncer + + The syncer is built around a database, and facilitates synchronization + between filesystem-like archives. Separate clients to each of the + archives are responsible for triggering and applying syncer decisions. + + The syncer is based on the following practical concerns: + + - The decision to synchronize a path must have access to both master and + slave live states, which cannot be considered always current, because + that would require locking both archives. Therefore the syncer must + operate asynchronously, optimistically but safely. + + - Due to the asynchronous nature of the process, conflicts cannot be + centrally declared, but must be handled by the clients that update + each archive. + + - Clients must be able to atomically update files defeating the partial + update, and the lost conflict hazard. + + + The 'syncer' has access to a database and to two 'clients', + the MASTER and the SLAVE. Each client can access a file-system-like + repository called an 'archive'. + + For each path in each archive there is a state in the syncer's + database. This state is updated by 'probing' a specific archive + path through a client. The client is given the old state registered + in the database and if it detects that the path has changed it + reports a new state back. + + Thus far the syncer maintains state for each archive separately. + The next step is to synchronize a path between the archives. + The idea is that all archives the syncer is connected to are views + of the same file repository and therefore they should all have the + same contents. + + To keep track this one 'true' state of a path, the syncer maintains + a special 'SYNC' archive in the database. The state in this path + is the last acknowledged state that was synchronized across the + archives. + + All deviations from the state in the SYNC archives are considered + new changes and are propagated. When new changes occur in both + archives, then the MASTER's version gets propagated and the SLAVE's + version gets conflicted, and is stashed away. + + The syncer's operation for each path follows a simple state machine: + + probe -> update() + decide() acknowledge() + | + | | | + V V +--> PUSHING --+ V + DECIDING ---| |---> IDLE + +--> PULLING --+ + + | | + +----------------+ + Syncing + + A probe only ends in an update() if it detects a change in state. + decide() changes the state of two paths, the path in the archive + that was changed goes into PUSHING while the path in the archive + that will receive the change goes into PULLING. + + Along with these transitions, the PUSHING client is told to 'stage' + the changed file. Staging a file provides a handle to enable + communication between the two clients so that the PULLING client + can pull it. + + After the pulling client has committed the changes onto its archive, + it calls back with acknowledge(), that puts the new state into the + SYNC archive and tells the pushing client to unstage the file. + + So the state machine runs in a loop, from IDLE to DECIDING and back + to IDLE. Each such loop has a serial number. This number is used to + defend against old or double calls. Each probe records this serial + and its update() will not succeed if the serial has changed. + Likewise, each pulling records the serial and its acknowledge will + not have an effect if the serial has changed. + + This mechanism can also defend against failures. An update() will + always set the state to DECIDING even if there is a pulling going + on concurrently. The new sync cycle will race the old one but only + one will prevail. + + To ensure that, serials must never be reissued, and the pulling + of the changes in an archive must be atomic and only applied if + the state of the archive is identical with the SYNC state in the + syncer's database. + + Other than that, the database can safely rewind to previous + state (e.g. lose changes not committed to disk) provided that + the serials are never re-issued. + + The clients must strictly observe the order of serials in the + commands for probing or staging or pulling. A command must not + be executed unless the associated serial exceeds every past serial. +"""