Skip to content
Snippets Groups Projects
HACKING.org 13.2 KiB
Newer Older
Qiantan Hong's avatar
Qiantan Hong committed
* Algorithm

Qiantan Hong's avatar
Qiantan Hong committed
Background reading: [[https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type][CRDT]]

Qiantan Hong's avatar
Qiantan Hong committed
This packages implements the Logoot split algorithm
~André, Luc, et al. "Supporting adaptable granularity of changes for massive-scale collaborative editing." 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE, 2013.~
    
Qiantan Hong's avatar
Qiantan Hong committed
The CRDT-ID blocks are implemented by text property ='crdt-id=. 
A continous range of text with the same ='crdt-id'= property represent a CRDT-ID block. 
The ='crdt-id= is a a cons of =(ID-STRING . END-OF-BLOCK-P)=, 
where =ID-STRING= represent the CRDT-ID of the leftmost character in the block.
If =END-OF-BLOCK-P= is =NIL=, the block is a non-rightmost segment splitted from a larger block,
so insertion at the right of this block shouldn't be merged into the block by sharing the base CRDT-ID and increasing offset.
Qiantan Hong's avatar
Qiantan Hong committed

=ID-STRING= is a unibyte string representing a CRDT-ID (for efficient comparison).
Every two bytes represent a big endian encoded integer.
For base IDs, last two bytes are always representing site ID.
Stored strings are BASE-ID:OFFSETs. So the last two bytes represent offset,
and second last two bytes represent site ID.

* Protocol
Qiantan Hong's avatar
Qiantan Hong committed
  Text-based version
  (it should be easy to migrate to a binary version.  Using text for better debugging for now)
Qiantan Hong's avatar
Qiantan Hong committed

  Every message takes the form =(type . body)=

  - Text Editing
    + insert ::
      body takes the form =(buffer-name crdt-id position-hint content)=
      - =position-hint= is the buffer position where the operation happens at the site
        which generates the operation.  Then we can play the trick that start search
        near this position at other sites to speedup CRDT ID search
      - =content= is the string to be inserted

    + delete ::
      body takes the form =(buffer-name position-hint . crdt-id-list)=
      - =crdt-id-list= is generated from =CRDT--DUMP-IDS= from the deleted text

  - Peer State
    + cursor ::
      body takes the form
           =(buffer-name site-id point-position-hint point-crdt-id mark-position-hint mark-crdt-id)=
      =*-crdt-id= can be either a CRDT ID, or
        - =nil=, which means clear the point/mark
        - =""=, which means =(point-max)=

    + contact ::
      body takes the form
           =(site-id name address port)=
      when name is =nil=, clear the contact for this =site-id=

    + focus ::
      body takes the form =(site-id buffer-name)=

  - Login
    + hello ::
      This message is sent from client to server, when a client connect to the server.
      body takes the form =(client-name &optional response)=

    + leave ::
      This message is sometime sent from client to server to indicate disconnection, 
      if the underlying proxy doesn't handle it properly.
      body takes the form =()=

    + challenge ::
      body takes the form =(salt)=

    + login ::
      It's always sent after server receives a hello message.
      Assigns an ID to the client
      body takes the form =(site-id session-name)=.

  - Initial Synchronization
    + sync ::
      This message is sent from server to client to get it sync to the state on the server.
Qiantan Hong's avatar
Qiantan Hong committed
      Might be used for other optimization in the future.
      One optimization I have in mind is let server try to merge all CRDT item into a single
      one and try to synchronize this state to clients at best effort.
      body takes the form =(buffer-name . crdt-id-list)=
      - =crdt-id-list= is generated from =CRDT--DUMP-IDS=

    + ready ::
      body takes the form =(buffer-name major-mode-symbol)=
      Indicates the end of a batch of synchronization messages
      (which usually contains some =cursor= messages, a =sync= message,
      and some =overlay-*= messages).
      The client should now try to enable =major-mode-symbol= in the
      synchronized buffer.

Qiantan Hong's avatar
Qiantan Hong committed
  - Error Recovery
    Note: when a client side error happens, it just sends a =get= message and
    follow initial synchronization procedure to reinitialize the buffer.

    + error ::
      body takes the form =(buffer-name error-symbol . error-datum)=.
      This message is sent from server to client to notice that some messages from the
      client is not processed due to error =(error-symbol . error-datum)=.
      Normally client should follow initial synchronization procedure to reinitialize the buffer.

  - Buffer Service
    + add ::
      Indicates that the server has started sharing some buffers.
      body takes the form =buffer-name-list=

    + remove ::
      Indicates that the server has stopped sharing some buffers.
      body takes the form =buffer-name-list=

    + get ::
      Request the server to resend =sync= message for a buffer.
      body takes the form =(buffer-name)=

  - Overlay Synchronization
    + overlay-add ::
      body takes the form 
      #+BEGIN_SRC
      (buffer-name site-id logical-clock species
        front-advance rear-advance
        start-position-hint start-crdt-id
        end-position-hint end-crdt-id)
      #+END_SRC

    + overlay-move ::
      body takes the form
      #+BEGIN_SRC
      (buffer-name site-id logical-clock
        start-position-hint start-crdt-id
        end-position-hint end-crdt-id)
      #+END_SRC

    + overlay-put ::
      body takes the form =(buffer-name site-id logical-clock prop value)=

    + overlay-remove ::
      body takes the form =(buffer-name site-id logical-clock)=

  - Remote Command
    + command ::
Qiantan Hong's avatar
Qiantan Hong committed
      body takes the form
      #+BEGIN_SRC
      (buffer-name spawn-site-id
       site-id logical-clock state-list
       command-symbol . args)
      #+END_SRC
       - =spawn-site-id= represents the site where the interactive command is originally invoked
         + It can be different from =site-id= because a remote command can call a remote command!
           This is especially useful when client makes a remote call, 
           but the call on the server request some interactive input,
           and such interactive call are remote-called back into the client.
       - =state-list= is an alist of bindings.
        (except that we use 1 element list for the CDRs, to save a dot in the serialized string)
        (CDRs can also be 2 element list of the form =(crdt-id pos-hint)=)
        Allowed symbols are 
        #+BEGIN_SRC
        point mark mark-active transient-mark-mode last-command-event
        #+END_SRC

    + return ::
Qiantan Hong's avatar
Qiantan Hong committed
      body takes the form =(site-id logical-clock state-list success-p . return-values)=

  - Buffer local variables
    + var :: body takes the form =(buffer-name variable-symbol . args)=
      =args= is passed to the variable receiver =(get variable-symbol 'crdt-variable-receiver)=
      to calculate an updated value.
      The actual format of =args= depends on the variable sender and receiver 
      (which supposed implement some CRDT).

      All peer must make sure they install the same kind of variable sender and receiver
      for =variable-symbol=.
  - Remote Buffer Process
    + process ::
      body takes the form =(buffer-name string)=
      Sent from client to server, request sending =string= 
      to the process buffer associated to =buffer-name=.

    + process-mark ::
      body takes the form =(buffer-name crdt-id position-hint)=.

NOTE: for =overlay-put=, =overlay-move= and =process-mark=, server must also broadcast the message
      *back to the client that generated it*, to ensure consistent global history.
Qiantan Hong's avatar
Qiantan Hong committed
* Emacs as a collaborative operating system

The goal: With a few annotations, developer should be able to make any Emacs application 
collaboration-powered. Emacs should be one of the most powerful collaboration platforms.

How: There're plenty of Emacs applications centered around the buffer and buffer-local-variables.
By implementing synchronization primitives for all components in a buffer,
pretty much everything can be made collaborative.
Synchronize arbitrary buffer-local-variable reasonably is hard, but user annotations can help.

Qiantan Hong's avatar
Qiantan Hong committed
** How to implement collaboration support for a package

~crdt.el~ provides two sets of facilities for adding collaboration support, a command-based one and a state-based one. 
Package hackers are free to combine them to provide desired behavior.

*** Command-based collaboration

This is a simple method to add collaboration support. 
After registering a command with =crdt-register-remote-command=, 
an =:around= advice is added such that when a client invoke this command,
an request is sent to the server instead of running the command locally.

Hackers must make sure that they declare what sets of buffer state the command uses 
to fully preserve user intent.

Although relatively simple, collaboration command implemented using this method
must go through a round trip to the server and will incur latency.

**** Why we need used-state-set annotations

Suppose Alyssa P. Hacker does =(crdt-register-remote-command 'eval-last-sexp)=,
but didn't declare that =eval-last-sexp= uses content of the buffer.
Now the hackers are conspiring in an ~crdt.el~ session. 
Ben Bitdiddle places cursor after =(+ 1 1)= and run =eval-last-sexp=.
However, the moment Ben Bitdiddle's request arrives at the server, 
Cy D. Fect has changed =(+ 1 1)= to =(+ 1 2)= (their message arrives first!).
Now the server does what it sees and return =3=, instead of =2=.

The correct solution is to let the server roll-back to the state when Ben Bitdiddle invoked the command.
It is relatively expensive thus we don't want to do this for every command,
thus we require package hackers to annotate explicitly.

/The above mechanism haven't been implemented yet!/ 
But adding annotations now will help adding it in the future.
To implement this mechanism we need to add lamport timestamp to every messages 
(which may corresponds to mutation of interesting states),
and send a vector clock in =command= messages which depend on buffer content.

*** State-based collaboration

We can also synchronize the underlying state of the packages 
rather than proxying user-level commands.
If there're good CRDT candidates to be used for the state 
(hackers need to understand what concurrency semantics their state need to have!),
then the commands can have real-time effect without needing to be acknowledged from the server.

=crdt-org-sync-overlay-mode= is an example of this approach.

Overall, this method is much more complicated than command-base method. 
Development of the facility is still on-going.

** TODO Task list for ~crdt.el~ facility
Qiantan Hong's avatar
Qiantan Hong committed
   - [X] synchronize buffer text (insert/delete)
   - [X] synchronize overlays
   - [-] synchronize major/minor modes
     + [X] initial synchronization of major modes
     + [ ] toggle minor modes on the fly
Qiantan Hong's avatar
Qiantan Hong committed
     + [X] change major modes on the fly
Qiantan Hong's avatar
Qiantan Hong committed
   - [-] set of synchronization primitives for buffer local variables
     + [-] server dictated
       + [ ] non incremental
       + [X] naive incremental
       + [ ] state-of-the-art level tree diff
Qiantan Hong's avatar
Qiantan Hong committed
     + [ ] a library of CRDTs
Qiantan Hong's avatar
Qiantan Hong committed
   - [-] synchronize text properties (any use case for this?)
     + [X] synchronize when new text is inserted
     + [ ] synchronize when changed
Qiantan Hong's avatar
Qiantan Hong committed
   - [ ] synchronize markers (any use case for this?)
   - [-] remote command
     + [X] basic remote command (only possibly use =(point)=)
Qiantan Hong's avatar
Qiantan Hong committed
     + [X] command that uses region
     + [ ] correctly handle command that uses buffer content
     + [ ] handle arbitrary =interactive= form (firstly, what's the right thing to do?)
   - [-] remote buffer process
Qiantan Hong's avatar
Qiantan Hong committed
     + [X] process mark
     + [X] send to process
     + [ ] make sure "pseudo process" really looks like process 
           (define complete set of advices)

** Notes and examples of CRDTize built-in packages

Search for =;;; Built-in package integrations= in ~crdt.el~

* TODO Cross-editor support

The current plan is to reuse the Emacs implementation as a local server for any other editor, aka Emacs as a service. 
The benefit is that we don't need to reimplement the sophiscated CRDT algorithm in other +uncivilized+ environments. 
We then just need to design a thin protocol that communicate between local Emacs and the other editor.
Since this protocol communicate only locally, the latency should be negligible, 
therefore we use a blocking reader/writer lock based synchronization scheme.

** Bridge protocol

   - Reader/writer lock
     + aquire :: body takes the form =()=
     + release :: body takes the form =()=

   The rest is mostly analogue to the primary protocol for Emacsen, 
   except that CRDT IDs are replaced by explicit integer position (start from 1, as in Emacs).

  - Text Editing
    + insert :: body takes the form =(buffer-name position content)=
    + delete :: body takes the form =(buffer-name position length)=

  - Peer State
    + cursor :: body takes the form =(buffer-name site-id point-position mark-position)=
      =*-position= can be either an integer, or
        - =nil=, which means clear the point/mark

    + contact :: same as primary protocol.

    + focus :: same as primary protocol.

  - Login
    Note that we don't include challenge/response authentication mecahnism.

    + hello :: same as primary protocol.
    + leave :: same as primary protocol.

    + login :: same as primary protocol.

  - Initial Synchronization
    + sync :: body takes the form =(buffer-name content-string)=
    + ready :: same as primary protocol.

  - Buffer Service
    + add :: same as primary protocol.
    + remove :: same as primary protocol.
    + get :: same as primary protocol.