HACKING.org



Algorithm
Background reading: CRDT
This packages implements the Logoot split algorithm
  André, Luc, et al. "Supporting adaptable granularity of changes for massive-scale collaborative editing." 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE, 2013.
The CRDT-ID blocks are implemented by text property =’crdt-id=.
  A continous range of text with the same =’crdt-id= property represent a CRDT-ID block.
  The =’crdt-id= is a a cons of (ID-STRING . END-OF-BLOCK-P),
  where ID-STRING represent the CRDT-ID of the leftmost character in the block.
  If END-OF-BLOCK-P is NIL, the block is a non-rightmost segment splitted from a larger block,
  so insertion at the right of this block shouldn’t be merged into the block by sharing the base CRDT-ID and increasing offset.
ID-STRING is a unibyte string representing a CRDT-ID (for efficient comparison).
  Every two bytes represent a big endian encoded integer.
  For base IDs, last two bytes are always representing Site ID.
  Stored strings are BASE-ID:OFFSETs. So the last two bytes represent offset,
  and second last two bytes represent Site ID.
Protocol
Text-based version
  (it should be easy to migrate to a binary version.  Using text for better debugging for now)
Note: Starting from v0.3.0, we separate User IDs and Site IDs.
  Site IDs are buffer local and temporarily assigned to users with writable access.
Every message takes the form (type . body)

  Text Editing
    A peer must obtain a site-id before performing the following operations,
    by remote calling crdt-get-write-access. See Remote Function.
    
      insert

        body takes the form (buffer-name user-id crdt-id position-hint content)
        
          
position-hint is the buffer position where the operation happens at the site
            which generates the operation.  Then we can play the trick that start search
            near this position at other sites to speedup CRDT ID search
          
content is the string to be inserted
        
      
      delete

        body takes the form (buffer-name user-id position-hint . crdt-id-list)
        
          
crdt-id-list is generated from CRDT--DUMP-IDS from the deleted text
        
      
      cursor

        body takes the form
        (buffer-name user-id point-position-hint point-crdt-id mark-position-hint mark-crdt-id)
        *-crdt-id can be either a CRDT ID, or
        
          
nil, which means clear the point/mark
          =”“=, which means (point-max)

        
      
    
  
  Contact information
    
      contact

        body takes the form (user-id slot value)
        
          
slot can be one of
            name host service focus
            
          
        
      
      leave

        body takes the form (user-id)
        This message is sometime sent from client to server to indicate disconnection,
          if the underlying proxy doesn’t indicate disconnection properly.
      
    
  
  Login
    
      hello

        This message is sent from client to server, when a client connect to the server.
        body takes the form (protocol-version &optional response)

      challenge

        body takes the form (salt)

      login

        It’s always sent after server receives a hello message.
        Assigns a User ID to the client
        body takes the form (user-id).
    
  
  Initial Synchronization
    
      sync

        This message is sent from server to client to get it sync to the state on the server.
        Might be used for other optimization in the future.
        One optimization I have in mind is let server try to merge all CRDT item into a single
        one and try to synchronize this state to clients at best effort.
        body takes the form (buffer-name . crdt-id-list)
        
          
crdt-id-list is generated from CRDT--DUMP-IDS

        
      
      ready

        body takes the form (buffer-name major-mode-symbol)
        Indicates the end of a batch of synchronization messages
        (which usually contains some cursor messages, a sync message,
        and some overlay-* messages).
        The client should now try to enable major-mode-symbol in the
        synchronized buffer.
    
  
  Error Recovery
    Note: when a client side error happens, it just sends a get message and
    follow initial synchronization procedure to reinitialize the buffer.
    
      error

        body takes the form (buffer-name error-symbol . error-datum).
        This message is sent from server to client to notice that some messages from the
        client is not processed due to error (error-symbol . error-datum).
        Normally client should follow initial synchronization procedure to reinitialize the buffer.
        
          
buffer-name can also be nil, which signifies that it’s a session error.
            The only reasonable thing to do is to disconnect in this scenario.
            Currently, this happens when client/server protocol version doesn’t match.
        
      
    
  
  Buffer Service
    
      add

        Indicates that the server has started sharing some buffers.
        body takes the form buffer-name-list

      remove

        Indicates that the server has stopped sharing some buffers.
        body takes the form buffer-name-list

      get

        Request the server to resend sync message for a buffer.
        body takes the form (buffer-name)

    
  
  Overlay Synchronization
    
      overlay-add

        body takes the form
        (buffer-name user-id logical-clock species
  front-advance rear-advance
  start-position-hint start-crdt-id
  end-position-hint end-crdt-id)
        
      
      overlay-move

        body takes the form
        (buffer-name user-id logical-clock
  start-position-hint start-crdt-id
  end-position-hint end-crdt-id)
        
      
      overlay-put

        body takes the form (buffer-name user-id logical-clock prop value)

      overlay-remove

        body takes the form (buffer-name user-id logical-clock)

    
  
  <<Remote Function>>
    
      fcap

        body takes the form (fcap-symbol nonce in-states out-states . interactive-form)
        This grants a “functional capability” to a peer.
        Nonce is a random number to prevent forging capability.
        
          
in-states is a list of state symbols that the function depends on.
            out-states is a list of state symbols that the function modifies and should be synchronized
            to the caller.
            See Allowed state symbols.
        
      
      funcall

        body takes the form
        (user-id logical-clock spawn-user-id 
  state-list nonce fcap-symbol . args)
        
        
          
spawn-user-id represents the site where the interactive command is originally invoked
            
              It can be different from user-id because a remote function can call a remote function!
                This is especially useful when client makes a remote call,
                but the call on the server request some interactive input,
                and such interactive call are remote-called back into the client.
            
          
          
state-list is an alist of bindings.
            (except that we use 1 element list for the CDRs, to save a dot in the serialized string)
            (CDRs can also be 2 element list of the form (crdt-id pos-hint))
            <<Allowed state symbols>> are
            window window-point buffer buffer-content point
mark mark-active transient-mark-mode last-command-event
            
          
        
      
      return

        body takes the form (user-id logical-clock state-list success-p . return-values)

    
  
  Buffer local variables
    
      var
body takes the form (buffer-name variable-symbol . args)
        args is passed to the variable receiver (get variable-symbol 'crdt-variable-receiver)
        to calculate an updated value.
        The actual format of args depends on the variable sender and receiver
        (which supposed implement some CRDT).
        All peer must make sure they install the same kind of variable sender and receiver
          for variable-symbol.
      
    
  
  Remote Buffer Process
    
      process

        body takes the form (buffer-name string)
        Sent from client to server, request sending string
        to the process buffer associated to buffer-name.
      process-mark

        body takes the form (buffer-name crdt-id position-hint).
    
  

NOTE: for overlay-put, overlay-move and process-mark, server must also broadcast the message
  back to the client that generated it, to ensure consistent global history.
Emacs as a collaborative operating system
The goal: With a few annotations, developer should be able to make any Emacs application
  collaboration-powered. Emacs should be one of the most powerful collaboration platforms.
How: There’re plenty of Emacs applications centered around the buffer and buffer-local-variables.
  By implementing synchronization primitives for all components in a buffer,
  pretty much everything can be made collaborative.
  Synchronize arbitrary buffer-local-variable reasonably is hard, but user annotations can help.
How to implement collaboration support for a package
crdt.el provides two sets of facilities for adding collaboration support, a command-based one and a state-based one.
  Package hackers are free to combine them to provide desired behavior.
Command-based collaboration
This is a simple method to add collaboration support.
  After registering a command with crdt-register-remote-command,
  an :around advice is added such that when a client invoke this command,
  an request is sent to the server instead of running the command locally.
Hackers must make sure that they declare what sets of buffer state the command uses
  to fully preserve user intent.
Although relatively simple, collaboration command implemented using this method
  must go through a round trip to the server and will incur latency.
Why we need used-state-set annotations
Suppose Alyssa P. Hacker does (crdt-register-remote-command 'eval-last-sexp),
  but didn’t declare that eval-last-sexp uses content of the buffer.
  Now the hackers are conspiring in an crdt.el session.
  Ben Bitdiddle places cursor after (+ 1 1) and run eval-last-sexp.
  However, the moment Ben Bitdiddle’s request arrives at the server,
  Cy D. Fect has changed (+ 1 1) to (+ 1 2) (their message arrives first!).
  Now the server does what it sees and return 3, instead of 2.
The correct solution is to let the server roll-back to the state when Ben Bitdiddle invoked the command.
  It is relatively expensive thus we don’t want to do this for every command,
  thus we require package hackers to annotate explicitly.
The above mechanism haven’t been implemented yet!
  But adding annotations now will help adding it in the future.
  To implement this mechanism we need to add lamport timestamp to every messages
  (which may corresponds to mutation of interesting states),
  and send a vector clock in command messages which depend on buffer content.
State-based collaboration
We can also synchronize the underlying state of the packages
  rather than proxying user-level commands.
  If there’re good CRDT candidates to be used for the state
  (hackers need to understand what concurrency semantics their state need to have!),
  then the commands can have real-time effect without needing to be acknowledged from the server.
crdt-org-sync-overlay-mode is an example of this approach.
Overall, this method is much more complicated than command-base method.
  Development of the facility is still on-going.
Task list for crdt.el facility

  [X] synchronize buffer text (insert/delete)
  [X] synchronize overlays
  [-] synchronize major/minor modes
    
      [X] initial synchronization of major modes
      [ ] toggle minor modes on the fly
      [X] change major modes on the fly
    
  
  [-] set of synchronization primitives for buffer local variables
    
      [-] server dictated
        
          [ ] non incremental
          [X] naive incremental
          [ ] state-of-the-art level tree diff
        
      
      [ ] a library of CRDTs
    
  
  [X] synchronize text properties (any use case for this?)
    
      [X] synchronize when new text is inserted
      [X] synchronize when changed
    
  
  [ ] synchronize markers (any use case for this?)
  [-] remote command
    
      [X] basic remote command (only possibly use (point))
      [X] command that uses region
      [ ] correctly handle command that uses buffer content
      [ ] handle arbitrary interactive form (firstly, what’s the right thing to do?)
    
  
  [-] remote buffer process
    
      [X] process mark
      [X] send to process
      [ ] make sure “pseudo process” really looks like process
        (define complete set of advices)
    
  

Notes and examples of CRDTize built-in packages
Search for ;;; Built-in package integrations in crdt.el
Cross-editor support
The current plan is to reuse the Emacs implementation as a local server for any other editor, aka Emacs as a service.
  The benefit is that we don’t need to reimplement the sophiscated CRDT algorithm in other uncivilized environments.
  We then just need to design a thin protocol that communicate between local Emacs and the other editor.
  Since this protocol communicate only locally, the latency should be negligible,
  therefore we use a blocking reader/writer lock based synchronization scheme.
Lock: modes of operations
It turns out that I vastly over-estimated the extensibility of The Other Editors.
  For example, lots of them (including M$ vScoDe and cult 666) doesn’t seem to have anything like pre-command-hook,
  making it impossible to implement a usual bidirectional locking mechanism
  (because we can’t tell those editors to acquire lock from Emacs before running commands that potentially modify the buffer).
Currently I implemneted a hack that by default let The Other Editors hold the lock, but upon receiving
  an acquire from Emacs, let The Other Editors dead loops to hopefully halt command execution until Emacs gives back the lock.
  Emacs thus must give back lock as soon as possible to un-hang The Other Editors.
Q: What if Emacs GCs?
  Q got thrown out of the window.
Bridge protocol

  Reader/writer lock
    
      aquire
body takes the form ()

      release
body takes the form ()

    
  

The rest is mostly analogue to the primary protocol for Emacsen,
  except that CRDT IDs are replaced by explicit integer position (start from 1, as in Emacs).

  Text Editing
    
      insert
body takes the form (buffer-name position content)

      delete
body takes the form (buffer-name position length)

    
  
  Peer State
    
      cursor
body takes the form (buffer-name user-id point-position mark-position)
        *-position can be either an integer, or
        
          
nil, which means clear the point/mark
        
      
      contact
same as primary protocol.
      leave
same as primary protocol.
    
  
  Login
    Note that we don’t include challenge/response authentication mecahnism.
    
      hello
same as primary protocol.
      login
same as primary protocol.
    
  
  Initial Synchronization
    
      sync
body takes the form (buffer-name content-string)

      ready
same as primary protocol.
    
  
  Buffer Service
    
      add
same as primary protocol.
      remove
same as primary protocol.
      get
same as primary protocol.