Fix digest bug of commit f90484e1 / commit c9095f8b

The bug was a digest miscalculation when encrypting, and its cause was a confusion about the order in which different things would get hashed into the digest. The right order is that the raw head fuzz gets hashed first, followed by all the message plaintext. But a subtle problem could either cause the plaintext hashing to never happen at all, or cause the order of hashing to be reversed. The generation of head fuzz was being done in the first call to convert(), but commit c9095f8b changed PadEncoder.encode() so that *both* its call to convert() and its call to digest_gulp() were conditionalized on whether there was some compressed output ready to encode. Remember, the compressor doesn't always emit anything on a given call -- if it can't do enough compression, it buffers stuff up for a while in the hopes that future input will be compressible based on past input. Furthermore, on short input, *no* call to compressor.compress() returns any output, and we have to wait until compressor.flush(), which isn't called until PadEncoder.finish(). This means that either digest_gulp() doesn't get called at all, or, conversely, if we deconditionalize it in PadEncoder.encode(), it gets called *before* the first call to convert() -- because convert() would now first be called in PadEncoder.finish() instead of in PadEncoder.encode() -- thus reversing the order in which head fuzz and plaintext get hashed. The solution is to do what we should have done a long time ago: prepare head fuzz in PadSession.prepare_for_encryption(), so that hashing isn't sensitive to the order of calls to convert(). (A corollary is that decryption should probably work the same way, but let's do one change at a time.) This solution also preserves the fix of the bug that commit c9095f8b fixed -- the ugly and information-leaking extra newline in the base64 output, between head fuzz section and the message section -- by ensuring that we never call base64.encodestring() on empty input. Most of 'make check' still passes after this, except that some tests still fail as before because of zap expectation issues: ERROR: expected 'v', found 'z' ERROR: did not see expected DigestMismatch message digest error FAIL: tampering with message digest causes authentication error ERROR: expected 'a', found 'w' ERROR: did not see expected DigestMismatch head fuzz digest error FAIL: tampering with head fuzz digest causes authentication error I will finish updating the test suite in a subsequent commit. * onetime (PadSession.__init__): New internal var self._head_buffer. (PadSession.prepare_for_encryption): Fill _head_buffer. (PadSession.convert): Don't handle head fuzz here, just prepend any head buffer to the result and empty the head buffer if so. Also, add a blank line to indicate conditional flow better, and remove an obsolete comment. (PadSession._get_id): Remove extra blank line before definition start. (PadSession._consume_fuzz_bytes): Fix test for is_head_fuzz. (PadSession._make_fuzz): Fix doc string, and remove obsolete comment. (SessionEncoder.__init__): Remove _started_encoding flag. (SessionEncoder.encode): Deconditionalize call to digest_gulp.

Fix digest bug of commit f90484e1 / commit c9095f8b
fd83f03c · Karl Fogel · f90484e1 · fd83f03c
Commit fd83f03c authored 8 years ago by Karl Fogel
--- a/onetime
+++ b/onetime
@@ -651,6 +651,12 @@ but it still needs to be a new object due to certain initializations)."""
    self._tail_fuzz_length = None
    self._tail_fuzz_length_source_bytes = None

+    # This buffer holds all encrypted head fuzz bytes that have not
+    # yet been emitted by convert().  That is, when this buffer is not
+    # empty, then it is what convert() needs to emit *before* it emits
+    # anything else -- the first part of the output of the first call.
+    self._head_buffer = ''
+
    # This buffer always holds the latest input, and must always be at
    # least as long as the digest + tail fuzz, so that we can
    # refrain from decrypting them as part of the original plaintext.
@@ -712,6 +718,7 @@ but it still needs to be a new object due to certain initializations)."""
    if self._decrypting:
      raise PadSession.OverPrepared(
        "cannot prepare for both encryption and decryption")
+    self._head_buffer = self._make_inner_header()
    self._encrypting = True

  def prepare_for_decryption(self):
@@ -765,10 +772,7 @@ must be used for all subsequent calls with that instance.
        raise PadSession.PadSessionUninitialized(
          "pad session not yet prepared for either encrypting or decrypting")
      elif not self._begun:
-        if self._encrypting:
-          inner_header_data = self._make_inner_header()
-          result += inner_header_data
-        else: # self._decrypting is set
+        if self._decrypting:
          # TODO (minor): It's possible string might be so short that
          # it doesn't even contain enough information to know the fuzz
          # length yet.  The solution is easy: if we haven't yet begun,
@@ -784,6 +788,7 @@ must be used for all subsequent calls with that instance.
              "Got both a result string and a pad remainder")
          if pad_remaining > 0:
            self._pad_remaining_to_skip = pad_remaining
+
      if self._pad_remaining_to_skip > 0:
        # TODO: Ahh, we're storing _pad_remaining_to_skip in self!
        # TODO: These "pad_remaining_..." vars should be named
@@ -797,7 +802,7 @@ must be used for all subsequent calls with that instance.
          num_bytes_to_consume_now = self._pad_remaining_to_skip
        self._pad_remaining_to_skip = new_pad_remaining
        num_fuzz_bytes_remaining, string = self._consume_fuzz_bytes(
-          num_bytes_to_consume_now, string, is_head_fuzz=True) # fooo?
+          num_bytes_to_consume_now, string, is_head_fuzz=True)
        self._pad_remaining_to_skip += num_fuzz_bytes_remaining

      # Once we've handled any inner headers, buffer for decrypting.
@@ -824,9 +829,11 @@ must be used for all subsequent calls with that instance.
      result += chr(ord(string[i]) ^ ord(pad_str[i]))
    self._length += string_len
    self._begun = True
+    if self._head_buffer:
+      result = self._head_buffer + result
+      self._head_buffer = ''
    return result

-
  def _get_id(self):
    """Get the ID for this session's underlying pad.
    (The ID is just the pad's first 32 bytes in hexadecimal.)"""
@@ -993,7 +1000,7 @@ must be used for all subsequent calls with that instance.
      consumable_len = min(len(string), num_bytes)
      pad_data = self.padfile.read(consumable_len)
      self._length += consumable_len
-      if is_head_fuzz is not None:
+      if is_head_fuzz:
        hash_input = ""
        for i in range(consumable_len):
          hash_input += chr(ord(string[i]) ^ ord(pad_data[i]))
@@ -1006,7 +1013,7 @@ must be used for all subsequent calls with that instance.
  def _make_fuzz(self, num_bytes, is_head_fuzz=False):
    """Return NUM_BYTES worth of encrypted fuzz data, advancing the pad.
 If IS_HEAD_FUZZ is true, then update the session hash with the NUM_BYTES
-f random fuzz data generated herein."""
+of random fuzz data generated herein."""
    try:
      rnd_data = self._randy.rand_bytes(num_bytes)
      pad_data = self.padfile.read(num_bytes)
@@ -1014,8 +1021,6 @@ f random fuzz data generated herein."""
      ret_data = ''
      self._length += num_bytes
      for i in range(num_bytes):
-        # TODO: The problem is that here hasher hashes rnd + pad,
-        # whereas when we consume fuzz, we hash rnd - pad, right?
        ret_data = ret_data + chr(ord(rnd_data[i]) ^ ord(pad_data[i]))
      if is_head_fuzz:
        self.digest_gulp(rnd_data)
@@ -1358,58 +1363,14 @@ class SessionEncoder:
    self.compressor = bz2.BZ2Compressor()
    self.pad_sess.prepare_for_encryption()

-    # Flag indicating whether or not self.pad_sess.convert() has been
-    # called for the first time yet.
-    #
-    # TBD: This is a totally lame boundary violation, and it would be
-    # better to structure the code in such a way that this dependency
-    # is handled more naturally.  Here's the situation:
-    #
-    # The first call to convert() in self.encode() performs some
-    # initializations for the pad session -- initializing the head
-    # fuzz and digest -- while subsequent calls are no-ops if there's
-    # nothing to convert.  But digest_gulp() should only be called
-    # after those initializations, since they affect the digest.
-    #
-    # We *could* call convert() unconditionally, even if there is no
-    # compressed plaintext to convert yet, since, except for the
-    # initializations, that would be a no-op.  But that can lead to
-    # an ugly extra newline in the base64 encoding, like so (trimmed):
-    #
-    #   [...]
-    #   FPdpeW1dOtl9cbZDkNmK2hd5Y7S8UMJVNWzc2r/zwMyzWy9cxsIvbJyRcgm
-    #   bssb/2Mrfl4tMr0OeAQRu5ZlBI99O5b6p931c/sHks032xgFfMu13eep/1S
-    #   3uc=
-    #   p4WTBmiXVjp1pTZY/A2cUBosg2KsBHH8bvb/LAjAnW9csJb3uUpboD/DQOl
-    #   Eh/lUVJHa5a1yVtWHPN7CIY/H4hbJ6C4S4B6ir5gt5V3WX+Id847QnS5T9B
-    #   [...]
-    #
-    # So we don't want to call convert() unless there's actually
-    # compressed plaintext to convert, and therefore we need to know
-    # whether or not convert() has been called for the first time, so
-    # we can decide whether to call digest_gulp() -- of course, once
-    # we start calling digest_gulp(), we always call it thereafter.
-    self._started_encoding = False
-    
  def encode(self, string):
    """Return onetime-encoded data for STRING, or the empty string if none.
    Consume pad as needed."""
    out = ''
    compressed_plaintext = self.compressor.compress(string)
    if compressed_plaintext:
-      self._started_encoding = True
      out = base64.encodestring(self.pad_sess.convert(compressed_plaintext))
-    if self._started_encoding is True:
-      # TODO: Ooooooh, wait, the problem is that, in the test case, we
-      # *never* get here -- all the conversion happens in finish().
-      # The solution is to do the digest preparation in the initial
-      # call to self.pad_sess.prepare_for_encryption(), and just have
-      # the head fuzz ready for the first call to convert() to prepend
-      # to whatever it's converting.  The bookkeeping for that can all
-      # be handled inside the pad session.
-      sys.stderr.write("KFF: digesting '%s'\n" % string)
-      sys.stderr.flush()
-      self.pad_sess.digest_gulp(string)
+    self.pad_sess.digest_gulp(string)
    return out

  def finish(self):