Skip to content
Snippets Groups Projects
Commit fd83f03c authored by Karl Fogel's avatar Karl Fogel
Browse files

Fix digest bug of commit f90484e1 / commit c9095f8b

The bug was a digest miscalculation when encrypting, and its cause was
a confusion about the order in which different things would get hashed
into the digest.

The right order is that the raw head fuzz gets hashed first, followed
by all the message plaintext.  But a subtle problem could either cause
the plaintext hashing to never happen at all, or cause the order of
hashing to be reversed.

The generation of head fuzz was being done in the first call to
convert(), but commit c9095f8b changed PadEncoder.encode() so that
*both* its call to convert() and its call to digest_gulp() were
conditionalized on whether there was some compressed output ready to
encode.  Remember, the compressor doesn't always emit anything on a
given call -- if it can't do enough compression, it buffers stuff up
for a while in the hopes that future input will be compressible based
on past input.

Furthermore, on short input, *no* call to compressor.compress()
returns any output, and we have to wait until compressor.flush(),
which isn't called until PadEncoder.finish().  This means that either
digest_gulp() doesn't get called at all, or, conversely, if we
deconditionalize it in PadEncoder.encode(), it gets called *before*
the first call to convert() -- because convert() would now first be
called in PadEncoder.finish() instead of in PadEncoder.encode() --
thus reversing the order in which head fuzz and plaintext get hashed.

The solution is to do what we should have done a long time ago:
prepare head fuzz in PadSession.prepare_for_encryption(), so that
hashing isn't sensitive to the order of calls to convert().  (A
corollary is that decryption should probably work the same way, but
let's do one change at a time.)

This solution also preserves the fix of the bug that commit c9095f8b
fixed -- the ugly and information-leaking extra newline in the base64
output, between head fuzz section and the message section -- by
ensuring that we never call base64.encodestring() on empty input.

Most of 'make check' still passes after this, except that some tests
still fail as before because of zap expectation issues:

  ERROR: expected 'v', found 'z'
  ERROR: did not see expected DigestMismatch message digest error
  FAIL: tampering with message digest causes authentication error

  ERROR: expected 'a', found 'w'
  ERROR: did not see expected DigestMismatch head fuzz digest error
  FAIL: tampering with head fuzz digest causes authentication error

I will finish updating the test suite in a subsequent commit.

* onetime
  (PadSession.__init__): New internal var self._head_buffer.
  (PadSession.prepare_for_encryption): Fill _head_buffer.
  (PadSession.convert): Don't handle head fuzz here, just prepend any
    head buffer to the result and empty the head buffer if so.  Also,
    add a blank line to indicate conditional flow better, and remove
    an obsolete comment.
  (PadSession._get_id): Remove extra blank line before definition start.
  (PadSession._consume_fuzz_bytes): Fix test for is_head_fuzz.
  (PadSession._make_fuzz): Fix doc string, and remove obsolete comment.
  (SessionEncoder.__init__): Remove _started_encoding flag.
  (SessionEncoder.encode): Deconditionalize call to digest_gulp.
parent f90484e1
No related branches found
No related tags found
No related merge requests found
......@@ -651,6 +651,12 @@ but it still needs to be a new object due to certain initializations)."""
self._tail_fuzz_length = None
self._tail_fuzz_length_source_bytes = None
# This buffer holds all encrypted head fuzz bytes that have not
# yet been emitted by convert(). That is, when this buffer is not
# empty, then it is what convert() needs to emit *before* it emits
# anything else -- the first part of the output of the first call.
self._head_buffer = ''
# This buffer always holds the latest input, and must always be at
# least as long as the digest + tail fuzz, so that we can
# refrain from decrypting them as part of the original plaintext.
......@@ -712,6 +718,7 @@ but it still needs to be a new object due to certain initializations)."""
if self._decrypting:
raise PadSession.OverPrepared(
"cannot prepare for both encryption and decryption")
self._head_buffer = self._make_inner_header()
self._encrypting = True
def prepare_for_decryption(self):
......@@ -765,10 +772,7 @@ must be used for all subsequent calls with that instance.
raise PadSession.PadSessionUninitialized(
"pad session not yet prepared for either encrypting or decrypting")
elif not self._begun:
if self._encrypting:
inner_header_data = self._make_inner_header()
result += inner_header_data
else: # self._decrypting is set
if self._decrypting:
# TODO (minor): It's possible string might be so short that
# it doesn't even contain enough information to know the fuzz
# length yet. The solution is easy: if we haven't yet begun,
......@@ -784,6 +788,7 @@ must be used for all subsequent calls with that instance.
"Got both a result string and a pad remainder")
if pad_remaining > 0:
self._pad_remaining_to_skip = pad_remaining
if self._pad_remaining_to_skip > 0:
# TODO: Ahh, we're storing _pad_remaining_to_skip in self!
# TODO: These "pad_remaining_..." vars should be named
......@@ -797,7 +802,7 @@ must be used for all subsequent calls with that instance.
num_bytes_to_consume_now = self._pad_remaining_to_skip
self._pad_remaining_to_skip = new_pad_remaining
num_fuzz_bytes_remaining, string = self._consume_fuzz_bytes(
num_bytes_to_consume_now, string, is_head_fuzz=True) # fooo?
num_bytes_to_consume_now, string, is_head_fuzz=True)
self._pad_remaining_to_skip += num_fuzz_bytes_remaining
# Once we've handled any inner headers, buffer for decrypting.
......@@ -824,9 +829,11 @@ must be used for all subsequent calls with that instance.
result += chr(ord(string[i]) ^ ord(pad_str[i]))
self._length += string_len
self._begun = True
if self._head_buffer:
result = self._head_buffer + result
self._head_buffer = ''
return result
def _get_id(self):
"""Get the ID for this session's underlying pad.
(The ID is just the pad's first 32 bytes in hexadecimal.)"""
......@@ -993,7 +1000,7 @@ must be used for all subsequent calls with that instance.
consumable_len = min(len(string), num_bytes)
pad_data = self.padfile.read(consumable_len)
self._length += consumable_len
if is_head_fuzz is not None:
if is_head_fuzz:
hash_input = ""
for i in range(consumable_len):
hash_input += chr(ord(string[i]) ^ ord(pad_data[i]))
......@@ -1006,7 +1013,7 @@ must be used for all subsequent calls with that instance.
def _make_fuzz(self, num_bytes, is_head_fuzz=False):
"""Return NUM_BYTES worth of encrypted fuzz data, advancing the pad.
If IS_HEAD_FUZZ is true, then update the session hash with the NUM_BYTES
f random fuzz data generated herein."""
of random fuzz data generated herein."""
try:
rnd_data = self._randy.rand_bytes(num_bytes)
pad_data = self.padfile.read(num_bytes)
......@@ -1014,8 +1021,6 @@ f random fuzz data generated herein."""
ret_data = ''
self._length += num_bytes
for i in range(num_bytes):
# TODO: The problem is that here hasher hashes rnd + pad,
# whereas when we consume fuzz, we hash rnd - pad, right?
ret_data = ret_data + chr(ord(rnd_data[i]) ^ ord(pad_data[i]))
if is_head_fuzz:
self.digest_gulp(rnd_data)
......@@ -1358,58 +1363,14 @@ class SessionEncoder:
self.compressor = bz2.BZ2Compressor()
self.pad_sess.prepare_for_encryption()
# Flag indicating whether or not self.pad_sess.convert() has been
# called for the first time yet.
#
# TBD: This is a totally lame boundary violation, and it would be
# better to structure the code in such a way that this dependency
# is handled more naturally. Here's the situation:
#
# The first call to convert() in self.encode() performs some
# initializations for the pad session -- initializing the head
# fuzz and digest -- while subsequent calls are no-ops if there's
# nothing to convert. But digest_gulp() should only be called
# after those initializations, since they affect the digest.
#
# We *could* call convert() unconditionally, even if there is no
# compressed plaintext to convert yet, since, except for the
# initializations, that would be a no-op. But that can lead to
# an ugly extra newline in the base64 encoding, like so (trimmed):
#
# [...]
# FPdpeW1dOtl9cbZDkNmK2hd5Y7S8UMJVNWzc2r/zwMyzWy9cxsIvbJyRcgm
# bssb/2Mrfl4tMr0OeAQRu5ZlBI99O5b6p931c/sHks032xgFfMu13eep/1S
# 3uc=
# p4WTBmiXVjp1pTZY/A2cUBosg2KsBHH8bvb/LAjAnW9csJb3uUpboD/DQOl
# Eh/lUVJHa5a1yVtWHPN7CIY/H4hbJ6C4S4B6ir5gt5V3WX+Id847QnS5T9B
# [...]
#
# So we don't want to call convert() unless there's actually
# compressed plaintext to convert, and therefore we need to know
# whether or not convert() has been called for the first time, so
# we can decide whether to call digest_gulp() -- of course, once
# we start calling digest_gulp(), we always call it thereafter.
self._started_encoding = False
def encode(self, string):
"""Return onetime-encoded data for STRING, or the empty string if none.
Consume pad as needed."""
out = ''
compressed_plaintext = self.compressor.compress(string)
if compressed_plaintext:
self._started_encoding = True
out = base64.encodestring(self.pad_sess.convert(compressed_plaintext))
if self._started_encoding is True:
# TODO: Ooooooh, wait, the problem is that, in the test case, we
# *never* get here -- all the conversion happens in finish().
# The solution is to do the digest preparation in the initial
# call to self.pad_sess.prepare_for_encryption(), and just have
# the head fuzz ready for the first call to convert() to prepend
# to whatever it's converting. The bookkeeping for that can all
# be handled inside the pad session.
sys.stderr.write("KFF: digesting '%s'\n" % string)
sys.stderr.flush()
self.pad_sess.digest_gulp(string)
self.pad_sess.digest_gulp(string)
return out
def finish(self):
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment