Matt's Crazy Hypothesis

Sat Jun 3 11:15:29 CDT 2000

--- Steve Poling <sdpoling at home.com> wrote:
> > >  My current crazy hypothesis:
> > >   If a text file is x, all encryption schemes can be represented
> by
> > > g(x),
> > > and f(x) is a compression scheme, for some x's there exists a
> g(x)
> > > such that f(g(x)) < f(x).
> > >
> > > what do you think?
> 
> I think i don't do enough mathematics.
> 
> if you ever find g() that satisfies your conjecture, you ought not
> use it
> for cryptographic purposes.
> 
> Consider a captain midnight decoder ring cipher. Its entropy will be
> equivalent to the original plaintext, since it's only permuted the
> original
> symbols. Conversely, if g() is cryptographically secure, it will make
> g(x)
> appear more "random" than x. It will flatten the distribution of
> symbols in
> the ciphertext.
> 
> I believe one can prove entropy(g(x)) > entropy(x). I also believe
> one can
> prove that entropy(f(x)) > entropy(x).
> 
> My experience has been that the compression factor of f() is a
> function of
> the original entropy of x. As x grows closer to "random," there is
> less
> structure in x for f() to exploit. As f() grows more efficient,
> entropy(f(x)) grows as well.
> 
> However, since entropy(g(x)) > entropy(x) we should expect encryption
> to
> degrade the compression factor: size(f(g(x)))/size(x) >
> size(f(x))/size(x)
> 
> if size(x) > 0, this implies size(f(g(x))) > size(f(x))
> 
> Now, this doesn't exactly satisfy your conjecture. Because I think
> that all
> the information-theoretic proofs will be true on all sets except
> those of
> lebesgue measure zero. Thus, a pathological file and f() and g() on
> the
> order of weirdness of the Cantor set could sneak past Dr. Shannon
> (who
> invented info theory).

That's kinda what I was thinking (except for that last paragraph : ) ).

Note though that Matt did say "for some x's".

Of course, there are contrived examples of f() and g() which satisfy
Matt's conjecture, but I doubt any serious f() and g() would do so for
most x's.

For (a contrived) example, you could have a compression algorithm, f(),
that does a sort of hardcoded Huffman encoding (i.e.- it has a table of
predefined strings of symbols that it expects will be "common" and
replaces them with shorter strings, but this mapping table is static). 
Now imagine that g() is just a trivial "rot1" (rotate by 1 letter of
the alphabet) encryption scheme.  Ordinarily, a string like "the" would
be pretty common in English text, but now imagine our contrived
compression algorithm doesn't have "the" in it's table, but rather has
"uif".  Since rot1("the") = "uif", our encryption function, g(), plays
right into our compression function, f().  Now imagine that there's a
whole lot of that going on between f() and g().

Sure, this example is simple and totally contrived, but I suppose it
demonstrates that Matt's conjecture could be true - even for "most x's"
for special pairs of f() and g().

+Joel

__________________________________________________
Do You Yahoo!?
Yahoo! Photos -- now, 100 FREE prints!
http://photos.yahoo.com