----- Forwarded message from Matthew Byng-Maddick <mbm colondot.net> -----

Date: Tue, 16 Dec 2003 11:11:12 +0000
From: Matthew Byng-Maddick <mbm colondot.net>
Subject: Development Methodologies for Obfuscated Code
To: "London.pm list" <london.pm london.pm.org>
User-Agent: Mutt/1.4.1i

During the rather interesting london.java/london.pm crossover meet
(interesting for me because I met someone I went to school with who I
hadn't seen in about 3 years), I showed a Simon (of the "*mumble* Dick
Dastardly *mumble* hehehe" variety) the cam.pm snowman/hangman code. I
was showing some of my working files (which I kept), and he said that
he thought it might possibly interest some of you if I explained what
I did and how the code fits together. It probably won't, but I'll post
it anyway.
(The original cam.pm message was:
  http://cam.pm.org/archive/2003-December/001079.html,
 and last year, I actually ran a competition:
  http://cam.pm.org/archive/2002-December/000953.html)

So, like all good programmers, I started out with a problem to solve.
OK, I don't have a problem, my problem is "find an interesting problem
and write some code to solve it". OK. So, I had an idea for something
involving a snowman (well, it's nearly christmas, and last year's effort
was a snowscape, so...). I started off, then by drawing out some ASCII
art (not particularly good, but anyway).

                 _____
                |     |
              __|_____|__
               /       \
              |  o   o  |
               \   o   /
               ,`-----'.
         _\|  '    o    `  |/_
           \/             \/
           |       o       |
           |               |
            \      o      /
              .         ,
     -----------------------------

So, I started figuring out what I could do with this, and hit upon the
idea of doing a hangman-alike. You draw the body, then the head, then
arms (one at a time) then hat, then buttons, and finally face. The
first challenge in all of these kinds of program is how to represent
the data. I've got two problems here, on the one hand, I need to draw
various characters at various points, but I also need to only draw them
after a number of bad guesses. If I can't draw the character (not enough
bad guesses, I must draw a space.

So we have a sparse matrix, with bad guesses on one side, and characters
on the other.

        | - _ | \ / ` ' , . o
      --+--------------------
      0 | *                     (ground)
      1 | *   * * * * * * *     (body)
      2 |   * * * * * *         (head)
      3 |   * * *               (left arm)
      4 |   * *   *             (right arm)
      5 |   * *                 (hat)
      6 |                   *   (buttons)
      7 |                   *   (face)

The asterisks represent a point in the matrix where we need a character.
Counting them, there are 25. This is a useful number, as it allows us to
use the alphabet to encode. Initially, I decided on missing out "o" to
stop myself confusing it with the character I needed. It turned out that
I didn't need to, but that's the way the data stayed.

        | - _ | \ / ` ' , . o
      --+--------------------
      0 | a                     (ground)
      1 | b   c d e f g h i     (body)
      2 |   j k l m n p         (head)
      3 |   q r s               (left arm)
      4 |   t u   v             (right arm)
      5 |   w x                 (hat)
      6 |                   y   (buttons)
      7 |                   z   (face)

So, we now have a table with an encoding, and we can now draw out our
snowman in all his glory.

                 wwwww
                x     x
              wwxjjjjjxww
               m       l
              k  z   z  k
               l   z   m
               hnbbbbbpi
         qsr  g    y    f  uvt
           se             dv
           c       y       c
           c               c
            d      y      e
              i         h
     aaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Doesn't look much like the original. Also note that at this point, no
code has yet been written, we've only dealt with trying to encode our
data (the most interesting problem of the program). We now need our
matrix in a more useful form, so let's re-write it:

     a      0   -
     b-i    1   -|\/`',.
     j-p    2   _|\/`:'
     q-s    3   _|\
     t-v    4   _|/
     w-x    5   _|
     y      6   o
     z      7   o

Note the ":". This is for the missed-out "o". Now we can start writing
some code.

| # data coding
| @a=("-", "-|\\/`',.", "_|\\/`:'", "_|\\", "_|/", "_|", "o", "o");
|
| # x is the character, and l is the level
| # this function returns the character or a space if the level is
| # not sufficiently high.
| sub char {
|     $x=ord(shift());
|     $l=shift();
|
|     $x -= ord("a");
|     $l++;
|
|     # loop over each of the coding strings
|     for(@a) {
|         # if we have had enough bad guesses
|         if($l--) {
|             # if our position is longer than this coding
|             if($x >= ($r=length($_))) {
|                 # skip over this coding
|                 $x -= $r;
|                 # and go to the next
|                 next;
|             }
|             # else return the character from the right
|             # position in the coding
|             return substr($_, $x-$r, 1);
|         }
|         # level isn't high enough, so print a space
|         return " ";
|     }
| }

The first bit is fine, the second will need tightening up. Let's make
it shorter:

| sub c{$x=ord(shift())-97;$l=shift()+1;
| for(@a){if($l--){$x-=$r,next if($x>=($r
| =y///c));return(substr($_,$x-$r,1))}
| return(chr(32))}}

This latter will be easier to pour into our shape later.

Now we're at the stage where we can turn things from our snowman shape
into the right character for that level. I wrote some quick test bits
at this point to prove to myself that it was working.

Next we'll get our word. I'll cut a long story short by saying that it
took several iterations to get this right. We're going to use
/usr/share/dict/words, which has characters other than lower-case letters
in it. The following gets them out:

| $W="/usr/share/dict/words";
|
| sub w {
|     open(W);
|     while(<W>) {
|         $_ = lc;
|         y/a-z//cd;
|
|         $z = $_
|             if(!int(rand(++$i)));
|     }
|     close(W);
| }

This causes a word to be stored in $z which is from a random line in the
file. The reason this works is a probability argument. You can show that
this is equivalent to picking a random line in a file, and the beauty is
that it's one-pass. It's a standard algorithm, though, so I won't dwell
on it. Note, however that trick you forgot about open where it will use
the scalar with the same identifier as the filehandle by default.

Now we can get a word and print a character, so let's actually try and
print out the snowman. In order to do this we need to store the data.
I chose not the most efficient method, but one sufficiently useful that
it would be easy to code. Any string of numbers represents a decimal
number of spaces, "N" for newline, and the lower-case characters are to
be interpreted by the function c() above.

| $d="12wwwwwN11x5xN9wwxjjjjjxwwN10m7lN9k2z3z2kN10l3z3mN".
|    "10hnbbbbbpiN4qsr2g4y4f2uvtN6se13dvN6c7y7cN6c15cN".
|    "7d6y6eN9i9hNaaaaaaaaaaaaaaaaaaaaaaaaaaaaaN";

and then we need something to print it out:

| sub d {
|     $s = shift();
|     $_ = $d;
|
|     s/N/\n/g;
|
|     s/[a-z]/ c( $&, $s ) /eg;
|
|     s/\d+/ chr(32) x $& /eg;
|
|     $t = $_;
| }

So we now have a copy in $t expanded to be our snowman, and running d(0)
through d(7) and printing $t in each case will produce the snowman as it
appears during the game.

Let's now write an easy routine to process guesses, we might as well get
the bits sorted:

| sub g {
|     $g="";
|     while( $g !~ /^[a-z]/ ) {
|         print "Guess:";
|         $g=<STDIN>;
|     }
|     $g=substr($g,0,1);
|     if(! grep {/$g/} @g ) {
|         push(@g,$g);
|     }
| }

So we get our past guesses in @g, and our current guess in $g.

What we haven't yet done at this point is to work out what our good
and bad guesses are, so let's do that. Writing this routine is
complicated, but we have our good old friend the regular expression
to help us (remember that $z has our chosen word):

| sub z {
|     # make a copy of the word, we don't want to overwrite the
|     # original
|     $Z=$z;
|     # re-calculate our level from scratch
|     $L=0;
|     # set up a regex excluded character class
|     $a="[^";
|     # and an empty array for the bad guesses
|     @G=();
|
|     for(@g) {
|         if( !($Z=~/$_/) ) {
|             # this is a bad guess
|             $L++;
|             push(@G,$_);
|         }
|         else {
|             # this is good, so exclude this from our regex
|             $a .= $_;
|         }
|     }
|
|     # close our character class
|     $a .= " ]";
|     # substitute all characters we *haven't* guessed
|     # correctly for underscores.
|     $Z =~ s/$a/_/g;
|
|     if( $Z !~ /_/ ) {
|         # we've found everything
|         $D = 1;
|     }
| }

So we now have the thing we want to display (for the word) in $Z, we
have the number of bad guesses in $L, and the bad guess list in @G. We
also have $D set if we need to finish. So this is our game engine almost
done.

Let's now put some of this together in a "render" function:

| sub r {
|      # do our game calculations
|      z();
|
|      # get the snowman drawn into $t
|      d($L);
|
|      $t =~ s{
|          ^                    # beginning of a line
|          (                    # beginning of $1
|            [^\n]*\n           # a line
|            [^\n]*\n           # another line
|            [^\n]*             # a third line (without a terminator)
|          )                    # end of $1
|          \n
|      }{
|          $1 . "          " . $Z . "\n"
|      }ex
|      # so we've added our word onto the end of line 3 of the snowman.
|      # this doesn't move because we always draw spaces where there is
|      # nothing else to draw.
|
|      # similarly, we're going to add our bad guesses 3 lines from the
|      # bottom:
|      $t =~ s{
|          (                    # beginning of $1
|            \n                 # end of the third from last line
|            [^\n]*\n           # second from last line
|            [^\n]*\n           # last line
|          )                    # end of $1
|          $                    # end anchor
|      }{
|          "          " . "@G" . $1
|      }ex
|
|      # now we just need to clear the screen before doing it
|      $t="\e[He[J" . $t;
| }

We've now got all of the bits of the game-play, except for the main
loop, we can get guesses, we can hide the word, count the good and
bad guesses, draw the relevant picture on the screen.

So the game loop itself looks like:
| # get the word:
| w();
|
| # while we haven't finished ask for guesses and render our screen
| while ( ! $D ) {
|     r();
|     print $t;
|     if( $L==7 || $D==1 ) {
|         # print the word at the end.
|         print "$z\n";
|         # make this loop terminate
|         $D=1;
|     }
|     else {
|         # Keep guessing...
|         g();
|     }
| }

So we now have a programme we can test, and which works when we put
it all together. We're going to need to pour it into our shape, so
we're going to need to compact it somewhat. So at this point there
are a couple of things to remember. We've used single letter
identifiers everywhere because these are easier to pour. What you
may have known (but hopefully won't) is that you can separate the
sigil and identifier parts with whitespace, thus:
| $foo = $
| a;
*WILL* set $foo to the current value of $a, and not cause a parser
error as you might expect. The other problem we have is that one
keyword in perl is a pain for doing shaped programming (because the
spaces matter), which is "sub" in its non-anonymous form. "sub"
always needs a space because of all of the sub names you might
create. So it's possible to re-write things, e.g.

| $w=sub{open(W);while(<W>){$_=lc;y/a-z//cd;($z=$_)
|        if(!int(rand(++$i)))}close(W)}
| $w->();

This form of writing allows you to write that function with no
whitespace, but it can be separated at least at the following points:

| $ w = sub { open ( W ) ; while ( <W> ) { $ _ = lc ; y/a-z//cd ;
|       ( $ z = $ _ ) if ( ! int ( rand ( ++ $ i ) ) ) } close ( W ) }
| & $ w ;

When pouring, we can often insert extra semi-colons etc. It's worth
remembering that perl has no comment that closes before end of line,
and we often want to reserve the end of line for a "sub" keyword, so
we have a number of choices. The most obvious is nicked straight from
M Brouhat, the master of perl obfuscation, which is to use a regex
in null context: ";/foo/;". We can create useless variables etc.

So the final programme, when poured gives:
| #!/usr/bin/perl
|                         @a=("-","-|\\/\`"
|                         ."',.","_|\\/`:'"
|                         ,"_|\\","_|/","_"
|                         ."|","o","o");sub
|                    d{$s=shift;$_=$d;s/N/\n/g;;
|                    s/[a-z]/c($&,$s)/eg;;s/\d+/
|                        chr(32)x$&/eg;$t=$_
|                       }$W="/usr/share/dict"
|                       ."/words";$d="12wwww"
|                       ."wN11x5xN9wwxjjjjjx"
|                        ."wwN10m7lN9k2z3z".
|               ""        ."2kN10l3z3mN10".        ""
|                .(     "hnbbbbbpiN4qsr2g4y4"     ).
|                 ""   ."f2uvtN6se13dvN6c7y7".   ""
|                  .( "cN6c15cN7d6y6eN9i9hN"."" ).
|                   ("a"x29)."N";;$T="A-Za-z";sub
|                   c{$x=ord(shift)-97;$l=shift()
|                   +1;for(@a){if($l--){if($x>=($
|                   r=y///c)){$x-=$r;next;}return
|                   (substr($_,$x-$r,1))}return#x
|                    chr(32)}}$w=sub{open(W);//;
|                     while(<W>){j();($z=#);sub
|                      $_)if(!int(rand(++$i)))
|                       }close(W)};$w->();sub
|                        z{$Z=$z;$a="[^";$L=
| 0;@G=();for(@g){if(!($Z=~/$_/)){$L++;push(@G,$_)}else{$a.=$_}}$a.=
| "\n]";$Z=~s/$a/_/g;($D=1)if($Z!~/_/)}$R=sub{z;d($L);$t=~s/^([^\n]*
| \n[^\n]*\n[^\n]*)\n/$1.(chr(32)x10).($Z)."\n"/ex;$t=~s/(\n[^\n]*\n
| [^\n]*\n)$/(chr(32)x10)."@G".$1/ex;$t="\e[H\e[J".$t};/morembm/;sub
| g{$g="";while($g!~/^[a-z]/){print"Guess:";$g=<STDIN>}$g=substr($g,
| 0,1);if(!grep{/$g/}@g){push(@g,$g)}}while(!$D){$R->#{r;z;$t="s";do
| ();print$t;if($L==7||$D==1){m;r():;;print"$z\n";$D=1}else{g}};;sub
| j{$_=lc;y/a-z//cd;$_}#Improved_Version_1.1_(c)_Copyright_MBM_2003.

Because I didn't spend too long on this, there are things I could have
done better. The data wasted a lot of space, a more sensible encoding
may well get it and d() to be smaller. d() should also have had a s/\s//g
in it, and I shouldn't have bothered with sticking the string together,
something which wasted a lot of space. There are too many bits of filler,
which could have ended up being got rid of:
(the following gives you some idea)
|                    d{$s=shift;$_=$d;s/N/\n/g;;
                                               ^
|                   ("a"x29)."N";;$T="A-Za-z";sub
                                 ^^^^^^^^^^^^^
|                   r=y///c)){$x-=$r;next;}return
                                         ^
|                   (substr($_,$x-$r,1))}return#x
                    ^                  ^       ^^
|                    chr(32)}}$w=sub{open(W);//;
                                             ^^^
|                     while(<W>){j();($z=#);sub
                                         ^^^^^^

So it's possible to write better than that. (See also last year's
snowflake). This one does at least show how it all fits together,
though, and maybe the Deparse output will make sense... :-)

Enjoy

MBM

-- 
Matthew Byng-Maddick         <mbm colondot.net>           http://colondot.net/

----- End forwarded message -----