----- Forwarded message from Matthew Byng-Maddick ----- Date: Tue, 16 Dec 2003 11:11:12 +0000 From: Matthew Byng-Maddick Subject: Development Methodologies for Obfuscated Code To: "London.pm list" User-Agent: Mutt/1.4.1i During the rather interesting london.java/london.pm crossover meet (interesting for me because I met someone I went to school with who I hadn't seen in about 3 years), I showed a Simon (of the "*mumble* Dick Dastardly *mumble* hehehe" variety) the cam.pm snowman/hangman code. I was showing some of my working files (which I kept), and he said that he thought it might possibly interest some of you if I explained what I did and how the code fits together. It probably won't, but I'll post it anyway. (The original cam.pm message was: http://cam.pm.org/archive/2003-December/001079.html, and last year, I actually ran a competition: http://cam.pm.org/archive/2002-December/000953.html) So, like all good programmers, I started out with a problem to solve. OK, I don't have a problem, my problem is "find an interesting problem and write some code to solve it". OK. So, I had an idea for something involving a snowman (well, it's nearly christmas, and last year's effort was a snowscape, so...). I started off, then by drawing out some ASCII art (not particularly good, but anyway). _____ | | __|_____|__ / \ | o o | \ o / ,`-----'. _\| ' o ` |/_ \/ \/ | o | | | \ o / . , ----------------------------- So, I started figuring out what I could do with this, and hit upon the idea of doing a hangman-alike. You draw the body, then the head, then arms (one at a time) then hat, then buttons, and finally face. The first challenge in all of these kinds of program is how to represent the data. I've got two problems here, on the one hand, I need to draw various characters at various points, but I also need to only draw them after a number of bad guesses. If I can't draw the character (not enough bad guesses, I must draw a space. So we have a sparse matrix, with bad guesses on one side, and characters on the other. | - _ | \ / ` ' , . o --+-------------------- 0 | * (ground) 1 | * * * * * * * * (body) 2 | * * * * * * (head) 3 | * * * (left arm) 4 | * * * (right arm) 5 | * * (hat) 6 | * (buttons) 7 | * (face) The asterisks represent a point in the matrix where we need a character. Counting them, there are 25. This is a useful number, as it allows us to use the alphabet to encode. Initially, I decided on missing out "o" to stop myself confusing it with the character I needed. It turned out that I didn't need to, but that's the way the data stayed. | - _ | \ / ` ' , . o --+-------------------- 0 | a (ground) 1 | b c d e f g h i (body) 2 | j k l m n p (head) 3 | q r s (left arm) 4 | t u v (right arm) 5 | w x (hat) 6 | y (buttons) 7 | z (face) So, we now have a table with an encoding, and we can now draw out our snowman in all his glory. wwwww x x wwxjjjjjxww m l k z z k l z m hnbbbbbpi qsr g y f uvt se dv c y c c c d y e i h aaaaaaaaaaaaaaaaaaaaaaaaaaaaa Doesn't look much like the original. Also note that at this point, no code has yet been written, we've only dealt with trying to encode our data (the most interesting problem of the program). We now need our matrix in a more useful form, so let's re-write it: a 0 - b-i 1 -|\/`',. j-p 2 _|\/`:' q-s 3 _|\ t-v 4 _|/ w-x 5 _| y 6 o z 7 o Note the ":". This is for the missed-out "o". Now we can start writing some code. | # data coding | @a=("-", "-|\\/`',.", "_|\\/`:'", "_|\\", "_|/", "_|", "o", "o"); | | # x is the character, and l is the level | # this function returns the character or a space if the level is | # not sufficiently high. | sub char { | $x=ord(shift()); | $l=shift(); | | $x -= ord("a"); | $l++; | | # loop over each of the coding strings | for(@a) { | # if we have had enough bad guesses | if($l--) { | # if our position is longer than this coding | if($x >= ($r=length($_))) { | # skip over this coding | $x -= $r; | # and go to the next | next; | } | # else return the character from the right | # position in the coding | return substr($_, $x-$r, 1); | } | # level isn't high enough, so print a space | return " "; | } | } The first bit is fine, the second will need tightening up. Let's make it shorter: | sub c{$x=ord(shift())-97;$l=shift()+1; | for(@a){if($l--){$x-=$r,next if($x>=($r | =y///c));return(substr($_,$x-$r,1))} | return(chr(32))}} This latter will be easier to pour into our shape later. Now we're at the stage where we can turn things from our snowman shape into the right character for that level. I wrote some quick test bits at this point to prove to myself that it was working. Next we'll get our word. I'll cut a long story short by saying that it took several iterations to get this right. We're going to use /usr/share/dict/words, which has characters other than lower-case letters in it. The following gets them out: | $W="/usr/share/dict/words"; | | sub w { | open(W); | while() { | $_ = lc; | y/a-z//cd; | | $z = $_ | if(!int(rand(++$i))); | } | close(W); | } This causes a word to be stored in $z which is from a random line in the file. The reason this works is a probability argument. You can show that this is equivalent to picking a random line in a file, and the beauty is that it's one-pass. It's a standard algorithm, though, so I won't dwell on it. Note, however that trick you forgot about open where it will use the scalar with the same identifier as the filehandle by default. Now we can get a word and print a character, so let's actually try and print out the snowman. In order to do this we need to store the data. I chose not the most efficient method, but one sufficiently useful that it would be easy to code. Any string of numbers represents a decimal number of spaces, "N" for newline, and the lower-case characters are to be interpreted by the function c() above. | $d="12wwwwwN11x5xN9wwxjjjjjxwwN10m7lN9k2z3z2kN10l3z3mN". | "10hnbbbbbpiN4qsr2g4y4f2uvtN6se13dvN6c7y7cN6c15cN". | "7d6y6eN9i9hNaaaaaaaaaaaaaaaaaaaaaaaaaaaaaN"; and then we need something to print it out: | sub d { | $s = shift(); | $_ = $d; | | s/N/\n/g; | | s/[a-z]/ c( $&, $s ) /eg; | | s/\d+/ chr(32) x $& /eg; | | $t = $_; | } So we now have a copy in $t expanded to be our snowman, and running d(0) through d(7) and printing $t in each case will produce the snowman as it appears during the game. Let's now write an easy routine to process guesses, we might as well get the bits sorted: | sub g { | $g=""; | while( $g !~ /^[a-z]/ ) { | print "Guess:"; | $g=; | } | $g=substr($g,0,1); | if(! grep {/$g/} @g ) { | push(@g,$g); | } | } So we get our past guesses in @g, and our current guess in $g. What we haven't yet done at this point is to work out what our good and bad guesses are, so let's do that. Writing this routine is complicated, but we have our good old friend the regular expression to help us (remember that $z has our chosen word): | sub z { | # make a copy of the word, we don't want to overwrite the | # original | $Z=$z; | # re-calculate our level from scratch | $L=0; | # set up a regex excluded character class | $a="[^"; | # and an empty array for the bad guesses | @G=(); | | for(@g) { | if( !($Z=~/$_/) ) { | # this is a bad guess | $L++; | push(@G,$_); | } | else { | # this is good, so exclude this from our regex | $a .= $_; | } | } | | # close our character class | $a .= " ]"; | # substitute all characters we *haven't* guessed | # correctly for underscores. | $Z =~ s/$a/_/g; | | if( $Z !~ /_/ ) { | # we've found everything | $D = 1; | } | } So we now have the thing we want to display (for the word) in $Z, we have the number of bad guesses in $L, and the bad guess list in @G. We also have $D set if we need to finish. So this is our game engine almost done. Let's now put some of this together in a "render" function: | sub r { | # do our game calculations | z(); | | # get the snowman drawn into $t | d($L); | | $t =~ s{ | ^ # beginning of a line | ( # beginning of $1 | [^\n]*\n # a line | [^\n]*\n # another line | [^\n]* # a third line (without a terminator) | ) # end of $1 | \n | }{ | $1 . " " . $Z . "\n" | }ex | # so we've added our word onto the end of line 3 of the snowman. | # this doesn't move because we always draw spaces where there is | # nothing else to draw. | | # similarly, we're going to add our bad guesses 3 lines from the | # bottom: | $t =~ s{ | ( # beginning of $1 | \n # end of the third from last line | [^\n]*\n # second from last line | [^\n]*\n # last line | ) # end of $1 | $ # end anchor | }{ | " " . "@G" . $1 | }ex | | # now we just need to clear the screen before doing it | $t="\e[He[J" . $t; | } We've now got all of the bits of the game-play, except for the main loop, we can get guesses, we can hide the word, count the good and bad guesses, draw the relevant picture on the screen. So the game loop itself looks like: | # get the word: | w(); | | # while we haven't finished ask for guesses and render our screen | while ( ! $D ) { | r(); | print $t; | if( $L==7 || $D==1 ) { | # print the word at the end. | print "$z\n"; | # make this loop terminate | $D=1; | } | else { | # Keep guessing... | g(); | } | } So we now have a programme we can test, and which works when we put it all together. We're going to need to pour it into our shape, so we're going to need to compact it somewhat. So at this point there are a couple of things to remember. We've used single letter identifiers everywhere because these are easier to pour. What you may have known (but hopefully won't) is that you can separate the sigil and identifier parts with whitespace, thus: | $foo = $ | a; *WILL* set $foo to the current value of $a, and not cause a parser error as you might expect. The other problem we have is that one keyword in perl is a pain for doing shaped programming (because the spaces matter), which is "sub" in its non-anonymous form. "sub" always needs a space because of all of the sub names you might create. So it's possible to re-write things, e.g. | $w=sub{open(W);while(){$_=lc;y/a-z//cd;($z=$_) | if(!int(rand(++$i)))}close(W)} | $w->(); This form of writing allows you to write that function with no whitespace, but it can be separated at least at the following points: | $ w = sub { open ( W ) ; while ( ) { $ _ = lc ; y/a-z//cd ; | ( $ z = $ _ ) if ( ! int ( rand ( ++ $ i ) ) ) } close ( W ) } | & $ w ; When pouring, we can often insert extra semi-colons etc. It's worth remembering that perl has no comment that closes before end of line, and we often want to reserve the end of line for a "sub" keyword, so we have a number of choices. The most obvious is nicked straight from M Brouhat, the master of perl obfuscation, which is to use a regex in null context: ";/foo/;". We can create useless variables etc. So the final programme, when poured gives: | #!/usr/bin/perl | @a=("-","-|\\/\`" | ."',.","_|\\/`:'" | ,"_|\\","_|/","_" | ."|","o","o");sub | d{$s=shift;$_=$d;s/N/\n/g;; | s/[a-z]/c($&,$s)/eg;;s/\d+/ | chr(32)x$&/eg;$t=$_ | }$W="/usr/share/dict" | ."/words";$d="12wwww" | ."wN11x5xN9wwxjjjjjx" | ."wwN10m7lN9k2z3z". | "" ."2kN10l3z3mN10". "" | .( "hnbbbbbpiN4qsr2g4y4" ). | "" ."f2uvtN6se13dvN6c7y7". "" | .( "cN6c15cN7d6y6eN9i9hN"."" ). | ("a"x29)."N";;$T="A-Za-z";sub | c{$x=ord(shift)-97;$l=shift() | +1;for(@a){if($l--){if($x>=($ | r=y///c)){$x-=$r;next;}return | (substr($_,$x-$r,1))}return#x | chr(32)}}$w=sub{open(W);//; | while(){j();($z=#);sub | $_)if(!int(rand(++$i))) | }close(W)};$w->();sub | z{$Z=$z;$a="[^";$L= | 0;@G=();for(@g){if(!($Z=~/$_/)){$L++;push(@G,$_)}else{$a.=$_}}$a.= | "\n]";$Z=~s/$a/_/g;($D=1)if($Z!~/_/)}$R=sub{z;d($L);$t=~s/^([^\n]* | \n[^\n]*\n[^\n]*)\n/$1.(chr(32)x10).($Z)."\n"/ex;$t=~s/(\n[^\n]*\n | [^\n]*\n)$/(chr(32)x10)."@G".$1/ex;$t="\e[H\e[J".$t};/morembm/;sub | g{$g="";while($g!~/^[a-z]/){print"Guess:";$g=}$g=substr($g, | 0,1);if(!grep{/$g/}@g){push(@g,$g)}}while(!$D){$R->#{r;z;$t="s";do | ();print$t;if($L==7||$D==1){m;r():;;print"$z\n";$D=1}else{g}};;sub | j{$_=lc;y/a-z//cd;$_}#Improved_Version_1.1_(c)_Copyright_MBM_2003. Because I didn't spend too long on this, there are things I could have done better. The data wasted a lot of space, a more sensible encoding may well get it and d() to be smaller. d() should also have had a s/\s//g in it, and I shouldn't have bothered with sticking the string together, something which wasted a lot of space. There are too many bits of filler, which could have ended up being got rid of: (the following gives you some idea) | d{$s=shift;$_=$d;s/N/\n/g;; ^ | ("a"x29)."N";;$T="A-Za-z";sub ^^^^^^^^^^^^^ | r=y///c)){$x-=$r;next;}return ^ | (substr($_,$x-$r,1))}return#x ^ ^ ^^ | chr(32)}}$w=sub{open(W);//; ^^^ | while(){j();($z=#);sub ^^^^^^ So it's possible to write better than that. (See also last year's snowflake). This one does at least show how it all fits together, though, and maybe the Deparse output will make sense... :-) Enjoy MBM -- Matthew Byng-Maddick http://colondot.net/ ----- End forwarded message -----