git.maemo.org Git - pkg-perl/blob - deb-src/libfilter-perl/libfilter-perl-1.34/perlfilter.pod

   1 =head1 NAME
   2
   3 perlfilter - Source Filters
   4
   5
   6 =head1 DESCRIPTION
   7
   8 This article is about a little-known feature of Perl called
   9 I<source filters>. Source filters alter the program text of a module
  10 before Perl sees it, much as a C preprocessor alters the source text of
  11 a C program before the compiler sees it. This article tells you more
  12 about what source filters are, how they work, and how to write your
  13 own.
  14
  15 The original purpose of source filters was to let you encrypt your
  16 program source to prevent casual piracy. This isn't all they can do, as
  17 you'll soon learn. But first, the basics.
  18
  19 =head1 CONCEPTS
  20
  21 Before the Perl interpreter can execute a Perl script, it must first
  22 read it from a file into memory for parsing and compilation. (Even
  23 scripts specified on the command line with the C<-e> option are stored in
  24 a temporary file for the parser to process.) If that script itself
  25 includes other scripts with a C<use> or C<require> statement, then each
  26 of those scripts will have to be read from their respective files as
  27 well.
  28
  29 Now think of each logical connection between the Perl parser and an
  30 individual file as a I<source stream>. A source stream is created when
  31 the Perl parser opens a file, it continues to exist as the source code
  32 is read into memory, and it is destroyed when Perl is finished parsing
  33 the file. If the parser encounters a C<require> or C<use> statement in
  34 a source stream, a new and distinct stream is created just for that
  35 file.
  36
  37 The diagram below represents a single source stream, with the flow of
  38 source from a Perl script file on the left into the Perl parser on the
  39 right. This is how Perl normally operates.
  40
  41     file -------> parser
  42
  43 There are two important points to remember:
  44
  45 =over 5
  46
  47 =item 1.
  48
  49 Although there can be any number of source streams in existence at any
  50 given time, only one will be active.
  51
  52 =item 2.
  53
  54 Every source stream is associated with only one file.
  55
  56 =back
  57
  58 A source filter is a special kind of Perl module that intercepts and
  59 modifies a source stream before it reaches the parser. A source filter
  60 changes the our diagram like this:
  61
  62     file ----> filter ----> parser
  63
  64 If that doesn't make much sense, consider the analogy of a command
  65 pipeline. Say you have a shell script stored in the compressed file
  66 I<trial.gz>. The simple pipeline command below runs the script without
  67 needing to create a temporary file to hold the uncompressed file.
  68
  69     gunzip -c trial.gz | sh
  70
  71 In this case, the data flow from the pipeline can be represented as follows:
  72
  73     trial.gz ----> gunzip ----> sh
  74
  75 With source filters, you can store the text of your script compressed and use a source filter to uncompress it for Perl's parser:
  76
  77      compressed           gunzip
  78     Perl program ---> source filter ---> parser
  79
  80 =head1 USING FILTERS
  81
  82 So how do you use a source filter in a Perl script? Above, I said that
  83 a source filter is just a special kind of module. Like all Perl
  84 modules, a source filter is invoked with a use statement.
  85
  86 Say you want to pass your Perl source through the C preprocessor before
  87 execution. You could use the existing C<-P> command line option to do
  88 this, but as it happens, the source filters distribution comes with a C
  89 preprocessor filter module called Filter::cpp. Let's use that instead.
  90
  91 Below is an example program, C<cpp_test>, which makes use of this filter.
  92 Line numbers have been added to allow specific lines to be referenced
  93 easily.
  94
  95     1: use Filter::cpp ;
  96     2: #define TRUE 1
  97     3: $a = TRUE ;
  98     4: print "a = $a\n" ;
  99
 100 When you execute this script, Perl creates a source stream for the
 101 file. Before the parser processes any of the lines from the file, the
 102 source stream looks like this:
 103
 104     cpp_test ---------> parser
 105
 106 Line 1, C<use Filter::cpp>, includes and installs the C<cpp> filter
 107 module. All source filters work this way. The use statement is compiled
 108 and executed at compile time, before any more of the file is read, and
 109 it attaches the cpp filter to the source stream behind the scenes. Now
 110 the data flow looks like this:
 111
 112     cpp_test ----> cpp filter ----> parser
 113
 114 As the parser reads the second and subsequent lines from the source
 115 stream, it feeds those lines through the C<cpp> source filter before
 116 processing them. The C<cpp> filter simply passes each line through the
 117 real C preprocessor. The output from the C preprocessor is then
 118 inserted back into the source stream by the filter.
 119
 120                   .-> cpp --.
 121                   |         |
 122                   |         |
 123                   |       <-'
 124    cpp_test ----> cpp filter ----> parser
 125
 126 The parser then sees the following code:
 127
 128     use Filter::cpp ;
 129     $a = 1 ;
 130     print "a = $a\n" ;
 131
 132 Let's consider what happens when the filtered code includes another
 133 module with use:
 134
 135     1: use Filter::cpp ;
 136     2: #define TRUE 1
 137     3: use Fred ;
 138     4: $a = TRUE ;
 139     5: print "a = $a\n" ;
 140
 141 The C<cpp> filter does not apply to the text of the Fred module, only
 142 to the text of the file that used it (C<cpp_test>). Although the use
 143 statement on line 3 will pass through the cpp filter, the module that
 144 gets included (C<Fred>) will not. The source streams look like this
 145 after line 3 has been parsed and before line 4 is parsed:
 146
 147     cpp_test ---> cpp filter ---> parser (INACTIVE)
 148
 149     Fred.pm ----> parser
 150
 151 As you can see, a new stream has been created for reading the source
 152 from C<Fred.pm>. This stream will remain active until all of C<Fred.pm>
 153 has been parsed. The source stream for C<cpp_test> will still exist,
 154 but is inactive. Once the parser has finished reading Fred.pm, the
 155 source stream associated with it will be destroyed. The source stream
 156 for C<cpp_test> then becomes active again and the parser reads line 4
 157 and subsequent lines from C<cpp_test>.
 158
 159 You can use more than one source filter on a single file. Similarly,
 160 you can reuse the same filter in as many files as you like.
 161
 162 For example, if you have a uuencoded and compressed source file, it is
 163 possible to stack a uudecode filter and an uncompression filter like
 164 this:
 165
 166     use Filter::uudecode ; use Filter::uncompress ;
 167     M'XL(".H<US4''V9I;F%L')Q;>7/;1I;_>_I3=&E=%:F*I"T?22Q/
 168     M6]9*<IQCO*XFT"0[PL%%'Y+IG?WN^ZYN-$'J.[.JE$,20/?K=_[>
 169     ...
 170
 171 Once the first line has been processed, the flow will look like this:
 172
 173     file ---> uudecode ---> uncompress ---> parser
 174                filter         filter
 175
 176 Data flows through filters in the same order they appear in the source
 177 file. The uudecode filter appeared before the uncompress filter, so the
 178 source file will be uudecoded before it's uncompressed.
 179
 180 =head1 WRITING A SOURCE FILTER
 181
 182 There are three ways to write your own source filter. You can write it
 183 in C, use an external program as a filter, or write the filter in Perl.
 184 I won't cover the first two in any great detail, so I'll get them out
 185 of the way first. Writing the filter in Perl is most convenient, so
 186 I'll devote the most space to it.
 187
 188 =head1 WRITING A SOURCE FILTER IN C
 189
 190 The first of the three available techniques is to write the filter
 191 completely in C. The external module you create interfaces directly
 192 with the source filter hooks provided by Perl.
 193
 194 The advantage of this technique is that you have complete control over
 195 the implementation of your filter. The big disadvantage is the
 196 increased complexity required to write the filter - not only do you
 197 need to understand the source filter hooks, but you also need a
 198 reasonable knowledge of Perl guts. One of the few times it is worth
 199 going to this trouble is when writing a source scrambler. The
 200 C<decrypt> filter (which unscrambles the source before Perl parses it)
 201 included with the source filter distribution is an example of a C
 202 source filter (see Decryption Filters, below).
 203
 204
 205 =over 5
 206
 207 =item B<Decryption Filters>
 208
 209 All decryption filters work on the principle of "security through
 210 obscurity." Regardless of how well you write a decryption filter and
 211 how strong your encryption algorithm, anyone determined enough can
 212 retrieve the original source code. The reason is quite simple - once
 213 the decryption filter has decrypted the source back to its original
 214 form, fragments of it will be stored in the computer's memory as Perl
 215 parses it. The source might only be in memory for a short period of
 216 time, but anyone possessing a debugger, skill, and lots of patience can
 217 eventually reconstruct your program.
 218
 219 That said, there are a number of steps that can be taken to make life
 220 difficult for the potential cracker. The most important: Write your
 221 decryption filter in C and statically link the decryption module into
 222 the Perl binary. For further tips to make life difficult for the
 223 potential cracker, see the file I<decrypt.pm> in the source filters
 224 module.
 225
 226 =back
 227
 228 =head1 CREATING A SOURCE FILTER AS A SEPARATE EXECUTABLE
 229
 230 An alternative to writing the filter in C is to create a separate
 231 executable in the language of your choice. The separate executable
 232 reads from standard input, does whatever processing is necessary, and
 233 writes the filtered data to standard output. C<Filter:cpp> is an
 234 example of a source filter implemented as a separate executable - the
 235 executable is the C preprocessor bundled with your C compiler.
 236
 237 The source filter distribution includes two modules that simplify this
 238 task: C<Filter::exec> and C<Filter::sh>. Both allow you to run any
 239 external executable. Both use a coprocess to control the flow of data
 240 into and out of the external executable. (For details on coprocesses,
 241 see Stephens, W.R. "Advanced Programming in the UNIX Environment."
 242 Addison-Wesley, ISBN 0-210-56317-7, pages 441-445.) The difference
 243 between them is that C<Filter::exec> spawns the external command
 244 directly, while C<Filter::sh> spawns a shell to execute the external
 245 command. (Unix uses the Bourne shell; NT uses the cmd shell.) Spawning
 246 a shell allows you to make use of the shell metacharacters and
 247 redirection facilities.
 248
 249 Here is an example script that uses C<Filter::sh>:
 250
 251     use Filter::sh 'tr XYZ PQR' ;
 252     $a = 1 ;
 253     print "XYZ a = $a\n" ;
 254
 255 The output you'll get when the script is executed:
 256
 257     PQR a = 1
 258
 259 Writing a source filter as a separate executable works fine, but a
 260 small performance penalty is incurred. For example, if you execute the
 261 small example above, a separate subprocess will be created to run the
 262 Unix C<tr> command. Each use of the filter requires its own subprocess.
 263 If creating subprocesses is expensive on your system, you might want to
 264 consider one of the other options for creating source filters.
 265
 266 =head1 WRITING A SOURCE FILTER IN PERL
 267
 268 The easiest and most portable option available for creating your own
 269 source filter is to write it completely in Perl. To distinguish this
 270 from the previous two techniques, I'll call it a Perl source filter.
 271
 272 To help understand how to write a Perl source filter we need an example
 273 to study. Here is a complete source filter that performs rot13
 274 decoding. (Rot13 is a very simple encryption scheme used in Usenet
 275 postings to hide the contents of offensive posts. It moves every letter
 276 forward thirteen places, so that A becomes N, B becomes O, and Z
 277 becomes M.)
 278
 279
 280    package Rot13 ;
 281
 282    use Filter::Util::Call ;
 283
 284    sub import {
 285       my ($type) = @_ ;
 286       my ($ref) = [] ;
 287       filter_add(bless $ref) ;
 288    }
 289
 290    sub filter {
 291       my ($self) = @_ ;
 292       my ($status) ;
 293
 294       tr/n-za-mN-ZA-M/a-zA-Z/
 295          if ($status = filter_read()) > 0 ;
 296       $status ;
 297    }
 298
 299    1;
 300
 301 All Perl source filters are implemented as Perl classes and have the
 302 same basic structure as the example above.
 303
 304 First, we include the C<Filter::Util::Call> module, which exports a
 305 number of functions into your filter's namespace. The filter shown
 306 above uses two of these functions, C<filter_add()> and
 307 C<filter_read()>.
 308
 309 Next, we create the filter object and associate it with the source
 310 stream by defining the C<import> function. If you know Perl well
 311 enough, you know that C<import> is called automatically every time a
 312 module is included with a use statement. This makes C<import> the ideal
 313 place to both create and install a filter object.
 314
 315 In the example filter, the object (C<$ref>) is blessed just like any
 316 other Perl object. Our example uses an anonymous array, but this isn't
 317 a requirement. Because this example doesn't need to store any context
 318 information, we could have used a scalar or hash reference just as
 319 well. The next section demonstrates context data.
 320
 321 The association between the filter object and the source stream is made
 322 with the C<filter_add()> function. This takes a filter object as a
 323 parameter (C<$ref> in this case) and installs it in the source stream.
 324
 325 Finally, there is the code that actually does the filtering. For this
 326 type of Perl source filter, all the filtering is done in a method
 327 called C<filter()>. (It is also possible to write a Perl source filter
 328 using a closure. See the C<Filter::Util::Call> manual page for more
 329 details.) It's called every time the Perl parser needs another line of
 330 source to process. The C<filter()> method, in turn, reads lines from
 331 the source stream using the C<filter_read()> function.
 332
 333 If a line was available from the source stream, C<filter_read()>
 334 returns a status value greater than zero and appends the line to C<$_>.
 335 A status value of zero indicates end-of-file, less than zero means an
 336 error. The filter function itself is expected to return its status in
 337 the same way, and put the filtered line it wants written to the source
 338 stream in C<$_>. The use of C<$_> accounts for the brevity of most Perl
 339 source filters.
 340
 341 In order to make use of the rot13 filter we need some way of encoding
 342 the source file in rot13 format. The script below, C<mkrot13>, does
 343 just that.
 344
 345     die "usage mkrot13 filename\n" unless @ARGV ;
 346     my $in = $ARGV[0] ;
 347     my $out = "$in.tmp" ;
 348     open(IN, "<$in") or die "Cannot open file $in: $!\n";
 349     open(OUT, ">$out") or die "Cannot open file $out: $!\n";
 350
 351     print OUT "use Rot13;\n" ;
 352     while (<IN>) {
 353        tr/a-zA-Z/n-za-mN-ZA-M/ ;
 354        print OUT ;
 355     }
 356
 357     close IN;
 358     close OUT;
 359     unlink $in;
 360     rename $out, $in;
 361
 362 If we encrypt this with C<mkrot13>:
 363
 364     print " hello fred \n" ;
 365
 366 the result will be this:
 367
 368     use Rot13;
 369     cevag "uryyb serq\a" ;
 370
 371 Running it produces this output:
 372
 373     hello fred
 374
 375 =head1 USING CONTEXT: THE DEBUG FILTER
 376
 377 The rot13 example was a trivial example. Here's another demonstration
 378 that shows off a few more features.
 379
 380 Say you wanted to include a lot of debugging code in your Perl script
 381 during development, but you didn't want it available in the released
 382 product. Source filters offer a solution. In order to keep the example
 383 simple, let's say you wanted the debugging output to be controlled by
 384 an environment variable, C<DEBUG>. Debugging code is enabled if the
 385 variable exists, otherwise it is disabled.
 386
 387 Two special marker lines will bracket debugging code, like this:
 388
 389     ## DEBUG_BEGIN
 390     if ($year > 1999) {
 391        warn "Debug: millennium bug in year $year\n" ;
 392     }
 393     ## DEBUG_END
 394
 395 When the C<DEBUG> environment variable exists, the filter ensures that
 396 Perl parses only the code between the C<DEBUG_BEGIN> and C<DEBUG_END>
 397 markers. That means that when C<DEBUG> does exist, the code above
 398 should be passed through the filter unchanged. The marker lines can
 399 also be passed through as-is, because the Perl parser will see them as
 400 comment lines. When C<DEBUG> isn't set, we need a way to disable the
 401 debug code. A simple way to achieve that is to convert the lines
 402 between the two markers into comments:
 403
 404     ## DEBUG_BEGIN
 405     #if ($year > 1999) {
 406     #     warn "Debug: millennium bug in year $year\n" ;
 407     #}
 408     ## DEBUG_END
 409
 410 Here is the complete Debug filter:
 411
 412     package Debug;
 413
 414     use strict;
 415     use warnings;
 416     use Filter::Util::Call ;
 417
 418     use constant TRUE => 1 ;
 419     use constant FALSE => 0 ;
 420
 421     sub import {
 422        my ($type) = @_ ;
 423        my (%context) = (
 424          Enabled => defined $ENV{DEBUG},
 425          InTraceBlock => FALSE,
 426          Filename => (caller)[1],
 427          LineNo => 0,
 428          LastBegin => 0,
 429        ) ;
 430        filter_add(bless \%context) ;
 431     }
 432
 433     sub Die {
 434        my ($self) = shift ;
 435        my ($message) = shift ;
 436        my ($line_no) = shift || $self->{LastBegin} ;
 437        die "$message at $self->{Filename} line $line_no.\n"
 438     }
 439
 440     sub filter {
 441        my ($self) = @_ ;
 442        my ($status) ;
 443        $status = filter_read() ;
 444        ++ $self->{LineNo} ;
 445
 446        # deal with EOF/error first
 447        if ($status <= 0) {
 448            $self->Die("DEBUG_BEGIN has no DEBUG_END")
 449                if $self->{InTraceBlock} ;
 450            return $status ;
 451        }
 452
 453        if ($self->{InTraceBlock}) {
 454           if (/^\s*##\s*DEBUG_BEGIN/ ) {
 455               $self->Die("Nested DEBUG_BEGIN", $self->{LineNo})
 456           } elsif (/^\s*##\s*DEBUG_END/) {
 457               $self->{InTraceBlock} = FALSE ;
 458           }
 459
 460           # comment out the debug lines when the filter is disabled
 461           s/^/#/ if ! $self->{Enabled} ;
 462        } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) {
 463           $self->{InTraceBlock} = TRUE ;
 464           $self->{LastBegin} = $self->{LineNo} ;
 465        } elsif ( /^\s*##\s*DEBUG_END/ ) {
 466           $self->Die("DEBUG_END has no DEBUG_BEGIN", $self->{LineNo});
 467        }
 468        return $status ;
 469     }
 470
 471     1 ;
 472
 473 The big difference between this filter and the previous example is the
 474 use of context data in the filter object. The filter object is based on
 475 a hash reference, and is used to keep various pieces of context
 476 information between calls to the filter function. All but two of the
 477 hash fields are used for error reporting. The first of those two,
 478 Enabled, is used by the filter to determine whether the debugging code
 479 should be given to the Perl parser. The second, InTraceBlock, is true
 480 when the filter has encountered a C<DEBUG_BEGIN> line, but has not yet
 481 encountered the following C<DEBUG_END> line.
 482
 483 If you ignore all the error checking that most of the code does, the
 484 essence of the filter is as follows:
 485
 486     sub filter {
 487        my ($self) = @_ ;
 488        my ($status) ;
 489        $status = filter_read() ;
 490
 491        # deal with EOF/error first
 492        return $status if $status <= 0 ;
 493        if ($self->{InTraceBlock}) {
 494           if (/^\s*##\s*DEBUG_END/) {
 495              $self->{InTraceBlock} = FALSE
 496           }
 497
 498           # comment out debug lines when the filter is disabled
 499           s/^/#/ if ! $self->{Enabled} ;
 500        } elsif ( /^\s*##\s*DEBUG_BEGIN/ ) {
 501           $self->{InTraceBlock} = TRUE ;
 502        }
 503        return $status ;
 504     }
 505
 506 Be warned: just as the C-preprocessor doesn't know C, the Debug filter
 507 doesn't know Perl. It can be fooled quite easily:
 508
 509     print <<EOM;
 510     ##DEBUG_BEGIN
 511     EOM
 512
 513 Such things aside, you can see that a lot can be achieved with a modest
 514 amount of code. I<[Note that Tuomas' toy VRML parser on p. 17 had the
 515 same difficulty parsing VRML strings that look like comments. -Jon]>
 516
 517 =head1 CONCLUSION
 518
 519 You now have better understanding of what a source filter is, and you
 520 might even have a possible use for them. If you feel like playing with
 521 source filters but need a bit of inspiration, here are some extra
 522 features you could add to the Debug filter.
 523
 524 First, an easy one. Rather than having debugging code that is
 525 all-or-nothing, it would be much more useful to be able to control
 526 which specific blocks of debugging code get included. Try extending the
 527 syntax for debug blocks to allow each to be identified. The contents of
 528 the C<DEBUG> environment variable can then be used to control which
 529 blocks get included.
 530
 531 Once you can identify individual blocks, try allowing them to be
 532 nested. That isn't difficult either.
 533
 534 Here is a interesting idea that doesn't involve the Debug filter.
 535 Currently Perl subroutines have fairly limited support for formal
 536 parameter lists. You can specify the number of parameters and their
 537 type, but you still have to manually take them out of the C<@_> array
 538 yourself. Write a source filter that allows you to have a named
 539 parameter list. Such a filter would turn this:
 540
 541     sub MySub ($first, $second, @rest) { ... }
 542
 543 into this:
 544
 545     sub MySub($$@) {
 546        my ($first) = shift ;
 547        my ($second) = shift ;
 548        my (@rest) = @_ ;
 549        ...
 550     }
 551
 552 Finally, if you feel like a real challenge, have a go at writing a
 553 full-blown Perl macro preprocessor as a source filter. Borrow the
 554 useful features from the C preprocessor and any other macro processors
 555 you know. The tricky bit will be choosing how much knowledge of Perl's
 556 syntax you want your filter to have.
 557
 558 =head1 REQUIREMENTS
 559
 560 The Source Filters distribution is available on CPAN, in
 561
 562     CPAN/modules/by-module/Filter
 563
 564 =head1 AUTHOR
 565
 566 Paul Marquess E<lt>Paul.Marquess@btinternet.comE<gt>
 567
 568 =head1 Copyrights
 569
 570 This article originally appeared in The Perl Journal #11, and is
 571 copyright 1998 The Perl Journal. It appears courtesy of Jon Orwant and
 572 The Perl Journal.  This document may be distributed under the same terms
 573 as Perl itself.