2 # $Id: Compare.pm 30 2008-06-29 14:04:29Z dave $
7 Array::Compare - Perl extension for comparing arrays.
13 my $comp1 = Array::Compare->new;
15 $comp->Skip({3 => 1, 4 => 1});
19 my $comp2 = Array::Compare->new(Sep => '|',
22 Skip => {3 => 1, 4 => 1});
27 $comp1->compare(\@arr1, \@arr2);
28 $comp2->compare(\@arr1, \@arr2);
32 If you have two arrays and you want to know if they are the same or
33 different, then Array::Compare will be useful to you.
35 All comparisons are carried out via a comparator object. In the
36 simplest usage, you can create and use a comparator object like
42 my $comp = Array::Compare->new;
44 if ($comp->compare(\@arr1, \@arr2)) {
45 print "Arrays are the same\n";
47 print "Arrays are different\n";
50 Notice that you pass references to the two arrays to the comparison
53 Internally the comparator compares the two arrays by using C<join>
54 to turn both arrays into strings and comparing the strings using
55 C<eq>. In the joined strings, the elements of the original arrays
56 are separated with the C<^G> character. This can cause problems if
57 your array data contains C<^G> characters as it is possible that
58 two different arrays can be converted to the same string.
60 To avoid this, it is possible to override the default separator
61 character, either by passing and alternative to the C<new> function
63 my $comp = Array::Compare->new(Sep => '|');
65 or by changing the seperator for an existing comparator object
69 In general you should choose a separator character that won't appear
72 You can also control whether or not whitespace within the elements of
73 the arrays should be considered significant when making the comparison.
74 The default is that all whitespace is significant. The alternative is
75 for all consecutive white space characters to be converted to a single
76 space for the pruposes of the comparison. Again, this can be turned on
77 when creating a comparator object:
79 my $comp = Array::Compare->new(WhiteSpace => 0);
81 or by altering an existing object:
85 You can also control whether or not the case of the data is significant
86 in the comparison. The default is that the case of data is taken into
87 account. This can be changed in the standard ways when creating a new
90 my $comp = Array::Compare->new(Case => 0);
92 or by altering an existing object:
96 In addition to the simple comparison described above (which returns true
97 if the arrays are the same and false if they're different) there is also
98 a full comparison which returns a list containing the indexes of elements
99 which differ between the two arrays. If the arrays are the same it returns
100 an empty list. In scalar context the full comparison returns the length of
101 this list (i.e. the number of elements that differ). You can access the full
102 comparision in two ways. Firstly, there is a C<DefFull> attribute. If this
103 is C<true> then a full comparison if carried out whenever the C<compare>
106 my $comp = Array::Compare->new(DefFull => 1);
107 $comp->compare(\@arr1, \@arr2); # Full comparison
110 $comp->compare(\@arr1, \@arr2); # Simple comparison
113 $comp->compare(\@arr1, \@arr2); # Full comparison again
116 Secondly, you can access the full comparison method directly
118 $comp->full_compare(\@arr1, \@arr2);
120 For symmetry, there is also a direct method to use to call the simple
123 $comp->simple_compare(\@arr1, \@arr2);
125 The final complication is the ability to skip elements in the comparison.
126 If you know that two arrays will always differ in a particular element
127 but want to compare the arrays I<ignoring> this element, you can do it
128 with Array::Compare without taking array slices. To do this, a
129 comparator object has an optional attribute called C<Skip> which is a
130 reference to a hash. The keys in this hash are the indexes of the array
131 elements and the values should be any true value for elements that should
134 For example, if you want to compare two arrays, ignoring the values in
135 elements two and four, you can do something like this:
137 my %skip = (2 => 1, 4 => 1);
138 my @a = (0, 1, 2, 3, 4, 5);
139 my @b = (0, 1, X, 3, X, 5);
141 my $comp = Array::Compare->new(Skip => \%skip);
143 $comp->compare(\@a, \@b);
145 This should return I<true>, as we are explicitly ignoring the columns
148 Of course, having created a comparator object with no skip hash, it is
149 possible to add one later:
151 $comp->Skip({1 => 1, 2 => 1});
155 my %skip = (1 => 1, 2 => 2);
158 To reset the comparator so that no longer skips elements, set the skip
159 hash to an empty hash.
163 You can also check to see if one array is a permutation of another, i.e.
164 they contain the same elements but in a different order.
166 if ($comp->perm(\@a, \@b) {
167 print "Arrays are perms\n";
169 print "Nope. Arrays are completely different\n";
172 In this case the values of C<WhiteSpace> and C<Case> are still used,
173 but C<Skip> is ignored for, hopefully, obvious reasons.
179 package Array::Compare;
184 our ($VERSION, $AUTOLOAD);
190 my %_defaults = (Sep => '^G',
196 =head2 new [ %OPTIONS ]
198 Constructs a new comparison object.
200 Takes an optional hash containing various options that control how
201 comparisons are carried out. Any omitted options take useful defaults.
207 This is the value that is used to separate fields when the array is joined
208 into a string. It should be a value which doesn't appear in your data.
213 Flag that indicates whether or not whitespace is significant in the
214 comparison. If this value is true then all multiple whitespace characters
215 are changed into a single space before the comparison takes place. Default
216 is 1 (whitespace is significant).
220 Flag that indicates whther or not the case of the data should be significant
221 in the comparison. Default is 1 (case is significant).
225 a reference to a hash which contains the numbers of any columns that should
226 be skipped in the comparison. Default is an empty hash (all columns are
231 Flag which indicates whether the default comparison is simple (just returns
232 true if the arrays are the same or false if they're not) or full (returns an
233 array containing the indexes of the columns that differ). Default is 0 (simple
243 my $self = {%_defaults, @_};
251 # Utility function to check the arguments to any of the comparison
252 # function. Ensures that there are two arguments and that they are
257 croak('Must compare two arrays.') unless @_ == 2;
258 croak('Argument 1 is not an array') unless ref($_[0]) eq 'ARRAY';
259 croak('Argument 2 is not an array') unless ref($_[1]) eq 'ARRAY';
264 =head2 compare_len \@ARR1, \@ARR2
266 Very simple comparison. Just checks the lengths of the arrays are
274 $self->_check_args(@_);
276 return @{$_[0]} == @{$_[1]};
279 =head2 compare \@ARR1, \@ARR2
281 Compare the values in two arrays and return a data indicating whether
282 the arrays are the same. The exact return values differ depending on
283 the comparison method used. See the descriptions of L<simple_compare>
284 and L<full_compare> for details.
286 Uses the value of DefFull to determine which comparison routine
294 if ($self->DefFull) {
295 return $self->full_compare(@_);
297 return $self->simple_compare(@_);
301 =head2 simple_compare \@ARR1, \@ARR2
303 Compare the values in two arrays and return a flag indicating whether or
304 not the arrays are the same.
306 Returns true if the arrays are the same or false if they differ.
308 Uses the values of 'Sep', 'WhiteSpace' and 'Skip' to influence
316 $self->_check_args(@_);
318 my ($row1, $row2) = @_;
320 # No point in continuing if the number of elements is different.
321 return unless $self->compare_len(@_);
323 # @check contains the indexes into the two arrays, i.e. the numbers
324 # from 0 to one less than the number of elements.
325 my @check = 0 .. $#$row1;
327 my ($pkg, $caller) = (caller(1))[0, 3];
328 my $perm = $caller eq __PACKAGE__ . "::perm";
330 # Filter @check so it only contains indexes that should be compared.
331 # N.B. Makes no sense to do this if we are called from 'perm'.
333 @check = grep {!(exists $self->Skip->{$_}
334 && $self->Skip->{$_}) } @check
335 if keys %{$self->Skip};
338 # Build two strings by taking array slices containing only the columns
339 # that we shouldn't skip and joining those array slices using the Sep
340 # character. Hopefully we can then just do a string comparison.
341 # Note: this makes the function liable to errors if your arrays
342 # contain the separator character.
343 my $str1 = join($self->Sep, @{$row1}[@check]);
344 my $str2 = join($self->Sep, @{$row2}[@check]);
346 # If whitespace isn't significant, collapse it
347 unless ($self->WhiteSpace) {
352 # If case isn't significant, change to lower case
353 unless ($self->Case) {
358 return $str1 eq $str2;
361 =head2 full_compare \@ARR1, \@ARR2
363 Do a full comparison between two arrays.
365 Checks each individual column. In scalar context returns the number
366 of columns that differ (zero if the arrays are the same). In list
367 context returns an list containing the indexes of the columns that
368 differ (an empty list if the arrays are the same).
370 Uses the values of 'Sep' and 'WhiteSpace' to influence the comparison.
372 B<Note:> If the two arrays are of different lengths then this method
373 just returns the indexes of the elements that appear in one array but
374 not the other (i.e. the indexes from the longer array that are beyond
375 the end of the shorter array). This might be a little
383 $self->_check_args(@_);
385 my ($row1, $row2) = @_;
387 # No point in continuing if the number of elements is different.
388 # Because of the expected return value from this function we can't
389 # just say 'the arrays are different'. We need to do some work to
390 # calculate a meaningful return value.
391 # If we've been called in array context we return a list containing
392 # the number of the columns that appear in the longer list and aren't
393 # in the shorter list. If we've been called in scalar context we
394 # return the difference in the lengths of the two lists.
395 unless ($self->compare_len(@_)) {
398 if ($#{$row1} > $#{$row2}) {
399 ($max, $min) = ($#{$row1}, $#{$row2} + 1);
401 ($max, $min) = ($#{$row2}, $#{$row1} + 1);
403 return ($min .. $max);
405 return abs(@{$row1} - @{$row2});
409 my ($arr1, $arr2) = @_;
413 foreach (0 .. $#{$arr1}) {
414 next if keys %{$self->Skip} && $self->Skip->{$_};
416 my ($val1, $val2) = ($arr1->[$_], $arr2->[$_]);
417 unless ($self->WhiteSpace) {
422 unless ($self->Case) {
427 push @diffs, $_ unless $val1 eq $val2;
430 return wantarray ? @diffs : scalar @diffs;
433 =head2 perm \@ARR1, \@ARR2
435 Check to see if one array is a permutation of the other (i.e. contains
436 the same set of elements, but in a different order).
438 We do this by sorting the arrays and passing references to the assorted
439 versions to simple_compare. There are also some small changes to
440 simple_compare as it should ignore the Skip hash if we are called from
448 return $self->simple_compare([sort @{$_[0]}], [sort @{$_[1]}]);
452 # Attempt to be clever with object attributes.
453 # Each object attribute is always accessed using an access method.
454 # None of these access methods exist in the object code.
455 # If an unknown method is called then the AUTOLOAD method is called
456 # in its place with the same parameters and the variable $AUTOLOAD
457 # set to the name of the unknown method.
459 # In this function we work out which method has been called and
460 # simulate it by returning the correct attribute value (and setting
461 # it to a new value if the method was passed a new value to use).
463 # We're also a little cleverer than that as we create a new method on
464 # the fly so that the next time we call the missing method it has
465 # magically sprung into existance, thereby avoiding the overhead of
466 # calling AUTOLOAD more than once for each method called.
470 my ($self, $val) = @_;
471 my ($name) = $AUTOLOAD =~ m/.*::(\w*)/;
473 *{$AUTOLOAD} = sub { return @_ > 1 ?
474 $_[0]->{$name} = $_[1] :
477 return defined $val ? $self->{$name} = $val : $self->{$name};
481 # One (small) downside of the AUTOLOAD trick, is that we need to
482 # explicitly define a DESTROY method to prevent Perl from passing
483 # those calls to AUTOLOAD. In this case we don't need to do anything.
492 Dave Cross <dave@mag-sol.com>
498 =head1 COPYRIGHT AND LICENSE
500 Copyright (C) 2000-2005, Magnum Solutions Ltd. All Rights Reserved.
502 This script is free software; you can redistribute it and/or modify it
503 under the same terms as Perl itself.