Writing simple example programs to solve example problems in a book helps you learn a language in the small. Yet writing real programs requires more than learning the syntax of a language, or its design principles, or even how to find and use its libraries.
Practical programming requires you to manage code: to organize it, to know that it works, to make it robust in the face of errors of logic or intent, and to do all of this in a concise, clear, and maintainable fashion. Fortunately, modern Perl provides many tools and techniques to write real programs--from testing to the organization of your source code.
Testing is the process of writing and running automated verifications that your software behaves as intended, in whole or in part. At its heart, this is an automation of a process you've performed countless times already: write a bit of code, run it, and see if it works. The difference is in the automation. Rather than relying on humans to perform each manual check perfectly every time, let the computer handle the repetition.
Perl 5 provides great tools to help you write good and useful automated tests.
Perl testing begins with the core module Test::More
and its ok()
function. ok()
takes two parameters, a boolean value and a string describing the purpose of the test:
ok( 1, 'the number one should be true' );
ok( 0, '... and the number zero should not' );
ok( '', 'the empty string should be false' );
ok( '!', '... and a non-empty string should not' );
Ultimately, any condition you can test for in your program should become a binary value. Does the code work as I intended? A complex program may have thousands of these individual conditions. In general, the smaller the granularity the better. The purpose of writing individual assertions is to isolate individual features to understand what doesn't work as you intended and what ceases to work after you make changes in the future.
This snippet isn't a complete test script, however. Test::More
and related modules require the use of a test plan, which represents the number of individual tests you plan to run:
use Test::More tests => 4;
ok( 1, 'the number one should be true' );
ok( 0, '... and the number zero should not' );
ok( '', 'the empty string should be false' );
ok( '!', '... and a non-empty string should not' );
The tests
argument to Test::More
sets the test plan for the program. This gives the test an additional assertion. If fewer than four tests ran, something went wrong. If more than four tests ran, something went wrong. That assertion is unlikely to be useful in this simple scenario, but it can catch bugs in code that seems too simple to have errors (footnote: As a rule, any code you brag about being too simple to contain errors will contain errors at the least opportune moment.).
The resulting program is now a full-fledged Perl 5 program which produces the output:
1..4
ok 1 - the number one should be true
not ok 2 - ... and the number zero should not
# Failed test '... and the number zero should not'
# at truth_values.t line 4.
not ok 3 - the empty string should be false
# Failed test 'the empty string should be false'
# at truth_values.t line 5.
ok 4 - ... and a non-empty string should not
# Looks like you failed 2 tests of 4.
This format adheres to a standard of test output called TAP, the Test Anything Protocol (http://testanything.org/). As part of this protocol, failed tests produce diagnostic messages. This is a tremendous aid to debugging.
The output of a test file containing multiple assertions (especially multiple failed assertions) can be verbose. In most cases, you want to know either that everything passed or that x, y, and z failed. The core module Test::Harness
interprets TAP and displays only the most pertinent information. It also provides a program called prove
which takes the hard work out of the process:
$ prove truth_values.t
truth_values.t .. 1/4
# Failed test '... and the number zero should not'
# at truth_values.t line 4.
# Failed test 'the empty string should be false'
# at truth_values.t line 5.
# Looks like you failed 2 tests of 4.
truth_values.t .. Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/4 subtests
Test Summary Report
-------------------
truth_values.t (Wstat: 512 Tests: 4 Failed: 2)
Failed tests: 2-3
That's a lot of output to display what is already obvious: the second and third tests fail because zero and the empty string evaluate to false. It's easy to fix that failure by inverting the sense of the condition with the use of boolean coercion (Boolean Coercion):
ok( ! 0, '... and the number zero should not' );
ok( ! '', 'the empty string should be false' );
With those two changes, prove
now displays:
$ prove truth_values.t
truth_values.t .. ok
All tests successful.
Even though the heart of all automated testing is the boolean condition "is this true or false?", reducing everything to that boolean condition is tedious and offers few diagnostic possibilities. Test::More
provides several other convenient functions to ensure that your code behaves as you intend.
The is()
function compares two values using the eq
operator. If the values are equal, the test passes. Otherwise, the test fails and provides a relevant diagnostic message:
is( 4, 2 + 2, 'addition should hold steady across the universe' );
is( 'pancake', 100, 'pancakes should have a delicious numeric value' );
As you might expect, the first test passes and the second fails:
t/is_tests.t .. 1/2
# Failed test 'pancakes should have a delicious numeric value'
# at t/is_tests.t line 8.
# got: 'pancake'
# expected: '100'
# Looks like you failed 1 test of 2.
Where ok()
only provides the line number of the failing test, is()
displays the mismatched values.
is()
applies implicit scalar context to its values. This means, for example, that you can check the number of elements in an array without explicitly evaluating the array in scalar context:
my @cousins = qw( Rick Kristen Alex Kaycee Eric Corey );
is( @cousins, 6, 'I should have only six cousins' );
... though some people prefer to write scalar @cousins
for the sake of clarity.
Test::More
provides a corresponding isnt()
function which passes if the provided values are not equal (according to the ne
operator). Otherwise, it behaves the same way as is()
with respect to scalar context and comparison types.
Both is()
and isnt()
apply string comparisons with the Perl 5 operators eq
and ne
. This almost always does the right thing, but for complex values such as objects with overloading (Overloading) or dual vars (Dualvars), you may prefer explicit comparison testing. The cmp_ok()
function allows you to specify your own comparison operator:
cmp_ok( 100, $cur_balance, '<=', 'I should have at least $100' );
cmp_ok( $monkey, $ape, '==', 'Simian numifications should agree' );
Classes and objects provide their own interesting ways to interact with tests. Test that a class or object extends another class (Inheritance) with isa_ok()
:
my $chimpzilla = RobotMonkey->new();
isa_ok( $chimpzilla, 'Robot' );
isa_ok( $chimpzilla, 'Monkey' );
isa_ok()
provides its own diagnostic message on failure.
can_ok()
verifies that a class or object can do the requested method (or methods):
can_ok( $chimpzilla, 'eat_banana' );
can_ok( $chimpzilla, 'transform', 'destroy_tokyo' );
The is_deeply()
function compares two references to ensure that their contents are equal:
use Clone;
my $numbers = [ 4, 8, 15, 16, 23, 42 ];
my $clonenums = Clone::clone( $numbers );
is_deeply( $numbers, $clonenums,
'Clone::clone() should produce identical structures' );
If the comparison fails, Test::More
will do its best to provide a reasonable diagnostic indicating the position of the first inequality between the structures. See the CPAN modules Test::Differences
and Test::Deep
for more configurable tests.
Test::More
has several more test functions, but these are the most useful.
CPAN's infrastructure and ecosystem expects distributions to include a t/ containing one or more test files named with the .t suffix. By default, when you build a distribution with Module::Build
or ExtUtils::MakeMaker
, the testing step runs all of the t/*.t files, summarizes their output, and succeeds or fails on the results of the test suite as a whole. There are no concrete guidelines on how to manage the contents of individual .t files, though two strategies are popular:
The important considerations are maintainability of the test files, as larger files are more difficult to maintain than smaller files, and the granularity of the test suite. A hybrid approach is the most flexible; one test can verify that all of your modules compile, while other tests verify that each module behaves as intended.
It's often useful to run tests only for a specific feature under development. If you're adding the ability to breathe fire to your RobotMonkey
, you may want only to run the t/breathe_fire.t test file. When you have the feature working to your satisfaction, run the entire test suite to verify that local changes have no unintended global effects.
Test::More
relies on a testing backend known as Test::Builder
. The latter module manages the test plan and coordinates the test output into TAP. This design allows multiple test modules to share the same Test::Builder
backend. Consequently, the CPAN has hundreds of test modules available--and they can all work together in the same program.
Test::Exception
provides functions to ensure that your code throws (and does not throw) exceptions appropriately.Test::MockObject
and Test::MockModule
allow you to test difficult interfaces by mocking (emulating but producing different results).Test::WWW::Mechanize
allows you to test live web applications.Test::Database
provides functions to test the use and abuse of databases.Test::Class
offers an alternate mechanism for organizing test suites. It allows you to create classes in which specific methods group tests. You can inherit from test classes just as your code classes inherit from each other. This is an excellent way to reduce duplication in test suites. See the Test::Class
series written by Curtis Poe at http://www.modernperlbooks.com/mt/2009/03/organizing-test-suites-with-testclass.html.Test::Differences
tests strings and data structures for equality and displays any differences in its diagnostics.Test::Deep
tests the equivalence of nested data structures (Nested Data Structures).Devel::Cover
analyzes the execution of your test suite to report on the amount of your code your tests actually exercises. In general, the more coverage the better--though 100% coverage is not always possible, 95% is far better than 80%.The Perl QA project (http://qa.perl.org/) is a primary source of test modules as well as wisdom and practical experience making testing in Perl easy and effective.
Perl 5 produces optional warnings for many confusing, unclear, and ambiguous situations. Even though you should almost always enable warnings unconditionally, certain circumstances dictate prudence in disabling certain warnings--and Perl supports this.
Use the warn
builtin to emit a warning:
warn 'Something went wrong!';
warn
prints a list of values to the STDERR filehandle (Input and Output). Perl will append the filename and line number on which the warn
call occurred unless the last element of the list ends in a newline.
The core Carp
module offers other mechanisms to produce warnings. Its carp()
function reports a warning from the perspective of the calling code. That is, you could check the arity of a function (Arity) with:
use Carp;
sub only_two_arguments
{
my ($lop, $rop) = @_;
Carp::carp( 'Too many arguments provided' ) if @_ > 2;
...
}
... and anyone who reads the error message will receive the filename and line number of the calling code, not only_two_arguments()
. Similarly, Carp
's cluck()
produces an entire backtrace of all function calls up to the current function.
To track down weird warnings or exceptions throughout your system, enable Carp
's verbose mode throughout the entire program:
$ perl -MCarp=verbose my_prog.pl
This changes all carp()
(and croak()
--Reporting Errors) calls to include a backtrace. When you organize your code into modules (Modules), use Carp instead of warn
or die
to save debugging time.
Lexical encapsulation of warnings is as important as lexical encapsulation of variables. Older code may use the -w
command-line argument to enable warnings throughout the program, even if other code has not specifically attempted to suppress warnings. It's all or nothing. If you have the wherewithal to eliminate warnings and potential warnings throughout the entire codebase, this can be useful.
The modern approach is to use the warnings
pragma (footnote: ...or an equivalent such as use Modern::Perl;
.), which indicates the intent of the author of the code that normal operation should not produce warnings.
All of -w
, -W
, and -X
affect the value of the global variable $^W
. Code written before the warnings
pragma (Perl 5.6.0 in spring 2000) may local
ize $^W
to suppress certain warnings within a given scope. New code should use the pragma instead.
To disable selective warnings within a scope, use no warnings;
with an argument list. Omitting the argument list disables all warnings within that scope.
perldoc perllexwarn
lists all of the warnings categories your version of Perl 5 understands with the warnings
pragma. Most of them represent truly interesting conditions which Perl may find in your program. A few may be unhelpful in specific conditions. For example, the recursion
warning will occur if Perl detects that a function has called itself more than a hundred times. If you are confident in your ability to write recursion-ending conditions, you may disable this warning within the scope of the recursion (though tail calls may be better; Tail Calls).
If you're generating code (Code Generation) or locally redefining symbols, you may wish to disable the redefine
warnings.
Some experienced Perl hackers disable the uninitialized
value warnings in string-processing code which concatenates values from many sources. Careful initialization of variables can avoid the need to disable the warning, but local style and concision may render this warning moot.
If your project considers warnings as onerous as errors, you can make them lexically fatal. To promote all warnings into exceptions:
use warnings FATAL => 'all';
You may also make specific categories of warnings fatal, such as the use of deprecated constructs:
use warnings FATAL => 'deprecated';
Just as you can catch exceptions, so you can catch warnings. The %SIG
variable (footnote: See perldoc perlvar
.) holds handlers for all sorts of signals Perl or your operating system might throw. It also includes two slots for signal handlers for Perl 5 exceptions and warnings. To catch a warning, install an anonymous function into $SIG{__WARN__}
:
{
my $warning;
local $SIG{__WARN__} = sub { $warning .= shift };
# do something risky
...
say "Caught warning:\n$warning" if $warning;
}
Within the warning handler, the first argument is the warning's message. Admittedly, this technique is less useful than disabling warnings lexically--but it can come to good use in test modules such as Test::Warnings
from the CPAN, where the actual text of the warning is important.
With the use of the warnings::register
pragma you can even create your own lexical warnings so that users of your code can enable and disable lexical warnings as appropriate. This is easy to accomplish; from a module, use
the warnings::register
pragma:
package Scary::Monkey;
use warnings::register;
1;
This will create a new warnings category named after the package Scary::Monkey
. Enable these warnings with use warnings 'Scary::Monkey'
and disable them with no warnings 'Scary::Monkey'
.
Use warnings::enabled()
to test if the calling lexical scope has the given warning category enabled. Use warnings::warnif()
to produce a warning only if warnings are in effect. For example, to produce a warning in the deprecated
category:
package Scary::Monkey;
use warnings::register;
sub import
{
warnings::warnif( 'deprecated',
'empty imports from ' . __PACKAGE__ . ' are now deprecated' )
unless @_;
}
1;
See perldoc perllexwarn
for more details.
Most programs deal with the outside world in some fashion, and much of that interaction takes place with files: reading them, writing them, manipulating them in some other fashion. Perl's early history as a language for system administration and text processing has produced a language very well suited for file manipulation.
The primary mechanism of interacting with the world outside of a program is through a filehandle. Filehandles represent the state of some channel of input or output, such as the standard input or output of a program, a file from or to which to read or write, and the position in a given file. Every Perl 5 program has three standard filehandles available, STDIN
(the input to the program), STDOUT
(the output from the program), and STDERR
(the error output from the program).
By default, everything you print
or say
goes to STDOUT
, while errors and warnings and everything you warn()
goes to STDERR
. This separation of output allows you to redirect useful output and errors to two different places--an output file and error logs, for example.
Besides the standard filehandles, you can open your own filehandles with the open
builtin. To open a file for reading:
open my $fh, '<', 'filename'
or die "Cannot read '$filename': $!\n";
The first operand is a lexical which will hold the opened filehandle. The second operand is the file mode, which determines the type of the filehandle operation. The final operand is the name of the file. If the open
fails, the die
clause will throw an exception, with the contents of $!
giving the reason why the open failed.
Symbols | Explanation |
---|---|
< |
Open for reading |
> |
Open for writing, clobbering existing contents if the file exists and creating a new file otherwise. |
>> |
Open for writing, appending to any existing contents and creating a new file otherwise. |
+< |
Open for reading and writing. |
Besides files, you can open filehandles to scalars:
use autodie; # The autodie Pragma
my $captured_output;
open my $fh, '>', \$captured_output;
do_something_awesome( $fh );
Such filehandles support all of the existing file modes.
You may encounter older code which uses the two-argument form of open()
:
open my $fh, "> $some_file"
or die "Cannot write to '$some_file': $!\n";
The lack of clean separation between the intended file mode and the name of the file allows the possibility of unintentional behaviors (footnote: When you read that phrase, train yourself to think "I wonder if that might produce security problems?") when interpolating untrusted input into the second operand. You can safely replace the two-argument form of open with the three-argument form in every case without any loss of feature.
perldoc perlopentut
offers far more details about more exotic uses of open
, including its ability to launch and control other processes, as well as the use of sysopen
for finer-grained control over input and output. perldoc perlfaq5
includes working code for many common IO tasks.
Given a filehandle opened for input, read from it with the readline
operator, also written as <>
. The most common idiom is to read a line at a time in a while()
loop:
use autodie;
open my $fh, '<', 'some_file';
while (<$fh>)
{
chomp;
say "Read a line '$_'";
}
In scalar context, readline
iterates through the lines of the file until it reaches the end of the file (eof()
). Each iteration returns the next line. After reaching the end of the file, each iteration returns undef
. This while
idiom explicitly checks the definedness of the variable used for iteration, such that only the end of file condition ends the loop.
Every line read from readline
includes the character or characters which mark the end of a line. In most cases, this is a platform-specific sequence consisting of a newline (\n
), a carriage return (\r
), or a combination of the two (\r\n
). Use chomp
to remove your platform's specific newline sequence.
With everything all together, the cleanest way to read from files in Perl 5 is:
use autodie;
open my $fh, '<', $filename;
while (my $line = <$fh>)
{
chomp $line;
...
}
If you're not reading textual data--instead reading binary data--use binmode
on the filehandle before reading from or writing to it. This builtin tells Perl to treat all of the filehandle's data as pure data. Perl will not modify it in any fashion, as it might for platform portability. Although Unix-like platforms may not to need binmode
in this case, portable programs use it anyway (Unicode and Strings).
Given a filehandle open for output, you may print
or say
to it:
use autodie;
open my $out_fh, '>', 'output_file.txt';
print $out_fh "Here's a line of text\n";
say $out_fh "... and here's another";
Note the lack of comma between the filehandle and the subsequent operand.
You may write an entire list of values to print
or say
, in which case Perl 5 uses the magic global $,
as the separator between list values. Perl also uses any value of $\
as the final argument to print
or say
.
When you've finished working with a file, you may close
it explicitly or allow its filehandle to go out of scope, in which case Perl will close it for you. The benefit of calling close
explicitly is that you can check for--and recover from--specific errors, such as running out of space on a storage device or a broken network connection.
As usual, autodie
handles these checks for you:
use autodie;
open my $fh, '>', $file;
...
close $fh;
For every line read, Perl 5 increments the value of the variable $.
, which serves as a line counter.
readline
uses the current contents of $/
as the line-ending sequence. The value of this variable defaults to the most appropriate line-ending character sequence for text files on your current platform. In truth, the word line is a misnomer. You can set $/
to contain any sequence of characters (footnote: ... but never a regular expression, because Perl 5 does not support that.). This is useful for highly-structured data in which you want to read a record at a time.
By default, Perl uses buffered output, where it performs IO only when it has enough data to exceed a threshold. This allows Perl to batch up expensive IO operations instead of always writing very small amounts of data. Yet sometimes you want to send data as soon as you have it without waiting for that buffering--especially if you're writing a command-line filter connected to other programs or a line-oriented network service.
The $|
variable controls buffering on the currently active output filehandle. When set to a non-zero value, Perl will flush the output after each write to the filehandle. When set to a zero value, Perl will use its default buffering strategy.
In lieu of the global variable, use the autoflush()
method on a lexical filehandle. Be sure to load FileHandle
first, as you cannot call methods on lexical filehandles otherwise:
use autodie;
use FileHandle;
open my $fh, '>', 'pecan.log';
$fh->autoflush( 1 );
...
Once you have loaded FileHandle
, you may also use its input_line_number()
and input_record_separator()
methods instead of $.
and $/
respectively. See perldoc FileHandle
and perldoc IO::Handle
for more information.
You may also manipulate directories and file paths with Perl 5. Working with directories is similar to working with files, except that you cannot write to directories (footnote: Instead, you save and move and rename and remove files.). Open a directory handle with opendir
:
use autodie;
opendir my $dirh, '/home/monkeytamer/tasks/';
The readdir
builtin reads from a directory. As with readline
, you may iterate over the contents of directories one at a time or you may assign them to a list in one swoop:
# iteration
while (my $file = readdir $dirh )
{
...
}
# flattening into a list
my @files = readdir $otherdirh;
As a new feature available in 5.12, readdir
in a while
will set $_
, just as does readline
in while
:
use 5.012;
use autodie;
opendir my $dirh, 'tasks/circus/';
while (readdir $dirh)
{
next if /^\./;
say "Found a task $_!";
}
The curious regular expression in this example skips so-called hidden files on Unix and Unix-like systems, where a leading dot prevents them from appearing in directory listings by default. It also skips two special files returned from every readdir
invocation, specifically .
and ..
, which represent the current directory and the parent directory, respectively.
The names returned from readdir
are relative to the directory itself. In other words, if the tasks/ directory contains three files named eat, drink, and be_monkey, readdir
will return eat
, drink
, and be_monkey
and not tasks/eat, tasks/drink, and task/be_monkey. In contrast, an absolute path is a path fully qualified to its filesystem.
Close a directory handle by letting it go out of scope or with the closedir
builtin.
Perl 5 offers a Unixy view of the world, or at least your filesystem. Even if you aren't using a Unix-like platform, Perl will interpret Unix-style paths appropriately for your operating system and filesystem. In other words, if you're using Microsoft Windows, you can use the path C:/My Documents/Robots/Bender/ just as easily as you can use the path C:\My Documents\Robots\Caprica Six\.
Even so, manipulating file paths in a safe and cross-platform manner suggests that you avoid string interpolation and concatenation. The core File::Spec
module family provides abstractions to allow you to manipulate file paths in safe and portable fashions. Even so, it's not always easy to understand or to use correctly.
The Path::Class
distribution on the CPAN provides a nicer interface around File::Spec
. Use the dir()
function to create an object representing a directory and the file()
function to create an object representing a file:
use Path::Class;
my $meals = dir( 'tasks', 'cooking' );
my $file = file( 'tasks', 'health', 'exoskeleton_research.txt' );
... and you can get file objects from directories:
my $lunch = $meals->file( 'veggie_calzone.txt' );
... and vice versa:
my $robots_dir = $robot_list->dir();
You can even open filehandles to directories and files:
my $dir_fh = $dir->open();
my $robots_fh = $robot_list->open( 'r' ) or die "Open failed: $!";
Both Path::Class::Dir
and Path::Class::File
offer further useful behaviors.
Besides reading and writing files, you can also manipulate them as you would directly from a command line or a file manager. The -X
file test operators can give you information about the attributes of files and directories on your system. For example, to test that a file exists:
say 'Present!' if -e $filename;
The -e
operator has a single operand, the name of a file or a file or directory handle. If the file exists, the expression will evaluate to a true value. perldoc -f -X
lists all other file tests; the most popular are:
-f
, which returns a true value if its operand is a plain file
-d
, which returns a true value if its operand is a directory
-r
, which returns a true value if the file permissions of its operand permit reading by the current user
-z
, which returns a true value if its operand is a non-empty file
As of Perl 5.10.1, you may look up the documentation for any of these operators with perldoc -f -r
, for example.
The rename
builtin can rename a file or move it between directories. It takes two operands, the old name of the file and the new name:
use autodie;
rename 'death_star.txt', 'carbon_sink.txt';
# or if you're stylish:
rename 'death_star.txt' => 'carbon_sink.txt';
There's no core builtin to copy a file, but the core File::Copy
module provides both copy()
and move()
functions. Use unlink
to remove one or more files. These functions and builtins all return true values on success and set $!
on error.
Finally, Perl allows you to change its notion of the current directory. By default, this is the active directory from where you launched the program. The core Cwd
module allows you to determine this. The builtin chdir
attempts to change the current working directory. This can be useful for manipulating files with relative--not absolute--paths.
A module is a package contained in its own file and loadable with use
or require
. A module must be valid Perl 5 code. It must end with an expression which evaluates to a true value so that the Perl 5 parser knows it has loaded and compiled the module successfully.
There are no other requirements, only strong conventions.
Packages correspond to files on disk in that when you load a module with use
or require
's bareword form, Perl splits the package name on double-colons (::
) and turns the components of the package name into a file path. Thus:
use StrangeMonkey;
... causes Perl to search for a file named StrangeMonkey.pm in every directory in @INC
, in order, until it finds one or exhausts the list. As well:
use StrangeMonkey::Persistence;
... causes Perl to search for a file named Persistence.pm
in every directory named StrangeMonkey/ present in every directory in @INC
, and so on. Finally:
use StrangeMonkey::UI::Mobile;
... causes Perl to search for a relative file path of StrangeMonkey/UI/Mobile.pm in every directory in @INC
. There is no technical requirement that the file at that location contain any package
declaration, let alone a package
declaration of StrangeMonkey::UI::Mobile
. Maintenance concerns highly recommend that convention, however.
When you load a module with the use
builtin, Perl loads it from disk, then calls its import()
method, passing any arguments you provided. This occurs at compilation time:
use strict; # calls strict->import()
use CGI ':standard'; # calls CGI->import( ':standard' )
use feature qw( say switch ) # calls feature->import( qw( say switch ) )
You do not have to provide an import()
method, and you may use it to do anything you wish, but the standard API expectation is that it takes a list of arguments of symbols (usually functions) to make available in the calling namespace. This is not a strong requirement; pragmas (Pragmas) such as strict
use arguments to change their behavior instead of exporting symbols.
The no
builtin calls a module's unimport()
method, if it exists, passing any arguments. While it's possible to remove exported symbols, it's more common to disable specific features of pragmas and other modules which introduce new behaviors through import()
:
use strict;
# no symbolic references, variable declaration required, no barewords
...
{
no strict 'refs';
# symbolic references allowed
# variable declaration still required; barewords prohibited
}
Like use
and import()
, no
calls unimport()
during compilation time. Effectively:
use Module::Name qw( list of arguments );
... is the same as:
BEGIN
{
require 'Module/Name.pm';
Module::Name->import( qw( list of arguments ) );
}
Similarly:
no Module::Name qw( list of arguments );
... is the same as:
BEGIN
{
require 'Module/Name.pm';
Module::Name->unimport( qw( list of arguments ) );
}
... including the require
of the module.
You may call import()
and unimport()
directly, though it makes little sense to unimport a pragma outside of a BEGIN
block, as they often have compilation-time effects.
Perl 5's use
and require
are case-sensitive, even if the underlying filesystem is not. While Perl knows the difference between strict
and Strict
, your combination of operating system and file system may not. If you were to write use Strict;
, Perl would not find strict.pm on a case-sensitive filesystem. With a case-insensitive filesystem, Perl will happily load Strict.pm, but will try to call Strict->import()
. Nothing will happen, because strict.pm declares a package named strict
.
Portable programs are strict about case even if they don't have to be.
A module can make certain global symbols available to other packages through a process known as exporting. This is the flip side of passing arguments to import()
through a use
statement.
The standard way of exporting functions or variables to other modules is through the core module Exporter
. Exporter
relies on the presence of package global variables--@EXPORT_OK
and @EXPORT
in particular--which contain a list of symbols to export when requested.
Consider a StrangeMonkey::Utilities
module which provides several standalone functions usable throughout the system:
package StrangeMonkey::Utilities;
use Exporter 'import';
our @EXPORT_OK = qw( round_number translate screech );
...
1;
Any other code now can use this module and, optionally, import any or all of the three exported functions (footnote: ... though using the module in any code is sufficient to allow any other code to invoke its functions by their fully-qualified names.). You may also export variables:
push @EXPORT_OK, qw( $spider $saki $squirrel );
You can export symbols by default by listing them in @EXPORT
instead of @EXPORT_OK
:
our @EXPORT = qw( monkey_dance monkey_sleep );
... so that any use StrangeMonkey::Utilities;
will import both functions. Be aware that specifying symbols to import will not import default symbols. You can also load a module without importing any symbols by providing an explicit empty list:
# make the module available, but import() nothing
use StrangeMonkey::Utilities ();
Regardless of any import lists, you can always call functions in another package with their fully-qualified names:
StrangeMonkey::Utilities::screech();
Perl 5 does not require you to use modules, nor packages, nor namespaces. You may put all of your code in a single .pl file, or in multiple .pl files you require
as necessary. You have the flexibility to manage your code in the most appropriate way, given your development style, the formality and risk and reward of the project, your experience, and your comfort with Perl 5 deployment.
Even so, a project with more than a couple of hundred lines of code receives multiple benefits from module organization:
Even if you do not use an object-oriented approach, modeling every distinct entity or responsibility in your system with its own module keeps related code together and separate code separate.
A distribution is a collection of one or more modules (Modules) which forms a single redistributable, testable, and installable unit. Effectively it's a collection of module and metadata.
The easiest way to manage software configuration, building, distribution, testing, and installation even within your organization is to create distributions compatible with the CPAN. The conventions of the CPAN--how to package a distribution, how to resolve its dependencies, where to install software, how to verify that it works, how to display documentation, how to manage a repository--have all arisen from the rough consensus of thousands of contributors working on tens of thousands of projects.
In particular, the copious amount of testing and reporting and dependency checking achieved by CPAN developers exceeds the available information and quality of work in any other language community. A distribution built to CPAN standards can be tested on several versions of Perl 5 on several different hardware platforms within a few hours of its uploading--all without human intervention.
You may choose never to release any of your code as public CPAN distributions, but you can reuse existing CPAN tools and designs as possible. The combination of intelligent defaults and customizability are likely to meet your specific needs.
A distribution obviously includes one or more modules. It also includes several other files and directories:
Additionally, a well-formed distribution must contain a unique name and single version number (often taken from its primary module). Any well-formed distribution you download from the public CPAN should conform to these standards--and the CPANTS service evaluates the kwalitee (footnote: Quality is difficult to measure with heuristics. Kwalitee is the machine measurable relative of quality.) of all CPAN distributions and recommends packaging improvements.
The Perl 5 core includes several tools to manage distributions--not just installing them from the CPAN, but developing and managing your own:
CPAN.pm
is the official CPAN client. While by default it installs distributions from the public CPAN, you can point it to your own repository instead of or in addition to the public repository.CPANPLUS
is an alternate CPAN client with a different design approach. It does some things better than CPAN.pm
, but they are largely equivalent at this point. Use whichever you prefer.Module::Build
is a pure-Perl tool suite for configuring, building, installing, and testing distributions. It works with the Build.PL file mentioned earlier.ExtUtils::MakeMaker
is an older, legacy tool which Module::Build
intends to replace. It is still in wide use, though it is in maintenance mode and receives only the most critical bug fixes. It works with the Makefile.PL file mentioned earlier.Test::More
(Testing) is the basic and most widely used testing module used to write automated tests for Perl software.
Test::Harness
and prove
(Running Tests) are the tools used to run tests and to interpret and report their results.In addition, several non-core CPAN modules make your life easier as a developer:
App::cpanminus
is a new utility which provides almost configuration-free use of the public CPAN. It fulfills 90% of your needs to find and install modules.App::perlbrew
helps you to manage multiple installations of Perl 5. This is very useful to use a newer version than the system version or to isolate distributions you've installed for one application from distributions you've installed for another.
CPAN::Mini
and the cpanmini
command allow you to create your own (private) mirror of the public CPAN. You can inject your own distributions into this repository and manage which versions of the public modules are available in your organization.
Dist::Zilla
is a toolkit for managing distributions by automating away common tasks. While it can use either Module::Build
or ExtUtils::MakeMaker
, it can replace your use of them directly.Test::Reporter
allows you to report the results of running the automated test suites of distributions you install, giving their authors more data on any failures.The process of designing a distribution could fill a book (see Sam Tregar's Writing Perl Modules for CPAN), but a few design principles will help you. Start with a utility such as Module::Starter
or Dist::Zilla
from the CPAN. The initial cost of learning the configuration and rules may seem like a steep investment, but the benefit of having everything set up the right way (and in the case of Dist::Zilla
, never going out of date) relieves you of much tedious bookkeeping.
Then consider several rules.
Perl 5 provides a special package which is the ancestor of all other packages in a very object-oriented way. The UNIVERSAL
package provides a few methods available for all other classes and objects.
The isa()
method takes a string containing the name of a class or the name of a built-in type. You can call it as a class method or an instance method on an object. It returns true if the class or object is or derives from the named class, or if the object itself is a blessed reference to the given type.
Given an object $pepper
, a hash reference blessed into the Monkey
class (which inherits from the Mammal
class):
say $pepper->isa( 'Monkey' ); # prints 1
say $pepper->isa( 'Mammal' ); # prints 1
say $pepper->isa( 'HASH' ); # prints 1
say Monkey->isa( 'Mammal' ); # prints 1
say $pepper->isa( 'Dolphin' ); # prints 0
say $pepper->isa( 'ARRAY' ); # prints 0
say Monkey->isa( 'HASH' ); # prints 0
Perl 5's core types are SCALAR
, ARRAY
, HASH
, Regexp
, IO
, and CODE
.
You can override isa()
in your own classes. This can be useful when working with mock objects (see Test::MockObject
and Test::MockModule
on the CPAN, for example) or with code that does not use roles (Roles).
The can()
method takes a string containing the name of a method. It returns a reference to the function which implements that method, if it exists. Otherwise, it returns false. You may call this on a class, an object, or the name of a package. In the latter case, it returns a reference to a function, not a method.
Given a class named SpiderMonkey
with a method named screech
, you can get a reference to the method with:
if (my $meth = SpiderMonkey->can( 'screech' )) { ... }
if (my $meth = $sm->can( 'screech' )
{
$sm->$meth();
}
Given a plugin-style architecture, you can test to see if a package implements a specific function in a similar way. The UNIVERSAL::require
module adds a require()
method to the UNIVERSAL
namespace to invert the sense of the require
builtin:
# a useful CPAN module
use UNIVERSAL::require;
die $@ unless $module->require();
if (my $register = $module->can( 'register' )
{
$register->();
}
... though in larger programs, use Module::Pluggable
to handle this busy work for you.
You can (and should) override can()
in your own code if you use AUTOLOAD()
(Drawbacks of AUTOLOAD).
The VERSION()
method is available to all packages, classes, and objects. It returns the value of the $VERSION
variable for the appropriate package or class. It takes a version number as an optional parameter. If you provide this version number, the method will throw an exception if the queried $VERSION
is not equal to or greater than the parameter.
Given a HowlerMonkey
module of version 1.23
:
say HowlerMonkey->VERSION(); # prints 1.23
say $hm->VERSION(); # prints 1.23
say $hm->VERSION( 0.0 ); # prints 1.23
say $hm->VERSION( 1.23 ); # prints 1.23
say $hm->VERSION( 2.0 ); # throws exception
You can override VERSION()
in your own code, but there's little reason to do so.
The DOES()
method is new in Perl 5.10.0. It exists to support the use of roles (Roles) in programs. Pass it an invocant and the name of a role, and the method will return true if the appropriate class somehow does that role--whether through inheritance, delegation, composition, role application, or any other mechanism.
The default implementation of DOES()
falls back to isa()
, because inheritance is one mechanism by which a class may do a role. Given a Cappuchin
:
say Cappuchin->DOES( 'Monkey' ); # prints 1
say $cappy->DOES( 'Monkey' ); # prints 1
say Cappuchin->DOES( 'Invertebrate' ); # prints 0
You can (and should) override DOES()
in your own code if you manually provide a role or other allomorphic behavior.
It's tempting to store other methods in UNIVERSAL
to make it available to all other classes and objects in Perl 5. Avoid this temptation; this global behavior can have subtle side effects because it is unconstrained.
With that said, occasional abuse of UNIVERSAL
for debugging purposes and to fix improper default behavior may be excusable. For example, Joshua ben Jore's UNIVERSAL::ref
distribution makes the nearly-useless ref()
operator usable. The UNIVERSAL::can
and UNIVERSAL::isa
distributions can help you debug anti-polymorphism bugs (Method-Function Equivalence), while Perl::Critic
can detect those (footnote: ... and many, many other.) problems.
Outside of very carefully controlled code and very specific, very pragmatic situations, there's no reason to put code in UNIVERSAL
directly. There are almost always much better design alternatives.
Improving as a programmer requires you to search for better abstractions. The less code you have to write, the better. The more general your solutions, the better. When you can delete code and add features, you've achieved something great.
Novice programmers write more code than they need to write, partly from unfamiliarity with their languages, libraries, and idioms, but also due to inexperience creating and maintaining good abstractions. They start by writing long lists of procedural code, then discover functions, then parameters, then objects, and--perhaps--higher-order functions and closures.
Writing programs to write programs for you--metaprogramming or code generation)--offers greater possibilities for abstraction. This can be as clear as exploiting higher-order programming capabilities or a rat hole down which you find yourself confused and frightened. The techniques are powerful and useful. For example, they form the basis of Moose (Moose).
The AUTOLOAD
technique (AUTOLOAD) for missing functions and methods demonstrates this technique in a constrained form; Perl 5's function and method dispatch system allows you to customize what happens when normal lookup fails.
The simplest code generation technique is to build a string containing a snippet of valid Perl and compile it with the string eval
operator. Unlike the exception-catching block eval
operator, string eval
compiles the contents of the string within the current scope, including the current package and lexical bindings.
A common use for this technique is providing a fallback if you can't (or don't want to) load an optional dependency:
eval { require Monkey::Tracer }
or eval 'sub Monkey::Tracer::log {}';
If Monkey::Tracer
is not available, its log()
function will exist, but will do nothing.
This simple example is deceptive. You must handle quoting issues to include variables within your eval
d code. Add more complexity to interpolate some but not others:
sub generate_accessors
{
my ($methname, $attrname) = @_;
eval <<"END_ACCESSOR";
sub get_$methname
{
my \$self = shift;
return \$self->{$attrname};
}
sub set_$methname
{
my (\$self, \$value) = \@_;
\$self->{$attrname} = \$value;
}
END_ACCESSOR
}
Woe to those who forget a backslash! Good luck convincing your syntax highlighter what's happening! Worse yet, each invocation of string eval
builds a new data structure representing the entire code. Compiling code isn't free, either--cheaper than performing IO, perhaps, but not free.
Even so, this technique is simple and reasonably easy to understand.
While building accessors and mutators with eval
is straightforward, closures (Closures) allow you to add parameters to generated code at compilation time without requiring additional evaluation:
sub generate_accessors
{
my $attrname = shift;
my $getter = sub
{
my $self = shift;
return $self->{$attrname};
};
my $setter = sub
{
my ($self, $value) = @_;
$self->{$attrname} = $value;
};
return $getter, $setter;
}
This code avoids unpleasant quoting issues and runs more quickly, as there's only one compilation stage, no matter how many accessors you create. It even uses less memory by sharing the compiled code between all instances of the closure. All that differs is the binding to the $attrname
lexical. In a long-running process, or with a lot of accessors, this technique can be very useful.
Installing into symbol tables is reasonably easy, if ugly:
{
my ($getter, $setter) = generate_accessors( 'homecourt' );
no strict 'refs';
*{ 'get_homecourt' } = $getter;
*{ 'set_homecourt' } = $setter;
}
The odd syntax of an asterisk (footnote: Think of it as a typeglob sigil, where a typeglob is Perl jargon for "symbol table".) deferencing a hash refers to a symbol in the current symbol table, which is the place in the current namespace which contains globally-accessible symbols such as package globals, functions, and methods. Assigning a reference to a symbol table entry installs or replaces the appropriate entry. To promote an anonymous function to a method, assign that function reference to the appropriate entry in the symbol table.
This operation refers to a symbol with a string, not a literal variable name, so it's a symbolic reference and it's necessary to disable strict
reference checking for the operation. Many programs have a subtle bug in similar code, as they assign and generate in a single line:
{
no strict 'refs';
*{ $methname } = sub {
# subtle bug: strict refs
# are disabled in here too
};
}
This example disables strictures for the outer block as well as the inner block, the body of the function itself. Only the assignment violates strict reference checking, so disable strictures for that operation alone.
If the name of the method is a string literal in your source code, rather than the contents of a variable, you can assign to the relevant symbol directly rather than through a symbolic reference:
{
no warnings 'once';
(*get_homecourt, *set_homecourt) = generate_accessors( 'homecourt' );
}
Assigning directly to the glob does not violate strictures, but mentioning each glob only once does produce a "used only once" warning unless you explicitly suppress it within the scope.
Unlike code written explicitly as code, code generated through string eval
gets compiled at runtime. Where you might expect a normal function to be available throughout the lifetime of your program, a generated function might not be available when you expect it.
Force Perl to run code--to generate other code--during the compilation stage by wrapping it in a BEGIN
block. When the Perl 5 parser encounters a block labeled BEGIN
, it parses the entire block. Provided it contains no syntax errors, the block will run immediately. When it finishes, parsing will continue as if there were no interruption.
In practical terms, the difference between writing:
sub get_age { ... }
sub set_age { ... }
sub get_name { ... }
sub set_name { ... }
sub get_weight { ... }
sub set_weight { ... }
... and:
sub make_accessors { ... }
BEGIN
{
for my $accessor (qw( age name weight ))
{
my ($get, $set) = make_accessors( $accessor );
no strict 'refs';
*{ 'get_' . $accessor } = $get;
*{ 'set_' . $accessor } = $set;
}
}
... is primarily one of maintainability.
Within a module, any code outside of functions executes when you use
it, because of the implicit BEGIN
Perl adds around the require
and import
(Importing). Any code outside of a function but inside the module will execute before the import()
call occurs. If you require
the module, there is no implicit BEGIN
block. The execution of code outside of functions will happen at the end of parsing.
Also beware of the interaction between lexical declaration (the association of a name with a scope) and lexical assignment. The former happens during compilation, while the latter occurs at the point of execution. This code has a subtle bug:
use UNIVERSAL::require;
# buggy; do not use
my $wanted_package = 'Monkey::Jetpack';
BEGIN
{
$wanted_package->require();
$wanted_package->import();
}
... because the BEGIN
block will execute before the assignment of the string value to $wanted_package
occurs. The result will be an exception from attempting to invoke the require()
method on the undefined value.
Unlike installing function references to populate namespaces and to create methods, there's no simple default way to create classes in Perl 5. Fortunately, a mature and powerful distribution is available from the CPAN to do just this. Class::MOP
is the library which makes Moose (Moose) possible. It provides a meta object protocol--a mechanism for creating and manipulating an object system in terms of itself.
Rather than writing your own fragile string eval
code or trying to poke into symbol tables manually, you can manipulate the entities and abstractions of your program with objects and methods.
To create a class:
use Class::MOP;
my $class = Class::MOP::Class->create( 'Monkey::Wrench' );
You can add attributes and methods to this class when you create it:
use Class::MOP;
my $class = Class::MOP::Class->create(
'Monkey::Wrench' =>
(
attributes =>
[
Class::MOP::Attribute->new( '$material' ),
Class::MOP::Attribute->new( '$color' ),
]
methods =>
{
tighten => sub { ... },
loosen => sub { ... },
}
),
);
... or add them to the metaclass (the object which represents that class) after you've created it:
$class->add_attribute( experience => Class::MOP::Attribute->new( '$xp' ) );
$class->add_method( bash_zombie => sub { ... } );
... and you can inspect the metaclass:
my @attrs = $class->get_all_attributes();
my @meths = $class->get_all_methods();
You can similarly create and manipulate and introspect attributes and methods with Class::MOP::Attribute
and Class::MOP::Method
.
Perl 5 is not a pervasively object oriented language. Its core data types (scalars, arrays, and hashes) are not objects with methods you can overload. Even so, you can control the behavior of your own classes and objects, especially when they undergo coercion or evaluation in various contexts. This is overloading.
Overloading can be subtle but powerful. An interesting example is overloading how an object behaves in boolean context, especially if you use something like the Null Object pattern (http://www.c2.com/cgi/wiki?NullObject). In boolean context, an object will be true... but not if you overload boolification.
You can overload what the object does for almost every operation: stringification, numification, boolification, iteration, invocation, array access, hash access, arithmetic operations, comparison operations, smart match, bitwise operations, and even assignment.
The most useful are often the most common: stringification, numification, and boolification. The overload
pragma allows you to associate a function with an operation you can overload. Here's a class which overloads boolean evaluation:
package Null;
use overload 'bool' => sub { 0 };
In all boolean contexts, every instance of this class will evaluate to false.
The arguments to the overload
pragma are pairs where the key describes the type of overload and the value is a function reference to call in place of Perl's default behavior for that object.
It's easy to add a stringification:
package Null;
use overload
'bool' => sub { 0 },
'""' => sub { '(null)' };
Overriding numification is more complex, because arithmetic operators tend to be binary ops (Arity). Given two operands both with overloaded methods for addition, which takes precedence? The answer needs to be consistent, easy to explain, and understandable by people who haven't read the source code of the implementation.
perldoc overload
attempts to explain this in the sections labeled Calling Conventions for Binary Operations and MAGIC AUTOGENERATION, but the easiest solution is to overload numification and tell overload
to use the provided overloads as fallbacks where possible:
package Null;
use overload
'bool' => sub { 0 },
'""' => sub { '(null)' },
'0+' => sub { 0 },
fallback => 1;
Setting fallback
to a true value lets Perl use any other defined overloads to compose the requested operation, if possible. If that's not possible, Perl will act as if there were no overloads in effect. This is often what you want.
Without fallback
, Perl will only use the specific overloadings you have provided. If someone tries to perform an operation you have not overloaded, Perl will throw an exception.
Subclasses inherit overloadings from their ancestors. They may override this behavior in one of two ways. If the parent class uses overloading as shown, with function references provided directly, a child class must override the parent's overloaded behavior by using overload
directly.
Parent classes can allow their descendants more flexibility by specifying the name of a method to call to implement the overloading, rather than hard-coding a function reference:
package Null;
use overload
'bool' => 'get_bool',
'""' => 'get_string',
'0+' => 'get_num',
fallback => 1;
Child classes do not have to use overload
themselves; they can merely override the appropriate get_*
methods. This is often more flexible.
Overloading may seem like a tempting tool to use to produce symbolic shortcuts for new operations. The IO::All
CPAN distribution pushes this idea to its limit to produce clever ideas for concise and composable code. Yet for every brilliant API refined through the appropriate use of overloading, a dozen more messes congeal. Sometimes the best code eschews cleverness in favor of simple and straightforward design.
Overriding addition, multiplication, and even concatenation on a Matrix
class makes sense, only because the existing notation for those operations is pervasive. A new problem domain without that established notation is a poor candidate for overloading, as is a problem domain where you have to squint to make Perl's existing operators match a different notation.
Damian Conway's Perl Best Practices suggests that the other useful use of overloading is to prevent the accidental abuse of objects. For example, overloading numification to croak()
for objects which have no reasonable single numeric representation can help you find real bugs in real programs. Overloading in Perl 5 is relatively rare, but this suggestion can improve the reliability and safety of programs.
Perl gives you tools with which to write programs securely. These tools are no substitute for careful thought and planning, but they reward caution and understanding and can help you avoid subtle mistakes.
A feature called taint mode or taint adds a small amount of metadata to all data which comes from sources outside of your program. Any data derived from tainted data is also tainted. You may use tainted data within your program, but if you use it to affect the outside world--if you use it insecurely--Perl will throw a fatal exception.
perldoc perlsec
explains taint mode in copious detail among other security guidelines.
To enable taint mode, launch your program with the -T
flag. You can use this flag on the #!
line of a program only if you make the program executable and do not launch it with perl
; if you run it as perl mytaintedappl.pl
and neglect the -T
flag, Perl will exit with an exception. By the time Perl encounters the flag on the #!
line, it's missed its opportunity to taint the environment data which makes up %ENV
, for example.
Taint can come from two places: file input and the program's operating environment. The former is anything you read from a file or collect from users in the case of web or network programming. The latter is more subtle. This includes any command-line arguments, environment variables, and data from system calls. Even operations such as reading from a directory handle (opened with opendir()
) produces tainted data.
The tainted()
function from the core module Scalar::Util
returns true if its argument is tainted:
die "Oh no!" if Scalar::Util::tainted( $some_suspicious_value );
To remove taint, you must extract known-good portions of the data with a regular expression capture. The captured data will be untainted. If your user input consists of a US telephone number, you can untaint it with:
die "Number still tainted!"
unless $tainted_number =~ /(\(/d{3}\) \d{3}-\d{4})/;
my $safe_number = $1;
The more specific your pattern is about what you allow, the more secure your program can be. The opposite approach of denying specific items or forms runs the risk of overlooking something harmful. In the case of security, Perl prefers that you disallow something that's safe but unexpected than that you allow something harmful which appears safe. Even so, nothing prevents you from writing a capture for the entire contents of a variable--but in that case, why use taint?
One source of taint is the superglobal %ENV
, which represents environment variables for the system. This data is tainted because forces outside of the program's control can manipulate values there. Any environment variable which modifies how Perl or the shell finds files and directories is an attack vector. A taint-sensitive program should delete several keys from %ENV
and set $ENV{PATH}
to a specific and well-secured path:
delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) };
$ENV{PATH} = '/path/to/app/binaries/';
If you do not set $ENV{PATH}
appropriately, you will receive messages about its insecurity.
For similar reasons, @INC
does not contain the current working directory under taint mode. Perl will also ignore the PERL5LIB
and PERLLIB
environment variables. Use the lib
pragma or the -I
flag to perl
if you need to add library directories to the program.
Taint mode is all or nothing. It's either on or off. This sometimes leads people to use permissive patterns to untaint data, and gives the illusion of security. Review untainting carefully.
Unfortunately, not all modules handle tainted data appropriately. This is a bug which CPAN authors should take seriously. If you have to make legacy code taint-safe, consider the use of the -t
flag, which enables taint mode but reduces taint violations from exceptions to warnings. This is not a substitute for full taint mode, but it allows you to secure existing programs without the all or nothing approach of -T
.