**Introduction**
This page discusses both subroutines and references. They're on the same page because references are often passed into and out of subroutines.
**References**
In Perl, you can pass only one kind of argument to a subroutine: a scalar. To pass any other kind of argument, you need to convert it to a scalar. You do that by passing a **reference** to it. A reference to anything is a scalar. If you're a C programmer you can think of a reference as a pointer (sort of).
The following table discusses the referencing and de-referencing of variables. Note that in the case of lists and hashes, you reference and dereference the list or hash as a whole, not individual elements (at least not for the purposes of this discussion).
^ Variable ^ Instantiating the scalar ^ Instantiating a reference to it ^ Referencing it ^ Dereferencing it ^ Accessing an element ^
| $scalar | $scalar = "steve"; | $ref = \"steve"; | $ref = \$scalar | $$ref or ${$ref} | N/A |
| @list | @list = ("steve", "fred"); | $ref = ["steve", "fred"]; | $ref = \@list | @{$ref} | ${$ref}[3] $ref->[3] |
| %hash | %hash = ("name" => "steve", "job" => "Troubleshooter"); | $hash = {"name" => "steve", "job" => "Troubleshooter"}; | $ref = \%hash | %{$ref} | ${$ref}{"president"} $ref->{"president"} |
| FILE | | | $ref = \*FILE | {$ref} or scalar <$ref> | |
These principles are demonstrated in the source code below. Note the following anomolies:
*A variable with a % sign won't evaluate out when placed in doublequotes. Variables with @ and $ will. I have no idea why.
sub doscalar
{
my($scalar) = "This is the scalar";
my($ref) = \$scalar;
print "${$ref}\n"; # Prints "This is the scalar".
}
sub dolist
{
my(@list) = ("Element 0", "Element 1", "Element 2");
my($ref) = \@list;
print "@{$ref}\n"; # Prints "Element 0 Element 1 Element 2".
print "${$ref}[1]\n"; # Prints "Element 1".
}
sub dohash
{
my(%hash) = ("president"=>"Clinton",
"vice president" => "Gore",
"intern" => "Lewinsky");
my($ref) = \%hash;
# NOTE: Can't put %{ref} inside doublequotes!!! Doesn't work!!!
# Prints "internLewinskyvice presidentGorepresidentClinton".
# NOTE: Hash elements might print in any order!
print %{$ref}; print "\n";
# NOTE: OK to put ${$ref}{} in doublequotes.
# NOTE: Prints "Gore".
print "${$ref}{'vice president'}\n";
}
&doscalar;
&dolist;
&dohash;
**Subroutines: A Discussion**
Subroutines are the basic computer science methodology to divide tasks into subtasks. They take zero or more scalar arguments as input (and possibly output), and they return zero or one scalar as a return value. Note that the scalar arguments and/or return values can be references to lists, hashes, or any other type of complex data, so the possibilities are limitless.
In computer science, there are two methods of passing arguments to a subroutine:
*By value
*By reference
When passing by value, the language makes a copy of the argument, and all access inside the subroutine is to that copy. Therefore, changes made inside the subroutine do not effect the calling routine. Such arguments**cannot** be used as output from the subroutine. The preferred method of outputting from a subroutine is via the return value. Unfortunately, the Perl language doesn't support it. Instead, the programmer must explicitly make the copy inside the subroutine.
In general, I believe it's best to use arguments as input-only.
When passing by reference, the language makes the argument's exact variable available inside the subroutine, so any changes the subroutine makes to the argument affect the arguments value in the calling procedure (after the subroutine call, of course). This tends to reduce encapsulation, as there's no way of telling in the calling routine that the called routine changed it. Passing by reference harkens back to the days of global values, and in general creates less robust code.
All arguments in Perl are passed by reference! If the programmer wishes to make a copy of the argument to simulate passing by value (and I believe in most cases he should), he must explicitly make the copy in the subroutine and not otherwise access the original arguments.
----
**NOTE: Modern Perl versions (5.003 and newer) enable you to do function prototyping somewhat similar to C. Doing so lessens the chance for wierd runtime errors. Because this page was created before Perl prototyping was common, much of its code is old school. This will change as time goes on.**
----
----
**Danger! Warning! Peligro! Achtung! Watch it!**
As you would probably imagine, subroutine order matters when prototyping. A subroutine call must call a subroutine defined previously. The danger lies in the fact that if you do not, you get a non-obvious runtime error, not a compile error.
[[http://www.troubleshooters.com/codecorn/littperl/perlfuncorder.htm|SUBROUTINE ORDER MATTERS IN PROTOTYPING]]
----
**Bare Bones Subroutine Syntax**
|Old school, no prototyping||
|Calling the subroutine|Constructing the subroutine|
|''&mysub();''|''sub mysub''\\ '' {''\\ '' }''|
Note that in the above the ampersand (&) is used before the subroutine call, and that no parentheses are used in the function definition.
|Prototyping, no arguments||
|Calling the subroutine|Constructing the subroutine|
|''mysub();''|''sub mysub()''\\ '' {''\\ '' }''|
The preceding is prototyped. Note that there is no ampersand before the function. Note also that the function definition has parentheses, but because there are no args expected those parens are empty. Contrast that with the following, which expects two scalars. Experiment and note that Perl gripes when your prototype and call don't match.
|Prototyping, two string arguments||
|Calling the subroutine|Constructing the subroutine|
|''mysub($filename, $title);''|''sub mysub($$)''\\ '' {''\\ '' }''|
**Returning a Scalar**
Use the return statement.
|Calling the subroutine|Constructing the subroutine|
|''my($name) = &getName();''\\ ''print "$name\n";''\\ ''# Prints "Bill Clinton"''|''sub getName''\\ '' {''\\ '' return("Bill Clinton");''\\ '' }''|
NOTE: In C++ there are cases where the calling code can "reach into" the function via the returned pointer or reference. This is appearantly not true of passed back scalars. Check out this code:
$GlobalName = "Clinton";
sub getGlobalName
{
return($GlobalName);
}
print "Before: " . &getGlobalName() . "\n";
$ref = \&getGlobalName();
$$ref = "Gore";
print "After: " . &getGlobalName() . "\n";
#All print statements printed "Clinton"
I have been unable to hack into a subroutine via its scalar return. If you know of a way it can be done, please [[http://www.troubleshooters.com/email_steve_litt.htm|let me know]], as this would be a horrid violation of encapsulation.
**Returning a List**
|Calling the subroutine|Constructing the subroutine|
|''my($first, $last) = &getFnameLname();''\\ ''print "$last, $first\n";''\\ ''# Prints "Clinton, Bill"''|''sub getFnameLname''\\ '' {''\\ '' return("Bill", "Clinton");''\\ '' }''|
**Returning a Hash**
|Calling the subroutine|Constructing the subroutine|
|''my(%officers) = &getOfficers();''\\ ''print $officers{"vice president"};''\\ ''# prints Al Gore''|''sub getOfficers''\\ '' {''\\ '' return("president"=>"Bill Clinton",''\\ '' "vice president"=>"Al Gore",''\\ '' "intern"=>"Monica Lewinsky"''\\ '' );''\\ '' }''|
**Subroutine With Scalar Input/Output Arguments**
Arguments to a subroutine are accessible inside the subroutine as list @_. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.
For readability therefore, on output or input/output arguments it is therefore important to use the output argument as $_[] or @_ throughout the function to let the reader know it's an output argument.
Below is how to change the value of an argument outside the function.
|Calling the subroutine|Constructing the subroutine|
|''my($mm, $dd, $yyyy) = ("12", "10", "1998");''\\ ''print "Before: $mm/$dd/$yyyy\n";''\\ ''**&firstOfNextMonth($mm, $dd, $yyyy);**''\\ ''print "After : $mm/$dd/$yyyy\n";''\\ ''# Second print will print 01/01/1999''|''sub firstOfNextMonth''\\ '' {''\\ '' $_[1] = "01";''\\ '' $_[0] = $_[0] + 1;''\\ '' if($_[0] > 12)''\\ '' {''\\ '' $_[0] = "01";''\\ '' $_[2]++;''\\ '' }''\\ '' }''|
By the way, the above is an excellent example of the advantages of a loosely typed language. Note the implicit conversions between string and integer.
**Subroutine With Scalar Input-Only Arguments**
Arguments to a subroutine are accessible inside the subroutine as list @_. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.
For readability, it is therefore important to immediately assign the input-only arguments to local variables, and only work on the local variables.
Below is how to print changed values without changing the arguments outside the functions scope.
|Calling the subroutine|Constructing the subroutine|
|''my($mm, $dd, $yyyy) = ("12", "10", "1998");''\\ ''print "Before: $mm/$dd/$yyyy\n";''\\ ''**&printFirstOfNextMonth($mm, $dd, $yyyy);**''\\ ''print "After : $mm/$dd/$yyyy\n";''\\ ''# Before and after will print 12/10/1998.''\\ ''# Inside will print 01/01/1999''|''sub printFirstOfNextMonth''\\ '' {''\\ '' my($mm, $dd, $yyyy) = @_;''\\ '' $dd = "01";''\\ '' $mm = $mm + 1;''\\ '' if($mm > 12)''\\ '' {''\\ '' $mm = "01";''\\ '' $yyyy++;''\\ '' }''\\ '' print "Inside: $mm/$dd/$yyyy\n";''\\ '' }''|
**Subroutine With List Input/Output Arguments**
Arguments to a subroutine are accessible inside the subroutine as list @_, which is a list of scalars. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.
For readability therefore, on output or input/output arguments it is therefore important to use the output argument as $_[] or @_ throughout the function to let the reader know it's an output argument.
If a member of @_ (in other words, an argument) is a reference to a list, it can be dereferenced and used inside the subroutine.
Here's an example of a listcat() function which appends the second list to the first. From that point forward the caller will see the new value of the first argument:
|Calling the subroutine|Constructing the subroutine|
|''my(@languages) = ("C","C++","Delphi");''\\ ''my(@newlanguages) = ("Java","Perl");''\\ ''print "Before: @languages\n";''\\ ''&listcat(\@languages, \@newlanguages);''\\ ''print "After : @languages\n";''\\ ''# Before prints "C C++ Delphi"''\\ ''# After prints "C C++ Delphi Java PERL"''|''sub listcat''\\ '' {''\\ '' # Purpose of @append is only to''\\ '' # self-document input-only status''\\ '' my(@append) = @{$_[1]};''\\ '' my($temp);''\\ '' foreach $temp (@append)''\\ '' {''\\ '' # note direct usage of arg0''\\ '' push(@{$_[0]}, $temp); ''\\ '' }''\\ '' }''|
By the way, the above is an excellent example of the advantages of a loosely typed language. Note the implicit conversions between string and integer.
**Subroutine With List Input-Only Arguments**
Arguments to a subroutine are accessible inside the subroutine as list @_. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.
For readability, it is therefore important to immediately assign the input-only arguments to local variables, and only work on the local variables.
If a member of @_ (in other words, an argument) is a reference to a list, it can be dereferenced and used inside the subroutine.
Here's an example of an improved listcat() function which appends the second list to the first without affecting the first outside the subroutine. Instead, it returns the total string.
|Calling the subroutine|Constructing the subroutine|
|''my(@languages) = ("C","C++","Delphi");''\\ ''my(@newlanguages) = ("Java","PERL");''\\ ''print "Before: @languages\n";''\\ ''print "Inside: ";''\\ ''print &listcat(\@languages,\@newlanguages);''\\ ''print "\n";''\\ ''print "After : @languages\n";''\\ ''# Before and after prints "C C++ Delphi"''\\ ''# Inside prints "CC++DelphiJavaPERL"''|''sub listcat''\\ '' {''\\ '' # Purpose of @append is only to''\\ '' # self-document input-only status''\\ '' my(@original) = @{$_[0]};''\\ '' my(@append) = @{$_[1]};''\\ '' my($temp);''\\ '' foreach $temp (@append)''\\ '' {''\\ '' push(@original, $temp); # note direct usage''\\ '' }''\\ '' return(@original);''\\ '' }''|
**Use parentheses with the shift command!**
The following generates an error:
sub handleArray
{
my(@localArray) = @{shift};
my($element);
foreach $element (@localArray) {print $element . "\n";}
}
&handleArray(\@globalArray);
But once you place the shift command in parens, everything's fine:
sub handleArray
{
my(@localArray) = @{(shift)};
my($element);
foreach $element (@localArray) {print $element . "\n";}
}
&handleArray(\@globalArray);
**Using prototypes**
Be careful prototyping with lists:
sub printList(@$) {print @{(shift)}; print shift; print "\n";};
printList(\@globalArray);
The preceding gives some runtime warnings. But the call is missing an arg -- it shouldn't run at all. Instead, use \@ for the list in the prototype, and pass just the list in the call, as follows:
sub printList(\@$) {print @{(shift)}; print shift; print "\n";};
printList(@globalArray);
Now it gives you a "not enough arguments errors, and ends with a compile error, which is what you want. Place an additional scalar in the call so the call matches the prototype, and it runs perfectly:
sub printList(\@$) {print @{(shift)}; print shift; print "\n";};
printList(@globalArray, "Hello World");
Remember, using an unbackslashed @ in the prototype defeats the purpose of prototyping. Precede the @ with a backslash. Note that this is also true for passed hashes (%). Unless you have a very good reason to do otherwise, precede all @ and % with backslashes in the prototype.
**Subroutine With Hash Input/Output Arguments**
Arguments to a subroutine are accessible inside the subroutine as list @_, which is a list of scalars. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.
For readability therefore, on output or input/output arguments it is therefore important to use the output argument as $_[] or @_ throughout the function to let the reader know it's an output argument.
If a member of @_ (in other words, an argument) is a reference to a hash, it can be dereferenced and used inside the subroutine.
Here's an example of a setGlobals() function which takes an existing %globals passed in as a reference argument and sets the proper elements. From that point forward the caller will see the new value of the elements:
|Calling the subroutine|Constructing the subroutine|
|''%globals; ''\\ ''&setGlobals(\%globals);''\\ ''&printGlobals(\%globals);''|''sub setGlobals''\\ '' {''\\ '' ${$_[0]}{"currentdir"} = "/corporate/data";''\\ '' ${$_[0]}{"programdir"} = "/corporate/bin";''\\ '' ${$_[0]}{"programver"} = "5.21";''\\ '' ${$_[0]}{"accesslevel"} = "root";''\\ '' }''|
**Subroutine With Hash Input-Only Arguments**
Arguments to a subroutine are accessible inside the subroutine as list @_. Any change the subroutine performs to @_ or any of its members like $_[0], $_[1], etc, are changes to the original argument. HOWEVER, assigning @_ or its elements to other variables makes a separate copy. Changes to the separate copy are unknown outside of the subroutine.
For readability, it is therefore important to immediately assign the input-only arguments to local variables, and only work on the local variables.
If a member of @_ (in other words, an argument) is a reference to a list, it can be dereferenced and used inside the subroutine.
Here's an example of an improved listcat() function which appends the second list to the first without affecting the first outside the subroutine. Instead, it returns the total string.
|Calling the subroutine|Constructing the subroutine|
|''%globals;''\\ ''# ...''\\ ''# set globals''\\ ''# ...''\\ ''# now print globals''\\ ''&printGlobals(\%globals);''|''sub printGlobals''\\ '' {''\\ '' # copy of argument precludes extra-scope change''\\ '' my(%globals) = %{$_[0]};''\\ '' print "Current Dir: $globals{'currentdir'}\n";''\\ '' print "Program Dir: $globals{'programdir'}\n";''\\ '' print "Version : $globals{'programver'}\n";''\\ '' print "Accesslevel: $globals{'accesslevel'}\n";''\\ '' }''|
**Dereferencing in Place: The **''**->**''** Operator**
By FAR the easiest way to handle references, especially when they're being passed into and out of subroutines, is the ''->'' operator. This operator works the same as it does in C. It means "element so and so of the dereferenced reference". This is ABSOLUTELY vital when using objects, because most Perl objects are references to a hash. Nest a few of those, and without the ''->'' operator you're dead meat. The ''->'' operator also enables you to easily modify arguments in place, which is vital in typical OOP applications.
One typical usage is an object containing a list of hashes. The list of hashes could easily represent a data table, with array elements being rows (records) and hash elements being columns (fields). Here's how it's easily done in Perl:
#!/usr/bin/perl -w
use strict;
package Me;
sub new
{
my($type) = $_[0];
my($self) = {};
$self->{'name'} = 'Bill Brown';
### Make a reference to an empty array of jobs
$self->{'jobs'} = [];
### Now make each element of array referenced by
### $self->{'jobs'} a REFERENCE to a hash!
$self->{'jobs'}->[0]={'ystart'=>'1998','yend'=>'1999','desc'=>'Bus driver'};
$self->{'jobs'}->[1]={'ystart'=>'1999','yend'=>'1999','desc'=>'Bus mechanic'};
$self->{'jobs'}->[2]={'ystart'=>'1999','yend'=>'2001','desc'=>'Software Developer'};
bless($self, $type);
return($self);
}
### showResume is coded to show off the -> operator. In real
### life you'd probably use a foreach loop, but the following
### while(1) loop better demonstrates nested -> operators.
sub showResume
{
my($self)=$_[0];
print "Resume of " . $self->{'name'} . "\n\n";
print "Start\tEnd\tDescription\n";
my $ss = 0;
# Loop through array referenced by $self->{'jobs'},
# and for each subscript, print the value corresponding
# to the hash key. In other words, print every field of
# every record of the jobs array
while (1)
{
last unless defined $self->{'jobs'}->[$ss];
print "$self->{'jobs'}->[$ss]->{'ystart'}\t";
print "$self->{'jobs'}->[$ss]->{'yend'}\t";
print "$self->{'jobs'}->[$ss]->{'desc'}\n";
$ss++;
}
}
package Main;
my $me = Me->new();
$me->showResume();
print "\nFirst job was $me->{'jobs'}->[0]->{'desc'}.\n";
I think you'll agree that the reference nesting in the preceding code would have been extremely hard to understand without the in-place dereferencing provided by the ''->'' operator. The following is the resulting output:
[slitt@mydesk slitt]$ ./test.pl
Resume of Bill Brown
Start End Description
1998 1999 Bus driver
1999 1999 Bus mechanic
1999 2001 Software Developer
First job was Bus driver.
[slitt@mydesk slitt]$
[[ http://www.troubleshooters.com/codecorn/littperl/perlsub.htm#References | Copied From ]]