RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


PerlNET: An Introduction

Perl is a language that has been around for a while and is one of the most popular open source languages among system administrators, Web developers and the research community. Meanwhile, Microsoft's .NET technology, which is comprised of a framework and set of tools, was recently released for creating sophisticated applications. Is it possible to have any connection between these two different worlds? Yes it is! Perl is now a .NET language. This is the first of a two-part series written to introduce and explore the tools and technologies that are giving Perl and .NET a new dimension.

erlNET, the technology that is part of ActiveState's Perl Development Kit, provides .NET access to thousands of Perl modules written over years. This opens up the feature rich .NET Framework with Perl.

Every once in a while a language comes along that gains the hype of the programming literati. In recent years, Java and now C# have stolen many of the headlines. However, for the past twenty years, there has been a programmer's diamond in the rough: Perl. When Larry Wall created this language in the early 1980's, it quickly became the language of choice for system administrators. With the growth of the Internet and, in particular, dynamic content, it has been the language of choice for Common Gateway Interface (CGI) applications. Now, with the introduction of .NET from Microsoft and PerlNET from ActiveState, Perl has become one of the standard .NET languages. Perl is an Open Source Language; however, PerlNET is also a commercially licensed technology. In this and the second article of this series, I'll explore this interesting transformation of Perl into a .NET language.

Let's begin by first demonstrating the features of Perl that make it such a gem. Two of these features are: regular expressions and associative arrays (hashes). Through a series of examples, I'll illustrate these and other features (you can download the source code for this article from the link in the References column. A hash is a data structure that allows a string to be used as an index. This data structure enables the programmer to avoid the cost of a table lookup. A regular expression allows a programmer to perform pattern matching. This technique is the cornerstone of data validation and other user interface queries.

This Camel Gets You a Long Way!
(See: What Does a Camel Have to Do with Perl?)
Perl has three different variable types: scalars, arrays and hashes.
Perl is a weakly typed language although there is support for three different variable types: scalars, arrays and hashes. A scalar variable stores any single value whether it is a number, a string, a paragraph or an entire file. A scalar variable begins with a $ symbol. An array stores a collection of scalars. An array variable begins with a @ symbol. To select an array element, you use the [ ] notation. A hash stores a collection of key-value pairs and begins with a % symbol. A value is selected from a hash by using the { } subscript notation. Listing 1 summarizes these simple ideas. Anything from the # to the end of a line is a comment. On a lighter note, what would you call a variable that is declared as %brown?

Perl comes with a plethora of built in functionality. In Listing 1, I used the print function and the sort function. The \n sequence represents the new line character and must be double quoted.

As you learn Perl, you will become increasingly amazed at the economy of expression in the language. For example, you can print lines typed at the keyboard with:

   while($line = <STDIN>)  #$line gets input
      print $line;         #print it

The phrase <STDIN> allows Perl to read a line at a time from the keyboard. Since this is a loop, each line is read and tucked away in the scalar $line. If you just say:

   while(<STDIN>)   # $_ gets input
      print;      # print $_

Then by default, the special Perl variable $_ holds the line that was just read. Also, $_ is printed by default. Even though the {} notation is necessary above, the following form of the while allows you to eliminate them:

   print while(<STDIN>);

You can just as easily read from files, but first you have to create your own file handle:

   open(MYFILE, "< text");     # open for input
   print while(<MYFILE>);

You can also create your own output file:

   open(MYFILE, "< text");     # open for input
   open("OUTPUT", " > out");   # open for output
   print OUTPUT while(<MYFILE>);

OK. That's enough file manipulation. Now let's see what's so special about hashes. Hashes have utility in a surprisingly large number of applications. For example, let's try to count the occurrences of each word in a document. For simplicity, consider a word to be anything surrounded by white space. See Listing 2 for the solution to this problem.

A little explanation is necessary here. There is a Perl subroutine on lines 1 through 4 that is called on line 15. The while loop reads lines from the standard input and converts them to lower case. The split function splits $_ on white space and returns an array consisting of the words on this line. The foreach on line 9 loops over each word on the line and executes the following statement for each word:


This statement simply adds one to the value for each word encountered in the input. The loop on line 15 is a bit tricky.

   sort bysize keys(%counts)

This expression sorts the keys using a special sort subroutine shown on lines 1 through 4. This subroutine compares two keys and swaps them based on their values. If you run that Perl code on all the text in this article through this paragraph you would find the last few lines of output to be.

   of   24
   a   41
   the   49

Let's leave hashes and take a look at Perl's regular expression capability. Suppose you ask the user of your program to enter lines consisting of a name and a number and nothing else. Let's further suppose that there can be spaces or tabs between the name and the number. This is a perfect example of where a regular expression can be used, as shown in Listing 3.

In Perl, regular expressions are enclosed within a pair of slashes. In Listing 3, the regular expression is compared to $_. In order to understand the regular expression, let's separate it into its component parts:

   ^      begins with
   [A-Z]      upper case character
   [a-z]+   one or more lower case characters
   [ \t]+   one or more blanks and tabs
   \d+      one or more digits
   $      ends with

Taken all together, this pattern matches a string if the string begins with an upper case character followed by one or more lower case characters followed by one or more spaces and/or tabs and ends with one or more digits.

The parentheses do not play any role in the match but, if there is a match, the portion of the match enclosed within each set of parentheses is remembered and stored in the special variables $1 and $2, etc.

Lots of system administrators love Perl because it has all the tools necessary to produce reports about resource use, such as users and files. The next example demonstrates a program that receives an integer representing a certain number of days. The program produces a listing of all the files that have been modified within that many days, as shown in Listing 4.

The first two lines are for error checking. The special array @ARGV contains the list of arguments from the command line. Each array has a variable $#arrayname associated with it that gives the subscript of the last element. In this case, if one argument is provided, then the highest subscript will be zero. Thus, if $#ARGV is not zero, the program terminates. The program also terminates if the opendir call fails. Otherwise, line 3 uses the readdir function to read all the files in the current directory. The loop on line 4 loops through all these files using the—f file inquiry operator to eliminate directory files. The—M operator determines the date in which this file was last modified. Finally, if this time is less than what was supplied on the command line, print the name of this file.

As you have seen, Perl uses a lot of built-in functions. It's also very simple to write your own function. I actually used one earlier. Perl functions have an elegant behavior when it comes to argument passing. You can pass a variable number of arguments to any programmer-written function. All of the arguments are collected into the special array @_. Listing 5 demonstrates this feature. It's also worth mentioning that all Perl functions return a value, the last expression evaluated in the function. If you don't want to use the returned value, you can just ignore it.

On line 3, the my function localizes the variable $total to make sure it does not collide with the same named variable elsewhere in the program. The function compute_mean collects all of the parameters in the special array @_. The foreach on lines 4-7 sums them together. Line 8 uses the array @_ in a scalar context, which forces Perl to treat @_ as a number, that is, the size of the array. Note that the keyword return on line 8 is not necessary.

The variable nature of Perl functions is not always a blessing. For example, if you wanted to send several arrays to a function and return an array containing the sum of the elements in each array, there would not appear to be a way to determine where each array began and where each array ended. To solve problems like this, you need to know about references.

A reference is a scalar containing the address of another variable. In short, a reference is a pointer. Use the \ operator to take the address of a variable and either the $, @ or % operator to de-reference it, depending on what it is referring to. Consider the following code snippet:

   @data = (80, 50);   # create an array
   $ref = \@data;      # take its address
   print "@data\n";    # print array directly
   print "@$ref\n";    # ...indirectly through $ref
   print "$data[0]\n"; # print 0th element directly
   print "$$ref\n";  # print 0th element indirectly

Using the simple concepts above, you can now write a Perl subroutine that returns the sums of individual arrays sent to it, as shown in Listing 6.

On line 18, three references are sent to the sums function. Each time the loop on line 4 is executed, $value will hold one of these array references. On line 7, this value must be dereferenced in order to get at the actual values in the array being referenced. On line 11, the push function is used to push the sum (for the array being processed) to the @answers array. This underscores the fact that all Perl arrays are dynamic and can grow or shrink to meet programming demands. Finally, line 13 forces the evaluation of the @answers array so it can be returned. In Perl, the result of the last evaluated expression becomes the return value of the subroutine.

So far, I have shown some of Perl's power, its economy of expression, hashes, regular expressions, file inquiry operators and built in subroutines. Now I'll move on to explore Perl as a .NET language.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date