like working in high-level languages like Lisp and Ruby, but finding one that produces the small native standalone applications that I sometimes need to build is not easy. The two languages I have found useful in the past are Scheme, a portable Lisp dialect that is often used in computer science classes, and Haskell. However, in the past year I discovered the Gambit-C Scheme system, which I have used ever since to build my small native applications. When combined with an effective Emacs-based development environment, Gambit-C Scheme provides a high-level dynamic language with good runtime and memory performance, as well as good deployment options.
In this article, I provide a quick introduction to Scheme and cover the issues involved with using it to build native applications on Linux and OS X (there is a Windows installer for Gambit-C and my instructions should also get you going if you have mingw installed). I conclude with two simple application examples: extracting information from text files and writing a mini web service that can support REST clients. Each application is self-contained in a single executable file (click here to download the source code).
This article is for developers who already know some Scheme and want to use it to create small, efficient standalone executables, as well as for those developers who don’t yet know the language. The developers with Scheme knowledge can skim the tutorial information and concentrate on the details for building executable programs. The Scheme newbies will find enough introductory information to be able to understand the example programs.
A Crash Course in Gambit-C
Before following along with the rest of the article, you first need to install Gambit-C. If you are using OS X or Windows, use the corresponding Gambit-C installers. Linux has no installer, but you can easily install Gambit-C from source:
tar xvfz gambc-v4_4_4-devel.tgzcd gambc-v4_4_4-devel./configuremakesudo make install
Now, you are ready for an introduction to Gambit-C Scheme, which covers just enough Scheme so that you can understand the two examples towards the end. For more in-depth coverage, see Sidebar 1. Scheme Learning Resources.
The Interpreter (gsi) and Compiler (gsc)
The examples use two command-line utilities: the Gambit Scheme interpreter (gsi) and the Gambit Scheme compiler (gsc). You can use the define Scheme special form to create both global variables and functions. Here are some examples in an interactive gsi session where I define a variable x, define a function double, and show how to recover from a runtime error:
$ gsiGambit v4.4.4 > (define x 123)> x123> (define (double x) (+ x x))> (double 4)8> (double 3.14159)6.28318> (double "cat")*** ERROR IN (console)@8.1 -- (Argument 1) NUMBER expected(+ "cat" "cat")1> >
The prompt 1> indicates that gsi recognized a runtime error and is now in a nested REPL. I recovered from this error by typing Ctrl-D, which brought back the usual > prompt.
While you can experiment with Scheme and play with the examples in this article using the Gambit-C Scheme interpreter gsi, I recommend using Emacs with the Gambit-C Emacs support.
The next example shows how to use if statements and illustrates that Scheme is a free-format language?you could put an entire program on one very long line!
> (if (equal? x 123) (display "OK ") (display "what? "))OK> (if (equal? x 1567) (display "OK ") (display "what? "))what?>
Notice that it makes no difference whether you put the if statement all on one line or use three lines to make it more readable. If statements require two arguments and usually are called with three arguments. The first is a Boolean expression. If this expression has a true value at runtime, then the program executes the second argument (an expression). If the program evaluates the first argument to false and you supply a third argument, then the program evaluates the third argument. If you want multiple statements to be evaluated in an if expression, you can use a let block to group multiple expressions into a single expression (that is nested).
Next, let’s look at using the let statement to define two local variables and print their sum:
> (let ((i 1) (j 2)) (display (+ i j)) (newline))3>
Here, I and j are local variables that are defined only inside of the let statement. Notice the two statements after the two variables are defined. Let statements can contain any number of statements; it is not unusual to see nested let special forms and nested functions.
The order in which I and j are defined in the previous example is undefined. If you want to both know the order in which variables are defined and use early variables to define variables declared later in the let special form, then use let*:
> (let* ((x 1) (y (+ x 100))) (display (+ x y)) (newline) (set! y (+ y 1)) (display (+ x y)) (newline))102103>
The above example uses set! to change the value of the local variable j. By convention in Scheme, functions that end with a ! character modify the state of their first argument.
Lists and Vectors
The program examples for this article use both list and vector data structures. A list is optimized, so performing any of the following three operations is computationally inexpensive:
- Adding an element at the beginning of the list
- Deleting an element in the middle of a list
- Adding an element anywhere in a list
Internally, using linked lists makes these operations inexpensive as well. However, if you want to find the element number N of a list, you need to start at the beginning and then follow N-1 links. Alternatively, vectors are much more efficient than lists for accessing random elements in a sequence (A vector is like an array in other programming languages). However, vectors don’t have a good property that makes it faster to insert and delete elements in a list.
Scheme has a special syntax for declaring list and vector literals. Here are some examples:
> (define list8 '(1 2 3 "cat" 'dog 6 7 8)) ; define a list> (car list8) ; car returns the first item in a list1> (cdr list8) ; cdr returns all but the first element of a list(2 3 "cat" 'dog 6 7 8)> (list-ref list8 3) ; list-ref inefficiently finds a list element at the specified index"cat"> (define vector5 '#(1 2 3 "cat" 'dog)) ; define a vector> (vector-ref vector5 2) ; get the element at index 2 (note that indices are zero based)3> (vector-set! vector5 2 2222) ; change the value of the element at index 2> vector5#(1 2 2222 "cat" 'dog)> (define vec (make-vector 10)) ; create another vector with 10 elements> vec#(0 0 0 0 0 0 0 0 0 0)> (vector-set! vec 4 '(1 "a test" 2)) ; an element in a vector can be any Lisp value> vec#(0 0 0 0 (1 "a test" 2) 0 0 0 0 0)>
Scheme supports closures. A closure captures the environment as it is when a function is defined. Consider this code example that references a “free variable”:
> (define (next-value) (set! count (+ count 1)) count)> (next-value)*** ERROR IN next-value, (console)@84.22 -- Unbound variable: count1> > (define count 0)> (next-value)1> (define count 0)> (next-value)1> (next-value)2> (next-value)3>
In the definition of next-value, the variable count is a free variable. The first time I called (next-value) no value was available for count because it had not yet been defined.
As with most modern programming languages, Scheme functions can call themselves recursively. Here is the famous factorial function that returns a value of 1 if its argument is less than 2. Otherwise, it returns the product of its argument times the value of calling itself with one fewer than its argument’s value:
> (define (fact n) (if (< n 2) 1 (* n (fact (- n 1)))))> (fact 5)120>
It is normal to use recursion for looping/iteration in Scheme.
Macros for Iterating
Most Scheme implementations include macro definitions for two convenient ways to iterate:
- Over a range of numbers
- Over the elements in a list
Here is an example:
> (dotimes (i 4) (display i) (newline))0123> (dolist (x '("cat" "dog" 3.14159)) (display x) (newline))catdog3.14159>
Gambit-C does not supply implementations of these macros, but the sample program proper-names.scm that I discuss later provides implementation examples for both. (This source file is included in the downloadable source code for this article.)
The topic of defining macros is outside the scope of this article.
Defining Anonymous Functions
The final lesson in this introduction is demonstrating how to define anonymous functions using lambda:
> (define myfunction (lambda (x) (+ x x x)))> (myfunction 1)3> (myfunction 10)30> (define func2 myfunction) ; function values are "first class" and can be stored in variables> (func2 2.5)7.5>
Scheme uses #t and #f to represent Boolean true and false values:
> (equal? 1 1)#t> (equal? 1 "cat")#f>
The “Hello Command Line” Example
The previous section explained how to define functions in Scheme. This section demonstrates how to access command-line arguments and generate a test standalone executable. The following code snippet uses the function print-arg-list, which contains at least two surprises if this is your first time using Scheme:
- The function calls itself recursively (as did the function fact you saw in the last section) to implement a loop over a list of argument values.
- It uses another function that is defined inside its definition.
(define (print-arg-list arg-list) (define (print-arg arg) ; define the inner utility function print-arg (display "Next command-line-argument: ") (display (car arg-list)) (newline)) ;; this is the start of the function print-arg-list: (if (pair? arg-list) ; check that arg-list is a list with at least one argument (let () (print-arg (car arg-list)) (print-arg-list (cdr arg-list))))) ; make a recursive call
The only reason that I split out the separate inner function print-arg was to show you an example of nested functions. Nesting function definitions like this is a good technique when a small utility function is used by only one other function. The inner function is not visible outside the scope of the outer function. Using a nested function makes sense when it is used more than once in an outer function definition.
When I am interactively coding, I write and debug small functions as I write them to catch mistakes earlier. In the case of nested functions, I usually write and debug them before copying them into the definition of another function that uses them.
You can test the function print-arg-list using gsi:
> (print-arg-list '("-i" "input.dat" "-o" "output.dat" 1 2 3))Next command-line-argument: -iNext command-line-argument: input.datNext command-line-argument: -oNext command-line-argument: output.datNext command-line-argument: 1Next command-line-argument: 2Next command-line-argument: 3>
You have seen that top-level Scheme expressions typed in gsi are evaluated immediately:
> (display "Hello ")Hello>
If you put top-level Scheme expressions in a Scheme source file and compile it to a native application, then these top-level expressions are also immediately executed when the application runs.
The gsc compiler generates a separate C file ending with _.c. The generated .c files contain a main C function. For the “Hello Command Line Arguments” test program, I use the print-arg-list function and the following expression:
(let* ((args (command-line)) (in-file (member "-i" args)) (out-file (member "-o" args)) (arg-hash-table (make-table))) ;; let's start by printing the command line arguments: (print-arg-list args) ;; check to see if an argument is "-help": (if (member "-help" args) (print-help)) ;; now print out input and output file names, if they are supplied: (if (and in-file (> (length in-file) 1)) (let () (display "Input file: ") (display (cadr in-file)) (newline))) (if (and out-file (> (length out-file) 1)) (let () (display "Output file: ") (display (cadr out-file)) (newline))))
This expression uses the built-in Gambit-C specific function command-line, which returns command-line arguments as a list of strings. It also uses the Scheme function member, which takes two arguments: an expression and a list of expressions. The expression searches the list for an element equal to the first argument.
Here are some examples of member:
> (define test-list '(1 2 3 4 5 "cat" 3.14159 100 101))> (member 3 test-list)(3 4 5 "cat" 3.14159 100 101)> (member "cat" test-list)("cat" 3.14159 100 101)> (member "not in list" test-list)#f>
Note that member returns a sub-list starting with the matching list element. The function length returns the number of elements in a list:
> (length test-list)9
An example program you will develop later requires you to process command arguments.
Creating an Executable File
When creating standalone executables, be aware that the Gambit-C gsc compiler compiles Scheme code to C; the generated C code is compiled and linked against the Gambit-C libraries. The location of these files is different for OS X, Linux, and Windows. When I started writing this article, using version 4.4.4 of Gambit-C, compiling and linking Scheme code into a native executable took several steps and was platform dependent because you had to know the location of the include files and library files for Gambit-C. With the release of Version 4.5.0, the compiler has a new option, -exe, that removes the platform dependencies and allows you to compile to C, compile the generated C code, and link it all with one command.
The code directory hello-command-line-args in the source code download contains both the source file hello-command-line.scm and a makefile. Here is a listing of the makefile:
clean: rm -f *~ rm -f */*~ rm -f #* rm -f */#* rm -f *.c rm -f hello-command-lineapp: gsc -exe hello-command-line.scm
The make target clean removes all files from the working directory except for the Scheme source file and the makefile. The make target app builds a standalone executable with no other dependencies. Here is an attempt at running the hello command-line executable:
markw$ ./hello -helpNext command-line-argument: ./helloNext command-line-argument: -helpHello Command Line (native) command line arguments: -help -- to print help message -i -- to define the input file name -o -- to specify the output file namemarkw$ ./hello "this is a test" 1 2 3Next command-line-argument: ./helloNext command-line-argument: this is a testNext command-line-argument: 1Next command-line-argument: 2Next command-line-argument: 3markw$ ./hello -i input.dat -o output.datNext command-line-argument: ./helloNext command-line-argument: -iNext command-line-argument: input.datNext command-line-argument: -oNext command-line-argument: output.datInput file: input.datInput file: output.dat
The next section demonstrates another example that processes input text files.
Find Names in Input Text
I use Gambit-C Scheme for the development of a commercial product that extracts semantic information from plain text. The example in this section is a much-simplified version that extracts only people’s names (and does so with a simple algorithm so the example will be easy to understand). The source code download for this article contains the file proper-names.scm, which contains the complete example.
The example uses lists, vectors, and hash tables. I’ll show you the basics of using hash tables while going through the code.
I start by defining hash tables for first names and last names. In the following code snippet, notice that I use * characters at the beginnings and endings of global variables. Doing so makes it easier to recognize global variables in code. I will use the function init-hashes (defined in the file people-data-tables.scm) to fill these two hash tables:
(define *first-name-hash* #f)(define *last-name-hash* #f)
The Scheme function for getting a value from a hash table is table-ref, which takes two required arguments and an optional third argument. The first argument is a hash table, the second argument is the key value that you are looking up in the hash table, and the third optional argument is the desired return value if the key is not in the hash table. The file people-data-tables.scm, which was automatically generated from a linguistic database that I use, calls the table-set! function to add key/value pairs to a hash table. This generated file contains only about 5000 names. I generated it requesting only frequently used names.
In addition to hash tables containing first and last names, I use a list containing commonly used name prefixes:
(define *name-prefix-list* '("Mr" "Mrs" "Ms" "Gen" "General" "Maj" "Major" "Doctor" "Vice" "President" "Lt" "Premier" "Senator" "Congressman" "Prince" "King" "Representative" "Sen" "St" "Dr"))
The main function in this example is find-human-names, which takes a vector of strings as an argument and returns a list of strings that are the names found in the input vector. If the input argument is a string, then it is tokenized into a vector of individual words. The example handles names containing four words, three words, two words, and one word. It starts looking for longer names first, and when it finds one, it skips forward to the end of the name before looking for another name.
Here is the beginning of the function definition with only the code for finding two word names:
(define (find-human-names word-vector) (if (not *first-name-hash*) ; load the first and last name hash table data the first time (let () ; this function is called (set! *first-name-hash* (make-table size: 10000)) (set! *last-name-hash* (make-table size: 1000)) (init-hashes *first-name-hash* *last-name-hash*))) ;; if word-vector is a string, then tokenize it: (if (string? word-vector) (set! word-vector (list->vector (string-tokenize word-vector)))) ;; loop over each word in word-vector, checking to see if it is part of a name: (let ((ret '()) (ret3 '()) (x '()) (len (vector-length word-vector)) (word #f) (found-at-index #f)) (dotimes (i len) (set! found-at-index #f) (set! word (vector-ref word-vector i)) ;; process 2 word names: (if (and (not found-at-index) (< i (- len 1))) (if (or (and (member word *name-prefix-list*) ; see is word is a prefix like "Mr" (table-ref *last-name-hash* (vector-ref word-vector (+ i 1)))) ;
is word at index+1 a last name? (and (table-ref *first-name-hash* word) ; is word a first name? (table-ref *last-name-hash* (vector-ref word-vector (+ i 1))))) ;
is word at index+1 a last name? (let () ; yes, we found a name, so store the indices of the beginning
and ending of the name: (set! ret (cons (list i (+ i 2)) ret)) (set! i (+ i 1)) (set! found-at-index #t))))
The indices for new names are added to the beginning of the list ret (I will reverse the order later). Getting strings for names found in the text (without any duplicates) only takes about 10 lines of code. The code will be easier to understand after you review example results from calling this function. Here is what the contents of the list ret looks like for this example:
> (load "people-data-tables.scm")"/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/people-data-tables.scm"> (load "proper-names.scm")"/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/proper-names.scm"> (find-human-names '#("President" "Bush" "went" "to" "San" "Diego" "to" "meet" "Ms" "." "Jones" "and"
"Gen" "." "Pervez" "Musharraf" "." "President" "Bush" "played" "golf" "."))("President Bush" "Ms. Jones" "Gen. Pervez Musharraf")>
Notice that the duplicate name “President Bush” is removed. After the input word vector has been scanned for four-, three-, two-, and one-word names, the list ret that contains starting and ending indices looks like this:
((17 19) (12 16) (8 11) (0 2))
For debugging, you can use the Scheme trace function to “instrument” any other function so its arguments and return value are printed each time the instrumented function is called:
> (trace find-human-names)> (find-human-names '#("President" "Bush" "went" "to" "San" "Diego" "to" "meet" "Ms" "." "Jones" "and"
"Gen" "." "Pervez" "Musharraf" "." "President" "Bush" "played" "golf" "."))| > (find-human-names '#("President" "Bush" "went" "to" "San" "Diego" "to" "m...| ("President Bush" "Ms. Jones" "Gen. Pervez Musharraf")("President Bush" "Ms. Jones" "Gen. Pervez Musharraf")> (define (fact n) (if (< n 2) 1 (* n (fact (- n 1)))))> (fact 5)120> (trace fact)> (fact 5)| > (fact 5)| | > (fact 4)| | | > (fact 3)| | | | > (fact 2)| | | | | > (fact 1)| | | | | 1| | | | 2| | | 6| | 24| 120120>
The code uses the input word vector (the list of starting/ending indices is listed below) and the function map. A pair of short examples in the following code demonstrate two ways to use map.
> (define a-list '(1 2 3 4 5 6 7))> (map (lambda (n) (+ n 1)) a-list)(2 3 4 5 6 7 8)> (let ((sum 0)) (map (lambda (n) (set! sum (+ sum n))) a-list) sum)28>
The first argument to map can be a function or a lambda expression. I first use map in the expected way for functional programming: map creates a new list in which each element is the value of applying the first functional argument to the corresponding element of the input list. My second use of map is to discard the list that map returns as its value and to use the lambda expression to calculate a sum of values in the input list using side effects (changing the value of the locally bound variable sum).
|Author’s Note: This example is contrived to show you the map function. The appropriate function to use here is for-each, which does not construct a return list.|
The use of map in the name finder program is like the second example that you have just seen. The value of ret3 is an empty list that will hold the strings returned as the value for this function:
(map (lambda (index-pair) (let ((str "")) (dolist (s (vector->list (subvector word-vector (car index-pair) (cadr index-pair)))) (if (equal? s ".") (set! str (string-append str s)) (if (equal? str "") (set! str s) (set! str (string-append str " " s))))) (if (not (member str ret3)) (set! ret3 (cons str ret3))))) (reverse ret)) (reverse ret3)))
As you now know, the function map takes two arguments: a function and a list. This example does not use the returned value from map (which is unusual); it just uses the side effects of the anonymous lambda function filling the list ret3. This anonymous lambda function captures the value from the outer variable word-vector. Subvector returns part of the vector that is passed as its first argument. In this case, I pull out the strings for an entire name from the input word vector. Vector->list is a Scheme utility function that converts a vector to a list.
Here is a shorter example to show you how subvector, string-append, and vector->list work:
> (define v '#("the" "dog" "ran" "down" "the" "street"))> (subvector v 1 3)#("dog" "ran")> (string-append "123" "456")"123456"> (string-append "123" "456" "cat")"123456cat"> (vector->list v)("the" "dog" "ran" "down" "the" "street")>
I use the dolist macro to loop through this list of words in a name and use the function string-append to concatenate them. After a string containing an entire name is constructed, I add it to the beginning of the list ret3. The list ret3 (in reversed order) is the return value for the function find-human-names.
The next section uses this name-finding example in an implementation of a simple web application.
Implementing a Web Service as a Standalone Executable
Using the function find-human-names from the previous section, you can implement a web service that returns a list of names when called with input text. First, you will need a web server. Marc Feeley (the author of Gambit-C) includes a web server example with the source code distribution of Gambit-C. The web server in this section is a simpler version (implemented in about 50 lines of Scheme code) that Marc posted on the comp.lang.scheme newsgroup. Adding calls to the name finder code and displaying an HTML input form requires only a few lines of extra code. The web server implementation is in the file web.scm, which is mostly Marc Feeley’s code with the following added snippets:
(define (http-get connection document) (print port: connection "HTTP/1.0 200 OK " "Content-Type: text/html; charset=ISO-8859-1 " "Connection: close " " " ( ( (
"Enter text that contains peoples names") (
The function http-get is called whenever a new HTTP GET request is received. The input argument document is a string that is the URL-encoded request. I use the function string-tokenize (defined in the file proper-names.scm) to split up the request. This is not a complete or proper way to URL decode data, but it works for this example. The function string-tokenize returns a list of strings, so I use Scheme's list->vector function to convert the list to a vector. I then call the function find-human-names, which is defined in the file proper-names.scm.
You can run the web app using this code and access it using the URL http://localhost:8000:
> (load "people-data-tables.scm")"/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/people-data-tables.scm"> (load "proper-names.scm")"/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/proper-names.scm"> (load "web.scm")
The next section shows the makefile you need to create standalone, self-contained executables for both a command-line name finder tool and the web application.
Makefile for Building the Command Line and Web Application Examples
The source code for both the command-line name finder tool and web application examples is located in the directory name-finder. You can use the following makefile to build the examples:
clean: rm -f *~ rm -f */*~ rm -f #* rm -f */#* rm -f *.c rm -f out.txt rm -f web rm -f testappapp: gsc -exe proper-names.scm people-data-tables.scm testapp.scmwebapp: gsc -exe proper-names.scm people-data-tables.scm web.scm
The following listing shows the build process and running both examples (command lines in bold font):
markw$ make app webappgsc -exe proper-names.scm people-data-tables.scm testapp.scmproper-names.scm:/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/proper-names.c:people-data-tables.scm:/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/people-data-tables.c:testapp.scm:/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/testapp.c:/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/testapp_.c:gsc -exe proper-names.scm people-data-tables.scm web.scmproper-names.scm:/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/proper-names.c:people-data-tables.scm:/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/people-data-tables.c:web.scm:/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/web.c:/Volumes/Data/ideas/DevX/Gambit-C apps/code/name-finder/web_.c:markws-macbook:name-finder markw$ ./testapp -i test.txt -o out.txtmarkws-macbook:name-finder markw$ cat out.txt President ObamaPresident Clintonmarkws-macbook:name-finder markw$ ./web ; start the web service
You can easily modify this example to act as a REST-style web service as well.
So, what would be a good use of small, self-contained executables for a web application and/or REST-style web service? A very good use case would be writing applications that run in a user's web browser but with a local server. If you implement a "locally hosted" application using Ruby on Rails or server-side Java, for example, the end user or your program has to install multiple files on the system. The resulting web application will take up more memory and run slower than a Gambit-C application. I have benchmarked simple Gambit-C web apps that run with HTTP response times about 4 or 5 times less than a Rails app with the same functionality.
The Benefits of Gambit-C Scheme
I hope that you have enjoyed this quick introduction to Scheme and building standalone executable programs using Gambit-C. For research or experimental coding, the combination of Gambit-C Scheme with Emacs and Slime is one of my favorite development environments because of the fast and efficient style of interactive development it provides. As demonstrated here, a further benefit is the ability to package applications as small, efficient standalone applications.
In addition to writing command-line utilities, I am replacing Java code that I use to implement Hadoop Map/Reduce processes with Scheme versions that have better functionality. While the runtime performance of Java is very good, the higher level and more expressive nature of Scheme makes using Gambit-C Scheme a better choice for my project.
As a final note, Gambit-C Creator Marc Feeley does a great job of maintaining the language, and he also quickly answers questions posed on the Gambit mailing list and the comp.lang.scheme Google Group (or usenet group). He also made useful comments and corrections on a draft of this article and provided sample code.