A Java Developer’s Guide to Ruby

s a Java developer, why should you learn Ruby? Because Ruby’s versatility and flexibility complement Java well, and you will be a more effective and efficient developer if you use both languages. In fact, I use Java, Ruby, and Common Lisp for all my development and Ruby has become a core part of my work life. Specifically, the following reasons make Ruby compelling for Java developers:

  • As a scripting language, Ruby is an effective tool for small projects. When I need to write utilities for data conversion and text processing quickly, I almost always use Ruby.
  • Ruby is a dynamic and terse language.
  • Using Ruby will often offer a different perspective on problem solving.
  • JRuby is still a work-in-progress but I believe it eventually will provide an excellent Ruby deployment platform using the Java VM. Currently, IntelliJ, NetBeans, and Eclipse all provide excellent Ruby support.
  • As the cost of software maintenance is roughly proportional to the number of lines of code, and Ruby programs are short and concise, they tend to be easier to read, understand, and maintain.
  • The Ruby on Rails web development framework is great for small and medium-sized database-backed web applications. You need to know Ruby if you want to use Ruby on Rails.

To demonstrate why Ruby is a good fit for Java developers, this article introduces the language features that will make you more efficient (see Table 1. Ruby and Java Feature Comparison) and then shows short program examples in both languages.

What You Need
To follow along with the rest of the article, you need to install external Ruby libraries. The RubyGems library system makes this easy. Download it from RubyForge and follow the installation instructions for your operating system. (If you already have Ruby set up, you can verify that your setup includes RubyGems?many Ruby install packages do?by typing gem in a command shell to check for installation.) Having a central repository for libraries and a standard tool like RubyGems will save you a lot of time: no searching for the libraries you need, installing them, and using them in multiple projects.

Use the following commands to install the required gems:

 gem query --remote # if you want to see all available remotely installable gemssudo gem install activerecordsudo gem install mysql # if you want to use MySQLsudo gem install postgres-pr # optional: install "pure ruby" PostgreSQL interfacesudo gem install postgres # optional: install native PostgreSQL interfacesudo gem install ferret # a search library like Lucene (same API)sudo gem install stemmer # a word stemming library for demonstrating extending a classgem query # to show gems locally installedgem specification activerecord # info on gem (ActiveRecord in this example)

Under Mac OS X and Linux, you will need to run the gem installs using sudo; if you are a Windows user, remove “sudo” from the previous commands.

This article also assumes that you will open a Ruby irb shell as follows and keep it open while you’re reading:

 markw$ irb>> s = "a b c"=> "a b c">> 

The example programs and code snippets are short enough to copy and paste into an irb interactive session.

Ruby String Handling
The Ruby String class provides a large set of string-processing methods that are more flexible than Java’s string handling capabilities. This section shows a useful subset of Ruby’s string processing. This code snippet shows how to combine strings, take them apart with slices, and then search for substrings (the examples to follow use the # character to make the rest of a line a program comment):

 require 'pp'  # use the "pretty print" library. Defines the function 'pp'# define some strings to use in our examples:s1 = "The dog chased the cat down the street"s2 = "quickly"puts s1puts s1[0..6]  # a substring slice up to and including character at index==6puts s1[0...6] # a substring slice up to (but not including) the character at index==6puts "He is a #{s2} dog #{1 + 6} days a week." # expressions inside #{} are inserted into a double quote stringputs "   test  ".strip  # create a copy of the string: the new copy has white space removedputs s1 + ' ' + s2 # string literals can also be formed with single quotesputs s2 * 4puts s1.index("chased") # find index (zero based) of a substrings1[4..6] = 'giant lizard'  # replace a substring (/dog/ -> /giant lizard/)puts s1s2 = s2 << " now"  # the << operator, which also works for arrays and other collections, copies to then endputs s2puts "All String class methods:"pp s1.methods # the method "methods" returns all methods for any object

The output would be:

 The dog chased the cat down the streetThe dogThe doHe is a quickly dog 7 days a week.testThe dog chased the cat down the street quicklyquicklyquicklyquicklyquickly8The giant lizard chased the cat down the streetquickly nowAll String class methods:["send", "%", "index", "collect", "[]=", "inspect", ......]    # most methods not shown for brevity--try this in irb

The << operator in the above example is really a method call. When evaluating expressions, Ruby translates infix operators into method calls. For example, the << operator in the following code adds the value of the expression on its right side to the value on the left side:

 >> "123" << "456"=> "123456">> "123".<<("456")=> "123456">> 1 + 2=> 3>> 1.+(2)=> 3

In the above example, using the form ".<<" is a standard method call.

Many classes use the << operator to add objects to a class-specific collection. For example, you will later see how the Ferret search library (a Ruby gem you have installed) defines the << operator to add documents to an index.

Modifying an Existing Class
The key to Ruby's versatility is the ability to extend all its classes by adding methods and data. I frequently extend core Ruby classes in my application, not in the original class source code. This likely seems strange to Java or even C++ developers, but this technique lets you keep resources for a project in one place and enables many developers to add application-specific functionality without "bloating" the original class. As a Java programmer, think how the limitations of Java constrain you: if you want to add functionality and data to an existing class, you must subclass.

The following listing shows how to add the method stem to the String class:

 begin  puts "The trips will be longer in the future".downcase.stem # stem is undefined at this pointrescue    puts 'Error:' + $!end  require "rubygems"require_gem 'stemmer'class String # you will extend the String class    include Stemmable # add methods and data defined in module Stemmableendputs "The trips will be longer in the future".downcase.stem

You will also find it useful to add methods and perhaps new class instance variables to existing classes in your application.

The next section looks at "duck typing," another example of the extreme flexibility that Ruby offers.

Ruby Duck Typing
In Java, you can call a method only on an object that is defined (with public, package, etc. visibility) in the object's class hierarchy. Suppose that you have a collection of objects and you want to iterate over each element in the collection, calling one or more methods. In Java, the objects would need to be part of the same class hierarchy or implement interfaces defining the methods that you want to call.

As you have probably already guessed, Ruby is much more flexible. Specific data types and classes are not required in Ruby's runtime method-calling scheme. Suppose you call method foo on an object obj, and then call method bar on the resulting object of this first method call as follows (the example shows two equivalent calls; when there are no method arguments, you can leave off the ()):

 obj.foo.barobj.foo().bar()

The result of calling obj.foo will be some object, and whatever the class of this new object is, you would attempt to call method bar on it.

As another example, suppose you want to call the method name on each object in a collection. One element in this collection happens to be of an instance of class MyClass2 that does not have a method name defined. You will get a runtime error when you first try applying method name to this object. You can fix this by dynamically adding the method as follows:

 class MyClass2    def name       "MyClass2: #{this}"    endend

Developers who are used to a strongly type checked language like Java likely will expect this "unsafe" flexibility to make their programs less reliable because the compiler or interpreter is not statically checking all type uses. However, any program bugs due to runtime type checking will be found quickly in testing, so there is no decrease in software reliability. Yet you get the benefits of a more flexible language: shorter programs and shorter development time.

Dealing with Missing Methods
Still skeptical about duck typing? Hang on, because now you are going to see another Ruby trick: how to handle missing methods for any Ruby class, starting with this simple example that applies two methods to a string object, one that is defined (length) and one that is undefined (foobar):

 markw$ irb>> s = "this is a string"=> "this is a string">> s.length=> 16>> s.foobarNoMethodError: undefined method `foobar' for "this is a string":String        from (irb):3

You'll see an error thrown for the undefined method. So "patch" the String class by writing your own method_missing method:

 >> class String>>   def method_missing(method_name, *arguments)>>     puts "Missing #{method_name} (#{arguments.join(', ')})">>   end>> end=> nil>> s.foobarMissing foobar ()=> nil>> s.foobar(1, "cat")Missing foobar (1, cat)=> nil>> 

Whenever the Ruby runtime system cannot find a method for an object, it calls the method method_missing that is initially inherited and simply raises a NoMethodError exception. This example overrode this inherited method with one that does not throw an error, and it prints out the name and arguments of the method call. Now, redefine this method again, this time checking to see if the method name (after converting it to a string with to_s) is equal to foobar:

 >> class String>>   def method_missing(method_name, *arguments)>>     if method_name.to_s=='foobar'>>       arguments.to_s.reverse  # return a value>>     else?>       raise NoMethodError, "You need to define #{method_name}">>     end>>   end>> end=> nil>> s.foobar(1, "cat")=> "tac1">> s.foobar_it(1, "cat")NoMethodError: You need to define foobar_it        from (irb):38:in `method_missing'        from (irb):43        from :0>> 

If the method name is equal to foobar, this example calculates a return value. Otherwise, it throws an error.

Ruby Code Blocks
Ruby uses code blocks as an additional way to iterate over data. These blocks offer more flexibility and power than the limited iteration functionality built into the Java language. The previous example showing basic string functionality used the stemmer gem to find the word stems of a string containing English words. The following example uses the String split method to tokenize a string using the space character as a word delimiter and then passes a code block defined using the { and } characters to mark the beginning and end of a code block (you also can use begin and end). Local variables in a block are listed between two | characters:

 puts "longs trips study studying banking".split(' ')puts "longs trips study studying banking".split(' ').each {|token| puts "#{token} : #{token.stem}"

This code snippet produces the following:

 longstripsstudystudyingbankinglongs : longtrips : tripstudy : studistudying : studibanking : bank

You can see another good use of code blocks in the following example, which uses the Array collect method. The collect method processes each array element and then passes it to a code block:

 require 'pp'pp ["the", "cat", "ran", "away"].collect {|x| x.upcase}pp ["the", "cat", "ran", "away"].collect {|x| x.upcase}.join(' ')

In this example, the code block assumes that the elements are strings and calls the upcase method on each element. The collect method returns the collected results in a new array. It also uses the method join to combine all the resulting array elements into a string, separating the elements with the space character. This is the output:

 ["THE", "CAT", "RAN", "AWAY"]"THE CAT RAN AWAY"

Writing Methods That Use Code Blocks
You can use the yield method to call a code block passed to a method or function call. The following example uses the method block_given? to call yield conditionally if a code block is supplied. The method yield returns a value that is printed:

 def cb_test name	puts "Code block test: argument: #{name}"	s = yield(name) if block_given?	puts "After executing an optional code block, =#{s}"end	

This example calls function cb_test, first without a code block and then with one:

 >> puts cb_test("Mark")Code block test: argument: MarkAfter executing an optional code block, =nil=> nil>> puts cb_test("Mark") {|x| x + x}Code block test: argument: MarkAfter executing an optional code block, =MarkMarknil=> nil>> 

The string value Mark is passed as an argument to yield, and inside the code block the local variable x is assigned the value Mark. The return value from the code block is MarkMark.

Ruby Regular Expressions
Ruby has built-in support for handling regular expressions using the class Regexp. Java's java.util.regex APIs offer similar functionality but regular expression support in Ruby definitely has a more native feel to it. You can create a regular expression object by either directly using a method call like Regexp.new("[a-e]og") or enclosing a regular expression between slash characters like /[a-e]og/. You can find good tutorials on both regular expressions and on Ruby's regular expression support on the web; this simple example shows only using the =~ operator:

 => 4>> "the dog ran" =~ /[a-e]og/=> 4>> "the zebra ran" =~ /[a-e]og/=> nil

Ruby Network Programming
Ruby has a great standard library for network programming as well. Please see my previous DevX article on this subject. I frequently use Ruby for collecting data from the Internet, parsing it, and then storing it in XML or a database.

Ruby Document Indexing and Search Using the Ferret Library
By now, you have installed the Ruby gem called ferret. Ferret is the fastest indexing and search library based on Java Lucene (even faster than the Common Lisp version, Montezuma). One interesting fact about the Ferret library is that during development the author David Balmain eventually wrote most of it in C with a Ruby wrapper. The lesson is that if you start to use Ruby and have performance problems, you can always recode the time-critical parts in C or C++. Ferret defines a few classes that you will use in your own applications once you adopt Ruby:

  • Document represents anything that you want to search for: a local file, a web URL, or (as you will see in the next section) text data in a relational database.
  • Field represents data elements stored in a document. Fields can be indexed or non-indexed. Typically, I use a single indexed (and thereby searchable) text field and then several "meta data" fields that are not indexed. Original file paths, web URLs, etc. can be stored in non-indexed fields.
  • Index represents the disk files that store an index.
  • Query provides APIs for search.

Indexing and Searching Microsoft Word Documents
The following is the Ruby class I use for reading Microsoft Word documents and extracting the plain text, which is an example of using external programs in Ruby:

 class ReadWordDoc  attr_reader :text  def initialize file_path    @text = `antiword #{file_path}`   # back quotes to run external program  endend

The "trick" here is that I use the open source antiword utility to actually process Word document files. You can run any external program and capture its output to a string by wrapping the external command in back quotes. Try the following under Linux or OS X (for Windows try `dir`):

 puts `ls -l`

This example prints the result of executing the external ls (Unix list directory) command.

The following Ruby script enters a Word document into an index (plain text files are easier?try that as an exercise):

 require 'rubygems'require 'ferret'include Ferretinclude Ferret::Documentrequire 'read_word_doc' # read_word_doc.rb defines class ReadWordDocindex = Index::Index.new(:path => './my_index_dir')  # any path to a directorydoc_path = 'test.doc'                   # path to a Microsoft Worddoc_text = ReadWord.new(doc_path).text  # get the plain text from the Word file doc = Document.newdoc << Field.new("doc_path", doc_path, Field::Store::YES, Field::Index::NO)doc << Field.new("text", doc_text, Field::Store::YES, Field::Index::TOKENIZED)index << docindex.search_each('text:"Ruby"') do |doc, score|  # a test search  puts "result: #{index[doc]['doc_path']} : #{score}"    # print doc_path meta data  puts "Original text: #{index[doc]['text']}"            # print original textendindex.close  # close the index when you are done with it

Notice how short this example is. In 24 lines (including the class to use antiword for extracting text from Word documents), you have seen an example that extracts text from Word, creates an index, performs a search, and then closes the index when you are done with it. Using Ruby enabled you to get complex tasks done with very few lines of code. Had you coded this example in Java using the very good Lucene library (which I've done!), the Java program would be much longer. Shorter programs are also easier and less expensive to maintain.

This example uses Word documents, but OpenOffice.org documents are simple enough to be read. With about 30 lines of pure Ruby code, you can unzip a document and extract the text from the content.xml element in the unzipped XML data stream. (XML processing is simple in Ruby, but it is beyond the scope of this article.)

Ruby Complements Java
The cost of software development and maintenance is usually the largest expense for a company's IT budget?much more expensive than servers, Internet connectivity, etc. The use of Ruby can greatly reduce the cost of building and maintaining systems, mostly because programs tend to be a lot shorter (For me, the time spent per line of code is similar for most programming languages I use).

OK, so when should you use Java? I have used the Java platform for building systems for my consulting customers for over 10 years, and I certainly will continue using Java. A well-built, Java-based web application will run forever?or at least until servers fail or have to be rebooted for hardware maintenance. My confidence comes from seeing systems run unattended for months on end with no problems. My advice is to continue using Java on the server side for large systems and to start using Ruby for small utility programs. For my work, I view Java and Ruby as complementary, and not as competitors. Use the best tool for each task.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

The Latest

microsoft careers

Top Careers at Microsoft

Microsoft has gained its position as one of the top companies in the world, and Microsoft careers are flourishing. This multinational company is efficiently developing popular software and computers with other consumer electronics. It is a dream come true for so many people to acquire a high paid, high-prestige job

your company's audio

4 Areas of Your Company Where Your Audio Really Matters

Your company probably relies on audio more than you realize. Whether you’re creating a spoken text message to a colleague or giving a speech, you want your audio to shine. Otherwise, you could cause avoidable friction points and potentially hurt your brand reputation. For example, let’s say you create a

chrome os developer mode

How to Turn on Chrome OS Developer Mode

Google’s Chrome OS is a popular operating system that is widely used on Chromebooks and other devices. While it is designed to be simple and user-friendly, there are times when users may want to access additional features and functionality. One way to do this is by turning on Chrome OS