Move Over Rails. Here Comes Merb.

bout three years ago, I spent a few hours building an RSS news aggregator for Ruby news. I wrote it in Java because my one leased server at the time was all set up to host multiple JSP web applications. I deployed the finished product as RubyPlanet.net. Around the same time, I discovered Ruby on Rails (RoR) and have since become an enthusiastic RoR user. For new web projects these days, I typically use Rails (unless I need the speed and scalability of some part of the J2EE stack). But the introduction of the MVC framework Merb may change that.

Merb is potentially much faster than Rails, and with the performance improvements to Ruby (C-Ruby, JRuby, etc.), the combination of Merb and Ruby may well provide adequate performance for most of my new projects in the future. On the eve of the version 1.0 release, this article explains how Merb can deliver a 2-4X speed increase over Rails and why that makes it better for certain applications.

Merb is clearly patterned after Rails, so this article concentrates more on how Merb is different from Rails and on which types of applications are better suited for Merb than Rails. While using Merb is more difficult than using Rails, Merb is touted as a “hacker’s framework” and there is a lot of truth in that. With a small kernel (about 6,000 lines of code), it is easier to get into the Merb code base.

The web application I use as an example is a Ruby + Merb replacement for the original Java version of RubyPlanet.net that I wrote three years ago (see Figure 1 for an overview). The new Ruby + Merb version took the same amount of time to write, but it required less code and has more functionality. Less code makes it simpler to maintain and modify.

Thread Safe? Beyond the Speed Boost
Merb has more to offer than just runtime performance. For example, its small core of functionality with plugins and “slices” (complete mini-apps that install in an existing Merb project) does not, in principle, have cross dependencies. Because Merb was designed to be modular, you can include just the plugins and “slices” your application needs, keeping your systems simpler.

Perhaps more importantly, Merb is thread safe and has built-in support for running background tasks as threads. That means deployed Merb applications are likely to require much less memory because you can use work threads instead of extra processes to serve up page and web service requests. Without this feature, you risk duplicating application and framework libraries (if shared libraries are not used) as well as application data when you run many identical processes on a server.

For some types of web applications, using a cluster of Mongrels and aggressive page and page fragment caching provides very good performance. For applications with highly customized page content, a single process with in-memory data caching is better. Because Merb is thread safe, it enables you to build web applications that can better take advantage of in-memory data caching.

Figure 1. Layout of the Merb Demo App: Here is the rough layout of the Merb version of RubyPlanet.net.

If you want to use in-memory caching with RoR instead, you can run a cluster of Mongrels, each with a copy of the cache. However, that eats up more server memory. Alternatively, you can use Memcached to cache common data between multiple Mongrels, but that incurs some communications overhead. For some applications, it is simply more efficient to cache data in-memory and count on multiple threads to support concurrent users.

Because Merb is thread safe and supports running background tasks as threads, I was able to run the entire Merb version of RubyPlanet.net in a single process and use an in-memory cache for active feeds. A single Merb process can process many page requests per second (more on that later).

Figure 1 shows the rough layout of the Merb version of RubyPlanet.net.

Installing Merb
The first decision that you need to make before installing Merb is whether you want to use stable or “edge” versions. While I encourage you to eventually contribute to Merb, in which case you would install nightly developer builds, this article assumes that you are starting with stable builds and using C-Ruby 1.8.6 (I had some difficulties with Merb 1.0RC4 and JRuby but these may be fixed soon).

Begin by typing the following to install a stable version of Merb as a collection of gems:

gem install merb

You will also need the simple-rss and atom gems installed:

gem install simple-rssgem install atom

The simple-rss gem handles both RSS and Atom, but for Atom it returns only the first link element, ignoring the rest. For blogging services like blogger.com, you need all links to get the one with the attribute rel=’alternate’.

The Merb core developer team intended Merb to be modular and highly customizable, but for new users they also wanted to provide a ready-to-use experience out of the box. This standard Merb install uses DataMapper instead of ActiveRecord (which you can easily use with a change to the config/init.rb file) and installs and uses the jQuery JavaScript library. Erb is the default template engine (also set in the config/init.rb file).

Your First Merb Web Application: Filtered Ruby News
The demo Merb app you will develop is for Ruby feeds, but you can repurpose it to use feeds for any topic that interests you. You will start by creating a data model for storing RSS feeds. When you installed the Merb gem (which also installed the “default Merb stack”), it placed several command-line utilities on your PATH. You use one of these utilities (merb-gen) to generate your Merb application (see Listing 1).

By default, merb-gen app generates an application using the DataMapper object relational mapper (ORM) framework, which is similar to ActiveRecord. I am a huge fan of ActiveRecord, but DataMapper has a more abstract interface for backend data sources (databases, RSS/Atom, flat file, etc.) and it’s thread safe (more on DataMapper later).

Normally, the first thing that you need to do after generating a new Merb application is to edit the config/init.rb file. In this case, you will use the defaults set by the application generator as is, but going through the generated file (see Listing 2) is still worthwhile. I have added comments to explain what is in this file (you will not see my comments when you run the application generator and edit the conf/init.rb file).

You need only two model classes, FeedSource and Feed, and I initially generated a model for this application with a string attribute for a blog post URL:

merb-gen resource feed_source url:stringmerb-gen resource feed url:string

Later, you will refactor this application to keep instances of the Feed class in memory. I eventually made the following manual edits to the FeedSource class to add the additional class attribute feed_type:

class FeedSource  include DataMapper::Resource   property :id, Serial  property :url, String  property :feed_type, String # 'rss' or 'atom'end

I also made these manual edits to the Feed class to add the additional attributes title and posted_date:

class Feed  include DataMapper::Resource  property :id, Serial  property :url, String  property :title, String  property :posted_date, Dateend

By default, Merb defines a user model class for you when you generate a new resource. After editing the model file(s), in development mode you can wipe out any existing database and recreate the development database using:

rake db:automigrate

Running this rake task created the two sqlite3 database tables for the FeedSource and Feed model classes.

While working on this application, I kept editing the app/models/feed.rb file. Here is the app/models/feed.rb file for the first version of the demo web application (remember to run rake db:automigrate after you change the model files during development):

class Feed  include DataMapper::Resource  property :id, Serial  property :url, String  property :title, String  property :content, String  property :feed_name, String  property :posted_date, Dateend

The feed_sources table contains seed RSS feed URLs for spidering. I used the following commands to add a row to this table for development and testing:

mark$ sqlite3 dev.dbsqlite> insert into feed_sources ('url', 'feed_type') values ('http://markwatson.com/blog/atom.xml', 'atom');

You are going to start Merb in the interactive (irb like) mode:

$mark merb -iirb(main):001:0> FeedSource.find_by_sql("select * from feed_sources").collect {|f| f.url} ~ select * from feed_sources=> ["http://markwatson.com/blog/atom.xml"]irb(main):002:0> FeedSource.all ~ SELECT "id", "url" FROM "feed_sources" ORDER BY "id"=> [#]irb(main):003:0> FeedSource.all.collect {|f| f.url} ~ SELECT "id", "url" FROM "feed_sources" ORDER BY "id"=> ["http://markwatson.com/blog/atom.xml"]irb(main):004:0> 

Note that DataMapper is printing the generated SQL.

Removing Unused Code from the Application
Usually when you use merg-gen resource …, you will use the generated model file, the controller, and the view files. However, this demo application is a single-page web site that does not need an admin interface, so there is quite a lot of clean up that you can do. Figure 2 shows the application with the unused files, and Figure 3 shows the application after I removed these files.


Figure 2. The Application with the Unused Files: There is quite a lot of clean up that you can do on the demo application.
 
Figure 3. The Application After I Removed the Unused Files: You will notice the demo application has many fewer unused files than new Merb applications that you generate.

Because Merb is itself a small and modular code base, I am motivated to clean up all the unused bits of any application built on Merb. When you download the demo web application for this article, you will notice it has many fewer unused files than new Merb applications that you generate.

The next section walks through the initial implementation of a prototype for the demo application.

Using DataMapper and a Database for Rendered Feed Data
To read all of the seed RSS URLs and fetch the RSS feed XML for each seed RSS URL, I created a model class FeedDataManager that reads the feed_sources database table using the DataMapper-generated class FeedSource. You can look at the source file app/models/feed_data_manager.rb for the full implementation, but the following few code snippets show how you fetch and process RSS data:

   feed_sources = FeedSource.all.collect {|f| f.url}   feed_sources.each {|source|     puts "
* * processing #{source}"     begin       rss = SimpleRSS.parse(open(source))       rss.items.each {|item|         puts "  link: #{item.link}  title: #{item.title}  feed title: #{rss.feed.title}"         # item.context # all content text         # save data to database         feed = Feed.new         feed.url = item.link         feed.title = item.title         feed.content = item.content         feed.feed_name = rss.feed.title         feed.posted_date = Time.now         feed.save!         puts "* saved #{feed}"       }     rescue       print "Error: #{$!}"     end  }

Running this periodically in the background, the ideal implementation, is relatively easy. Merb has built-in support for running background tasks from controllers, but I wanted to do something different: start a work thread as soon as all application code has loaded. To do this, I edited the config/init.rb file by adding a statement to the run after loading block:

Merb::BootLoader.after_app_loads do  # This will get executed after your app's classes have been loaded.  DataMapper.setup(:default, "sqlite3://#{Dir.pwd}/dev.db")  # Start up the background work thread:  Thread.new { FeedDataManager.new }end

The initialize method for class FeedDataManager never returns. Rather, it initializes instance data and enters a long loop that fetches feeds, and then the thread sleeps for five hours before re-fetching the feeds.

Now you need a controller to get the feed data for a view (note that I have removed the unnecessary controller methods that merb-gen resource feeds … created):

class Feeds < Application  # provides :xml, :yaml, :js ## do not need to handle web service clients  def index    @feeds = Feed.all    display @feeds  ## make available to the view  endend

My initial edits for the app/views/feeds/index.html.erb were:

Latest Ruby Blog Entries

<% @feeds.each {|feed| %>

<%= feed.title %>

link <%= feed.content[0..70]%>

<% } %>

Finally, before running the first implementation, I wanted to set the method index in the controller class Feeds to be the default route for the web application in config/router.rb:

    match('/').to(:controller => 'feeds', :action =>'index')

That is pretty much bare bones, but you now have a complete web application (that you will improve) for fetching feeds and displaying them.

One thing that surprised me when I first ran this prototype was its speed. It was fast without caching and despite database fetches for the data it would display. I ran the Apache Benchmark utility and discovered that this prototype provided 44 page renders per second on my MacBook:

ab -n 5 http://localhost:4000/

The next section explains how you can change the design of the demo application to incorporate in-memory storage.

In-Memory Storage for Feed and Search Index
The first thing that I noticed when running the first implementation of the application was how quickly the background RSS feed fetch operations ran. As a result, I decided to simplify the application to read the remote feeds only on startup and then to re-read the feeds every five hours while the application was running. This allowed me to get rid of one database table by changing the Feed class definition to not include behavior from the module DataMapper. The class Feed then became a plain old Ruby class with no database persistence. I also removed some of the class slots as follows:

class Feed  attr_accessor :url, :title, :feed_name, :snippetend

You can extend the initial implementation of the demo app in two ways:

  1. Keeping feeds in an array in-memory
  2. Creating an in-memory inverted word index to support fast search functionality that will let users filter their feeds

The following sections explain how I accomplished these.

Implementing the Background Processing Functionality
You already saw the new implementation of the Feed class as a plain old Ruby class (no persistence). You can look at the source code in the file app/models/feed_data_manager.rb for details, but a few code snippets are worth looking at here to explain how it all works. The snippets below use only the simple-rss gem, so the code is a simpler example than the final version, which also supports using the atom gem, uses class variables to hold Feed objects, and provides an inverted word index (for search filtering) in memory:

    @@feeds = []    @@search_index = {}

The private method do_background_work runs in an infinite loop. Each time through the loop, it fetches the Ruby blog feeds, creates Feed objects, sorts the feed objects by date and time, and builds a search index. When the application creates the array of feed objects and the search index as local (temporary) variables, it uses them to overwrite the values of the static class variables. Only the private method do_background_work modifies these static class variables; the controller has read-only access.

I rewrote the code for fetching and saving blog feeds so it uses the new Feed class and does not save any blog entries that do not contain at least one of the words Ruby, Rails, or Merb (see Listing 3). This enabled the app to process general technology blogs and grab only the articles that I am interested in:

After creating a local array of Feed instances, I sorted them in order of latest date and time first and then overwrote the static class variables to which the controller has read-only access:

      feeds.sort! {|a,b| b.published <=> a.published}      feeds.each {|feed|        # build in-memory search index:        feed.content.downcase.scan( /[a-z_][a-z_d]*/ ).each {|token|          feed_indices = search_index[token] || []          feed_indices << count if !feed_indices.index(count)          search_index[token] = feed_indices        }        count += 1      }      ## not thread safe:      @@feeds = feeds      @@search_index = search_index      sleep(3600*5)  # 5 hours

The static Feed method feeds simply returns all of the feeds if no filter search string is provided. When the user enters a filter string, the web application converts this string to lower case and tokenizes it. It returns only feeds that contain at least one of the filter search tokens. Note that the filter terms are "logical or," not "logical and":

  def FeedDataManager.feeds filter=''    return @@feeds if filter==''    ids = []    filter.downcase.split.each {|token|      ii = @@search_index[token]      if ii        ii.each {|x| ids << x if !ids.index(x)}      end    }    ids.sort! # want IDs in ascending order to posts are in time order    ids.collect {|index| @@feeds[index]}  end

Implementing Feed Controller and View
I edited the app/views/layout/application.html.erb global page template to add the graphics from my existing RubyPlanet.net web site. The template file app/views/feeds/index.html.erb file is simple:

Filter terms:

Latest Ruby Blog Entries

<% @feeds.each {|feed| %>

<%= feed.title %>

<%= feed.feed_name %> - <%= feed.s_published %>

Feed <%= feed.snippet%>

<% } %>

Note that it uses two variables set in the controller: @filter and @feeds. The controller itself is very simple because it relies on the FeedDataManager model class to do the real work:

class Feeds < Application  def index    @filter =  params['sfilter'] || ""    @feeds = FeedDataManager.feeds(@filter)    display @feeds  endend

With these implementations, I extended the demo app to keep feeds in an array in-memory and enable fast search functionality for the user.

I started with a conventional application that used a database to store the data it would render. The following realizations led me to refactor the application using in-memory storage:

  • The application data easily fit in a small amount of memory.
  • It took very little time to fetch the feed data from blog web sites.

To increase the page-rendering performance I used an in-memory storage instead of accessing a database. I also could have eliminated the feed_sources database table and used a simple YAML configuration file for the seed blog URLs. However, I wanted this demo application to show a simple use of DataMapper.

So, how did the refactored implementation do on performance? It serves about 120 page requests per second on my MacBook with no filtering (rendering about 100 blog posts on the page for this benchmark). With filtering, the number of rendered pages per second drops to 50 to 60, depending on the number of filter terms. The previous version's benchmark rendered 44 page views per second with about 30 blog entries on the page. So the second version of the demo web application runs much faster than the first version when results are normalized for the number of blog articles rendered.

I ran all of these benchmarks in "development mode," so there is real overhead for handling things such as templates. When I ran the final version on my MacBook using merb -e production, I got about 180 page renders per second. That's fast!

Which Apps Are Best for Merb?
Am I going to completely give up using Rails in favor of Merb? Absolutely not. I currently see Merb to be most suited for either small compute-intensive web applications or for creating web services from Ruby applications.

The techniques I discussed in this article (note that I did not cover using page and fragment caching, testing frameworks, and "slices") reflect my own use of Merb to create small, targeted web services and web applications. I believe that Merb is an especially good fit for writing web services; notice that when you create a new resource with merb-gen, the generated controller contains a commented out line that you can uncomment to enable rendering XML, YAML, and JSON:

class Feeds < Application  # provides :xml, :yaml, :js

When you use a provides method call in your controller, you get an automated conversion of Ruby model objects to data formats, which are also useful for implementing web services.

I am still experimenting with Merb to understand which other kinds of projects are a good fit for it and which are best implemented in Rails. I encourage you to do that same.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

The Latest

Top 5 B2B SaaS Marketing Agencies for 2023

In recent years, the software-as-a-service (SaaS) sector has experienced exponential growth as more and more companies choose cloud-based solutions. Any SaaS company hoping to stay ahead of the curve in this quickly changing industry needs to invest in effective marketing. So selecting the best marketing agency can mean the difference

technology leadership

Why the World Needs More Technology Leadership

As a fact, technology has touched every single aspect of our lives. And there are some technology giants in today’s world which have been frequently opined to have a strong influence on recent overall technological influence. Moreover, those tech giants have popular technology leaders leading the companies toward achieving greatness.

iOS app development

The Future of iOS App Development: Trends to Watch

When it launched in 2008, the Apple App Store only had 500 apps available. By the first quarter of 2022, the store had about 2.18 million iOS-exclusive apps. Average monthly app releases for the platform reached 34,000 in the first half of 2022, indicating rapid growth in iOS app development.