Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Move Over Rails. Here Comes Merb. : Page 4

Merb is potentially much faster than Rails, and the combination of Merb and Ruby can provide the performance most web apps require. Find out how Merb can deliver a 2-4X speed increase over Rails and why that makes it better for certain applications.


advertisement
In-Memory Storage for Feed and Search Index
The first thing that I noticed when running the first implementation of the application was how quickly the background RSS feed fetch operations ran. As a result, I decided to simplify the application to read the remote feeds only on startup and then to re-read the feeds every five hours while the application was running. This allowed me to get rid of one database table by changing the Feed class definition to not include behavior from the module DataMapper. The class Feed then became a plain old Ruby class with no database persistence. I also removed some of the class slots as follows:

class Feed attr_accessor :url, :title, :feed_name, :snippet end

You can extend the initial implementation of the demo app in two ways:

  1. Keeping feeds in an array in-memory
  2. Creating an in-memory inverted word index to support fast search functionality that will let users filter their feeds



The following sections explain how I accomplished these.

Implementing the Background Processing Functionality
You already saw the new implementation of the Feed class as a plain old Ruby class (no persistence). You can look at the source code in the file app/models/feed_data_manager.rb for details, but a few code snippets are worth looking at here to explain how it all works. The snippets below use only the simple-rss gem, so the code is a simpler example than the final version, which also supports using the atom gem, uses class variables to hold Feed objects, and provides an inverted word index (for search filtering) in memory:

@@feeds = [] @@search_index = {}

The private method do_background_work runs in an infinite loop. Each time through the loop, it fetches the Ruby blog feeds, creates Feed objects, sorts the feed objects by date and time, and builds a search index. When the application creates the array of feed objects and the search index as local (temporary) variables, it uses them to overwrite the values of the static class variables. Only the private method do_background_work modifies these static class variables; the controller has read-only access.

I rewrote the code for fetching and saving blog feeds so it uses the new Feed class and does not save any blog entries that do not contain at least one of the words Ruby, Rails, or Merb (see Listing 3). This enabled the app to process general technology blogs and grab only the articles that I am interested in:

After creating a local array of Feed instances, I sorted them in order of latest date and time first and then overwrote the static class variables to which the controller has read-only access:

feeds.sort! {|a,b| b.published <=> a.published} feeds.each {|feed| # build in-memory search index: feed.content.downcase.scan( /[a-z_][a-z_\d]*/ ).each {|token| feed_indices = search_index[token] || [] feed_indices << count if !feed_indices.index(count) search_index[token] = feed_indices } count += 1 } ## not thread safe: @@feeds = feeds @@search_index = search_index sleep(3600*5) # 5 hours

The static Feed method feeds simply returns all of the feeds if no filter search string is provided. When the user enters a filter string, the web application converts this string to lower case and tokenizes it. It returns only feeds that contain at least one of the filter search tokens. Note that the filter terms are "logical or," not "logical and":

def FeedDataManager.feeds filter='' return @@feeds if filter=='' ids = [] filter.downcase.split.each {|token| ii = @@search_index[token] if ii ii.each {|x| ids << x if !ids.index(x)} end } ids.sort! # want IDs in ascending order to posts are in time order ids.collect {|index| @@feeds[index]} end

Implementing Feed Controller and View
I edited the app/views/layout/application.html.erb global page template to add the graphics from my existing RubyPlanet.net web site. The template file app/views/feeds/index.html.erb file is simple:

<div class="right"> <form> Filter terms: <input type="text" name="sfilter" value="<%=@filter%>"/> </form> </div> <h1>Latest Ruby Blog Entries</h1> <% @feeds.each {|feed| %> <h4 style="color: #c55"> <%= feed.title %></h4> <h4> <%= feed.feed_name %> - <%= feed.s_published %></h4> <p><a href="<%=feed.url%>" target="new">Feed</a> <%= feed.snippet%></p> <% } %>

Note that it uses two variables set in the controller: @filter and @feeds. The controller itself is very simple because it relies on the FeedDataManager model class to do the real work:

class Feeds < Application def index @filter = params['sfilter'] || "" @feeds = FeedDataManager.feeds(@filter) display @feeds end end

With these implementations, I extended the demo app to keep feeds in an array in-memory and enable fast search functionality for the user.

I started with a conventional application that used a database to store the data it would render. The following realizations led me to refactor the application using in-memory storage:

  • The application data easily fit in a small amount of memory.
  • It took very little time to fetch the feed data from blog web sites.

To increase the page-rendering performance I used an in-memory storage instead of accessing a database. I also could have eliminated the feed_sources database table and used a simple YAML configuration file for the seed blog URLs. However, I wanted this demo application to show a simple use of DataMapper.

So, how did the refactored implementation do on performance? It serves about 120 page requests per second on my MacBook with no filtering (rendering about 100 blog posts on the page for this benchmark). With filtering, the number of rendered pages per second drops to 50 to 60, depending on the number of filter terms. The previous version's benchmark rendered 44 page views per second with about 30 blog entries on the page. So the second version of the demo web application runs much faster than the first version when results are normalized for the number of blog articles rendered.

I ran all of these benchmarks in "development mode," so there is real overhead for handling things such as templates. When I ran the final version on my MacBook using merb -e production, I got about 180 page renders per second. That's fast!



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap