WEBINAR:
On-Demand
Building the Right Environment to Support AI, Machine Learning and Deep Learning
Using DataMapper and a Database for Rendered Feed Data
To read all of the seed RSS URLs and fetch the RSS feed XML for each seed RSS URL, I created a model class FeedDataManager that reads the
feed_sources database table using the DataMapper-generated class FeedSource. You can look at the source file
app/models/feed_data_manager.rb for the full implementation, but the following few code snippets show how you fetch and process RSS data:
feed_sources = FeedSource.all.collect {|f| f.url}
feed_sources.each {|source|
puts "\n* * processing #{source}"
begin
rss = SimpleRSS.parse(open(source))
rss.items.each {|item|
puts " link: #{item.link} title: #{item.title} feed title: #{rss.feed.title}"
# item.context # all content text
# save data to database
feed = Feed.new
feed.url = item.link
feed.title = item.title
feed.content = item.content
feed.feed_name = rss.feed.title
feed.posted_date = Time.now
feed.save!
puts "* saved #{feed}"
}
rescue
print "Error: #{$!}"
end
}
Running this periodically in the background, the ideal implementation, is relatively easy. Merb has built-in support for running background tasks from controllers, but I wanted to do something different: start a work thread as soon as all application code has loaded. To do this, I edited the config/init.rb file by adding a statement to the run after loading block:
Merb::BootLoader.after_app_loads do
# This will get executed after your app's classes have been loaded.
DataMapper.setup(:default, "sqlite3://#{Dir.pwd}/dev.db")
# Start up the background work thread:
Thread.new { FeedDataManager.new }
end
The initialize method for class FeedDataManager never returns. Rather, it initializes instance data and enters a long loop that fetches feeds, and then the thread sleeps for five hours before re-fetching the feeds.
Now you need a controller to get the feed data for a view (note that I have removed the unnecessary controller methods that merb-gen resource feeds ... created):
class Feeds < Application
# provides :xml, :yaml, :js ## do not need to handle web service clients
def index
@feeds = Feed.all
display @feeds ## make available to the view
end
end
My initial edits for the app/views/feeds/index.html.erb were:
<h1>Latest Ruby Blog Entries</h1>
<% @feeds.each {|feed| %>
<h4> <%= feed.title %> </h4>
<a href="<%=feed.url%>" target="new">link</a> <%= feed.content[0..70]%>
<br/><br/>
<% } %>
Finally, before running the first implementation, I wanted to set the method index in the controller class Feeds to be the default route for the web application in config/router.rb:
match('/').to(:controller => 'feeds', :action =>'index')
That is pretty much bare bones, but you now have a complete web application (that you will improve) for fetching feeds and displaying them.
One thing that surprised me when I first ran this prototype was its speed. It was fast without caching and despite database fetches for the data it would display. I ran the Apache Benchmark utility and discovered that this prototype provided 44 page renders per second on my MacBook:
ab -n 5 http://localhost:4000/
The next section explains how you can change the design of the demo application to incorporate in-memory storage.