RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Using SharePoint Portal Server to Index Your Custom Application : Page 3

Although you can plumb the depths of SharePoint's search engine, creating custom protocol handlers and IFILTERs to handle your site's custom search needs, for most sites, you can follow a much simpler process to provide customized searches.

The Sample Application
The downloadable code accompanying this article contains a sample Web application that demonstrates the concepts discussed here, and that you can use to practice with. The sample application contains three pages: default.aspx, crawler.aspx, and detail.aspx. Each demonstrates the concepts laid out here. Default.aspx is the home page of the application, which includes the link (with no text in the middle) to crawler.aspx. That generally looks something like:

   <a href="./crawler.aspx"></a>
Crawler.aspx is the search start page. It contains a loop that adds 100 links to detail.aspx, creating a link with a different query string value for each loop iteration. The core code for Crawler.aspx is:

Figure 1. Link Page: The file crawler.aspx contains a list of links for SharePoint to index.
   for(int looper=0;looper < 100;looper++)
      HtmlAnchor ctlAnchor = new HtmlAnchor();
      LiteralControl ctlBr = new LiteralControl();
      ctlAnchor.HRef = "detail.aspx?id=" + looper;
      ctlAnchor.InnerText = "Detail " + looper;
      ctlBr.Text = "<BR/>";
This causes a page to appear as shown in Figure 1.

When the loop shown above completes, detail.aspx contains a list of links for the gatherer to follow. The content for those links is simple; for each request Detail.aspx displays a short message, including the query string, in its response. Figure 2 and Figure 3 show the content that the gatherer will index when it follows the first two links).

Figure 2. Crawling Links: The figure shows the content that detail.aspx returns when the gatherer crawls it using a query string ID of 0.
Figure 3: Varying Parameters to Obtain Different Content: The figure shows the content that detail.aspx returns when the gatherer crawls it using a query string ID of 1.
You can install the sample application for this article in a Web site and have SharePoint index it to see the process in action. Note, however, that you can not install the application in a SharePoint Portal Server or Windows SharePoint Services extended IIS virtual server, as indexing does not work properly for applications installed in a SharePoint extended virtual server.

Differentiating Search User Agents from Humans
When indexing, you generally want to clear the target page of as much information as possible—including data essential for humans—so it's often best to differentiate between a search engine and a user to get the best results. In other words, when the Web request originates with a user, you want to render the page normally, but when the Web request comes from a search engine, you want to suppress the rendering of standard items such as headers, footers, menus, related items, announcements, etc., so that they're not indexed with the page.

You do this by evaluating the UserAgent property of the Request object. Each HTTP request includes a user-agent header that identifies the type of browser making the request. SharePoint includes "SPRobot" in the user agent string when it issues a request. So, if you find "SPRobot" in the user agent string, you then know that the request is coming from the SharePoint indexing engine, and you can suppress unnecessary items.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date