Building Search Interface Using Apache Solr in .NET

A quick, granular and accurate search interface is of prime importance for most web applications today. Many have traditional search interfaces where the scope of search is restricted to specific fields; thus limiting the ability to get relevant search results. However, most commercial web sites will require an advanced search interface that will index the content of documents, build complex queries based on multiple criteria and fetch maximum results. In this paper, we introduce the Apache Solr Search Engine that will, most importantly, provide content search, explain how to construct queries involving multiple search criteria using Solr and integrate with the application to build a quicker, accurate and more refined search interface.

Apache Solr is an open source enterprise search server, based on the Lucene Java search library, with XML/HTTP and JSON APIs. It runs in a Java servlet container, such as Tomcat. Input to Solr is the document with optional metadata. We put documents in it (called as indexing) via XML, JSON, CSV or binary over HTTP. We query it via HTTP Get and receive XML, JSON, CSV or binary result. It is architected to deliver very fast search operations across wide variety of data.

Features

  • Advanced full-text search – Users can search for one or more words or phrases in the content of documents, specific fields, or combinations of one or more fields, thus providing results that match user’s interests.
  • Faceted Search – User can narrow down the search results further by applying filters on the fields (numeric, date fields, unique fields) if the user wishes to drill down. Thus providing categorized search.
  • Sort – User can prioritize the search results based on field count.
  • Pagination – User can display the search results in pages of fixed size.
  • Hit-Term Highlighting – Provides highlighting of the search keyword in the document.
  • It is optimized for high-volume web traffic.
  • It supports rich Document Parsing and Indexing (PDF, Word, HTML, etc.)
  • Admin UI – It has a very simple and user-friendly interface for designing and executing queries over the data.
  • Caching – It caches the results of filter queries, thus delivering faster search operations.

Architecture

Diagram

The above block diagram shows the sequence of actions for uploading documents to Solr and executing Search queries as per specific search criteria to get the relevant matches.

Building Search Interface for Your Web Application:

We assume that Solr is configured and running. User will be required to know the Solr endpoint.

For more information on installing, configuring and running Solr, go here.

We propose a generic search interface that can be implemented to search any application specific entity that is indexed by Solr. Search method accepts the SearchParameters and returns the SearchResult of generic type .

public interface ISearch    {        SearchResult Search (SearchParameters parameters);    } 

Let us see how the SearchParameters look like and how they are constructed.

public class SearchParameters {        public const int DefaultPageSize = 4;         public SearchParameters () {SearchFor = new Dictionary();Exclude = new Dictionary();SortBy = new List();FilterBy = new List();PageSize = DefaultPageSize;PageIndex = 1;        }         public string FreeSearch { get; set; }        public int PageIndex { get; set; }        public int PageSize { get; set; }        public IDictionary SearchFor { get; set; }        public IDictionary Exclude { get; set; }        public IList SortBy { get; set; }        public IList FilterBy{ get; set; }        }  }
  • SearchFor : We add the advanced full-text parameters to this dictionary in the following pattern:
Key – Name of field Value – Word/phrase
Title Azure, “cloud computing”
tags Azure, “cloud computing”
  • Exclude: These are parameters to be excluded in the advanced full-text Search added to a dictionary.
Key – Name of field Value – Word/phrase
Title business
tags business
  • SortBy: This is a List of SortQuery items. FieldName will map to the field on which the sorting needs to be done. Order will indicate the SortOrder (Ascending/Descending)
public class SortQuery    {        public string FieldName { get; set; }        public SortOrder order { get; set; }    }public enum SortOrder        {            Ascending,            Descending,        }
FieldName order Description
Like _integer Descending Will sort the search results in descending order of number of Likes
  • FilterBy: This is a list of FilterQuery items.
public class FilterQuery    {        public string FieldName {get; set;}        public string LowerLimit { get; set; }        public string UpperLimit { get; set;}        public string Value { get; set; }        public string DataType { get; set; }    }
  • FieldName = The field on which filtering needs to be done
  • Value = Value of the filter field
  • DataType = If filtering values are restricted to a particular range, this will indicate datatype of the filter field
  • LowerLimit = If filtering values are restricted to a particular range, this will indicate lowerlimit of the filter field
  • UpperLimit = If filtering values are restricted to a particular range, this will indicate upperlimit of the filter field

Table1

  • PageSize: This is used for pagination to specify number of search results per page.

  • PageIndex: This is used for pagination to specify the offset for query’s results set. It will instruct Solr to display results from this offset.

Implementing search interface:

SolrNet is a free Open Source API that can be integrated with your .NET web application to build queries programmatically and execute them over Solr.

Creating Solr search result entity

We first create a class with properties that will map to the fields returned in the Solr search result. To identify the fields we will fire the search query in Solr Admin UI or contact the administrator.

For example:
To get search results for keyword “twitter” we will enter the keyword in the “Query String” textbox of Solr Admin UI and hit the Search button.

A query such as http://endpoint/solr/select?q=twitter&fq=&start=0&rows=10 will appear in the browser which is the search query.

The response will be XML with nodes for each search result. Users can identify the fields returned from this response and create the properties accordingly.

SolrNet.Attributes namespace contains attributes to map fields from Solr search result to entity. These attributes can be used to augment existing application entity or create a parallel entity.

Example:

public class Product {        [SolrUniqueKey("company_id_text")]        public string CompanyId { get; set; }         [SolrField ("product_count_integer")]        public int ProductCount { get; set; }          [SolrField("title_text")]        public string Title { get; set; }           [SolrField("created_on_datetime")]        public DateTime CreatedOn { get; set; }         [SolrField("downloadable_boolean")]        public bool Downloadable { get; set; } }

A SolrUniqueKey uniquely identifies the document. In database terms, it is the primary key. So users should choose to map a SolrUniqueKey in case the field is unique for each document, or SolrField can be used. The property name must exactly map to the field name in the Solr output XML.

Initialization:

using SolrNet; public class Search Product: ISearch{ static ISolrReadOnlyOperations< Product > solr;static SolrConnection connection;         static Search Product ()        {                                  connection = new SolrConnection("solrendpoint");                      Startup.Init< Product >(connection);           Solr = ServiceLocator.Current.GetInstance>();        }}

The above code snippet will initialize the SolrConnection. We also instantiate ISolrReadOnlyOperations variable that we will use to build the Solr query. Here Product refers to the type of search result to be fetched.

Building Queries:

public SearchResult Search(SearchParameters parameters)        {            int? start = null;            int? rows = null;            if (parameters.PageIndex > 0)            {                start = (parameters.PageIndex - 1) * parameters.PageSize;                rows = parameters.PageSize;            }                       var matchingProducts = solr.Query(BuildQuery(parameters), new QueryOptions            {                FilterQueries = BuildFilterQueries(parameters),                Rows = rows,                Start = start,                OrderBy = GetSelectedSort(parameters),            });             return new SearchResult< Product>(matchingProducts)            {                TotalResults = matchingProducts.NumFound,            }}

The Query() method has the following prototype:

SolrQueryResults Query (ISolrQuery query, QueryOptions options); 
  • Query = the advanced full-text search query
  • Options = Filter, Sort, Pagination options.
  • Returns SolrQueryResults of type T. In our case T = Product.
  • In the above code,FilterQueries = the filter query to be executed on the search results obtained after applying the full-text search query.
  • OrderBy = the sort query to be executed on the search results obtained after filtering.
  • Rows = for pagination specifies the number to search results to be returned.
  • Start = for pagination specifies the offset in the Solr response from where the results will be fetched.

Build advanced full-text search query:

public ISolrQuery BuildQuery(SearchParameters parameters)        {            if (!string.IsNullOrEmpty(parameters.FreeSearch))                return new SolrQuery(parameters.FreeSearch);             AbstractSolrQuery searchquery = null;             List solrQuery = new List();            List solrNotQuery = new List();            foreach (var searchType in parameters.SearchFor)            {                solrQuery.Add(new SolrQuery(string.Format("{0}:{1}", searchType.Key,   searchType.Value)));            }             if (solrQuery.Count > 0)                searchquery = new SolrMultipleCriteriaQuery(solrQuery, SolrMultipleCriteriaQuery.Operator.OR);             foreach (var excludeType in parameters.Exclude)            {                solrNotQuery.Add(new SolrQuery(string.Format("{0}:{1}", excludeType.Key,  excludeType.Value)));            }             if (solrNotQuery.Count > 0)            {                searchquery = (searchquery ?? SolrQuery.All) - new SolrMultipleCriteriaQuery(solrNotQuery, SolrMultipleCriteriaQuery.Operator.OR);            }             return searchquery ?? SolrQuery.All;        }
  • new SolrQuery(“fieldname : value”) = This class will create a SolrQuery for a full-text search of the given word/phrases in the fieldname
  • new SolrMultipleCriteriaQuery(solrQuery, Operator (AND/OR etc.)) = This class will apply the respective boolean operator on a list of solr queries.
  • SolrQuery.All = This will return all the documents in Solr without applying any search query.

Build Filter Queries:

public ICollection BuildFilterQueries(SearchParameters parameters)        {            List filter = new List();             foreach (var filterBy in parameters.FilterBy)            {                if (!String.IsNullOrEmpty(filterBy.DataType) &&  filterBy.DataType.Equals(Constants.DATE_DATATYPE))                {                     DateTime upperlim = Convert.ToDateTime(filterBy.UpperLimit);                    DateTime lowerlim = Convert.ToDateTime(filterBy.LowerLimit);                    if (upperlim.Equals(lowerlim))                    {                        upperlim = upperlim.AddDays(1);                    }                    filter. Add(new SolrQueryByRange(filterBy.FieldName, lowerlim,                        upperlim));                }                else                {                    string[] filterValues;                         if (filterBy.Value.Contains(";"))                        {                            filterValues = filterBy.Value.Split(';');                            List filterForProduct = new List();                            foreach (string filterVal in filterValues)                            {                               filterForProduct.Add(new SolrQueryByField(filterBy.FieldName, filterVal) { Quoted = false });                            }                            filter.Add(new SolrMultipleCriteriaQuery(filterForProduct, SolrMultipleCriteriaQuery.Operator.OR));                        }                        else                        {                            filter.Add(new SolrQueryByField(filterBy.FieldName, filterBy.Value));                        }                    }                }                       return filter;        }

There are 2 types of filters:

  • SolrQueryByField:
new SolrQueryByField(filterBy.FieldName, filterVal) { Quoted = false })

This accepts the fieldName and filter value. Quoted = false indicates that character escaping is disabled.

  • SolrQueryByRange:
new SolrQueryByRange(filterBy.FieldName, lowerlim, upperlim) 

We use SolrQueryByRange when our filter values fall between some ranges. This needs the datatype of the field. DateTime in our case. The fieldname for filter, its lower limit and upper limit.

Build Sort Queries:

private ICollection GetSelectedSort(SearchParameters parameters)        {            List sortQueries = new List();            foreach (var sortBy in parameters.SortBy)            {                if (sortBy.order.Equals(SortQuery.SortOrder.Ascending))                    sortQueries.Add(new SortOrder(sortBy.FieldName, Order.ASC));                else                    sortQueries.Add(new SortOrder(sortBy.FieldName, Order.DESC));            }            return sortQueries;        }

SortOrder class will accept the fieldName to be sorted on along with the Sort Order (Ascending/Descending).

Relevance Score

Solr will by default order the search results based on the relevancy score that is calculated to determine how relevant a given Document is to a user’s query. The more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query. Thus the priority of the search results will always be on track unless one explicitly gives a sort parameter in the search query.

Conclusion

This paper has provided a detailed description of the search options available with Apache Solr and will hopefully serve as a thorough guide in deciding on the search parameters, constructing queries and for building an Advanced Search interface.

References

About the Author

Amruta Morajkar has more than four years of IT experience with .NET, Windows Azure, C#, WCF, Entity Framework, XML, SQL Server, and Asp.Net MVC, etc. She has also worked on niche technologies such as SOA and Cloud Computing and is currently working with Infosys Limited, Pune, India.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

The Latest

positive contribution tech

Technology’s Positive Contributions to Society

Technology has and continues to improve our lives. From the business world to the world of medicine, and our day-to-day lives, you can’t go a day without interacting with at least one form of technology. While some fear technology may be going too far, there are many ways in which

How to Choose From The Best Big Data Platforms in 2023

How to Choose From The Best Big Data Platforms in 2023

As big data continues to become increasingly popular in the business world, companies are always looking for better ways to process and analyze complex data. The process critically depends on the platform that manages and analyzes the data. In this article, we will provide a guide to help you choose

Why transparent code is a good idea

Why Transparent Code is a Good Idea

Code is used to make up the apps and software we use every day. From our favorite social media platforms to our online banking services, code is the framework used to build these tools that help make our lives easier. Code is complex. Software today requires large teams of programmers

The Role of WordPress Hosting in Website Speed and Performance

The Role of WordPress Hosting in Website Performance

The term “WordPress hosting” refers to a specific type of web hosting service that offers hardware and software configurations tailored to the needs of WP sites. It’s important to remember that a WP hosting package is not required to host WordPress webpages. WP web pages are also compatible with standard

Data Privacy vs. Data Security: What you Should Know

Data Privacy vs. Data Security: What you Should Know

Data privacy and data security are often used interchangeably, but they are two completely different things. It’s important to understand the difference for anyone who handles sensitive information, such as personal data or financial records. In this article, we’ll take a closer look at data privacy vs. data security. We’ll