Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Building Search Interface Using Apache Solr in .NET

Explore how the Apache Solr Search Engine will provide content search, learn how to construct queries involving multiple search criteria using Solr and integrate with the application to build a quicker, accurate and more refined search interface.


advertisement

<p>A quick, granular and accurate search interface is of prime importance for most web applications today. Many have traditional search interfaces where the scope of search is restricted to specific fields; thus limiting the ability to get relevant search results. However, most commercial web sites will require an advanced search interface that will index the content of documents, build complex queries based on multiple criteria and fetch maximum results. In this paper, we introduce the Apache Solr Search Engine that will, most importantly, provide content search, explain how to construct queries involving multiple search criteria using Solr and integrate with the application to build a quicker, accurate and more refined search interface.</p>Apache Solr is an open source enterprise search server, based on the Lucene Java search library, with XML/HTTP and JSON APIs. It runs in a Java servlet container, such as Tomcat. Input to Solr is the document with optional metadata. We put documents in it (called as indexing) via XML, JSON, CSV or binary over HTTP. We query it via HTTP Get and receive XML, JSON, CSV or binary result. It is architected to deliver very fast search operations across wide variety of data.

Features

  • Advanced full-text search - Users can search for one or more words or phrases in the content of documents, specific fields, or combinations of one or more fields, thus providing results that match user's interests.
  • Faceted Search - User can narrow down the search results further by applying filters on the fields (numeric, date fields, unique fields) if the user wishes to drill down. Thus providing categorized search.
  • Sort - User can prioritize the search results based on field count.
  • Pagination - User can display the search results in pages of fixed size.
  • Hit-Term Highlighting - Provides highlighting of the search keyword in the document.
  • It is optimized for high-volume web traffic.
  • It supports rich Document Parsing and Indexing (PDF, Word, HTML, etc.)
  • Admin UI - It has a very simple and user-friendly interface for designing and executing queries over the data.
  • Caching - It caches the results of filter queries, thus delivering faster search operations.

Architecture

Diagram

The above block diagram shows the sequence of actions for uploading documents to Solr and executing Search queries as per specific search criteria to get the relevant matches.

Building Search Interface for Your Web Application:

We assume that Solr is configured and running. User will be required to know the Solr endpoint.



For more information on installing, configuring and running Solr, go here.

We propose a generic search interface that can be implemented to search any application specific entity that is indexed by Solr. Search method accepts the SearchParameters and returns the SearchResult of generic type <T>.



public interface ISearch<T>

    {

        SearchResult<T> Search (SearchParameters parameters);

    }

 

Let us see how the SearchParameters look like and how they are constructed.

public class SearchParameters {
        public const int DefaultPageSize = 4;
 
        public SearchParameters () {
SearchFor = new Dictionary<string, string>();
Exclude = new Dictionary<string, string>();
SortBy = new List<SortQuery>();
FilterBy = new List<FilterQuery>();
PageSize = DefaultPageSize;
PageIndex = 1;
        }
 
        public string FreeSearch { get; set; }
        public int PageIndex { get; set; }
        public int PageSize { get; set; }
        public IDictionary<string, string> SearchFor { get; set; }
        public IDictionary<string, string> Exclude { get; set; }
        public IList<SortQuery> SortBy { get; set; }
        public IList<FilterQuery> FilterBy{ get; set; }
        }
  }
  • SearchFor : We add the advanced full-text parameters to this dictionary in the following pattern:

Key - Name of fieldValue - Word/phrase
Title Azure, "cloud computing"
tags Azure, "cloud computing"

  • Exclude: These are parameters to be excluded in the advanced full-text Search added to a dictionary.

Key - Name of fieldValue - Word/phrase
Title business
tags business

  • SortBy: This is a List of SortQuery items. FieldName will map to the field on which the sorting needs to be done. Order will indicate the SortOrder (Ascending/Descending)

public class SortQuery
    {
        public string FieldName { get; set; }
        public SortOrder order { get; set; }
    }
public enum SortOrder
        {
            Ascending,
            Descending,
        }

FieldNameorderDescription
Like _integer Descending Will sort the search results in descending order of number of Likes
  • FilterBy: This is a list of FilterQuery items.

public class FilterQuery

    {

        public string FieldName {get; set;}

        public string LowerLimit { get; set; }

        public string UpperLimit { get; set;}

        public string Value { get; set; }

        public string DataType { get; set; }

    }
  • FieldName = The field on which filtering needs to be done
  • Value = Value of the filter field
  • DataType = If filtering values are restricted to a particular range, this will indicate datatype of the filter field
  • LowerLimit = If filtering values are restricted to a particular range, this will indicate lowerlimit of the filter field
  • UpperLimit = If filtering values are restricted to a particular range, this will indicate upperlimit of the filter field

Table1

  • PageSize: This is used for pagination to specify number of search results per page.

  • PageIndex: This is used for pagination to specify the offset for query's results set. It will instruct Solr to display results from this offset.

Implementing search interface:

SolrNet is a free Open Source API that can be integrated with your .NET web application to build queries programmatically and execute them over Solr.

Creating Solr search result entity

We first create a class with properties that will map to the fields returned in the Solr search result. To identify the fields we will fire the search query in Solr Admin UI or contact the administrator.

For example:
To get search results for keyword "twitter" we will enter the keyword in the "Query String" textbox of Solr Admin UI and hit the Search button.

A query such as http://endpoint/solr/select?q=twitter&fq=&start=0&rows=10 will appear in the browser which is the search query.

The response will be XML with nodes for each search result. Users can identify the fields returned from this response and create the properties accordingly.

SolrNet.Attributes namespace contains attributes to map fields from Solr search result to entity. These attributes can be used to augment existing application entity or create a parallel entity.

Example:

public class Product

 {

        [SolrUniqueKey("company_id_text")]

        public string CompanyId { get; set; }

 

        [SolrField ("product_count_integer")]

        public int ProductCount { get; set; }

  

        [SolrField("title_text")]

        public string Title { get; set; }

   

        [SolrField("created_on_datetime")]

        public DateTime CreatedOn { get; set; }

 

        [SolrField("downloadable_boolean")]

        public bool Downloadable { get; set; } 

}

A SolrUniqueKey uniquely identifies the document. In database terms, it is the primary key. So users should choose to map a SolrUniqueKey in case the field is unique for each document, or SolrField can be used. The property name must exactly map to the field name in the Solr output XML.

Initialization:

using SolrNet;

 

public class Search Product: ISearch<Product>

{

 

static ISolrReadOnlyOperations< Product > solr;

static SolrConnection connection;

 

        static Search Product ()

        {                      

            connection = new SolrConnection("solrendpoint");          

            Startup.Init< Product >(connection);

           Solr = ServiceLocator.Current.GetInstance<ISolrReadOnlyOperations< Product>>();

        }

}

The above code snippet will initialize the SolrConnection. We also instantiate ISolrReadOnlyOperations<T> variable that we will use to build the Solr query. Here Product refers to the type of search result to be fetched.

Building Queries:

public SearchResult<Product> Search(SearchParameters parameters)

        {

            int? start = null;

            int? rows = null;

            if (parameters.PageIndex > 0)

            {

                start = (parameters.PageIndex - 1) * parameters.PageSize;

                rows = parameters.PageSize;

            }

           

            var matchingProducts = solr.Query(BuildQuery(parameters), new QueryOptions

            {

                FilterQueries = BuildFilterQueries(parameters),

                Rows = rows,

                Start = start,

                OrderBy = GetSelectedSort(parameters),

            });

 

            return new SearchResult< Product>(matchingProducts)

            {

                TotalResults = matchingProducts.NumFound,

            }}

The Query() method has the following prototype:

SolrQueryResults<T> Query (ISolrQuery query, QueryOptions options); 
  • Query = the advanced full-text search query
  • Options = Filter, Sort, Pagination options.
  • Returns SolrQueryResults of type T. In our case T = Product.
  • In the above code,FilterQueries = the filter query to be executed on the search results obtained after applying the full-text search query.
  • OrderBy = the sort query to be executed on the search results obtained after filtering.
  • Rows = for pagination specifies the number to search results to be returned.
  • Start = for pagination specifies the offset in the Solr response from where the results will be fetched.

Build advanced full-text search query:

public ISolrQuery BuildQuery(SearchParameters parameters)

        {

            if (!string.IsNullOrEmpty(parameters.FreeSearch))

                return new SolrQuery(parameters.FreeSearch);

 

            AbstractSolrQuery searchquery = null;

 

            List<SolrQuery> solrQuery = new List<SolrQuery>();

            List<SolrQuery> solrNotQuery = new List<SolrQuery>();

            foreach (var searchType in parameters.SearchFor)

            {

                solrQuery.Add(new SolrQuery(string.Format("{0}:{1}", searchType.Key,   searchType.Value)));

            }

 

            if (solrQuery.Count > 0)

                searchquery = new SolrMultipleCriteriaQuery(solrQuery, SolrMultipleCriteriaQuery.Operator.OR);

 

            foreach (var excludeType in parameters.Exclude)

            {

                solrNotQuery.Add(new SolrQuery(string.Format("{0}:{1}", excludeType.Key,  excludeType.Value)));

            }

 

            if (solrNotQuery.Count > 0)

            {

                searchquery = (searchquery ?? SolrQuery.All) - new SolrMultipleCriteriaQuery(solrNotQuery, SolrMultipleCriteriaQuery.Operator.OR);

            }

 

            return searchquery ?? SolrQuery.All;

        }
  • new SolrQuery("fieldname : value") = This class will create a SolrQuery for a full-text search of the given word/phrases in the fieldname
  • new SolrMultipleCriteriaQuery(solrQuery, Operator (AND/OR etc.)) = This class will apply the respective boolean operator on a list of solr queries.
  • SolrQuery.All = This will return all the documents in Solr without applying any search query.

Build Filter Queries:

public ICollection<ISolrQuery> BuildFilterQueries(SearchParameters parameters)

        {

            List<ISolrQuery> filter = new List<ISolrQuery>();

 

            foreach (var filterBy in parameters.FilterBy)

            {

                if (!String.IsNullOrEmpty(filterBy.DataType) &&  filterBy.DataType.Equals(Constants.DATE_DATATYPE))

                {

 

                    DateTime upperlim = Convert.ToDateTime(filterBy.UpperLimit);

                    DateTime lowerlim = Convert.ToDateTime(filterBy.LowerLimit);

                    if (upperlim.Equals(lowerlim))

                    {

                        upperlim = upperlim.AddDays(1);

                    }

                    filter. Add(new SolrQueryByRange<DateTime>(filterBy.FieldName, lowerlim,

                        upperlim));

                }

                else

                {

                    string[] filterValues;

 

                        if (filterBy.Value.Contains(";"))

                        {

                            filterValues = filterBy.Value.Split(';');

                            List<SolrQueryByField> filterForProduct = new List<SolrQueryByField>();

                            foreach (string filterVal in filterValues)

                            {

                               filterForProduct.Add(new SolrQueryByField(filterBy.FieldName, filterVal) { Quoted = false });

                            }

                            filter.Add(new SolrMultipleCriteriaQuery(filterForProduct, SolrMultipleCriteriaQuery.Operator.OR));

                        }

                        else

                        {

                            filter.Add(new SolrQueryByField(filterBy.FieldName, filterBy.Value));

                        }

                    }

                }           

            return filter;

        }

There are 2 types of filters:

  • SolrQueryByField:

new SolrQueryByField(filterBy.FieldName, filterVal) { Quoted = false })

This accepts the fieldName and filter value. Quoted = false indicates that character escaping is disabled.

  • SolrQueryByRange:

new SolrQueryByRange<DateTime>(filterBy.FieldName, lowerlim, upperlim) 

We use SolrQueryByRange when our filter values fall between some ranges. This needs the datatype of the field. DateTime in our case. The fieldname for filter, its lower limit and upper limit.

Build Sort Queries:

private ICollection<SortOrder> GetSelectedSort(SearchParameters parameters)

        {

            List<SortOrder> sortQueries = new List<SortOrder>();

            foreach (var sortBy in parameters.SortBy)

            {

                if (sortBy.order.Equals(SortQuery.SortOrder.Ascending))

                    sortQueries.Add(new SortOrder(sortBy.FieldName, Order.ASC));

                else

                    sortQueries.Add(new SortOrder(sortBy.FieldName, Order.DESC));

            }

            return sortQueries;

        }

SortOrder class will accept the fieldName to be sorted on along with the Sort Order (Ascending/Descending).

Relevance Score

Solr will by default order the search results based on the relevancy score that is calculated to determine how relevant a given Document is to a user's query. The more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query. Thus the priority of the search results will always be on track unless one explicitly gives a sort parameter in the search query.

Conclusion

This paper has provided a detailed description of the search options available with Apache Solr and will hopefully serve as a thorough guide in deciding on the search parameters, constructing queries and for building an Advanced Search interface.

References

About the Author

Amruta Morajkar has more than four years of IT experience with .NET, Windows Azure, C#, WCF, Entity Framework, XML, SQL Server, and Asp.Net MVC, etc. She has also worked on niche technologies such as SOA and Cloud Computing and is currently working with Infosys Limited, Pune, India.



   
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap