dcsimg
Login | Register   
LinkedIn
Google+
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX

By submitting your information, you agree that devx.com may send you DevX offers via email, phone and text message, as well as email offers about other products and services that DevX believes may be of interest to you. DevX will process your information in accordance with the Quinstreet Privacy Policy.


advertisement
 

Big Data Clusters for a Content Repository : Page 2

A Big Data content repository can help manage the flood of digital information in your enterprise. Learn how to build one with open source technologies.


advertisement

WEBINAR:

On-Demand

Building the Right Environment to Support AI, Machine Learning and Deep Learning


Cluster Setups for Big Data Content Repository

Figure 2 shows a possible topology of various hardware and software components of the content repository in a production environment. Basically we require three clusters of multiple nodes. The first cluster, as shown in Figure 2 as Cluster-1, is a cluster of Hadoop nodes. In Hadoop cluster, it is required to set up one Hadoop master node (Node-1) and one, or more than one slave nodes. On Hadoop master node, HDFS Name-Node and MapReduce Job-Tracker services will run and on all the slaves, HDFS Data-Node service and MapReduce Task-Tracker service will run. Similarly, on Node-1, HBase master service will run and on the rest of the nodes HBase Region services will run. On all the HBase region servers, we can run Lily Repository services as recommended by the Lily repository documentation. We can configure any nodes (as per the recommendation only odd number of nodes, i.e. total 1 or 3 or 5 etc. nodes from the cluster) as ZooKeeper servers.

Topology of the Content Repository in Production
Figure 2: Topology of the Content Repository in Production



The second cluster will be for the Solr, as shown in Cluster-2, where we need to set up one Solr master and the rest as slaves. The third cluster will be for hosting the web service application and setting up document and search controllers. The number of nodes in this cluster will depend on the work load and total number of users we want to support at a time. Apart from these, one may need load balancers, firewall, web proxy servers and routers in case the clusters are in separate networks or we want to enable Internet connectivity. The installation and configuration detail of all the specific software components is beyond the scope of this article. Refer to the installation guides from the individual product web sites.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
Thanks for your registration, follow us on our social networks to keep up-to-date