Open Source Zone

Exploring Apache Shark

July 11, 2014

Apache Shark is a distributed query engine developed by the open source community. This query engine is mainly used for Hadoop data and it provides enhanced performance and high-end analytical

To Get Developer Adoption Today You Have To Build a Community

June 23, 2014

It is increasingly difficult to get developers’ attention and keep it, simply because there are so many technologies, products and vendors out there competing to get developers on board —

Getting Started with Apache Spark

June 11, 2014

Apache Spark is a high performance general engine used to process large scale data. It is an open source framework used for cluster computing. The aim of this framework is

Exploring Various Hadoop Installation Modes

April 25, 2014

Overview Apache Hadoop can be installed in different modes as per the requirement. These different modes are configured during installation and by default, Hadoop is installed in Standalone mode. The

Create an Apache Hadoop MapReduce Job Using Spring

April 16, 2014

Spring is a widely used framework in enterprise applications development and includes different components such as Spring ORM, Spring JDBC and more, to support various features. Spring for Apache Hadoop

Using Advanced Hadoop MapReduce Features

March 31, 2014

Basic MapReduce programming explains the work flow details, but it does not cover the actual working details inside the MapReduce programming framework. This article will explain the data movement through

Introduction to Hadoop Streaming

March 18, 2014

Introduction Hadoop Streaming is a generic API which allows writing Mappers and Reduces in any language. But the basic concept remains the same. Mappers and Reducers receive their input and

Getting Started with Apache HBase

January 30, 2014

Overview

Managing Large Volumes of Data with Apache Cassandra NoSQL

January 23, 2014

Overview Apache Cassandra is one of the most popular and scalable open source NoSQL databases. Cassandra is an ideal database for managing a large volume of unstructured, semi-structured and structured