devxlogo

Open Source Zone

Exploring Apache Shark

Apache Shark is a distributed query engine developed by the open source community. This query engine is mainly used for Hadoop data and it provides enhanced performance and high-end analytical

Getting Started with Apache Spark

Apache Spark is a high performance general engine used to process large scale data. It is an open source framework used for cluster computing. The aim of this framework is

Exploring Various Hadoop Installation Modes

Overview Apache Hadoop can be installed in different modes as per the requirement. These different modes are configured during installation and by default, Hadoop is installed in Standalone mode. The

Create an Apache Hadoop MapReduce Job Using Spring

Spring is a widely used framework in enterprise applications development and includes different components such as Spring ORM, Spring JDBC and more, to support various features. Spring for Apache Hadoop

Using Advanced Hadoop MapReduce Features

Basic MapReduce programming explains the work flow details, but it does not cover the actual working details inside the MapReduce programming framework. This article will explain the data movement through

Introduction to Hadoop Streaming

Introduction Hadoop Streaming is a generic API which allows writing Mappers and Reduces in any language. But the basic concept remains the same. Mappers and Reducers receive their input and