Google and Cloudera have partnered together on a project that will bring Google’s Cloud Dataflow programming model to Apache?s Spark data processing engine. Dataflow arose out of Google’s own internal big data processing efforts and it utilizes Google’s Compute Engine, Cloud Storage and BigQuery cloud computing services. Spark is an Apache project for very fast big data processing.
The two companies have released a “runner” that connects Dataflow to Spark. However, enterprises should note that the tool is still an alpha release and is not ready for production deployment.