Search results
Results From The WOW.Com Content Network
The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel.
The product includes an open-source distribution of Apache Hadoop.Support from Cloudera was announced in January 2012. [4]The Oracle NoSQL Database, Oracle Data Integrator with an adapter for Hadoop Oracle Loader for Hadoop, an open source distribution of R, Oracle Linux, and Oracle Java Hotspot Virtual Machine were also mentioned in the announcement.
Hadoop now encompasses multiple subprojects in addition to the base core, MapReduce, and HDFS distributed filesystem. These additional subprojects provide enhanced application processing capabilities to the base Hadoop implementation and currently include Avro, Pig, HBase, ZooKeeper, Hive, and Chukwa.
Sahara is a component to easily and rapidly provision Hadoop clusters. Users will specify several parameters like the Hadoop version number, the cluster topology type, node flavor details (defining disk space, CPU and RAM settings), and others. After a user provides all of the parameters, Sahara deploys the cluster in a few minutes.
Burningwave Core: Java library to build frameworks. Cascading: Abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any JVM-based language. CodeName One
Stanbol: Software components for semantic content management; Stratos: Platform-as-a-Service (PaaS) framework; Tajo: relational data warehousing system. It using the hadoop file system as distributed storage. Tiles: templating framework built to simplify the development of web application user interfaces.
Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.
Cascading is a software abstraction layer for Apache Hadoop and Apache Flink. Cascading is used to create and execute complex data processing workflows on a Hadoop cluster using any JVM-based language (Java, JRuby, Clojure, etc.), hiding the underlying complexity of MapReduce jobs. It is open source and available under the Apache License.