This is an old revision of the document!


Hadoop

Hadoop is framework that has set of tools to distrebute and proccess data over clasters.

  • Scalable
  • Flexible
  • Fault-tolerant
  • Intelligent

Main tools are HDFS (HaDoop File System), MapReduce and YARN

Installing Hadoop

===== Hadoop Technology stack ===== see more at http://incubator.apache.org/ ==== Data Access ==== Hive, Pig ==== Data Storage ==== Hbase, Cassandra ==== Vizualization ==== HCatalog, Lucene, Hama, Crunch ==== Data serialization ==== Avro, Thrift ==== Data Integration ==== Drill, Mahout ==== Data Integration ==== Sqoop, Flume, Chukwa ==== Managment, Monitoring ==== Ambari, Zookeeper, Oozie ==== More ==== HDT, Konx, Spark ===== Use-cases ===== * New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240!

learn/bigdata/hadoop.1407490428.txt.gz · Last modified: (external edit)
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0