This is an old revision of the document!
Hadoop
Hadoop is framework that has set of tools to distrebute and proccess data over clasters.
- Scalable
- Flexible
- Fault-tolerant
- Intelligent
Main tools are HDFS (HaDoop File System), MapReduce and YARN
Installing Hadoop
===== Hadoop Technology stack ===== see more at http://incubator.apache.org/ ==== Data Access ==== Hive, Pig ==== Data Storage ==== Hbase, Cassandra ==== Vizualization ==== HCatalog, Lucene, Hama, Crunch ==== Data serialization ==== Avro, Thrift ==== Data Integration ==== Drill, Mahout ==== Data Integration ==== Sqoop, Flume, Chukwa ==== Managment, Monitoring ==== Ambari, Zookeeper, Oozie ==== More ==== HDT, Konx, Spark ===== Use-cases ===== * New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240!