====== Hadoop ====== Hadoop is framework that has set of tools to distrebute and proccess data over clasters. * Scalable * Flexible * Fault-tolerant * Intelligent Main tools are [[HDFS]] (HaDoop File System), [[MapReduce]] and [[YARN]] ===== Installing Hadoop ===== Single node [[Cluser]] * Standalon mode - all hadoop components run under single [[JVM]] * Pesodo Destributed - each deamon runs under seperated [[JVM]] * Fully Destributed - each deamon runs under seperated maching see [[Install Hadoop eco system (single mode)]] ===== Hadoop Technology stack ===== see more at http://incubator.apache.org/ ==== Data Access ==== [[Hive]], [[Pig]] ==== Data Storage ==== [[Hbase]], [[Cassandra]] ==== Vizualization ==== [[HCatalog]], [[Lucene]], [[Hama]], [[Crunch]] ==== Data serialization ==== [[Avro]], [[Thrift]] ==== Data Integration ==== [[Drill]], [[Mahout]] ==== Data Integration ==== [[Sqoop]], [[Flume]], [[Chukwa]] ==== Managment, Monitoring ==== [[Ambari]], [[Zookeeper]], [[Oozie]] ==== More ==== [[HDT]], [[Konx]], [[Spark]] ===== Use-cases ===== * New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240!