Differences

This shows you the differences between two versions of the page.

--- learn:bigdata:hadoop [2014/08/07 12:23] – yehuda
+++ learn:bigdata:hadoop [2022/01/03 16:03] (current) – external edit 127.0.0.1
@@ Line 2: / Line 2: @@
 Hadoop is framework that has set of tools to distrebute and proccess data over clasters.
+  * Scalable
+  * Flexible
+  * Fault-tolerant
+  * Intelligent
-Main tools are [[HDFS]] (HaDoop File System) and [[MapReduce]]
+Main tools are [[HDFS]] (HaDoop File System), [[MapReduce]] and [[YARN]]
+===== Installing Hadoop =====
+Single node [[Cluser]]
+  * Standalon mode - all hadoop components run under single [[JVM]]
+  * Pesodo Destributed - each deamon runs under seperated [[JVM]]
+  * Fully Destributed - each deamon runs under seperated maching
+see [[Install Hadoop eco system (single mode)]]
+===== Hadoop Technology stack =====
+see more at http://incubator.apache.org/
+==== Data Access ====
+[[Hive]], [[Pig]]
+==== Data Storage ====
+[[Hbase]], [[Cassandra]]
+==== Vizualization ====
+[[HCatalog]], [[Lucene]], [[Hama]], [[Crunch]]
+==== Data serialization ====
+[[Avro]], [[Thrift]]
+==== Data Integration ====
+[[Drill]], [[Mahout]]
+==== Data Integration ====
+[[Sqoop]], [[Flume]], [[Chukwa]]
+==== Managment, Monitoring ====
+[[Ambari]], [[Zookeeper]], [[Oozie]]
+==== More ====
+[[HDT]], [[Konx]], [[Spark]]
+===== Use-cases =====
+  * New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240!

learn/bigdata/hadoop.1407414223.txt.gz · Last modified: 2022/01/03 16:03 (external edit)

Back to top