Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
learn:bigdata:hadoop [2014/08/07 12:23] yehudalearn:bigdata:hadoop [2022/01/03 16:03] (current) – external edit 127.0.0.1
Line 2: Line 2:
 Hadoop is framework that has set of tools to distrebute and proccess data over clasters. Hadoop is framework that has set of tools to distrebute and proccess data over clasters.
  
 +  * Scalable
 +  * Flexible
 +  * Fault-tolerant
 +  * Intelligent
  
-Main tools are [[HDFS]] (HaDoop File System) and [[MapReduce]]+Main tools are [[HDFS]] (HaDoop File System)[[MapReduce]] and [[YARN]]
  
 +===== Installing Hadoop =====
 +Single node [[Cluser]]
 +
 +  * Standalon mode - all hadoop components run under single [[JVM]]
 +  * Pesodo Destributed - each deamon runs under seperated [[JVM]]
 +  * Fully Destributed - each deamon runs under seperated maching
 +
 +
 +see [[Install Hadoop eco system (single mode)]]
 +
 +===== Hadoop Technology stack =====
 +see more at http://incubator.apache.org/
 +==== Data Access ====
 +[[Hive]], [[Pig]]
 +
 +==== Data Storage ====
 +[[Hbase]], [[Cassandra]]
 +
 +==== Vizualization ====
 +[[HCatalog]], [[Lucene]], [[Hama]], [[Crunch]]
 +
 +==== Data serialization ====
 +[[Avro]], [[Thrift]]
 +
 +==== Data Integration ====
 +[[Drill]], [[Mahout]]
 +
 +==== Data Integration ====
 +[[Sqoop]], [[Flume]], [[Chukwa]]
 +
 +==== Managment, Monitoring ====
 +[[Ambari]], [[Zookeeper]], [[Oozie]]
 +
 +==== More ====
 +[[HDT]], [[Konx]], [[Spark]]
 +===== Use-cases =====
 +  * New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240! 
learn/bigdata/hadoop.1407414223.txt.gz · Last modified: (external edit)
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0