Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
learn:bigdata:hadoop [2014/08/07 12:23] – yehuda | learn:bigdata:hadoop [2022/01/03 16:03] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
Hadoop is framework that has set of tools to distrebute and proccess data over clasters. | Hadoop is framework that has set of tools to distrebute and proccess data over clasters. | ||
+ | * Scalable | ||
+ | * Flexible | ||
+ | * Fault-tolerant | ||
+ | * Intelligent | ||
- | Main tools are [[HDFS]] (HaDoop File System) | + | Main tools are [[HDFS]] (HaDoop File System), [[MapReduce]] and [[YARN]] |
+ | ===== Installing Hadoop ===== | ||
+ | Single node [[Cluser]] | ||
+ | |||
+ | * Standalon mode - all hadoop components run under single [[JVM]] | ||
+ | * Pesodo Destributed - each deamon runs under seperated [[JVM]] | ||
+ | * Fully Destributed - each deamon runs under seperated maching | ||
+ | |||
+ | |||
+ | see [[Install Hadoop eco system (single mode)]] | ||
+ | |||
+ | ===== Hadoop Technology stack ===== | ||
+ | see more at http:// | ||
+ | ==== Data Access ==== | ||
+ | [[Hive]], [[Pig]] | ||
+ | |||
+ | ==== Data Storage ==== | ||
+ | [[Hbase]], [[Cassandra]] | ||
+ | |||
+ | ==== Vizualization ==== | ||
+ | [[HCatalog]], | ||
+ | |||
+ | ==== Data serialization ==== | ||
+ | [[Avro]], [[Thrift]] | ||
+ | |||
+ | ==== Data Integration ==== | ||
+ | [[Drill]], [[Mahout]] | ||
+ | |||
+ | ==== Data Integration ==== | ||
+ | [[Sqoop]], [[Flume]], [[Chukwa]] | ||
+ | |||
+ | ==== Managment, Monitoring ==== | ||
+ | [[Ambari]], [[Zookeeper]], | ||
+ | |||
+ | ==== More ==== | ||
+ | [[HDT]], [[Konx]], [[Spark]] | ||
+ | ===== Use-cases ===== | ||
+ | * New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240! |