Table of Contents
Hadoop
Hadoop is framework that has set of tools to distrebute and proccess data over clasters.
- Scalable
- Flexible
- Fault-tolerant
- Intelligent
Main tools are HDFS (HaDoop File System), MapReduce and YARN
Installing Hadoop
Single node Cluser
Hadoop Technology stack
see more at http://incubator.apache.org/
Data Access
Data Storage
Vizualization
Data serialization
Data Integration
Data Integration
Managment, Monitoring
More
Use-cases
- New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240!