This is an old revision of the document!


Hadoop

Hadoop is framework that has set of tools to distrebute and proccess data over clasters.

  • Scalable
  • Flexible
  • Fault-tolerant
  • Intelligent

Main tools are HDFS (HaDoop File System), MapReduce and YARN

Hadoop Technology stack

Data Access

Data Storage

Vizualization

Data serialization

Data Integration

Data Integration

Managment, Monitoring

More

Use-cases

  • New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240!
learn/bigdata/hadoop.1407425243.txt.gz · Last modified: (external edit)
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0