Hadoop is framework that has set of tools to distrebute and proccess data over clasters.

  • Scalable
  • Flexible
  • Fault-tolerant
  • Intelligent

Main tools are HDFS (HaDoop File System), MapReduce and YARN

Installing Hadoop

Single node Cluser

  • Standalon mode - all hadoop components run under single JVM
  • Pesodo Destributed - each deamon runs under seperated JVM
  • Fully Destributed - each deamon runs under seperated maching

see Install Hadoop eco system (single mode)

Hadoop Technology stack

Data Access

Data Storage


Data serialization

Data Integration

Data Integration

Managment, Monitoring



  • New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240!
learn/bigdata/hadoop.txt · Last modified: 2022/01/03 16:03 by
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0