Hadoop

Hadoop is framework that has set of tools to distrebute and proccess data over clasters.

  • Scalable
  • Flexible
  • Fault-tolerant
  • Intelligent

Main tools are HDFS (HaDoop File System), MapReduce and YARN

Installing Hadoop

Single node Cluser

  • Standalon mode - all hadoop components run under single JVM
  • Pesodo Destributed - each deamon runs under seperated JVM
  • Fully Destributed - each deamon runs under seperated maching

see Install Hadoop eco system (single mode)

Hadoop Technology stack

Data Access

Data Storage

Vizualization

Data serialization

Data Integration

Data Integration

Managment, Monitoring

More

Use-cases

  • New-York Times - Want to convert 4 TB of articales to PDF. thay did it with AWS less then 24 hours and it cost them about $240!
learn/bigdata/hadoop.txt · Last modified: 2022/01/03 16:03 by 127.0.0.1
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0