BigData

Tec

  • presto - Distributed SQL query engine for big data (github) -Apache License 2.0
  • Apache Kudu - A addition to Apache Hadoop ecosystem. Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data. Kudu is specifically designed for use cases that require fast analytics on fast (rapidly changing) data. Engineered to take advantage of next-generation hardware and in-memory processing, Kudu lowers query latency significantly for Apache Impala (incubating) and Apache Spark (initially, with other execution engines to come).
  • Alluxio (formerly Tachyon) - enables any application to interact with any data from any storage system at memory speed.

Logs & Visual

D Other

Graph Databases

Terminology

  • RDF - Resource Description Framework Source
  • OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). source
  • OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema). source
  • OLTP vs OLAP - We can divide IT systems into transactional (OLTP) and analytical (OLAP).

Technology

  • HGraphDB - HBase as a TinkerPop Graph Database
  • TinkerPop - Graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
  • Giraph - Iterative graph processing system built for high scalability
  • s2graph - graph database designed to handle transactional graph processing at scale. Its REST API allows you to store, manage and query relational information using edge and vertex representations in a fully asynchronous and non-blocking manner
  • GraphFrames - package for Apache Spark which provides DataFrame-based Graphs (Tutorial)
  • Spark GraphX - component in Spark for graphs and graph-parallel computation (Tutorial)

Other

Hadoop, Spark, Storm, Samza, Spark Streaming, Kafka, Flume, MapReduce, Scalding, Hbase, MongoDB, Cassandra, Elasticsearch, Solr, Spark Mlib, Algebird, Spark Graphx

NiFi, Apex

http://sigmajs.org/ - Vizual grpah js lib

Tech KB

kb/bigdata.txt · Last modified: 2017/11/26 16:24 by yehuda
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0