Table of Contents
BigData
Performance
Tools
Ingestion
Workflow
MDM
- DataHub: A Generalized Metadata Search & Discovery Tool (ex WhereHows)
OLAP & OLTP
- Druid
- Kylin
Fast Databases
See also: in mem dbs
- Druid is a high-performance, column-oriented, distributed data store.
- Presto - is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
- VoltDB in-memory operational database with real-time analytics and real-time decisioning is available in several different editions: Enterprise, Pro, AWS, and Community.
Primume Databases
Tec
- Redix - http://redux.js.org/
- Clustrix - http://www.clustrix.com/
- Aerospike - http://www.aerospike.com/
- Apache Kudu - A addition to Apache Hadoop ecosystem. Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data. Kudu is specifically designed for use cases that require fast analytics on fast (rapidly changing) data. Engineered to take advantage of next-generation hardware and in-memory processing, Kudu lowers query latency significantly for Apache Impala (incubating) and Apache Spark (initially, with other execution engines to come).
- Alluxio (formerly Tachyon) - enables any application to interact with any data from any storage system at memory speed.
- vespa - Big data. Real time. The open big data serving engine: Store, search, rank and organize big data at user serving time.
Logs & Visual
- logstash - https://www.elastic.co/products/logstash
- graylog - https://www.graylog.org/
D Other
Graph Databases
Terminology
- RDF - Resource Description Framework Source
- OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF). source
- OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema). source
- OLTP vs OLAP - We can divide IT systems into transactional (OLTP) and analytical (OLAP).
Technology
- HGraphDB - HBase as a TinkerPop Graph Database
- TinkerPop - Graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
- Giraph - Iterative graph processing system built for high scalability
- s2graph - graph database designed to handle transactional graph processing at scale. Its REST API allows you to store, manage and query relational information using edge and vertex representations in a fully asynchronous and non-blocking manner
- GraphFrames - package for Apache Spark which provides DataFrame-based Graphs (Tutorial)
- Spark GraphX - component in Spark for graphs and graph-parallel computation (Tutorial)
Other
Hadoop, Spark, Storm, Samza, Spark Streaming, Kafka, Flume, MapReduce, Scalding, Hbase, MongoDB, Cassandra, Elasticsearch, Solr, Spark Mlib, Algebird, Spark Graphx
NiFi, Apex
http://sigmajs.org/ - Vizual grpah js lib