Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
kb:bigdata [2018/08/30 12:21] – [BigData] yehudakb:bigdata [2022/01/03 16:03] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== BigData ====== ====== BigData ======
-| [[.:BigData:YARN|YARN]] | [[.:BigData:Sqoop]] | [[.BigData:Spark]] | [[.:BigData:Knox]] | +| [[.:BigData:YARN|YARN]] | [[.:BigData:Sqoop]] | [[.BigData:Spark]] | [[.:BigData:Knox]] | [[.:BigData:Hortonworks|Hortonworks]] | 
 +| [[.:BigData:File types]] | [[.:BigData:NiFi]] | [[.:BigData:Eco-system]] |
  
 [[https://www.slideshare.net/mattlieber/parquet-and-impala-overview-external| parquet-and-impala-overview-external presentation]] [[https://www.slideshare.net/mattlieber/parquet-and-impala-overview-external| parquet-and-impala-overview-external presentation]]
Line 7: Line 7:
 [[https://www.dremio.com/|Dermio Israeli startup ]] [[https://www.dremio.com/|Dermio Israeli startup ]]
  
 +[[https://delta.io/|Delat Lake]] [[https://www.youtube.com/watch?v=zx9rFKnk4hU|Delta lake youtube]]
  
-===== Preformance =====+===== Performance =====
   * [[http://crazyadmins.com/tune-hadoop-cluster-to-get-maximum-performance-part-1/|Hadoop]]   * [[http://crazyadmins.com/tune-hadoop-cluster-to-get-maximum-performance-part-1/|Hadoop]]
 +  * [[https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_hos_tuning.html|Cloudera tune]]
 +
 +==== Tools ====
 +
 +  * [[https://unraveldata.com/|Unravel]]
 +  * [[https://github.com/linkedin/dr-elephant|Dr. Elephant]]
 +  * [[https://www.pepperdata.com/| Dr. Elephant Enterprise]]
 +
 +===== Ingestion =====
 +
 +  * [[https://gobblin.apache.org/|Apache Gobblin]]
 +    * [[https://www.youtube.com/watch?v=BQ7aONetKl4|Youtube:Stream and Batch Data Integration at LinkedIn scale using Apache Gobblin]]
 +    * [[https://engineering.linkedin.com/blog/2021/data-integration-library|Linkedin blog: data-integration-library]]
 +    * [[https://gobblin.readthedocs.io/en/latest/miscellaneous/Exactly-Once-Support/#achieving-exactly-once-delivery-with-commitstepstore|Gobblin Exactly-Once-Support readthedocs.io]]
 +    * [[https://www.youtube.com/watch?v=fHFNZlWCpKA|Youtube:Gobblin как ETL-фреймворк / Иван Ахлестин (Rambler&Co)]]
 +    * [[https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+as+a+Service|Gobblin as a Service]]
 +    * [[https://gobblin.apache.org/docs/user-guide/Gobblin-CLI/|user-guide Gobblin-CLI]]
 +
 +
 +===== Workflow =====
 +
 +  * [[https://azkaban.github.io|Azkaban]]
 +
 +===== MDM =====
 +  * DataHub: A Generalized Metadata Search & Discovery Tool (ex WhereHows)
 +    * [[https://github.com/linkedin/datahub| Linkedin Datahub (ex WhereHows)]]
 +    * [[https://engineering.linkedin.com/wherehows | Linkedin wherehows]]
 +
 +===== OLAP & OLTP =====
 +  * Druid
 +  * Kylin
 +  * [[https://www.slideshare.net/argonauts007/kylin-and-druid-presentation|Kylin and Druid presentation]]
 +
 ===== Fast Databases ===== ===== Fast Databases =====
 See also: [[https://en.wikipedia.org/wiki/List_of_in-memory_databases| in mem dbs]] See also: [[https://en.wikipedia.org/wiki/List_of_in-memory_databases| in mem dbs]]
Line 84: Line 118:
 ===== Other url ===== ===== Other url =====
   * [[https://streever.atlassian.net/wiki/spaces/HADOOP/pages/9961474/Hive+JDBC+Extended+Connection+URL+Examples| Hadoop]]   * [[https://streever.atlassian.net/wiki/spaces/HADOOP/pages/9961474/Hive+JDBC+Extended+Connection+URL+Examples| Hadoop]]
 +  * [[https://cdap.io/|CDAP]]
kb/bigdata.1535631682.txt.gz · Last modified: (external edit)
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0