Both sides previous revisionPrevious revisionNext revision | Previous revision |
kb:bigdata [2018/11/18 11:55] – yehuda | kb:bigdata [2022/01/03 16:03] (current) – external edit 127.0.0.1 |
---|
====== BigData ====== | ====== BigData ====== |
| [[.:BigData:YARN|YARN]] | [[.:BigData:Sqoop]] | [[.BigData:Spark]] | [[.:BigData:Knox]] | [[.:BigData:Hortonworks|Hortonworks]] | | | [[.:BigData:YARN|YARN]] | [[.:BigData:Sqoop]] | [[.BigData:Spark]] | [[.:BigData:Knox]] | [[.:BigData:Hortonworks|Hortonworks]] | |
| | [[.:BigData:File types]] | [[.:BigData:NiFi]] | [[.:BigData:Eco-system]] | |
| |
[[https://www.slideshare.net/mattlieber/parquet-and-impala-overview-external| parquet-and-impala-overview-external presentation]] | [[https://www.slideshare.net/mattlieber/parquet-and-impala-overview-external| parquet-and-impala-overview-external presentation]] |
[[https://www.dremio.com/|Dermio Israeli startup ]] | [[https://www.dremio.com/|Dermio Israeli startup ]] |
| |
| [[https://delta.io/|Delat Lake]] [[https://www.youtube.com/watch?v=zx9rFKnk4hU|Delta lake youtube]] |
| |
===== Preformance ===== | ===== Performance ===== |
* [[http://crazyadmins.com/tune-hadoop-cluster-to-get-maximum-performance-part-1/|Hadoop]] | * [[http://crazyadmins.com/tune-hadoop-cluster-to-get-maximum-performance-part-1/|Hadoop]] |
* [[https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_hos_tuning.html|Cloudera tune]] | * [[https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_hos_tuning.html|Cloudera tune]] |
| |
| ==== Tools ==== |
| |
| * [[https://unraveldata.com/|Unravel]] |
| * [[https://github.com/linkedin/dr-elephant|Dr. Elephant]] |
| * [[https://www.pepperdata.com/| Dr. Elephant Enterprise]] |
| |
| ===== Ingestion ===== |
| |
| * [[https://gobblin.apache.org/|Apache Gobblin]] |
| * [[https://www.youtube.com/watch?v=BQ7aONetKl4|Youtube:Stream and Batch Data Integration at LinkedIn scale using Apache Gobblin]] |
| * [[https://engineering.linkedin.com/blog/2021/data-integration-library|Linkedin blog: data-integration-library]] |
| * [[https://gobblin.readthedocs.io/en/latest/miscellaneous/Exactly-Once-Support/#achieving-exactly-once-delivery-with-commitstepstore|Gobblin Exactly-Once-Support readthedocs.io]] |
| * [[https://www.youtube.com/watch?v=fHFNZlWCpKA|Youtube:Gobblin как ETL-фреймворк / Иван Ахлестин (Rambler&Co)]] |
| * [[https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+as+a+Service|Gobblin as a Service]] |
| * [[https://gobblin.apache.org/docs/user-guide/Gobblin-CLI/|user-guide Gobblin-CLI]] |
| |
| |
| ===== Workflow ===== |
| |
| * [[https://azkaban.github.io|Azkaban]] |
| |
| ===== MDM ===== |
| * DataHub: A Generalized Metadata Search & Discovery Tool (ex WhereHows) |
| * [[https://github.com/linkedin/datahub| Linkedin Datahub (ex WhereHows)]] |
| * [[https://engineering.linkedin.com/wherehows | Linkedin wherehows]] |
| |
| ===== OLAP & OLTP ===== |
| * Druid |
| * Kylin |
| * [[https://www.slideshare.net/argonauts007/kylin-and-druid-presentation|Kylin and Druid presentation]] |
| |
===== Fast Databases ===== | ===== Fast Databases ===== |
===== Other url ===== | ===== Other url ===== |
* [[https://streever.atlassian.net/wiki/spaces/HADOOP/pages/9961474/Hive+JDBC+Extended+Connection+URL+Examples| Hadoop]] | * [[https://streever.atlassian.net/wiki/spaces/HADOOP/pages/9961474/Hive+JDBC+Extended+Connection+URL+Examples| Hadoop]] |
| * [[https://cdap.io/|CDAP]] |