Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
kb:bigdata:file_types [2018/11/21 14:05] – created yehudakb:bigdata:file_types [2022/01/03 16:03] (current) – external edit 127.0.0.1
Line 2: Line 2:
  
 [[https://www.youtube.com/watch?v=_0Wpwj_gvzg|Spark + Parquet In Depth]] [[https://www.youtube.com/watch?v=_0Wpwj_gvzg|Spark + Parquet In Depth]]
 +[[https://www.youtube.com/watch?v=2vOfh064uUM|File Format Benchmark Avro JSON ORC and Parquet]]
 +[[https://www.youtube.com/watch?v=aIcxFIyL6xo|Berlin buzzwords18: Owen O'Malley – Fast Access To Your Complex Data - Avro, JSON, ORC, and Parquet]]
  
 ===== Parquet ===== ===== Parquet =====
-  Unordered List Item+ 
 + 
 +  Better Column selecting 
 +  * Columnar format 
 +  * Binary format 
 +  * Encoded & Compressed 
 +  * Support schema evolution - Format supports 
 +Limitation: 
 +  * Pushdown filters dont works on String / Binary ([[https://www.youtube.com/watch?v=_0Wpwj_gvzg|source]]) 
 +  * Write speed tradeoff 
 + 
 +Walkaround(s): 
 +  * Immutability 
 +    * Write using partitioning 
 +    * Combine with a database (i.e. Cassandara) - after a while spilt out parquets 
 +    * Write mode append, that added embedded schema 
 + 
 +=== vs ORC === 
 +  * indexed 
 +  * dont handles nested data 
 +  *  
 ===== ORC ===== ===== ORC =====
 +
 +  * Nested Data 
 +  * Columnar format
 +  * Predicate pushdown (Min max + bloomfilters)
 +  * ACID support / cannot add 
 +  * suggested to streaming ([[https://www.youtube.com/watch?v=NZLrJmjoXw8|source]])
 +
 +
 +
 ===== Avro ===== ===== Avro =====
  
kb/bigdata/file_types.1542809148.txt.gz · Last modified: (external edit)
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0