Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
kb:bigdata:file_types [2018/11/21 14:23] yehudakb:bigdata:file_types [2022/01/03 16:03] (current) – external edit 127.0.0.1
Line 2: Line 2:
  
 [[https://www.youtube.com/watch?v=_0Wpwj_gvzg|Spark + Parquet In Depth]] [[https://www.youtube.com/watch?v=_0Wpwj_gvzg|Spark + Parquet In Depth]]
 +[[https://www.youtube.com/watch?v=2vOfh064uUM|File Format Benchmark Avro JSON ORC and Parquet]]
 +[[https://www.youtube.com/watch?v=aIcxFIyL6xo|Berlin buzzwords18: Owen O'Malley – Fast Access To Your Complex Data - Avro, JSON, ORC, and Parquet]]
  
 ===== Parquet ===== ===== Parquet =====
 +
  
   * Better Column selecting   * Better Column selecting
Line 9: Line 12:
   * Binary format   * Binary format
   * Encoded & Compressed   * Encoded & Compressed
-  * +  * Support schema evolution - Format supports 
 +Limitation: 
 +  * Pushdown filters dont works on String / Binary ([[https://www.youtube.com/watch?v=_0Wpwj_gvzg|source]]) 
 +  * Write speed tradeoff
  
 +Walkaround(s):
 +  * Immutability
 +    * Write using partitioning
 +    * Combine with a database (i.e. Cassandara) - after a while spilt out parquets
 +    * Write mode append, that added embedded schema
 +
 +=== vs ORC ===
 +  * indexed
 +  * dont handles nested data
 +  * 
  
 ===== ORC ===== ===== ORC =====
kb/bigdata/file_types.1542810191.txt.gz · Last modified: (external edit)
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0