Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
kb:bigdata:file_types [2018/11/21 14:05] – created yehuda | kb:bigdata:file_types [2022/01/03 16:03] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
[[https:// | [[https:// | ||
+ | [[https:// | ||
+ | [[https:// | ||
===== Parquet ===== | ===== Parquet ===== | ||
- | | + | |
+ | |||
+ | | ||
+ | * Columnar format | ||
+ | * Binary format | ||
+ | * Encoded & Compressed | ||
+ | * Support schema evolution - Format supports | ||
+ | Limitation: | ||
+ | * Pushdown filters dont works on String / Binary ([[https:// | ||
+ | * Write speed tradeoff | ||
+ | |||
+ | Walkaround(s): | ||
+ | * Immutability | ||
+ | * Write using partitioning | ||
+ | * Combine with a database (i.e. Cassandara) - after a while spilt out parquets | ||
+ | * Write mode append, that added embedded schema | ||
+ | |||
+ | === vs ORC === | ||
+ | * indexed | ||
+ | * dont handles nested data | ||
+ | * | ||
===== ORC ===== | ===== ORC ===== | ||
+ | |||
+ | * Nested Data | ||
+ | * Columnar format | ||
+ | * Predicate pushdown (Min max + bloomfilters) | ||
+ | * ACID support / cannot add | ||
+ | * suggested to streaming ([[https:// | ||
+ | |||
+ | |||
+ | |||
===== Avro ===== | ===== Avro ===== | ||