Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
kb:bigdata:file_types [2018/11/21 14:23] – yehuda | kb:bigdata:file_types [2022/01/03 16:03] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
[[https:// | [[https:// | ||
+ | [[https:// | ||
+ | [[https:// | ||
===== Parquet ===== | ===== Parquet ===== | ||
+ | |||
* Better Column selecting | * Better Column selecting | ||
Line 9: | Line 12: | ||
* Binary format | * Binary format | ||
* Encoded & Compressed | * Encoded & Compressed | ||
- | * | + | * Support schema evolution - Format supports |
+ | Limitation: | ||
+ | * Pushdown filters dont works on String / Binary ([[https:// | ||
+ | * Write speed tradeoff | ||
+ | Walkaround(s): | ||
+ | * Immutability | ||
+ | * Write using partitioning | ||
+ | * Combine with a database (i.e. Cassandara) - after a while spilt out parquets | ||
+ | * Write mode append, that added embedded schema | ||
+ | |||
+ | === vs ORC === | ||
+ | * indexed | ||
+ | * dont handles nested data | ||
+ | * | ||
===== ORC ===== | ===== ORC ===== |