Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| kb:bigdata:file_types [2018/11/21 14:23] – yehuda | kb:bigdata:file_types [2022/01/03 16:03] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| [[https:// | [[https:// | ||
| + | [[https:// | ||
| + | [[https:// | ||
| ===== Parquet ===== | ===== Parquet ===== | ||
| + | |||
| * Better Column selecting | * Better Column selecting | ||
| Line 9: | Line 12: | ||
| * Binary format | * Binary format | ||
| * Encoded & Compressed | * Encoded & Compressed | ||
| - | * | + | * Support schema evolution - Format supports |
| + | Limitation: | ||
| + | * Pushdown filters dont works on String / Binary ([[https:// | ||
| + | * Write speed tradeoff | ||
| + | Walkaround(s): | ||
| + | * Immutability | ||
| + | * Write using partitioning | ||
| + | * Combine with a database (i.e. Cassandara) - after a while spilt out parquets | ||
| + | * Write mode append, that added embedded schema | ||
| + | |||
| + | === vs ORC === | ||
| + | * indexed | ||
| + | * dont handles nested data | ||
| + | * | ||
| ===== ORC ===== | ===== ORC ===== | ||