Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| kb:bigdata:file_types [2018/11/21 14:05] – created yehuda | kb:bigdata:file_types [2022/01/03 16:03] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| [[https:// | [[https:// | ||
| + | [[https:// | ||
| + | [[https:// | ||
| ===== Parquet ===== | ===== Parquet ===== | ||
| - | | + | |
| + | |||
| + | | ||
| + | * Columnar format | ||
| + | * Binary format | ||
| + | * Encoded & Compressed | ||
| + | * Support schema evolution - Format supports | ||
| + | Limitation: | ||
| + | * Pushdown filters dont works on String / Binary ([[https:// | ||
| + | * Write speed tradeoff | ||
| + | |||
| + | Walkaround(s): | ||
| + | * Immutability | ||
| + | * Write using partitioning | ||
| + | * Combine with a database (i.e. Cassandara) - after a while spilt out parquets | ||
| + | * Write mode append, that added embedded schema | ||
| + | |||
| + | === vs ORC === | ||
| + | * indexed | ||
| + | * dont handles nested data | ||
| + | * | ||
| ===== ORC ===== | ===== ORC ===== | ||
| + | |||
| + | * Nested Data | ||
| + | * Columnar format | ||
| + | * Predicate pushdown (Min max + bloomfilters) | ||
| + | * ACID support / cannot add | ||
| + | * suggested to streaming ([[https:// | ||
| + | |||
| + | |||
| + | |||
| ===== Avro ===== | ===== Avro ===== | ||