Table of Contents

File types

Parquet

  • Better Column selecting
  • Columnar format
  • Binary format
  • Encoded & Compressed
  • Support schema evolution - Format supports

Limitation:

  • Pushdown filters dont works on String / Binary (source)
  • Write speed tradeoff

Walkaround(s):

  • Immutability
    • Write using partitioning
    • Combine with a database (i.e. Cassandara) - after a while spilt out parquets
    • Write mode append, that added embedded schema

vs ORC

  • indexed
  • dont handles nested data

ORC

  • Nested Data
  • Columnar format
  • Predicate pushdown (Min max + bloomfilters)
  • ACID support / cannot add
  • suggested to streaming (source)

Avro

kb/bigdata/file_types.txt · Last modified: 2022/01/03 16:03 by 127.0.0.1
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0