Migrating Apache Flume Flows to Apache NiFi: Kafka Source to Apache Parquet on HDFS
Article 3 - This
*Article 2 - * https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache.html
Article 1 - https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html
*Source Code: * https://github.com/tspannhw/flume-to-nifi
This is one possible simple, fast replacement for "Flafka". I can read any/all Kafka topics, route and transform them with SQL and store them in Apache ORC, Apache Avro, Apache Parquet, Apache Kudu, Apache HBase, JSON, CSV, XML or compressed files of many types in S3, Apache HDFS, File Systems or anywhere you want to stream this data in Real-time. Also with a fast easy to use Web UI. Everything you liked doing in Flume but now easier and with more Source and Sink options.
[
](https://1.bp.blogspot.com/-rrzR-xonOAg/XZvQp8wsUDI/AAAAAAAAYew/7LUVLuqb5hY1pkAPV9l8pU_vPOvHf640gCLcBGAsYHQ/s1600/createParquetTable.png)[
](https://1.bp.blogspot.com/-rrzR-xonOAg/XZvQp8wsUDI/AAAAAAAAYew/7LUVLuqb5hY1pkAPV9l8pU_vPOvHf640gCLcBGAsYHQ/s1600/createParquetTable.png)Consume Kafka And Store to Apache Parquet
Kafka to Kudu, ORC, AVRO and Parquet
With Apache 1.10 I can send those Parquet files anywhere not only HDFS.
JSON (or CSV or AVRO or ...) and Parquet Out
In Apache 1.10, Parquet has a dedicated reader and writer
Or I can use PutParquet
Create A Parquet Table and Query It
References
- https://www.progress.com/tutorials/jdbc/ingest-salesforce-data-incrementally-into-hive-using-apache-nifi
- https://community.cloudera.com/t5/Community-Articles/RDBMS-to-Hive-using-NiFi-small-medium-tables/ta-p/244677
- https://community.cloudera.com/t5/Community-Articles/My-Year-in-Review-2018/ta-p/249363
- https://community.cloudera.com/t5/Community-Articles/My-Year-in-Review-2017/ta-p/247541