FLaNK-Store
Real-Time Retail Grocery Store with Apache MiNiFi, Apache NiFi, Apache Kafka, Apache Flink, Apache Kudu, Apache Ozone, Apache Iceberg, HTML, JQuery, DataTables.
Source: https://github.com/tspannhw/FLaNK-Store
In todayâs example, I need to ingest grocery items for some analytics, so letâs read these via a secured REST API. To follow along, you will need to sign up for your own free key to see this interesting data. Who doesnât want to ingest bananas with NiFi. Maybe I just really need to know about eggs and sugar. A friend of mine Brent wanted me to build this during lockdown to determine the current price of a basket of household goods. I had built it and it sat on the shelf as most of the APIs I wanted were not available. Thanks to the innovative retail work at Kroger, we can get the data at speed with their API. Letâs explore. This is the first in the series, I will cover storage to Ozone (on S3), Iceberg and Kudu. I will also cover In-Store Data Collection, Updating Shelf Pricing in Real-Time (Raspberry Pi with E-Ink) and more retail use cases.
We are ingesting data from Kroger
You will need to sign-up to get your credentials to get this cool data.
- https://developer.kroger.com/reference/#operation/productGet
- https://developer.kroger.com/documentation
- https://developer.kroger.com/documentation/public/getting-started/quick-start
- https://developer.kroger.com/documentation/api-products/public/products/overview
We picked a basket of items to watch from the stores.
Some of my basket items: Organic Banana,
Velveeta Original Cheese, Imperial Cane Sugar, Oreo Team USA Chocolate Sandwich Cookies, KrogerÂŽ Fat Free Skim Milk,
Land O LakesÂŽ Salted Butter Sticks,
KrogerÂŽ 1 lb. Lean Ground Beef Chuck Roll 80/20,
StoufferâsÂŽ Macaroni & Cheese Frozen Meal and
Eggo Homestyle Frozen Waffles.
Apache NiFi Flow Walk Through
We have a few interesting flows for working with Retail data. The first one is ingesting product information from Kroger.
Kafka Topics Shown in Streams Messaging Manager
Apache Kafka Item Image
Querying Kafka Topic for Items
NiFi Calcite SQL â To Transform and Enrich Item Price Stream
SELECT brandname,category,countryorigin,'${date}' as itemdate,displayimage,
images,item,itemId,itemdescription,itemheatsen,itemsize,longdescrption,
msrp,originstore,cast(COALESCE(price, 0.00) as float) as price,
productid,tmpind,tpr,ts,upc,uuid,updatedate
FROM FLOWFILE
Flink SQL to Browse Data
An example of querying Apache Kafka topics with Apache Flink SQL via Schema Registry catalog.
select brandname, item, itemdescription, itemsize,
price, category,
updatedate, longdescrption, displayimage
from `sr1`.`default_database`.`item`
I wish to make this data available for Jupyter notebooks and also HTML pages. I will add some notebooks and Cloudera Data Visualization.
The easiest way to do this is create a materialized view in SQL Stream Builder to make my query results available as JSON over REST.
Once I see the results of my query, I am good to go. Letâs build an HTML view of this data.
Letâs make sure our materialized view is loaded in the raw REST feed.
Letâs use DataTables and JQuery to build a dynamic HTML table view.
All of the code is available in github for you to use with your own data or your own basket to load from Kroger.
The final results here:
ChatGPT gave me some good ideas.
Using the Kroger REST API for groceries with Apache NiFi is a great idea for several reasons. Letâs explore the benefits of combining these two technologies:
Data Integration: Apache NiFi is a powerful data integration tool that enables the seamless flow and transformation of data between various systems. By integrating the Kroger REST API with NiFi, you can easily fetch, process, and distribute grocery-related data from Krogerâs services to your desired destinations.
Real-Time Data: The Kroger REST API provides real-time access to a wide range of grocery-related information, including product details, prices, availability, and more. By leveraging NiFiâs capabilities, you can constantly monitor and retrieve the latest data from Krogerâs API, ensuring that you have up-to-date information at all times.
Scalability: NiFi is designed to handle high volumes of data and can scale horizontally to accommodate increased workloads. This scalability makes it well-suited for processing large quantities of grocery data fetched from the Kroger API. You can configure NiFi to handle parallel processing, data partitioning, and load balancing, ensuring efficient data flow even during peak times.
Data Transformation and Enrichment: NiFi provides a wide range of processors and functions that facilitate data transformation and enrichment. You can use NiFiâs processors to extract specific data from the Kroger API responses, apply transformations, perform calculations, and enrich the data with additional information. This capability allows you to tailor the data from the Kroger API to suit your specific requirements.
Data Quality and Reliability: NiFi offers extensive data quality monitoring and control capabilities. You can implement data validation rules, perform data cleansing operations, and apply data governance practices to ensure the accuracy and reliability of the grocery data received from the Kroger API. NiFi also provides features like provenance tracking, error handling, and data lineage, which help you maintain data integrity throughout the data flow.
Integration with Other Systems: NiFi supports integration with a wide range of systems and platforms, including databases, data lakes, messaging systems, and cloud services. By combining the Kroger REST API with NiFi, you can seamlessly integrate the fetched grocery data with your existing data infrastructure, enabling further analysis, reporting, and integration with downstream systems.
Workflow Orchestration: NiFi allows you to design complex data workflows through its intuitive graphical user interface. You can create workflows that fetch data from the Kroger API, apply various transformations, perform validations, and route the data to different destinations based on predefined rules. This workflow orchestration capability simplifies the data integration process and provides better control over data flow.
By utilizing the Kroger REST API with Apache NiFi, you can leverage the strengths of both technologies to build a robust and scalable grocery data integration pipeline. This combination empowers you to access real-time grocery data, apply transformations, ensure data quality, and integrate with other systems effectively.
References
- https://developer.kroger.com/documentation/api-products/public/products/tutorial
- https://github.com/tspannhw/retail-dynamic-shelf-pricing
- https://github.com/tspannhw/FLaNK-AllTheStreams/tree/main/schemas
- https://developer.kroger.com/documentation/partner/getting-started/apis
- https://documenter.getpostman.com/view/4833726/TVReeWJm
- https://github.com/tspannhw/FLaNK-AllTheStreams
- https://medium.com/@tspann/building-a-real-time-data-pipeline-a-comprehensive-tutorial-on-minifi-nifi-kafka-and-flink-ee03ee6722cb