Ingesting All The Weather Data With Apache NiFi
Step By Step NiFi Flow
- GenerateFlowFile - build a schedule matching when NOAA updates weather
- InvokeHTTP - download all weather ZIP
- CompressContent - decompress ZIP
- UnpackContent - extract files from ZIP
- *RouteOnAttribute - just give us ones that are airports (${filename:startsWith('K')}). optional.
- *QueryRecord - XMLReader to JsonRecordSetWriter. Query : SELECT * FROM FLOWFILE WHERE NOT location LIKE '%Unknown%'. This is to remove some locations that are not identified. optional.
- Send it somewhere for storage. Could put PutKudu, PutORC, PutHDFS, PutHiveStreaming, PutHbaseRecord, PutDatabaseRecord, PublishKafkaRecord2* or others.
URL For All US Data
invokehttp.request.url
https://w1.weather.gov/xml/current\_obs/all\_xml.zip
Example Record As Converted JSON
[ {
"credit" : "NOAA's National Weather Service",
"credit_URL" : "http://weather.gov/",
"image" : {
"url" : "http://weather.gov/images/xml\_logo.gif",
"title" : "NOAA's National Weather Service",
"link" : "http://weather.gov"
},
"suggested_pickup" : "15 minutes after the hour",
"suggested_pickup_period" : 60,
"location" : "Stanley Municipal Airport, ND",
"station_id" : "K08D",
"latitude" : 48.3008,
"longitude" : -102.4064,
"observation_time" : "Last Updated on Jul 10 2020, 9:55 am CDT",
"observation_time_rfc822" : "Fri, 10 Jul 2020 09:55:00 -0500",
"weather" : "Fair",
"temperature_string" : "66.0 F (19.0 C)",
"temp_f" : 66.0,
"temp_c" : 19.0,
"relative_humidity" : 83,
"wind_string" : "South at 6.9 MPH (6 KT)",
"wind_dir" : "South",
"wind_degrees" : 180,
"wind_mph" : 6.9,
"wind_kt" : 6,
"pressure_in" : 30.03,
"dewpoint_string" : "60.8 F (16.0 C)",
"dewpoint_f" : 60.8,
"dewpoint_c" : 16.0,
"visibility_mi" : 10.0,
"icon_url_base" : "http://forecast.weather.gov/images/wtf/small/",
"two_day_history_url" : "http://www.weather.gov/data/obhistory/K08D.html",
"icon_url_name" : "skc.png",
"ob_url" : "http://www.weather.gov/data/METAR/K08D.1.txt",
"disclaimer_url" : "http://weather.gov/disclaimer.html",
"copyright_url" : "http://weather.gov/disclaimer.html",
"privacy_policy_url" : "http://weather.gov/notice.html"
} ]
Source Code
https://github.com/tspannhw/ClouderaFlowManagementWorkshop/tree/main/flows
Resources
- https://www.datainmotion.dev/2020/05/cloudera-flow-management-101-lets-build.html
- https://www.datainmotion.dev/2019/03/advanced-xml-processing-with-apache.html
- https://www.datainmotion.dev/2020/01/analyzing-wood-burning-stoves-with_23.html
- https://www.datainmotion.dev/2020/01/cloudera-edge2ai-minifi-java-agent-with.html
- https://community.cloudera.com/t5/Community-Articles/Tracking-Air-Quality-with-HDP-and-HDF-Part-1-Apache-NiFi/ta-p/248265
- https://community.cloudera.com/t5/Community-Articles/Part-2-IoT-Augmenting-GPS-Data-with-Weather/ta-p/245685