01-April-2024
FLaNK / KNIFe AI Weekly
Tim Spann @PaaSDev
https://www.youtube.com/@FLaNK-Stack
https://www.threads.net/@tspannhw
https://medium.com/@tspann/subscribe
https://www.cloudera.com/campaign/apache-nifi-for-dummies.html
https://ossinsight.io/analyze/tspannhw
COOL CHARITY by KIDS!
CODE + COMMUNITY
Please join my meetup group NJ/NYC/Philly/Virtual.
http://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
https://www.meetup.com/futureofdata-philadelphia/
*This is Issue #131 *
https://github.com/tspannhw/FLiPStackWeekly
https://www.cloudera.com/solutions/dim-developer.html
New Releases
Apache Hive 4.0.0
https://hub.docker.com/r/apache/hive
Articles
Meetup Report
https://medium.com/@tspann/march-2024-meetup-report-61e82b00cf57
Real-Time Irish Transit Analytics
https://medium.com/@tspann/real-time-irish-transit-analytics-ea76164c9595
Adding Generative AI Results to SQL Streams
https://medium.com/@tspann/adding-generative-ai-results-to-sql-streams-513e1fd2a6af
Image Processing with Custom Python and Apache NiFi 2.0
https://medium.com/@tspann/image-processing-with-custom-python-and-nifi-2-0-06eadc62c03c
Cloudera + GenAI + NVIDIA NIM Microservices
https://menews247.com/cloudera-to-enhance-genai-with-nvidia-nim-microservices/
https://blog.cloudera.com/data-architecture-and-strategy-in-the-ai-era/
https://blog.cloudera.com/clouderas-rhel-volution-powering-the-cloud-with-red-hat/
https://drive.google.com/file/d/11lCJAB272ruBa7AAVwYxaN2E2xooWizG/view
https://pypi.org/project/streaming-jupyter-integrations/
https://thenewstack.io/how-nvidia-gpu-acceleration-supercharged-milvus-vector-database/
NiFi 2.0 Python
https://medium.com/@sudeep.singh99/a-beginners-guide-to-nifi-2-0-custom-python-processor-ac6d8c7bda7b
Make sure you are on the write MacOS version for new Java
https://blogs.oracle.com/java/post/java-on-macos-14-4
https://www.datanami.com/2024/03/22/zilliz-unveils-game-changing-features-for-vector-search
https://towardsdatascience.com/automated-detection-of-data-quality-issues-54a3cb283a91
https://mlops.community/7-methods-to-secure-llm-apps-from-prompt-injections-and-jailbreaks/?
https://www.startdataengineering.com/post/change-data-capture-using-debezium-kafka-and-pg/
https://nvidianews.nvidia.com/news/nvidia-blackwell-platform-arrives-to-power-a-new-era-of-computing
https://netflixtechblog.com/bending-pause-times-to-your-will-with-generational-zgc-256629c9386b
https://www.uber.com/en-GB/blog/balancing-hdfs-datanodes-in-the-uber-datalake/
https://techcrunch.com/2024/03/31/why-aws-google-and-oracle-are-backing-the-valkey-redis-fork/
Videos
Meetup Talk NYC
https://youtu.be/u8XNNEPEnKQ?si=VWe6n8OKOF7qk6Fl
Irish Rail Preview
https://youtu.be/EIpH7RPO2Yo
TCF Pro 2024
https://www.youtube.com/watch?v=tLbdrOxg5Rs
Streaming Traffic Cameras
https://www.youtube.com/watch?v=85ECRGJBEQU&ab_channel=DatainMotion-HowToBeaStreamingEngineer
NiFi 101
https://www.youtube.com/watch?v=8cZJ9CyLYyI&t=3114s
March 11, 2024 Princeton 23 Orchard Event
https://www.slideshare.net/slideshows/2024-build-generative-ai-for-nonprofits/266748822
march 15, 2024 Trenton TCF
https://www.slideshare.net/slideshows/tcfpro24-building-realtime-generative-ai-pipelines/266807785
Events
April 2, 2024: XtremeJ 2024. Virtual.
https://xtremej.dev/2023/schedule/
April 8-11, 2024: NLIT Summit. Seattle.
https://www.fbcinc.com/e/nlit/default.aspx
April 11, 2024: Conf42 LLM. Virtual.
https://www.conf42.com/llms2024
April 12, 2024: AI Max Conference. 23 Orchard Princeton
https://www.startupgrind.com/events/details/startup-grind-princeton-presents-startup-grind-hosts-ai-max-summit/
April 2024: AI Meetup NJ
https://www.meetup.com/nj-gai/
EMEA | APAC: April 24, 2024 9:30 AM CEST | 1:00 PM IST
AMER EVENT: Apr 25, 2024 9:00 AM PDT | 12:00 PM EDT
Register Now: http://spr.ly/6047Z3AjN
May 8-9, 2024: Data Summit 2024. Boston, MA.
https://www.dbta.com/DataSummit/2024/default.aspx
https://www.dbta.com/DataSummit/2024/Timothy-Spann.aspx
May 21, 2024: Gen AI and Beyond with NiFi 2.0. Virtual.
June 12, 2024: Budapest Data + ML Forum. Virtual.
https://budapestdata.hu/2024/en/
Cloudera Events
https://www.cloudera.com/about/events.html
More Events:
https://www.linkedin.com/pulse/schedule-2024-tim-spann--y4coe
Code
- https://github.com/tspannhw/FLaNK-python-processors
- https://github.com/kevinbtalbert/CML_AMP-to-Airgapped
- https://github.com/cloudera/CML_AMP_Deploy-Mistral7B-CML-Native-Model
- https://github.com/kadjoudi/Fraud-Prevention-With-Cloudera-SSB
Models
- https://github.com/lichao-sun/mora
- https://github.com/NousResearch/Hermes-Function-Calling
- https://huggingface.co/alpindale/Mistral-7B-v0.2-hf/tree/main
Tools
- https://developers.redhat.com/articles/2024/03/13/kafka-tiered-storage-deep-dive
- https://huggingface.co/adept/fuyu-8b
- https://llava.hliu.cc/
- https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_self_rag.ipynb
- https://github.com/mapr-demos/gess
- https://github.com/TracecatHQ/tracecat
- https://github.com/datadreamer-dev/DataDreamer
- https://huggingface.co/databricks/dbrx-instruct
- https://github.com/openai/tiktoken
- https://github.com/ora0600/genai-with-confluent
- https://github.com/speedb-io/speedb
- https://github.com/crate/crate
- https://gpt4all.io/index.html
- https://pypi.org/project/imposm.parser/
- https://github.com/easyjailbreak/easyjailbreak?
- https://github.com/databricks/megablocks
- https://github.com/milvus-io/bootcamp/blob/master/bootcamp/RAG/zilliz_pipeline_rag.ipynb
- https://github.com/DoMusic/Hybrid-Net
- https://www.reddit.com/r/LocalLLaMA/comments/1bmvtyb/new_user_beginning_guide_from_total_noob_to/
- https://github.com/run-llama/llama_index/tree/main/llama-index-packs/llama-index-packs-raft-dataset
- https://github.com/FoundationVision/GLEE
- https://github.com/zenml-io/zenml
- https://github.com/lapce/lapdev
- https://github.com/enisdenjo/graphql-ws
- https://github.com/jasonppy/VoiceCraft
- https://github.com/h2oai/enterprise-h2ogpte
- https://github.com/sbcshop/Pitalk_4G_HAT_Software
- https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/docs/rlhf.qmd
- https://www.modular.com/blog/the-next-big-step-in-mojo-open-source
New
Vector Db built on clickhouse
https://github.com/myscale/myscaledb
Cool Tool - LLM Synthetic Data Generators
https://github.com/geraldyong/OpenAI_Synthetic/tree/main
https://github.com/quentinlintz/synthetic-data-generator
https://medium.com/@n-demia/how-to-prepare-test-data-via-openai-api-in-postman-7e378dde1f53
https://github.com/datadreamer-dev/DataDreamer
https://huggingface.co/collections/rbiswasfc/synthetic-data-generation-65ee68e821ddaff47073ed02
Flink Connectors (scroll down)
https://flink.apache.org/downloads/
Avro
Can't handle numbers bigger than 19 decimals
Throwback Article
https://docs.cloudera.com/csp-ce/latest/ce-overview/topics/csp-ce-overview.html
Discount
Discount access to DataSummit 2024
https://secure.infotoday.com/RegForms/DataSummit/?Priority=24SPKR
© 2020-2024 Tim Spann