AWS open source newsletter, #140

Ricardo Sueiras - Jan 9 '23 - - Dev Community

January 9th, 2023 - Instalment #140

Welcome

Happy New Year and welcome to the first AWS open source newsletter of 2023, edition #140. If you have not already checked it out, I put together a short retrospective summary of 2022 in the post, AWS open source newsletter - 2022 in review. There are some interesting facts and figures in there. I am also taking time to collect feedback from readers to help shape where this newsletter goes in 2023. What do you want me to cover? What should I change? What is working well for you? Interesting in all your feedback so that I can make sure that you continue to enjoy (I hope) reading this newsletter.

With there being several weeks since the last newsletter, there is no surprise that we have a bumper selection of great new projects for you this week. Kicking things off with "cloudwatch-dashboard-builder" a nice GUI to help you build your CloudWatch dashboards, "sfn-workflow-studio-sync" a great developer productivity tool for those building Step Functions, "renate" a Python library for automatic model re-training, "graph-explorer" a React-based web application to visualise graph data, "aws-mainframe-modernization-carddemo" go retro mainframe with this sample app, "csharp-code-converter-for-postgres" helps you modernise your .NET applications by moving to open source databases, "aws-terraform-dev-container" a must check out project if you use Terraform, and many more projects. If you have a project you want me to feature in this newsletter, please get in touch.

We also feature content covering a very wide selection of your favourite open source technologies, including MySQL, PostgreSQL, Next.js, AWS ParallelCluster, MariaDB, Amazon EKS, Kubernetes, ArgoCD, AWS Distro for OpenTelemetry, Prometheus, DAMON, Crossplane, Apache Spark, Apache Kafka, Apache Flink, Apache Pinot, Apache Superset, Apache NiFi, Delta Lake, OpenShift, AWS Copilot, RabbitMQ, Apache Airflow, Rust, Terraform, Amazon EMR, and many more.

Finally, check out the Videos and Events section. This week I feature the best of re:Invent and a few other videos that you might have missed that are worth checking out, and we cover events coming up in the next few weeks. A reminder that if you have an open source event you want to us to include, then drop me a line via comments or social media.

Celebrating open source contributors

The articles and projects shared in this newsletter are only possible thanks to the many contributors in open source. I would like to shout out and thank those folks who really do power open source and enable us all to learn and build on top of what they have created.

So thank you to the following open source heroes: Ivica Kolenkaš, Lars Jacobsson, Harinder Seera, Danilo Poccia, Michael Larabel, Paul Vixie, Philipp Sacha, Faizal Khan, Saitkat Banerjee, Mayur Deqaikar, Abbey Fuller, Vikram Sethi, Nima Kaviani, Scott Rigney, Kiran Matty, Sanjay Rao, Abhijit Kshirsagar, Arunkumar Selvam, Gary Stafford, Noritaka Sekiyama, Kyle Duong, and Sandeep Adwankar

Latest open source projects

The great thing about open source projects is that you can review the source code. If you like the look of these projects, make sure you that take a look at the code, and if it is useful to you, get in touch with the maintainer to provide feedback, suggestions or even submit a contribution.

Tools

cloudwatch-dashboard-builder

cloudwatch-dashboard-builder is the latest tool from AWS Community Builder Harinder Seera that provides a graphical user interface (Windows only, but maybe someone out there can build a Linux GUI from the source) that helps you build CloudWatch Dashboards from CloudWatch metrics. Check out the repo from plenty of screenshots and a short video of how this works. Harinder has also thoughtfully put together a three part blog post that dives deep into how to use this tool, so make sure you read AWS CloudWatch Dashboard Builder - Tool For SRE, Performance Engineers and DevOps

screenshot of cloudwatch-dashboard builder

sfn-workflow-studio-sync

sfn-workflow-studio-sync is another great tool from AWS Community Builder Lars Jacobsson, that enables real-time sync between StepFunctions Workflow Studio and your local machine. This extension uses the File System Access API to give Chrome temporary access to a single file on your filesystem, so make sure you check your browser compatibility (link in the repo)

demo of sfn-workflow-studio-sync

renate

renate is an open-source Python library for automatic model re-training. The library implements continual learning algorithms to train deep neural networks incrementally when new data becomes available. Applications of machine learning require updating models as new batches of data become available. Repeatedly re-training deep neural network models from scratch is costly and fine-tuning them with the new data only will lead to a phenomenon called “catastrophic forgetting”. This means that the model will have good performance on the most recent data, but the performance will degrade on the older data. Renate provides algorithms that alleviate the problem of catastrophic forgetting and helps to automate the re-training process. With Renate, users run small scale continual learning experiments on their local machine or run large continual learning jobs using Amazon SageMaker. Renate also supports state-of-the-art hyper-parameters tuning out-of-the-box, thanks to the integrations with SyneTune.

example graph of improvements

graph-explorer

graph-explorer The graph-explorer open source tool provides a React-based web application that can be deployed as a Docker image to visualise graph data. You can connect to Amazon Neptune or another graph database that provides an Apache TinkerPop Gremlin or SPARQL 1.1 endpoint. You can search the data quickly using faceted search filters and interactively explore connections around nodes and edges. You can also customise the graph layout, colours, icons, and default properties to display for nodes and edges, and save images for future use.

demo of graph-explorer screenshot

To get started, build the graph-explorer Docker image and run it on a local machine or on AWS via an EC2 instance or perhaps your favourite container orchestrator.

csharp-code-converter-for-postgres

csharp-code-converter-for-postgres is a useful tool for those looking to move to open source databases and provides a Visual Studio extension which helps migrating the Microsoft C# ADO.NET code that is connected to Microsoft SQL Server to PostgreSQL database.

aws-terraform-dev-container

aws-terraform-dev-container is a VSCode Dev Container with tools to help you build and manage AWS infrastructure with Terraform. Check out the documentation and sample screenshots, if you are using Terraform then this might be a great way to ensure consistency within your teams.

screenshot of aws terraform dev container

Demos, Samples, Solutions and Workshops

lambda-rust-and-cdk

lambda-rust-and-cdk is a proof of concept project from my colleague Danilo that uses AWS CDK to provision two AWS Lambda functions developed in Rust.

aws-mainframe-modernization-carddemo

aws-mainframe-modernization-carddemo is a Mainframe application designed and developed to test and showcase AWS and partner technology for mainframe migration and modernisation use-cases such as discovery, migration, modernisation, performance test, augmentation, service enablement, service extraction, test creation, test harness, etc. Check out the supporting blog post, Introducing Open Source AWS CardDemo for Mainframe Modernization where Sanjay Rao, Abhijit Kshirsagar, and Arunkumar Selvam describe the functions, the technical components and the structure of the CardDemo application before showing how to install the application on a mainframe.

sample architecture for carddemo mainframe

AWS and Community blog posts

Apache Airflow

In 2022 I did a number of talks around how to orchestrate hybrid workflows, using a combination of Apache Airflow and AWS services such as AWS ECS Anywhere. I first met Ivica Kolenkaš during PyCon Italia, and he got in touch to say that he has put together this post, Solving data governance with Airflow and AWS ECS Anywhere, which I cannot recommend highly enough. Much like the original idea behind my talk, this dives deep into one aspect of this (Governance/Compliance) and provides a rich, hands on example of how you can use the power of AWS ECS Anywhere and Apache Airflow as the orchestrator to solve this tricky problem. Very nice.

architecture of airflow and governance blog post

Delta Lake

Delta Lake is an open-source project that helps implement modern data lake architectures commonly built on Amazon S3 or other cloud storages. In the post, Introducing native Delta Lake table support with AWS Glue crawlers, Noritaka Sekiyama, Kyle Duong, and Sandeep Adwankar collaborate and demonstrates how AWS Glue crawlers work with native Delta Lake tables and describes typical use cases to query native Delta Lake tables. [hands on]

sample screenshot of delta lake querying

Apache Kafka

In Analyze real-time streaming data in Amazon MSK with Amazon Athena Scott Rigney and Kiran Matty look at the recently released Athena connector for Amazon MSK. With it, you can run interactive queries on data held in Kafka topics running in MSK or self-managed Apache Kafka. [hands on]

example screenshot showing athena source of kafka

MQTT

MQTT is a lightweight, publish-subscribe, machine to machine network protocol for message queue/message queuing service. In the post, Use AWS IoT Core MQTT broker with standard MQTT libraries Philipp Sacha shares how you can use standard MQTT libraries for different languages like Python, Node.js, or Java to interact with the AWS IoT Core message broker. [hands on]

DAMON

Have you heard about DAMON? I have shared details about this project that originated from Amazon in earlier newsletters (#114), but what does it do? DAMON is a data access monitoring framework subsystem for the Linux kernel. In the post, Amazon Reflects On The Great Year For DAMON In The Linux Kernel Michael Larabel provides a quick post for those yearning to know more about this project.

DNS

Our own VP of Security Paul Vixie (and DNS contributor/ legend) gave a talk this autumn explaining why open source supply chain security is such a big issue. Check out the post which dives deep into this talk, Why Bad Bugs in DNS (And Other Open Source Code) Just Won’t Go Away

Cloud native related posts

architecture of multi architecture pipeline

architecture of blue green solution for deployments using argocd

architecture of sample application

Open Source database related posts

architecture of solution to generate excel sheets from postgres db

architecture using apache shardingsphere-proxy

Other posts and quick reads

apache nifi flow example

Quick updates

Apache Airflow

You can now create Apache Airflow version 2.4 environments on Amazon Managed Workflows for Apache Airflow (MWAA) with Python 3.10 support. Amazon MWAA is a managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud. With Apache Airflow 2.4 on Amazon MWAA, customers can enjoy the same scalability, availability, security, and ease of management that Amazon MWAA offers with the improvements of Apache Airflow 2.4. Further, with Airflow 2.4 MWAA has increased the Python version to 3.10, providing support for newer Python libraries, features, and improvements.

AWS Copilot

AWS Copilot (v1.24) now supports Amazon Elastic Container Service (Amazon ECS) Service Connect, a capability that simplifies building and operating resilient distributed applications. Service Connect enables you to have easier network setup and seamless communication between services deployed across multiple Amazon ECS clusters and virtual private clouds (VPCs). Service Connect enables you to add a layer of resilience to your Amazon ECS service communication and get traffic insights with no changes to your application code.

With the new AWS Copilot release (v1.24), customers can now enable Amazon ECS Service Connect in their service manifest file by specifying network: connect: true. AWS Copilot enables Amazon ECS Service Connect by default for new applications and services, like load balanced web service and backend web service patterns.

MySQL

Amazon Aurora MySQL Version 3 (Compatible with MySQL 8.0) now offers support for Backtrack. Backtrack allows you to move your database to a prior point in time without needing to restore from a backup, and it completes within seconds, even for large databases.

Redis

Amazon ElastiCache for Redis now supports updates to encryption in transit on existing cluster resources. You can change the TLS configuration of your Redis clusters without re-building or re-provisioning them or impacting application availability. When enabling encryption in transit, your overall solution can remain connected to Redis clusters.

RabbitMQ

Amazon MQ now provides support for RabbitMQ version 3.9.24, which includes several fixes to the previously supported version, RabbitMQ 3.9.20.

OpenShift

Red Hat OpenShift Service on AWS (ROSA) now provides an AWS Management Console experience that simplifies the process for satisfying the AWS account prerequisites for provisioning and operating ROSA clusters. The new AWS ROSA console page automatically checks whether ROSA prerequisites are met and provides automated configuration and step-by-step guidance where manual configuration is required.

Before a cluster administrator can provision ROSA clusters, their AWS account must have ROSA enabled, adequate service quotas, and the Elastic Load Balancing service-linked role. The new AWS ROSA console page automatically checks if these ROSA prerequisites are met, and provides automation and active step-by-step guidance for meeting the requirements. The new AWS ROSA console page helps you provision clusters faster.

Linux

AWS License Manager announces support for cross-region, cross-account tracking of commercial Linux subscriptions you run on AWS. This includes subscriptions purchased as part of EC2 subscription-included AMIs, on the AWS Marketplace, or brought to AWS via Red Hat Cloud Access Program. You can track subscription usage by number of instances for Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Ubuntu Pro distributions in the Linux subscriptions tab of the AWS License Manager console.

Once data is discovered and aggregated, you will have insight into all your instances using commercial Linux subscriptions. To view historical usage patterns and set alarms when key thresholds are met for a particular subscription type, you can view the data as Amazon CloudWatch dashboards in the AWS License Manager console and set Amazon CloudWatch alarms from within the AWS License Manager console for monitoring. For data analysis, you can view and export the list of EC2 instances by commercial Linux subscription type (or across all subscriptions types) with instance attributes such as subscription type, AMI id, instance ID, account ID, region, usage operation, and product code.

AWS ParallelCluster

AWS ParallelCluster 3.4 is now generally available and introduces the ability to create HPC clusters that can access and aggregate compute capacity across multiple AWS Availability Zones (AZ) in a Region. Other important features in this release include:

  • Amazon VPC configuration for AWS Lambda functions used by ParallelCluster for managing your clusters.
  • Custom AWS IAM prefixes for ParallelCluster IAM roles and policies to enable permission boundaries.
  • Secure file-system mounting for Amazon EFS to enable in-transit encryption and IAM authorizations.
  • Ability to set up custom actions for cluster updates.

Delta Lake

You can now query Delta Lake tables seamlessly in Amazon Athena, giving you the benefit of increased operational efficiency, improved query performance and reduced cost. Delta Lake is an open-source table format that helps implement modern data lake architectures commonly built on Amazon S3. Prior to this launch, reading Delta Lake tables in Athena required a complex process of generating and managing additional metadata files. Now you can use Athena to query Delta Lake tables directly without this additional effort.

Athena enables interactive analytics and dashboard reporting for Delta Lake-formatted data lakes and now your Delta Lake table updates are available for analysis in Athena as soon as they are completed. Athena uses metadata contained in Delta Lake files to optimise your queries, so you reduce your data scan costs and get up to 40% performance improvement in your Athena queries. Athena makes it easier for you to create and manage Delta Lake tables in AWS Glue Data Catalog by using simple DDL statements such as CREATE EXTERNAL TABLE and DESCRIBE TABLE, which are consistent with other table types supported in Athena. You can also use AWS Glue Crawler to discover Delta Lake table schemas and manage schema updates in Glue Data Catalog for Delta Lake files, making newly cataloged data available for analysis in Athena seamlessly.

Videos of the week

Exploring Popular Open-source Stream Processing Technologies

Gary Stafford has been busy over the holidays and put together this new video Based on the two-part blog post: "Exploring Popular Open-source Stream Processing Technologies: A brief demonstration of Apache Spark Structured Streaming, Apache Kafka Streams, Apache Flink, and Apache Pinot with Apache Superset".

Part One

Part Two

AWS IoT and open source

AWS Community Hero Faizal Khan runs this extended video covering all things open source and AWS IoT. Well worth watching, and covers many of the familiar things such as FreeRTOS, AWS IoT for Greengrass, as well as looking at new things you may not be familiar with. Original link is over at Open Source and AWS IoT

re:Invent roundup

It feels like only yesterday, but re:Invent is now but a memory. Luckily, thanks to the fact that many of the sessions were recorded, you can check out some of the best open source related sessions. Here are my top three, but I will feature others in coming editions of the newsletter.

When security, safety, and urgency all matter: Handling Log4Shell (BOA204)

On December 9, 2021, there was a report of a potential remote code execution issue in the widely used open-source Apache logging library Log4j. This issue allowed a user to use Java Naming and Directory Interface (JNDI) and LDAP endpoints to execute arbitrary code on a system. Over the next 10 days, 5 additional common vulnerabilities and exposures affecting Log4j were made public. This event as is now referred to as Log4Shell. In this session, Abbey Fuller shares our response to Log4Shell, from initial notification to hot patch, fleet scanning, and customer communications.

.NET open source on AWS (OPN207)

.NET is the third most-used programming stack by AWS customers and is used by a dedicated community of over 10 million developers worldwide. Increasingly, the non-enterprise .NET open-source ecosystem has been neglected and often termed “third party,” which has been detrimental to the growth of the community. AWS is trying to change that. Saitkat Banerjee and Mayur Deqaikar share how AWS is making an impact in the .NET open-source community, and discover some of the long-term bets that AWS is investing in today. Walk away equipped with an understanding of the possibilities for .NET open source on AWS.

Secure and multi-tenant infrastructure as code with Crossplane & Argo (OPN309)

Secure and multi-tenant automation of application and infrastructure rollout on Kubernetes can be complicated, particularly at enterprise scale. Adobe, AWS, and Deutsche Kreditbank collaborated on a solution that has been able to address this challenge by using a combination of GitOps, Crossplane Compositions, open policy agent, and Argo CD. Vikram Sethi and Nima Kaviani talk about the collaborative effort that yielded a solution for a scalable, secure, and multi-tenant infrastructure-as-code deployment strategy working at enterprise scale. Hear about the technical complexities, observations, solutions, lessons learned, and best practices uncovered as part of this effort.

Build on Open Source

For those unfamiliar with this show, Build on Open Source is where we go over this newsletter and then invite special guests to dive deep into their open source project. Expect plenty of code, demos and hopefully laughs. We have put together a playlist so that you can easily access all (eight) of the episodes of the Build on Open Source show. Build on Open Source playlist

Events for your diary

If you are planning any events in 2023, either virtual, in person, or hybrid, get in touch as I would love to share details of your event with readers.

FOSSDEM
Feb 4-5th, 2023 in Brussels

FOSDEM is a free event for software developers to meet, share ideas and collaborate. Every year, thousands of developers of free and open source software from all over the world gather at the event in Brussels. 4 & 5 February 2023. A must attend event for all open source fans, check out and register via this link.

State of Open Con 23
Feb 7-8th, 2023 in London

OpenUK will be hosting a 1000 person plus two day conference in Central London, “State of Open Con 23” in association with IEEE, the headline sponsor. Check out more info and sign up here.

Everything Open
March14-15th Melbourne, Australia

A new event for the fine folks in Australia. Everything Open is running for the first time, and the organisers (Linux Australia) have decided to run this event to provide a space for a cross-section of the open technologies communities to come together in person. Check out the event details here. The CFP us currently open, so why not take a look and submit something if you can.

OpenSearch
Every other Tuesday, 3pm GMT

This regular meet-up is for anyone interested in OpenSearch & Open Distro. All skill levels are welcome and they cover and welcome talks on topics including: search, logging, log analytics, and data visualisation.

Sign up to the next session, OpenSearch Community Meeting

Stay in touch with open source at AWS

I hope this summary has been useful. Remember to check out the Open Source homepage to keep up to date with all our activity in open source by following us on @AWSOpen

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .