Exploratory data analysis is essential for data science or analytics
Let's take the heavy lifting out of data-preprocessing to speed up the 'data understanding' and 'data preparation' steps so that you can fix any data anomalies and make your data clean.
Now that you know a little bit more about generative AI from the last two instalments, let's get hands-on with Amazon CodeWhisperer in VS Code with Python to explore a crime dataset.
Learning Objectives
In this lesson you will use a real-word dataset to:
- Import Python Libraries
- Load a dataset
- Explore the dataset
- Use descriptive statistics to summarize the dataset
- Check for missing data
- Get comfortable using Amazon CodeWhisperer
- Learn about the new generative features that integrate with Jupyter notebooks announced by AWS on 10 May 2023
Dataset
This open-source Quarterly crime recorded dataset was provided by NSW Bureau of Crime Statistics and Research via data.gov.nsw.au.
The meta data includes 6 columns:
- Offences - Text
- Year 2018 - Integer
- Year 2019 - Integer
- Year 2020 - Integer
- Year 2021 - Integer
- Year 2022 - Integer
How to use Amazon CodeWhisperer as your AI coding companion?
Here are some tips:
- In VS Code open a new Python file
- Starting typing a few words e.g. Import...
- Write a comment # Import python libraries...
- Accept the code suggestion by clicking the tab key and click enter on your keyboard
Amazon CodeWhisperer will start generating code from the words that you enter or even learn and predict what you want to achieve by reading your comments.
Tutorial 1: Exploratory Data Analysis in Python with Amazon CodeWhisperer
Amazon CodeWhisperer will help you as a data scientist or data analyst prepare a logical flow for your analysis and help you debug and offer code suggestions as it learns what you might be thinking from natural language processing contained in the large language model architecture.
Step 1: Ensure you are connected to your Amazon Builder ID in VS Code.
Step 2: Start typing a few words or a comment in Python
Step 3: Import popular Python libraries and load the dataset
Step 4: Inspect the first 5 rows of the dataframe. As you type in comments, Amazon Whispererer will try to predict what you would like to achieve next and will suggest if you wanted to review the first 5 rows of your data.
Step 5: You might want to understand the shape of your dataframe such as how many columns and rows are in your dataset.
In the comments, start typing the word 'shape' and Amazon CodeWhisperer will try to complete and predict what you are trying to achieve.
Step 6: Summarize the dataset and provide descriptive statistics. Just start typing the word 'describe' and Amazon CodeWhisperer remembers the previous line of code and will suggest if you were wanting to summarize the data.
Step 7: You might want to view overall information about the data and start typing the word 'info'.
Step 8: You might want to check for any missing values. You may start typing the word 'missing' in the comments and Amazon CodeWhisperer will try to generate the code you need.
On the next line of code, Amazon CodeWhisperer will try to guess your next step and ask if you wanted to check for any duplicate values. You can click tab and enter to accept the code suggestion or you may reject the recommendation.
Step 9: You may also want to understand and visualize the data with histograms for different variables. If you start typing the word 'hist' in comments Amazon CodeWhisperer will quickly predict and provide a code suggestion in what you might be thinking. It's very smart! Learning from all your code comments and what you are typing from the large language models.
Step 10: As you tab to the next line of code, Amazon CodeWhisperer is already trying to guess that you want to write the code for building a box plot.
Step 11: On a new line, you may type in the Python comment 'draw a bar chart'. Amazon CodeWhisperer will quickly suggest to you the correct code that you need.
This quick exploratory analysis outline took 10 minutes to write the workflow and thought process.
Tutorial 2: Exploratory data analysis of NSW Crime for 2018 to 2022 in Jupyter Notebook using Amazon CodeWhisperer
(Note: Please read the new AWS Machine Learning Blog below for new announcements in generative AI with Jupter at the end of this blog announced on 10 May 2023).
Step 1: Open an instance of Jupyter Notebook in Python 3 and ensure you have saved the dataset in your directory.
Step 2: You may quickly apply the Amazon CodeWhisperer code suggestions into the Jupyter Notebook so that you can see the input code and the output.
Step 3: Run the Jupter Notebook for each line code so you may inspect the output.
Step 4: Check the descriptive statistics of your dataframe
Step 5: Check the dimensions of your dataframe with shape. There are 62 records and 6 columns or features.
Step 6: Check for any missing values in the dataframe. There are no missing values in the dataframe.
Step 7: Check details of the dataframe including the data types and also if there are any empty values or nulls values. All the data types are objects within the dataframe.
Step 8: Check for any duplicate values. There were no duplicate records in the dataframe.
Conclusion
You have learnt how to navigate to use Amazon CodeWhisperer as your ML-powered coding companion in 15 minutes by accepting or rejecting the predicted words and comments that is learning in real-time. You have also completed exploratory analysis in a Jupyter notebook in 10 minutes to help to achieve different stages of data exploration for either data analytics or data science.
Resources
News Flash - Hot off the press on 10 May 2023! 🌍
I am pleased to share with you, hot off the press. A news flash from AWS. Please read the latest announcement from AWS announcing new Jupyter contributions by AWS to democratize generative AI and scale ML workloads.
New features for generative AI include:
- Introducing two generative AI extensions for Jupyter
- Jupyter AI, an open-source project to bring generative AI to Jupyter notebooks
- Amazon CodeWhisperer Jupyter extension to build, train and deploy ML models
- Notebooks scheduling
- SageMaker open-source distribution
Please read the latest announcements authored by Brian Granger on the AWS Machine Learning Blog announced today 10 May 2023.
This Month - AWS She Builds Global Mentorship application closing 31 May
AWS She Builds Application
Apply to participate in the free AWS She Builds Mentorship Program - APJ, EMEA or US. Learn cloud computing ☁️
If you aspire to be a women in tech or interested in learning more about cloud computing, you have until 31 May 2023 to submit your application. Early bird gets the worm! You may apply at this link
Next Month - AWS re:Inforce 2023 on June 13-14
Are you ready for AWS re:Inforce this year? Join the AWS Community in-person to experience two days of learning, networking with your peers as you further your learning in security, innovation and the latest announcements from the keynotes.
I hope you will be able to join AWS CISO CJ Moses for his keynote, the AWS Security community next month and register at this link. There are leadership sessions, partner sessions and expert sessions at the 300 and 400 levels which are focused-learning and interactive sessions.
You may watch the keynote from AWS re:Inforce 2022 and also follow the hashtag #AWSreInforce on Twitter
Until the next lesson, happy learning! 😀
Reinventing your Customer's Business with Generative AI on AWS
Hello World |AI Coding Companion
Dr. Werner Vogels, Amazon VP & CTO, sits down with CodeWhisperer GM Doug Seven and Sr. Principal Engineer Sandeep Pokkunuri to discuss large language models.