Are you ready to give the AWS AI Practitioner Certification Exam?
Below is the list of all the material and prep notes that helped me pass the exam.
Hope it will be helpful to you.
KEY NOTES: PLEASE READ FIRST
Below are my personal notes taken from the course I have taken around this
It is not exhaustive list and may not be specific sequence also
Usually my idea was to write everything down while learning and then go through it again and in case I don't recall the given topic then I would go in depth to understand it further.
Hope it helps you in prep, however way it can and most importantly WISH YOU GOOD LUCK..
In case the indentation is not showing correctly, you can go to this link - or message me and can send it to you.
What are Transformers in Artificial Intelligence? -> aws.amazon.com/what-is/transformers-in-artificial-intelligence/
What are Foundation Models? -> aws.amazon.com/what-is/foundation-models/
What is Artificial Intelligence (AI)? -> aws.amazon.com/what-is/artificial-intelligence/
What is Machine Learning? -> aws.amazon.com/what-is/machine-learning/
What is Deep Learning? -> aws.amazon.com/what-is/deep-learning/
What is Generative AI? -> aws.amazon.com/what-is/generative-ai/
Whatโs the Difference Between Supervised and Unsupervised Learning? -> aws.amazon.com/compare/the-difference-between-machine-learning-supervised-and-unsupervised/
Machine Learning Concepts -> docs.aws.amazon.com/machine-learning/latest/dg/machine-learning-concepts.html
AWS AI Use Case Explorer -> aws.amazon.com/machine-learning/ai-use-cases/?use-cases
What is Amazon SageMaker? -> docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
AWS Services - Machine Learning (ML) and Artificial Intelligence (AI) -> docs.aws.amazon.com/whitepapers/latest/aws-overview/machine-learning.html
AWS Deploy Serverless ML ->aws.amazon.com/blogs/machine-learning/deploy-a-serverless-ml-inference-endpoint-of-large-language-models-using-fastapi-aws-lambda-and-aws-cdk/
AWS Sagemaker - API Gateway - AWS Lambda -> aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/
Inference parameters ->docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html
Inference parameters -> docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html?icmpid=docs_bedrock_help_panel_playgrounds
Amazon Bedrock or Amazon SageMaker? -> docs.aws.amazon.com/decision-guides/latest/bedrock-or-sagemaker/bedrock-or-sagemaker.html
Choosing a generative AI service -> docs.aws.amazon.com/decision-guides/latest/generative-ai-on-aws-how-to-choose/guide.html
AWS Bedrock Agents -> aws.amazon.com/bedrock/agents/
What is RAG? - Retrieval-Augmented Generation AI Explained - AWS (amazon.com)
docs.aws.amazon.com/awscloudtrail/latest/userguide/how-cloudtrail-works.html
docs.aws.amazon.com/bedrock/latest/userguide/usingVPC.html
aws.amazon.com/blogs/machine-learning/use-aws-privatelink-to-set-up-private-access-to-amazon-bedrock/
Known Data -> Features -> Algorithm -> Output
Adjustments
Inference
ML models can be trained on various types of data.
Structured data on RDS, S3 or Redshift
S3 is primary source of training data
Semi-structures = DynamoDB & DocumentDB
Unstructured data - tokenization
Timeseries - sequential data
Model Training - Algorithm
Inference 2 options
-
Real time
Low Latency High throughput persistent endpoint
-- Batch Transform
Offline
Large datasets
Infrequent use
ML Types
Supervised Learning
Amazon Sagemaker GroundTruth -> Amazon Mechanical Turk
Unsupervised Learning
Reinforcement Learning
Reward - AWS DeepRacer
Overfitting
Model does well on training data but not outside it
Underfitting
Model cannot determine meaningful results. It gives negative results for training data and new inputs
Bias and fairness
Diversity of training data
Feature importance
Fairness constraints
Deep Learning
Neural Networks
Input Layer -> Hidden Layers -> Output Layer
Machine Learning vs Deep Learning
Consider alternatives when
Costs outweigh the benefits
Models cannot meet the interpretability requirements
Systems must be deterministic rather than probabilistic
ML Models are probabilistic
Supervised learning -
Classification
Binary - Diabetic or not diabetic
MultiClass
Regression
Simple Linear regression
Multiple Linear regression
Logistic regression
Unsupervised Learning
Clustering
Define features
Similarity function
Number of clusters
Anomaly detection
Data points that diverge
Amazon Rekognition
Facial comparison and analysis
Text detection
Object detection and labelling
Content moderation
Can find out explicit text from images and videos
Amazon Textract
Extract text from scanned documents
Amazon Comprehend
Extract key phrases, entities and sentiment.
Main is finding PII data
Amazon Lex
Conversational voice and text
Amazon Transcribe
Converts speech to text
Amazon Polly
Converts Text to speech
Amazon Kendra
Intelligent document search
Amazon Personalize
Personalized product recommendations
Amazon Translate
Translates between 75 languages
Amazon Forecast
Predicts future points in time-series data
Amazon Fraud Detector
Detects fraud and fraudulent activities
Amazon Bedrock
Amazon Sagemaker
ML Pipeline
Identify Business Goal -> Frame ML Problem -> Collect Data -> Pre-process Data -> Engineer Features -> Train, Tune Evaluate -> Deploy -> Monitor
Collect Data
AWS Glue -
Cloud optimized ETL service
Contains its own data catalog
Built in transformations
AWS Glue DataBrew
Point and click data transformation
200+ transformations
AWS SageMaker Ground Truth
Uses ML to label your training data
Can automatically label
AWS SageMaker Canvas
Import, Prepare, Transform, Visualize and analyze
AWS Sagemaker Feature Store
Processes raw data into features by using a processing workflow
Amazon Sagemaker Experiments
visual interface
Amazon Sagemaker automatic model tuning
Deploy
Batch inference
Real-time inference
Self-managed
Hosted
Amazon Sagemaker inference
Batch Transform
Offline inference
Large datasets
Asynchronous
Long processing times
Large payloads
Serverless
Intermittent traffic
Periods of no traffic
Real-time
Live predictions
Sustained traffic
Low latency
Consistent
Monitor the model
Configure alerts to notify and initiate actions if any drift
data drift / concept drift
Amazon Sagemaker Model Monitor
MLOps
Amazon SageMaker Model Building Pipelines
Repository Options
AWS Codecommit
AWS Sagemaker feature store
AWS Sagemaker model registry
3rd party repository
Orchestration options
Amazon Sagemaker pipelines
Amazon managed workflows for apache airflow
AWS Step functions
Accuracy = (True Positives + Ture Negatives) / Total
Precision = True Positives / (True Positivies + False Positives)
Recall = True Positives / (True Positives + False Negatives)
F1 = Precision Recall 2 / (Precision + Recall)
False Positive Rate FPR = False Positives / (True Negatives + False Positives)
True Negative Rate = True Negatives / (True Negatives + False Positives)
Area Under Curve - AUC
Regression Model Errors
Mean Squared Error
Root mean squared error
Mean absolute error
A Framework to Mitigate Bias and Improve Outcomes in the New Age of AI(opens in a new tab) (opens in a new tab) (opens in a new tab)
2 What Are Transformers in Artificial Intelligence?(opens in a new tab)
3 What Is Overfitting?(opens in a new tab) (opens in a new tab)
4 What Are Large Language Models (LLMs)?(opens in a new tab)
5 Responsible Use of Machine Learning(opens in a new tab)
6 Easily Add Intelligence to Your Applications(opens in a new tab)
7 What Is MLOps?(opens in a new tab) (opens in a new tab) (opens in a new tab)
8 Amazon SageMaker MLOps: From Idea to Production in Six Steps(opens in a new tab)
9 Machine Learning Lens
Domain 2::
AI - ML - DL - GAI
Model
In-context learning
Prompts, prompt tuning, prompt engineering
Every NLP has a tokenizer which converts texts into token ID's.
Vector - ordered list of numbers.
Ability to encode related relationships and collect associations
Embeddings
Numerical vectorized representations of type that capture the semantic meaning of the token
Self-attention
LLMs
Deep learning foundation models
Transformers
Unimodal or multimodal
Multimodal use cases
Multimodal tasks
Diffusion Models
Forward Diffusion
Reverse Diffusion
Stable Diffusion
Does not use pixel space of the image, uses a reduced-definition latent space
SageMaker + Amazon Q Developer
Amazon Nimble studio and amazon samarian
Gen AI Architectures
Generative Adversarial Networks GANs
Variational autoencoders VAE
Transformers
AI Project lifecycle
Identify User case
Experiment and select
Adapt, align and augment
Evaluate
Deploy and integrate
Monitor
Interpretability
Intrinsic analysis
Post hoc analysis
ML outputs are deterministic
Gen AI outputs are non-deterministic
Gen AI Performance metrics
Recall - Oriented Understudy for Gisting Evaluation (ROUGE)
Bilingual Evaluation Understudy (BLEU)
Transfer learning
SageMaker JumpStart
Selecting the Right Foundation Model for Your Startup(opens in a new tab) (opens in a new tab) (opens in a new tab) (opens in a new tab)
2 Generative Adversarial Networks Applications and its Benefits (opens in a new tab)
3 The Complete Guide to Generative AI Architecture(opens in a new tab) (opens in a new tab)
4 PartyRock.aws(opens in a new tab)
5 Monitoring Generative AI Applications Using Amazon Bedrock and Amazon CloudWatch Integration(opens in a new tab)
6 What Is a GAN?(opens in a new tab)
7 AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI
Considerations
Architecture
Complexity
Availability
Compatibility
Explainability
Interpretability
Inference
It is the process of generating an output from an input that you provided to the model.
Input = Prompt and inference parameters
Randomness and Diversity
Temperature (Lower value = high probability outputs and Higher value = Low probability outputs)
Top K (Lower value = decrease the size of pool)
Top P
Length
Response Length
Penalties
Stop sequences
Prompt
A specific set of inputs to guide LLMs to generate an appropriate output or completion
RAG - Retrieval Augmented Generation (RAG)
Prompt enrichment and appending external data to your prompt
Vector Database
Collection of data stored as mathematical representations
AWS Services for Vector search databases
Amazon OpenSearch Service
Amazon OpenSearch Serverless
Amazon Aurora PostgreSQL
Amazon RDS PostgreSQL
Amazon Aurora
Amazon Neptune
Amazon DocumentDB [with MongoDB compatibility]
Amazon Bedrock AGENTS
Orchestrate prompt completion workflows
Prompt
Zero shot prompting
Few shot prompting
Prompt Template
Chain-of-thought prompting
Prompt tuning
Latent space
The encoded knowledge of language in LLMs or the stored patterns of data that capture relationships and reconstruct the language from the patterns when prompted
Statistical database
Prompt Engineering risks and limitations
Exposure
Prompt Injection
Jailbreaking
Hijacking
Poisoning
Training process for foundation models
Pretraining - Self supervised learning
Fine-tuning - Supervised learning :: Catastrophic forgetting
Continuous pre-training
Fine-tuning techniques
Parameter-efficient fine-tuning (PEFT)
Low-Rank Adaptation (LoRA)
Representation fine-tuning (ReFT)
Multitask fine-tuning
Domain adaption fine-tuning
Reinforcement learning from human feedback (RLHF)
Data preparation fine-tuning
Prepare your training data
Select prompts
Calculate loss
Update weights
Define evaluation steps
Data preparation AWS Services
Amazon SageMaker Canvas
Open-source frameworks
Amazon Sagemaker studio - integration with EMR, can use jupyter labs
Amazon Glue
Amazon SageMaker Feature Store
Amazon SageMaker Clarify -- if you have bias in your data
Amazon SageMaker Ground Truth -- manage data labelling
Model performance
One option to reduce inference latency is to decrease the size of LLMs but might decrease its performance
Gen AI Performance Metrics
Recall Oriented Understudy for Gisting Evaluation (ROUGE)
Automatic summarization tasks
Machine translation software
Bilingual Evaluation Understudy (BLEU)
Used for translation tasks
General Language Understanding Evaluation (GLUE)
Compare against benchmarks set by the experts
Access model generalization across multiple tasks
Holistic Evaluation of Language Models (HELM)
Help improve model transparency
Massive Multitask Language Understanding (MMLU)
Evaluates knowledge and problem solving capabilities of the model
Tested against history, mathematics, laws, computer science and more
Beyond the Imitation Game Benchmark (BIG-bench)
Focuses on tasks that are beyond the capabilities of the current language models
AWS Services for model evaluation
Amazon SageMaker JumpStart
Amazon SageMaker Clarify
Review these materials to learn more about the topics covered in this exam domain:
1 What Are Foundation Models?(opens in a new tab) (opens in a new tab) (opens in a new tab)
2 Inference Parameters(opens in a new tab)
3 Knowledge Bases for Amazon Bedrock(opens in a new tab) (opens in a new tab)
4 Agents for Amazon Bedrock(opens in a new tab)
5 Amazon OpenSearch Serviceโs Vector Database Capabilities Explained(opens in a new tab)
6 The Role of Vector Datastores in Generative AI Applications(opens in a new tab)
7 Vector Engine for Amazon OpenSearch Serverless (opens in a new tab) (opens in a new tab) (opens in a new tab)
8 What Is Prompt Engineering?(opens in a new tab)
9 Domain-Adaptation Fine-Tuning of Foundation Models in Amazon SageMaker JumpStart on Financial Data(opens in a new tab) (opens in a new tab)
10 Metric: bleu(opens in a new tab)
11 Metric: rouge(opens in a new tab)
12 ReFT: Representation Fine-Tuning for Language Models
Responsible AI
Fairness
Explainability
Robustness
Privacy and security
Governance
Transparency
Effects of bias and variance
Demographic disparities
Inaccuracy
Overfitting
Underfitting
User Trust
Responsible datasets
Inclusivity
Diversity
Balanced datasets
Privacy protection
Consent and transparency
Regular audits
Responsible practices
Environmental considerations
Sustainability
Transparency
Accountability
Stakeholder engagement
AWS service for this
Amazon SageMaker Clarify
Detect bias
Explainability
SageMaker Processing jobs
SageMaker pre-training bias analysis
Class imbalance
Label imbalance
Demographic disparity
Difference in positive proportions
Specificity difference
Recall difference
Accuracy difference
Treatment equality
Gen AI Risks
Hallucinations
Intellectual Property
Bias
Toxicity
Data privacy
Guardrails for Amazon Bedrock
Hate
Insults
Sexual
Violence
- Denied topics
Model transparency
Interpretability - Deep analysis
Explainability - black box analysis
AI Service Card
Amazon SageMaker Model Cards
Sagemaker provides
Feature attributions - SHAP Values
Partial dependence plots
Amazon Augmented AI (A2I) - send data to human reviewers to review random predictions.
Use your own reviewers or use mechanical turf
Responsible AI in the Generative Era(opens in a new tab) (opens in a new tab) (opens in a new tab)
2 Transform Responsible AI from Theory into Practice(opens in a new tab)
3 Tools and Resources to Build AI Responsibly(opens in a new tab) (opens in a new tab)
4 What Is RLHF?(opens in a new tab)
5 Responsible AI Best Practices: Promoting Responsible and Trustworthy AI Systems
IAM Identity Center
Workforce users, Workforce identities
Logging with CloudTrail
Captures API calls and related events
Integrated with SageMaker
Amazon SageMaker Role Manager
Preconfigured permissions for 12 activities
Encryption at rest
Amazon SageMaker
Data is encrypted by default on ML storage volumes
Notebook instances, SageMaker jobs, and endpoints
AWS Key Management Service - KMS
Amazon Macie
Identifies and alerts you to sensitive data
Remove PII during ingestion
AI System Vulnerabilities
Training Data
Input Data
Output Data
Models
Inversion
Theft
LLM's
Prompt Injection
Amazon SageMaker Model Monitor
Capture data
Create a baseline
Define data quality monitoring jobs
Evaluate statistics
Amazon SageMaker Model Registry
Amazon SageMaker Model Cards
Amazon SageMaker ML Lineage Tracking
Amazon SageMaker Feature Store
Amazon SageMaker Model Dashboard
Emerging AI compliance standards
ISO 42001 and ISO 23894
EU Artificial Intelligence Act
NIST AI Risk Management Framework (RMF)
AI Risk Management
Probability of occurrence
Severity of occurrence
Algorithmic Accountability Act
Transparency and explainability
Monitor for Bias
AWS Audit Manager
Audits AWS usage to assess compliance
Choose a framework
Gen AI
Customer frameworks
Collect evidence and add to audit report
Guardrails for Amazon Bedrock
Apply guardrails to any foundation model and agents for Amazon Bedrock
Configure harmful content filtering
Define and disallow denied topics
PII data
AWS Config
Continuously monitors and records configurations
AWS Config rules
Conformance packs
Operational best practices for AI and ML
Security best practices for Amazon SageMaker
Amazon Inspector
Works at application level
Performs automated security assessments on your applications
AWS Trusted Advisor
Provides guidance to help you
Reduce cost
Increase performance
Improve security
Data Governance
Curation
Discovery and understanding
Protection
Define roles
Data steward
Data owner
IT Roles
AWS Glue DataBrew for data goverance
Data profiling
Data Lineage
AWS Glue Data Catalog
AWS Glue Data Quality
Curation
Data Quality Management
Data Integration
Data Management
Protection
Data Security
Data Compliance
Data Lifecycle management
Review these materials to learn more about the topics covered in this exam domain:
1 Shared Responsibility Model(opens in a new tab) (opens in a new tab) (opens in a new tab)
2 Securing Generative AI: Applying Relevant Security Controls(opens in a new tab)
3 AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI(opens in a new tab)
4 AWS Compliance(opens in a new tab)
5 Customer Compliance Center(opens in a new tab) (opens in a new tab) (opens in a new tab)
6 NIST Artificial Intelligence Risk Management Framework(opens in a new tab)
7 ISO 42001: A New Foundational Global Standard to Advance Responsible AI(opens in a new tab)
8 The EU Artificial Intelligence Act(opens in a new tab) (opens in a new tab)
9 Learn How to Assess the Risk of AI Systems(opens in a new tab)
10 What Is Data Governance?(opens in a new tab)
11 Data Governance in the Age of Generative AI
How to Choose a Machine Learning Algorithm? (serokell.io