INTRODUCTION TO PYTHON FOR DATA ANALYSIS
Python is a high-level, interpreted programming language known for its readability and simplicity. Python was created by Guido Van Rossum and first released in 1991.
It emphasizes on code clarity which makes it a better choice for beginners and well-experienced developers because of its readability and simplicity.
Python is used in various fields such as data analysis, data science, web development, and automation.
WHY PYTHON FOR DATA ANALYSIS
- Rich Libraries: Python has a variety of libraries specifically designed for data analysis such as: pandas: For data manipiulation and analysis. Numpy: For numerical computing and handling arrays. Matplotlib and seaborn: for data visualization. Scipy: For scientific and technical computing. Scikit-learn: For machine learning.
- Community support: Python has a large and active community, where you can easily find resources, tutorials, and forums where you can get assistance.
- Integration: Python is easily integratable with other languages and technologies making it suitable for complex workflows.
- Flexibility: It is easy to handle when building machine learning models, creating visualizations and doing exploratory data analysis.
DATA TYPES
Python has several built-in data types, which can be categorized as:
-
Numeric types
int- integer values(1, 5, 7)
float-floating point number that is decimals (2.4, 3.142)
complex- complex numbers with a real number and imaginary part(4 + 2x) -
Sequence types
str- string (sequence of characters)"hello world"
list- mutable ordered sequence of elements [1, 2, 3] ["dog", "cats", "cows"]
tuple- immutable ordered sequence of elements (1, 2, 3) ("dog", "cats", "cows")
range- represents a range of numbers range(0, 10)
dict- dictionary, a collection of key value pairs {"name:" "alice", "Age:" "30"} Set types
set- unordered collection of unique pairs {1, 2, 3} {"dog", "cats"}
bool
- represents True and False
GETTING STARTED
Getting started with python you have to start with the installation process that is installing the neccessary tools, notebooks and virtual environments to work with.
Anacoda is the best distribution to work with as it comes with pre-installed libraries. After installation of anaconda launch the jupyter notebook which will be used to run the python codes.
After installation you can start with a simple code to help you familiarize yourself with python the syntax is as below:
# printing hello world
print("hello world!")
The output is:
hello world!
Remember hello world! is a string, therefore to output a string you have to use "quotation marks".
Here is a code to distinguish how to print a string and the output from a variable name;
#Printing fruits such as oranges bananas apples
fruits = ("oranges", "bananas", "apples")
print(fruits)
print("fruits")
There are two outputs from this code
oranges, bananas, apples
fruits
The first output fruit being a variable name and therefore outputs the data values stored in it.
The second output is the string fruits because of the use of the quotation marks"".
COMMENTS
Comments are really useful when writting codes as it explains why a certain code was written.
In python this comments are written anywhere in the code using the hash(#) sign
# this code adds 2+2 and gives the output
# this is an illustration of how comments work
a = 2+2
print(a)
The output
4
Comments are not excecutable when running a code therefore they do not affect the code if written properly
comments do not display in the output
Comments can be written in any language that is understandable by the users of the code
VARIABLES
Variables are containers used to store data values either numericals or textual
Rules when naming variables
variables are case sensitive such as Name is not same as name
Keywords cannot be used as variable names
variables can not contain spaces but instead use underscore.
ARITHMETIC OPERATORS
This operators are used to aid in mathematical calculations.
addittion(+)
subtraction(-)
division(/)
floor divisin(//) divides two numbers and round off the result to the nearest whole number
modulus(%) returns the remainder after division
exponential (**
) raises the first number to the power of the second number
multiplication(*
)
COMPARISON OPERATORS
== equal to
!= not equal
< less than
<= less than or equal to
greater than
= greater than or equal to
LOGICAL OPERATORS
AND returns true if both statenments are true
OR returns true if one of the statements is true
NOT reverses the result giving true if both statements are false and gives false if both statements are true
CONTROL STRUCTURES
The control structure helps you indictate the flow of you program based on various conditions. The control structures helps in making valid decisions, repeating some actions and also maging the flow of your code excecution.
We will have a look into the following control structures:
1. CONDITIONAL STATEMENTS
if statements- the if statement escecutes a block of code if the specified condition is true.
The output of an if statement is entirely based on the condition being true otherwise it does not output anything.
#example of if statement
x=10
if x>5: #changing the comparison sign from > to < it does not give any output considering the condition is false
print("The value is greater than 5")
output:
The value is greater than 5
if-else statements- The if statement works simmilar to one above but the else works as an alternative of the if statement running if the if statemnt is false.
The if-else statement works well because one of the conditions must be met, that is a condition can only be true or false and neither of both.
# we will still use the code above but with a different comparison operator
x=10
if x<5:
print("the value is less than 5")
else:
print("the value is greater than 5")
output:
the value is greater than 5
if-elif-else statement- The if-elif-elsestatement is an upgrade of the if-else statement because it works on multiple conditions unlike the if-else statement.
# red- stop, yellow-get ready, green-go
# automated traffic light
colour = "green"
if colour== "red":
print("stop")
elif colour == "yellow":
print("get ready")
elif colour == "green":
print("go")
else:
print("invalid traffic code")
output:
go
The if-elif-statements runs untill a true condition is met and returns the else condition if no condition is true.
2. LOOPS
for loop- a for loop iterates over a sequence such as a list, tuple or a string
numbers = [1,2,3,4,5,6,]
for num in numbers:
print(num)
output:
1
2
3
4
5
6
while loop- a while loop excecutes as long as a certain condition.
day = 1
while day <= 7:
print (day)
day +=1
output:
1
2
3
4
5
6
7
3. LOOP CONTROL STATEMENTS
BREAK- used to exit the loop immediately at a certain point of the code excecution when the condition is met.
for num in range(10):
if num == 5:
break
print (num)
output:
0
1
2
3
4
5
CONTINUE- skips the current iteration provided in the condition and proceeds to the next
for num in range(10):
if num == 5:
continue
print (num)
output:
0
1
2
3
4
6
7
8
9
10
- NESTED CONTROL STRUCTURES
num = 10
if num>5:
for int in range(5):
print(f"{num} is greater than 5: {int}")
output:
10 is greater than 5: 0
10 is greater than 5: 1
10 is greater than 5: 2
10 is greater than 5: 3
10 is greater than 5: 4
This is just an overview of the introduction to python for data analysis, this forms a great basis of your python journey in data analysis.
In a different article we will look at the different libraries used in data analysis.