PYTHON 101: INTRODUCTION TO PYTHON FOR DATA ANALYTICS
Python is a versatile and powerful programming language, widely used in data analytics due to its simplicity and the vast ecosystem of libraries tailored for data processing. In this guide, we'll cover the essentials you need to get started with Python for data analytics, including variables, data types, control structures, functions, and an introduction to NumPy, a fundamental library for numerical computing.
INTRODUCTION TO PYTHON FOR DATA ANALYTICS CONCEPTS
Variables
Variables are containers for storing data values. In Python, you don’t need to declare the type of a variable explicitly, as it is inferred based on the value you assign.
CODE:
age = 30
name = "John"
Data Types
Python has various built-in data types:
Integers: Whole numbers (10, -5)
Floats: Decimal numbers (3.14, -2.5)
Strings: Text data ("Hello", "123")
Booleans: True or False values (True, False)
Lists: Ordered, mutable collections of items ([1, 2, 3])
Dictionaries: Key-value pairs ({"name": "John", "age": 30})
CODE:
x = 10
print(type(x))
Lists vs. Tuples
Lists are mutable, meaning you can modify their elements after creation.
CODE:
my_list = [1, 2, 3]
my_list[0] = 4
print(my_list) # The Output is [4, 2, 3]
Tuples are immutable, meaning once they are created, their values cannot be changed.
CODE:
my_tuple = (1, 2, 3)
# my_tuple[0] = 4
print(my_tuple) # The Output is (1, 2, 3)
# my_tuple[0] = 4 would raise the following error
TypeError Traceback (most recent call last)
Cell In[12], line 2
1 my_tuple = (1, 2, 3)
----> 2 my_tuple[0] = 4
3 print(my_tuple) # The Output is (1, 2, 3)
TypeError: 'tuple' object does not support item assignment
Comparison Operators
Comparison operators allow you to compare values:
==: Equal to
!=: Not equal to
: Greater than
<: Less than
CODE:
x = 5
y = 10
print (x > y) # The Output Is False
Logical Operators
Logical operators are used to combine conditional statements:
and: True if both conditions are true
or: True if at least one condition is true
not: Reverses the result (True becomes False)
CODE:
x = 5
y = 10
print(x < 10 and y > 5) # The Output Is True
Membership Operators
Membership operators check if an item is present in a sequence (list, tuple, string):
in: True if the item is found
not in: True if the item is not found
CODE:
my_list = [1, 2, 3]
print(3 in my_list) # The Output Is True
If-Else Statements
Conditional statements allow decision-making:
CODE:
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5") # The Output Is x is less than or equal to 5
For Loops
Loops allow you to iterate over sequences:
CODE:
for i in range(5):
print(i)
Functions
Functions enable code reuse. You define a function using the def keyword
CODE:
def greet(name):
return f"Hello, {name}!"
print(greet("John"))
NUMPY
Python alone is powerful, but for large-scale data analytics and mathematical operations, NumPy is essential. NumPy introduces a high-performance, multi-dimensional array object known as ndarray, which is much more efficient for numerical computations than Python's built-in lists.
NumPy Arrays vs. Python Lists
- Lists: Flexible, can store mixed data types, but are slower for numerical operations.
CODE:
my_list_1 = [11, 21, 31, 41]
- NumPy Arrays: Homogeneous (all elements are of the same type) and optimized for performance.
CODE:
import numpy as np
my_array = np.array([10, 20, 30, 40])
NumPy arrays are faster and more efficient because they use contiguous memory. Python lists store each element as an independent object in memory, whereas NumPy arrays store data in a block of memory, making it easier and faster to perform operations like matrix multiplication and element-wise arithmetic.
Creating NumPy Arrays
You can create arrays in NumPy using various functions.
CODE:
import numpy as np
# Creating a simple array
arr = np.array([1, 2, 3, 4])
# Creating an array of zeros
zeros = np.zeros(5)
# Creating an array with a range of values
range_arr = np.arange(1, 10, 2)
Operations with NumPy Arrays
NumPy allows you to perform element-wise operations on arrays, which is not as straightforward with Python lists.
CODE:
arr = np.array([1, 2, 3, 4])
arr2 = arr * 2 # Element-wise multiplication
print (arr2) # The Output Is [2 4 6 8]
Memory Efficiency in NumPy
NumPy arrays consume less memory compared to lists because arrays store elements of the same data type, allowing for more compact storage. For instance, a Python list stores references to each item, while a NumPy array stores data directly in contiguous memory locations, making operations faster and more memory-efficient.
Converting Data Types in NumPy
NumPy makes it easy to convert data types for numerical computations.
CODE:
arr = np.array([1.0, 2.0, 3.0])
arr_int = arr.astype(int) # Convert array to integers
Functions in Data Analytics Scripts
In addition to Python's built-in functions, you will often define custom functions for specific tasks like data cleaning, analysis, and transformation. When working with data analytics, functions help modularize your code and make it reusable across different datasets.
CODE:
def normalize_data(data):
max_value = np.max(data)
min_value = np.min(data)
return (data - min_value) / (max_value - min_value)
# Usage with NumPy array
data = np.array([10, 20, 30, 40, 50])
normalized_data = normalize_data(data)
Final Thoughts
Python, combined with libraries like NumPy, provides a solid foundation for data analytics. Understanding key concepts such as variables, data types, loops, and functions, alongside NumPy’s efficient array manipulation, prepares you to handle large datasets with ease. As you progress, you’ll unlock more sophisticated tools in Python’s data analytics ecosystem, including Pandas for data manipulation and Matplotlib for visualization.