PYTHON 101: INTRODUCTION TO PYTHON FOR DATA ANALYTICS

John Wakaba - Oct 6 - - Dev Community

PYTHON 101: INTRODUCTION TO PYTHON FOR DATA ANALYTICS

Python is a versatile and powerful programming language, widely used in data analytics due to its simplicity and the vast ecosystem of libraries tailored for data processing. In this guide, we'll cover the essentials you need to get started with Python for data analytics, including variables, data types, control structures, functions, and an introduction to NumPy, a fundamental library for numerical computing.

INTRODUCTION TO PYTHON FOR DATA ANALYTICS CONCEPTS

Variables

Variables are containers for storing data values. In Python, you don’t need to declare the type of a variable explicitly, as it is inferred based on the value you assign.

CODE:

age = 30
name = "John"
Enter fullscreen mode Exit fullscreen mode

Data Types

Python has various built-in data types:

Integers: Whole numbers (10, -5)

Floats: Decimal numbers (3.14, -2.5)

Strings: Text data ("Hello", "123")

Booleans: True or False values (True, False)

Lists: Ordered, mutable collections of items ([1, 2, 3])

Dictionaries: Key-value pairs ({"name": "John", "age": 30})

CODE:

x = 10
print(type(x)) 
Enter fullscreen mode Exit fullscreen mode

Lists vs. Tuples

Lists are mutable, meaning you can modify their elements after creation.

CODE:

my_list = [1, 2, 3]
my_list[0] = 4
print(my_list) # The Output is [4, 2, 3]
Enter fullscreen mode Exit fullscreen mode

Tuples are immutable, meaning once they are created, their values cannot be changed.

CODE:

my_tuple = (1, 2, 3)
# my_tuple[0] = 4 
print(my_tuple) # The Output is  (1, 2, 3)

Enter fullscreen mode Exit fullscreen mode
# my_tuple[0] = 4 would raise the following error
TypeError                                 Traceback (most recent call last)
Cell In[12], line 2
      1 my_tuple = (1, 2, 3)
----> 2 my_tuple[0] = 4 
      3 print(my_tuple) # The Output is  (1, 2, 3)

TypeError: 'tuple' object does not support item assignment
Enter fullscreen mode Exit fullscreen mode

Comparison Operators

Comparison operators allow you to compare values:

==: Equal to

!=: Not equal to

: Greater than

<: Less than

CODE:

x = 5
y = 10
print (x > y) # The Output Is False
Enter fullscreen mode Exit fullscreen mode

Logical Operators

Logical operators are used to combine conditional statements:

and: True if both conditions are true

or: True if at least one condition is true

not: Reverses the result (True becomes False)

CODE:

x = 5
y = 10
print(x < 10 and y > 5) # The Output Is True
Enter fullscreen mode Exit fullscreen mode

Membership Operators

Membership operators check if an item is present in a sequence (list, tuple, string):

in: True if the item is found

not in: True if the item is not found

CODE:

my_list = [1, 2, 3]
print(3 in my_list)  # The Output Is True
Enter fullscreen mode Exit fullscreen mode

If-Else Statements

Conditional statements allow decision-making:

CODE:

if x > 5:
    print("x is greater than 5")
else:
    print("x is less than or equal to 5") # The Output Is x is less than or equal to 5
Enter fullscreen mode Exit fullscreen mode

For Loops

Loops allow you to iterate over sequences:

CODE:

for i in range(5):
    print(i)
Enter fullscreen mode Exit fullscreen mode

Functions

Functions enable code reuse. You define a function using the def keyword
CODE:

def greet(name):
    return f"Hello, {name}!"
print(greet("John"))
Enter fullscreen mode Exit fullscreen mode

NUMPY

Python alone is powerful, but for large-scale data analytics and mathematical operations, NumPy is essential. NumPy introduces a high-performance, multi-dimensional array object known as ndarray, which is much more efficient for numerical computations than Python's built-in lists.

NumPy Arrays vs. Python Lists

  • Lists: Flexible, can store mixed data types, but are slower for numerical operations.

CODE:

my_list_1 = [11, 21, 31, 41]
Enter fullscreen mode Exit fullscreen mode
  • NumPy Arrays: Homogeneous (all elements are of the same type) and optimized for performance.

CODE:

import numpy as np
my_array = np.array([10, 20, 30, 40])
Enter fullscreen mode Exit fullscreen mode

NumPy arrays are faster and more efficient because they use contiguous memory. Python lists store each element as an independent object in memory, whereas NumPy arrays store data in a block of memory, making it easier and faster to perform operations like matrix multiplication and element-wise arithmetic.

Creating NumPy Arrays

You can create arrays in NumPy using various functions.

CODE:

import numpy as np
# Creating a simple array
arr = np.array([1, 2, 3, 4])
# Creating an array of zeros
zeros = np.zeros(5)
# Creating an array with a range of values
range_arr = np.arange(1, 10, 2)   
Enter fullscreen mode Exit fullscreen mode

Operations with NumPy Arrays

NumPy allows you to perform element-wise operations on arrays, which is not as straightforward with Python lists.
CODE:

arr = np.array([1, 2, 3, 4])
arr2 = arr * 2  # Element-wise multiplication
print (arr2) # The Output Is [2 4 6 8]
Enter fullscreen mode Exit fullscreen mode

Memory Efficiency in NumPy

NumPy arrays consume less memory compared to lists because arrays store elements of the same data type, allowing for more compact storage. For instance, a Python list stores references to each item, while a NumPy array stores data directly in contiguous memory locations, making operations faster and more memory-efficient.

Converting Data Types in NumPy

NumPy makes it easy to convert data types for numerical computations.

CODE:

arr = np.array([1.0, 2.0, 3.0])
arr_int = arr.astype(int)  # Convert array to integers
Enter fullscreen mode Exit fullscreen mode

Functions in Data Analytics Scripts

In addition to Python's built-in functions, you will often define custom functions for specific tasks like data cleaning, analysis, and transformation. When working with data analytics, functions help modularize your code and make it reusable across different datasets.

CODE:

def normalize_data(data):
    max_value = np.max(data)
    min_value = np.min(data)
    return (data - min_value) / (max_value - min_value)

# Usage with NumPy array
data = np.array([10, 20, 30, 40, 50])
normalized_data = normalize_data(data)
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

Python, combined with libraries like NumPy, provides a solid foundation for data analytics. Understanding key concepts such as variables, data types, loops, and functions, alongside NumPy’s efficient array manipulation, prepares you to handle large datasets with ease. As you progress, you’ll unlock more sophisticated tools in Python’s data analytics ecosystem, including Pandas for data manipulation and Matplotlib for visualization.

. . . . . . . .