"Understanding Your Data: The Essentials of Exploratory Data Analysis". #exploratory data analysis #data science # statistics

Tony Ndereva - Aug 17 - - Dev Community

Exploratory Data Analysis (EDA) involves investigating datasets to understand the variables in the data set and their relationships better through visualization and summary statistics.

EDA enables data scientists to spot anomalies, get a picture of the dataset through summaries and visualizations and avoid making inappropriate assumptions. Additionaly, vairables identified in EDA can be used later in machine learning to build predictive model.

Types of EDA
There are 4 types of Exploratory Data Analysis:

  1. Univariate non-graphical: This methods involve the use of statistics to obtain various descriptions of a dataset with only one variable(univariate)

2.
Univariate graphical: This involves the use of graphical methods such as stem and leaf plots and box plots to visualize one variable data making it easier for the scientist to understand the dataset.

3.
Multivariate non-graphical: This method uses statistical methods such as correlation, covariance and regression to identify the relationships between different variables in a dataset. For example, the relationship between housing and inflation can quantified using correlation.

4.
Multivariate graphical: This makes use of various graphical methods such as scatter plts and regression lines to visualize the relationship between various variables. This aids in the identification and understanding of these relationships by the data scientist.

Tools in EDA
Apart from a good grasp of statistics, computer languages such as Python and R come in handy in exploratory data analysis.

. . .