Weather Data Analysis and Forecasting for Different Cities in Kenya
Introduction
In this article, I’ll walk you through analyzing weather patterns using Python. From identifying temperature trends to visualizing rainfall, this step-by-step guide is perfect for anyone interested in using data science techniques for weather analysis. I’ll explore code, data manipulation, and visualizations for practical insights.
In Kenya, Weather plays a critical role in many sectors, particularly agriculture, tourism, and outdoor activities. Farmers, businesses, and event planners need accurate weather information in order to make decisions. However, weather patterns can vary significantly across different regions, and current forecasting systems may not always provide localised insights.
The objective of this project is to collect real-time weather data from from OpenWeatherMap API and Weather API for different regions across Kenya. This data will be stored in a database and analysed using Python to uncover insights into:-
- Temperature trends
- Rainfall patterns - Humidity and wind conditions
In this project, I analyze a dataset containing weather information for various cities in Kenya. The dataset includes over 3,000 rows of weather observations, including temperature, humidity, pressure, wind speed, visibility, and rainfall, among other factors. Using these insights, we aim to provide accurate, region specific weather forecast that can aid decision-making in weather sensitive sectors like agriculture, tourism, and even management.
Dataset overview
The dataset was structured using several columns:
- Datetime - Timestamp indicating when the weather was recorded.
- City and Country - Location of the weather observation.
- Latitude and Longitude - Geographical coordinates of the location.
- Temperature (Celsius) - The temperature recorded.
- Humidity (%) - The percentage of humidity in the air.
- Pressure (hPa) - The atmospheric pressure in hectopascals.
- Wind Speed (m/s) - The speed of the wind at the time.
- Rain (mm) - The amount of rainfall measured in millimeters.
- Clouds (%) - The percentage of cloud coverage.
- Weather Condition and Weather Description - General and detailed descriptions of the weather (e.g., 'Clouds', 'Scattered Clouds').
This is how the data is structured in the database.
Exploratory Data Analysis
The first step in the analysis involved basic exploration of the data.
_ Data dimensions - The dataset contains 3,000 rows and 14 columns.
_ Null Values - Minimal missing data, ensuring that the dataset was reliable for further analysis.
print(df1[['temperature_celsius', 'humidity_pct', 'pressure_hpa', 'wind_speed_ms', 'rain', 'clouds']].describe())
Using the code above, we computed summary statistics for the numerical columns, that provided insights into the range, mean, and spread of temperature, humidity, pressure, rainfall and clouds.
Visualising Key Weather Features
To gain a clearer understanding of the weather features, we plotted various distributions:
Temperature Distribution
sns.displot(df1['temperature_celsius'], bins=50, kde=True)
plt.title('Temperature Distribution')
plt.xlabel('Temperature (Celsius)')
This distibution reveals the general spread of temperatures across the cities. The KDE line plot gives a smooth estimate of the probability distribution of temperature.
Rainfall Distribution
sns.displot(df1['rain'], bins=50, kde=True)
plt.title('Rainfall Distribution')
plt.xlabel('Rainfall (mm/h)')
This code analyzes rainfall distribution across kenyan cities.
Humidity, Pressure and Wind Speed
Similar distribution plots for Humidity (%), Pressure (hPa), and Wind Speed (m/s), each providing useful insights into the variations of these parameters across the dataset.
Weather Condition Analysis
Weather conditions (e.g., 'Clouds', 'Rain') were counted and visualized using a pie chart to show their proportional distribution:
condition_counts = df1['weather_condition'].value_counts()
plt.figure(figsize=(8,8))
plt.pie(condition_counts, labels=condition_counts.index, autopct='%1.1f%%', pctdistance=1.1, labeldistance=0.6, startangle=140)
plt.title('Distribution of Weather Conditions')
plt.axis('equal')
plt.show()
City-wise Rainfall
One of the key analysis was the total rainfall by city:
rainfall_by_city = df1.groupby('city')['rain'].sum().sort_values()
plt.figure(figsize=(12,12))
rainfall_by_city.plot(kind='barh', color='skyblue')
plt.title('Total Rainfall by City')
plt.xlabel('Total Rainfall (mm)')
plt.ylabel('City')
plt.tight_layout()
plt.show()
This bar plot highlighted which cities received the most rain over the observed period, with a few outliers showing significant rainfall compared to others.
Average Monthly Temperature
avg_temp_by_month.plot(kind='line')
plt.title('Average Monthly Temperature')
The line chart revealed temperature fluctuations across different months, showing seasonal changes.
Average Monthly Rainfall
monthly_rain.plot(kind='line')
plt.title('Average Monthly Rainfall')
Similarly, rainfall was analyzed to observe how it varied month-to-month.
We also visualized the data using heatmaps for a more intuitive understanding of monthly temperature and rainfall.
Here are the heatmaps for the average monthly temperature and rainfall
Correlation Between Weather Variables
Next, I calculated the correlation matrix between key weather variables:
correlation_matrix = df1[['temperature_celsius', 'humidity_pct', 'pressure_hpa', 'wind_speed_ms', 'rain', 'clouds']].corr()
correlation_matrix
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Between Weather Variables')
This correlation heatmap allowed us to identify relationships between variables. For example, we observed a negative correlation between temperature and humidity, as expected.
Case Study: City Specific Trends
I have focused on individual cities such as Mombasa and Nyeri, to explore their unique weather patterns:
Mombasa Temperature Trends
plt.plot(monthly_avg_temp_msa)
plt.title('Temperature Trends in Mombasa Over Time')
This city showed significant variation in temperature across the year.
Nyeri Rainfall Trends
plt.plot(monthly_avg_rain_nyr)
plt.title('Rainfall Trends in Nyeri Over Time')
The rainfall data for Nyeri displayed a clear seasonal pattern, with rainfall peaking during certain months.
Conclusion
This analysis provides a comprehensive overview of the weather conditions in major cities, highlighting the temperature, rainfall, and other key weather variables. By using visualizations like histograms, line charts, pie charts, and heatmaps, we were able to extract meaningful insights into the data. Further analysis could involve comparing these trends with historical weather patterns or exploring predictive modeling to forecast future weather trends.
You can find the Jupyter Notebook with the full code for this analysis in my GitHub repository.