Agricultural Product Classification Using Machine Learning

🌾 Introduction
Agriculture plays a vital role in feeding the global population, making it essential to improve productivity and efficiency in this sector. One of the key challenges in modern agriculture is the accurate classification of agricultural products, which directly impacts quality control, pricing, and supply chain management. Traditional methods of classification often rely on manual inspection, which can be time-consuming and prone to errors.

With the rise of data-driven technologies, Machine Learning (ML) offers a powerful alternative to automate and enhance agricultural product classification. By analyzing dimensional and shape factors, ML algorithms can accurately classify various agricultural products, reducing manual labor and increasing efficiency. This project explores how machine learning techniques can be applied to classify agricultural products using real-world data, leading to smarter agricultural practices and better decision-making.

📊 Project Overview
This project focuses on classifying agricultural products based on dimensional and shape factors using machine learning. The goal is to develop an accurate and efficient classification system by exploring data preprocessing techniques, visualizations, and multiple machine learning algorithms.

📁 Dataset
Note: The Data was taken from

https://docs.google.com/spreadsheets/d/1sSndhD8mKBppcsMw8TiKUR4fzxtTlabK/edit?usp=sharing&ouid=116447350445311431259&rtpof=true&sd=true

Entries: 13,611
Columns: 17
Data Types: 14 float64, 2 int64, 1 object
The dataset contains detailed information about agricultural plants, including area, perimeter, major axis length, and shape factors. Preprocessing steps included:

Removing 68 duplicate entries.
Encoding the categorical ‘Class’ column.
Ensuring no missing values.

📊 Data Preprocessing
Data Cleaning: Removed duplicates and encoded categorical data.
Feature Scaling: Applied standard scaling for improved model performance.
Data Splitting: Divided the dataset into training and testing sets.

📈 Data Visualization
Utilized Matplotlib and Seaborn for:

Pair Plots: To visualize feature relationships.
Histograms & Distribution Plots: To understand feature distributions.
Scatter Plots: To observe trends and correlations.
Heatmap: To display feature correlation.
Box Plots: To examine feature distribution across classes.

🤖 Machine Learning Models Used
Random Forest Classifier
Accuracy: 91.47%
Precision: 92.65%
Recall: 91.59%
F1-Score: 92.12%

K-Nearest Neighbors (KNN) (Best Performer)

Accuracy: 91.95%
Precision: 92.62%
Recall: 92.20%
F1-Score: 92.41%

Support Vector Classifier (SVC)

Accuracy: 87.93%
Precision: 92.62%
Recall: 92.20%
F1-Score: 92.41%

⚖️ Model Comparison
KNN outperformed the other models, making it the most suitable for classifying agricultural products in this project.
Random Forest followed closely, while SVC showed comparatively lower accuracy.

🤖 Model Training & Evaluation

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Feature and target split
X = df.drop(['Class_BO', 'Class_CA', 'Class_DE', 'Class_HO', 'Class_SE', 'Class_SI'], axis=1)
y = df[['Class_BO', 'Class_CA', 'Class_DE', 'Class_HO', 'Class_SE', 'Class_SI']]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Random Forest
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))

# KNN
knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)
print("KNN Accuracy:", accuracy_score(y_test, y_pred_knn))

# SVC
svc = SVC(kernel='linear')
svc.fit(X_train, y_train)
y_pred_svc = svc.predict(X_test)
print("SVC Accuracy:", accuracy_score(y_test, y_pred_svc))

✅ Conclusion
This project demonstrates how machine learning can be applied to classify agricultural products based on shape and dimensional factors. The KNN model proved to be the most effective, showcasing strong performance across all evaluation metrics.

🛠️ Tools & Technologies Used
Python
Jupyter Notebook
Pandas & NumPy: Data preprocessing
Matplotlib & Seaborn: Data visualization
Scikit-learn: Machine learning algorithms and evaluation