Sentiment Analysis on IMDB Movie Reviews Using BERT

Seenivasa Ramadurai - Feb 25 - - Dev Community

In the ever-evolving world of Natural Language Processing (NLP), sentiment analysis remains a crucial task. Today, we'll dive into a powerful approach to sentiment analysis using BERT (Bidirectional Encoder Representations from Transformers) on the IMDB movie reviews dataset. This blog will guide you through the process of building a sentiment analysis model that can classify movie reviews as positive or negative.

The Dataset

We'll be using the IMDB dataset, which contains 50,000 movie reviews split evenly between positive and negative sentiments. This dataset is widely used in the NLP community and provides a great starting point for sentiment analysis tasks.

Setting Up the Environment

Before we begin, make sure you have the necessary libraries installed:

pip install pandas datasets scikit-learn transformers torch tensorflow
pip install --upgrade tensorflow transformers
Enter fullscreen mode Exit fullscreen mode

Loading and Preprocessing the Data

First, let's load the IMDB dataset using the Hugging Face datasets library:

from datasets import load_dataset
import pandas as pd

# Load IMDB dataset
dataset = load_dataset('imdb')

# Convert to pandas DataFrame
train_dataframe = pd.DataFrame(dataset['train'])
test_dataframe = pd.DataFrame(dataset['test'])

# Display basic info
print(train_dataframe.info())
print(train_dataframe['label'].value_counts(normalize=True))
Enter fullscreen mode Exit fullscreen mode

Image description

plt.figure(figsize=(8, 6))
sns.countplot(x='label', data=train_df)
plt.title('Distribution of Sentiment Labels')
plt.xlabel('Sentiment')
plt.ylabel('Count')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image description

Preprocessing with BERT Tokenizer

Next, we'll preprocess the text data using BERT's tokenizer:

from transformers import BertTokenizer
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def preprocess_data(texts, labels, max_length=256):
    encoded = tokenizer.batch_encode_plus(
        texts,
        add_special_tokens=True,
        max_length=max_length,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors='pt'
    )
    return {
        'input_ids': encoded['input_ids'],
        'attention_mask': encoded['attention_mask'],
        'labels': torch.tensor(labels)
    }

# Preprocess training and testing data
train_data = preprocess_data(train_dataframe['text'].tolist(), train_dataframe['label'].tolist())
test_data = preprocess_data(test_dataframe['text'].tolist(), test_dataframe['label'].tolist())
Enter fullscreen mode Exit fullscreen mode

Setting Up the Model

We'll use the BertForSequenceClassification model from Hugging Face:

from transformers import BertForSequenceClassification, AdamW
from torch.utils.data import DataLoader, TensorDataset

# Initialize model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Optimizer
optimizer = AdamW(model.parameters(), lr=2e-5)

# Create DataLoader
train_dataset = TensorDataset(train_data['input_ids'], train_data['attention_mask'], train_data['labels'])
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
Enter fullscreen mode Exit fullscreen mode

Training the Model

Now, let's train our model:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

num_epochs = 3
for epoch in range(num_epochs):
    model.train()
    for batch in train_loader:
        input_ids, attention_mask, labels = [b.to(device) for b in batch]
        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
    print(f"Data passes through {epoch+1}/{num_epochs} times or epochs. ")

# Save the model
torch.save(model.state_dict(), 'bert_sentiment_v1_model.pth')
Enter fullscreen mode Exit fullscreen mode

Training Environment Details

I trained the model using Google Colab with the runtime setting configured to utilize a GPU. Despite leveraging GPU acceleration, the training process took approximately 49 minutes to complete.

Image description

Image description

Evaluating the Model

After training, let's evaluate our model's performance:

from sklearn.metrics import accuracy_score, classification_report

model.eval()
test_dataset = TensorDataset(test_data['input_ids'], test_data['attention_mask'], test_data['labels'])
test_loader = DataLoader(test_dataset, batch_size=32)

all_preds = []
all_labels = []

with torch.no_grad():
    for batch in test_loader:
        input_ids, attention_mask, labels = [b.to(device) for b in batch]
        outputs = model(input_ids, attention_mask=attention_mask)
        preds = torch.argmax(outputs.logits, dim=1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

accuracy = accuracy_score(all_labels, all_preds)
print(f"Accuracy: {accuracy}")
print(classification_report(all_labels, all_preds))
Enter fullscreen mode Exit fullscreen mode

Image description

Interpreting the Model's Performance

Let's break down the results of our sentiment analysis model:

Overall Accuracy

The model achieved an impressive overall accuracy of 92.128%. This means that out of all the movie reviews in the test set, our model correctly classified 92.128% of them as either positive or negative.

Class-specific Metrics

Negative Reviews (Class 0)

Precision: 0.93

Recall: 0.91

F1-score: 0.92

Positive Reviews (Class 1)

Precision: 0.91

Recall: 0.93

F1-score: 0.92

Interpretation

Balanced Performance: The model shows consistent performance across both positive and negative reviews, with identical F1-scores of 0.92 for both classes. This indicates that the model is well-balanced and doesn't favor one sentiment over the other.

Precision:

For negative reviews, the precision of 0.93 means that when the model predicts a review is negative, it's correct 93% of the time.

For positive reviews, the precision of 0.91 indicates that when the model predicts a review is positive, it's correct 91% of the time.

Recall:

For negative reviews, the recall of 0.91 means the model correctly identifies 91% of all actual negative reviews.

For positive reviews, the recall of 0.93 shows the model correctly identifies 93% of all actual positive reviews.

F1-Score:

The F1-score of 0.92 for both classes represents a strong balance between precision and recall, indicating robust overall performance.

Support:

The test set contains an equal number of positive and negative reviews (12,500 each), ensuring a balanced evaluation.

In conclusion, this model demonstrates excellent and balanced performance in sentiment analysis of movie reviews. Its high accuracy and consistent metrics across both positive and negative sentiments make it a reliable tool for classifying movie review sentiments.

Making Predictions

Finally, let's create a function to predict sentiment for new reviews:

def predict_sentiment(text):
    encoded = tokenizer.encode_plus(
        text,
        add_special_tokens=True,
        max_length=256,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors='pt'
    )
    input_ids = encoded['input_ids'].to(device)
    attention_mask = encoded['attention_mask'].to(device)

    with torch.no_grad():
        outputs = model(input_ids, attention_mask=attention_mask)
        pred = torch.argmax(outputs.logits, dim=1)
    return "Positive" if pred.item() == 1 else "Negative"

# Example usage
# Test with new sentences
print(predict_sentiment("I love this movie!"))  
print(predict_sentiment("This movie was terrible."))

Enter fullscreen mode Exit fullscreen mode

Here is my GOOGLE COLAB Notebook if you want to copy and run the code

https://colab.research.google.com/drive/138HZtdJib-aOoldL4pvl3rLwx8_YsQ1D?usp=drive_link

Conclusion

In this blog, we've walked through the process of building a sentiment analysis model using BERT on the IMDB movie reviews dataset. This powerful approach leverages the pre-trained BERT model and fine-tunes it for our specific task, resulting in high accuracy in sentiment classification.

By following these steps, you can create your own sentiment analysis model and apply it to various text classification tasks. Remember that you can further improve the model by experimenting with different hyperparameters, using more advanced BERT variants, or incorporating additional features into your analysis.

Happy coding and sentiment analyzing!

Thanks
Sreeni Ramadorai

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .