⚡ I Made a JavaScript Library that Leaves Profanity Speechless! 🤬

Best Codes - Feb 8 - - Dev Community

How I Built a Profanity Blocking JavaScript Library

Introduction

As developers, we often come across situations where we need to filter and sanitize text to block or remove profanity. To tackle this problem, I decided to create a JavaScript library called bc-ProfanityBlock. In this article, I will walk you through the steps I took to build this library and explain how it can be used to effectively block profanity in your applications.

Step 1: Defining the Problem

The first step in building any library is to clearly define the problem we are trying to solve. In this case, the problem was to create, well, a solution that could detect and handle profanity in text. Profanity can be present in various forms, including common words, variations, and even with evasion characters. The goal was to build a library that could efficiently detect and sanitize such content.

Step 2: Research and Planning

Before diving into the implementation, I conducted thorough research on existing profanity blocking techniques and libraries. Among these were
https://github.com/2Toad/Profanity
and
https://www.npmjs.com/package/bad-words
(both fantastic libraries!).
This helped me understand the different approaches and challenges involved. Based on my research, I decided to use a combination of encoded bad words and evasion pattern detection to build an effective solution.

Step 3: Designing the Architecture

Next, I designed the architecture of the library. I created a ContentFilterBadWord class that would encapsulate all the necessary methods and properties for filtering and cleaning text. The class would have functions for decoding encoded bad words, normalizing text with evasion patterns, checking if text contains bad words, and cleaning text by replacing or removing bad words.

Step 4: Implementing the Functionality

With the architecture in place, I started implementing the functionality of the library. I created methods to decode base64 encoded bad words, normalize text with evasion patterns, and check if text contains bad words. I also added options to match bad words as whole words and detect evasion characters and separators. Lastly, I implemented functions to clean text by replacing or removing bad words.

Step 5: Testing and Optimization

Once the functionality was implemented, I conducted extensive testing to ensure the library was working as expected. I created test cases with different scenarios, including common bad words, variations, and evasion techniques. I also tested the library's performance with large volumes of text. Based on the test results, I made optimizations to improve the speed and accuracy of the library. (There are some very minor bugs I am still working on).


Now, Let's take a look at the code

Step 1: Define a class (ContentFilterBadWord)

In JavaScript, a class is a blueprint or template for creating objects that share similar properties and behaviors. It provides a way to define the structure and behavior of an object.

To create a class in JavaScript, you can use the class keyword followed by the name of the class. Here's an example:

class Person {
  constructor(name, age) {
    this.name = name;
    this.age = age;
  }

  greet() {
    console.log(`Hello, my name is ${this.name} and I'm ${this.age} years old.`);
  }
}
Enter fullscreen mode Exit fullscreen mode

In the above example, we define a Person class with a constructor and a greet method. The constructor is a special method that gets called when a new object is created from the class. It is used to initialize the object's properties.

To create an instance of a class, you can use the new keyword followed by the class name with parentheses. Here's an example:

const person1 = new Person("John", 25);
const person2 = new Person("Jane", 30);

person1.greet(); // Output: Hello, my name is John and I'm 25 years old.
person2.greet(); // Output: Hello, my name is Jane and I'm 30 years old.
Enter fullscreen mode Exit fullscreen mode

In the above example, we create two instances of the Person class and call the greet method on each instance.

Using classes in JavaScript allows you to create reusable and organized code by encapsulating related properties and behaviors within a single class.

In this case, we define a class called ContentFilterBadWord. In our constructor, we put our bad word list (Base64 encoded, so we can't just read them) and our evasion patterns. Now, we need to add some functions to our class. See below:

class ContentFilterBadWord {
  constructor() {
    // Base64 encoded bad words
    this.encodedCussWords = [
      "AAAAAA",
      "BBBBBB",
      "CCCCCC",
    ];

    this.evasionPatterns = [
      { pattern: /4/gi, replacement: "a" },
      { pattern: /\$/gi, replacement: "s" },
      { pattern: /5/gi, replacement: "s" },
      { pattern: /0/gi, replacement: "o" },
      { pattern: /1/gi, replacement: "i" },
      { pattern: /!/gi, replacement: "i" },
      { pattern: /@/gi, replacement: "a" },
    ];
  }
Enter fullscreen mode Exit fullscreen mode

Step 2: decodeBase64

This one is pretty simple.

  decodeBase64(encodedString) {
    return atob(encodedString);
  }
Enter fullscreen mode Exit fullscreen mode

Step 3: normalizeText

This one is also pretty simple. The code defines a normalizeText function that takes a text parameter and applies evasion patterns to normalize the text by replacing specified patterns with their replacements. It uses the evasionPatterns array to iterate through each pattern and replacement and apply the replacements to the text.

  normalizeText(text) {
    // Apply evasion patterns to normalize text
    this.evasionPatterns.forEach(({ pattern, replacement }) => {
      text = text.replace(pattern, replacement);
    });
    return text;
  }
Enter fullscreen mode Exit fullscreen mode

Step 4: containsBadWords

The function containsBadWords accepts four parameters:

  • text: The string to be checked for bad words.
  • matchWord (default false): A boolean indicating whether to match only whole words.
  • detectEvasionCharacters (default true): A boolean indicating whether to normalize the text for character evasion attempts (like using "@" instead of "a").
  • detectEvasionSeperators (default true): A boolean indicating whether to remove certain separators or spaces that might be used to disguise bad words.

The function begins by decoding an array of base64-encoded bad words (this.encodedCussWords) to their original form for comparison.

If detectEvasionCharacters is true, the function applies a series of patterns (defined in this.evasionPatterns) to replace evasion characters in text with their normal counterparts.

If detectEvasionseperators is true, the function removes common separators (like hyphens, underscores, and periods) from the text. It then goes further to remove spaces between the letters of each bad word within the text, to catch cases where spaces are used to evade detection.

After normalization, the function logs the normalized text to the console.

Finally, it uses the Array.prototype.some method to check if any bad words are present in the normalized text. It does this by creating a regular expression for each bad word. If matchWord is true, it ensures that only whole words are matched by using word boundaries (\b). Otherwise, it matches the bad word as a substring anywhere in the text. The function returns true if any bad word is detected, and false otherwise.

  containsBadWords(
    text,
    matchWord = false,
    detectEvasionCharacters = true,
    detectEvasionSeperators = true
  ) {
    // Decode bad words for comparison
    const cussWords = this.encodedCussWords.map((encodedWord) =>
      this.decodeBase64(encodedWord)
    );

    // Normalize text to catch evasion attempts
    let normalizedText = text;
    if (detectEvasionCharacters) {
      // Apply evasion patterns to normalize text
      this.evasionPatterns.forEach(({ pattern, replacement }) => {
        normalizedText = normalizedText.replace(pattern, replacement);
      });
    }

    if (detectEvasionSeperators) {
      // Remove common separators between letters
      normalizedText = normalizedText.replace(/[-_.]/g, "");
      // Remove spaces between letters only for bad words
      cussWords.forEach((cussWord) => {
        // Create a dynamic regular expression that matches the bad word with any spaces between the letters
        let wordRegex = new RegExp(cussWord.split("").join("\\s*"), "gi");
        // Replace the matched substring with the bad word without spaces
        normalizedText = normalizedText.replace(wordRegex, (match) => {
          return match.replace(/\s/g, "");
        });
      });
    }
Enter fullscreen mode Exit fullscreen mode

Step 5: cleanText

Here is a breakdown of the function cleanText:

Parameters:

  • text: The text to be cleaned.
  • method (default "replace"): The method to use for cleaning the text. Can be either "replace" or "remove".
  • detectEvasionCharacters (default true): A boolean indicating whether to normalize the text for character evasion attempts (like using "@" instead of "a").
  • detectEvasionSeparators (default true): A boolean indicating whether to remove certain separators or spaces that might be used to disguise bad words.

Function Body:

  1. Initialization:

    • The function creates a new variable cleanedText and assigns it the value of the input text.
  2. Evasion Character Detection (if enabled):

    • If detectEvasionCharacters is true, the function calls the normalizeText function (not provided in the snippet) to replace evasion characters in cleanedText with their normal counterparts.
  3. Evasion Separator Detection (if enabled):

    • If detectEvasionSeparators is true:
      • The function removes common separators (like hyphens, underscores, and periods) from cleanedText using a regular expression [-_.]/g.
      • It iterates over an array of base64-encoded bad words (this.encodedCussWords).
      • For each encoded word:
      • It decodes the word using this.decodeBase64 (not provided in the snippet).
      • It creates a regular expression object wordRegex for the decoded word, with the flags g (global) and i (case-insensitive).
      • Based on the method value:
        • If method is "replace":
        • The function replaces all occurrences of the bad word in cleanedText with the same number of asterisks using a callback function.
        • If method is "remove":
        • The function replaces all occurrences of the bad word in cleanedText with an empty string.
  4. Return:

    • The function returns the cleaned text cleanedText.

Overall, this function takes text as input and cleans it by removing or replacing bad words. It can optionally handle evasion attempts by normalizing characters and separators.

cleanText(
    text,
    method = "replace",
    detectEvasionCharacters = true,
    detectEvasionSeparators = true
  ) {
    let cleanedText = text;

    if (detectEvasionCharacters) {
      cleanedText = this.normalizeText(cleanedText);
    }

    if (detectEvasionSeparators) {
      cleanedText = cleanedText.replace(/[-_.]/g, "");
      this.encodedCussWords.forEach((encodedWord) => {
        const cussWord = this.decodeBase64(encodedWord);

        let wordRegex;
        wordRegex = new RegExp(cussWord, "gi");

        if (method === "replace") {
          cleanedText = cleanedText.replace(wordRegex, (match) => {
            return match.replace(/\S/g, "*");
          });
        } else if (method === "remove") {
          cleanedText = cleanedText.replace(wordRegex, "");
        }
      });
    }

    return cleanedText;
  }
Enter fullscreen mode Exit fullscreen mode

That's it as far as code!

How to use...

If you are interested in the Usage docs, see https://github.com/The-Best-Codes/bc-ProfanityBlock.

Conclusion

In this article, I shared the process of building the bc-ProfanityBlock JavaScript library for blocking profanity in text. By combining encoded bad words and evasion pattern detection, the library provides an efficient and effective solution for filtering and sanitizing content. Whether you are building a social media platform, chat application, or any other system where content moderation is important, this library can be a valuable addition to your toolkit.

You can find the complete source code and documentation for the ContentFilterBadWord library on GitHub. I hope this article has been informative and encourages you to explore the world of content moderation in your applications.

If you have any questions or feedback, please feel free to reach out to me via email at best-codes@proton.me.
Happy coding!


Some content in this article is generated by the BestCodes AI.
Article by Best_codes.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .