DevOps teams are all too familiar with the frustration of finding a bug in their code that could have been caught earlier. Or worse, they have had to deal with the consequences of a security vulnerability that slipped through the cracks. There is no surprise then that tools like Semgrep are a devs' best friend.
Semgrep is considered the future of static analysis, and with its growing community of users and over 2500 rules in the Semgrep Registry, it's available to everyone. In this article, we'll explore the basics of Semgrep, how to run rules and set up optimal SAST scanning, and even how to write your own rules to catch those pesky bugs and security vulnerabilities.
An introduction to Semgrep
Semgrep is a popular open-source static analysis tool that identifies and prevents security vulnerabilities in source code. Initially developed by Facebook in 2009 for internal use, Semgrep has become a widely used tool among software developers and security professionals. Semgrep's unique selling point is its ease of use and flexibility in writing custom rules to detect specific security issues.
This tool is handy for software developers performing static analysis in their development workflow. It can quickly identify potential security issues and prevent security breaches and related problems. As an open-source tool, Semgrep has a growing community of contributors who help maintain and improve the tool, ensuring it stays up-to-date with the latest developments in the field.
Running rules with Semgrep
Semgrep rules are written in a simple, declarative language that specifies what code patterns to look for and what actions to take when a pattern is found. They can detect security vulnerabilities, code smells, and style violations.
The number of Semgrep Registry rules
Semgrep rules can be stored in various ways, including in YAML files, your code repository, or Semgrep's rule registry. They are categorized by the type of issue they detect, and you can filter them by language, file type, and other attributes.
For example, the following Semgrep rule detects the use of insecure cryptographic functions:
*name: Insecure Cryptography\
description: Detects use of insecure cryptographic functions\
patterns:\
- pattern: MD5(\
- pattern: SHA1(\
- pattern: DES(\
- pattern: RC4(\
- pattern: PBKDF1(\
- pattern: cbc*
Types of Semgrep rules
Semgrep Existing Rules: These are the default rules that are included in the Semgrep rule registry. The Semgrep development team creates and maintains these rules covering various potential security vulnerabilities and coding errors. Developers can run these rules as-is or customize them to fit their specific codebase better.
Local Rules (Ephemeral and YAML-defined): Local rules are custom rules that developers can create to scan their codebase for specific issues. There are two types of local rules: ephemeral and YAML-defined.
Ephemeral rules are one-off rules that are passed into the command line.
YAML-defined rules are defined in a YAML file and can be reused across multiple scans. You can customize them to scan for specific issues in a codebase, making them a powerful tool for catching potential problems early in development.
Setting up Semgrep Rules for Optimal SAST Scanning
Semgrep rules are designed to identify specific patterns of code that are potentially vulnerable to security issues. They work by using a set of regular expressions or syntax trees to match patterns of code that indicate security vulnerabilities.
For example, let's say you have a web application that takes user input and uses it to construct a SQL query. This is a common way to create security vulnerabilities in web applications if the input is improperly sanitized. With Semgrep, you can create a rule to scan your code for this vulnerability by looking for code that constructs a SQL query using user input.
Here's an example Semgrep rule that would identify this type of vulnerability:
This example rule, named "SQL Injection," is designed to identify potential SQL injection vulnerabilities in Python code. The rule works as follows:
- The check_query function takes a parsed SQL query tree as input and checks if the query contains a "SELECT" statement.
- The match function is the main function used to scan the code. It takes a syntax tree and a filename as input.
- It first checks if the filename ends with ".py," indicating that it is a Python source file.
- If it's a Python file, the function collects all SQL queries in the code by looking for nodes with a "DML" token type.
- Then, for each collected query, it checks if it contains a "SELECT" statement using the check_query function.
- If a "SELECT" statement is found, the function returns True, indicating that the code has a potential SQL injection vulnerability.
- The rule is added to a RulesDict instance, which runs the rule against a given code snippet.
Semgrep's Rule Board
Semgrep's Rule Board is a powerful tool that allows developers to access a vast library of pre-existing rules to scan their code for potential vulnerabilities. To use Semgrep's Rule Board, developers can simply add the desired ruleset to their configuration file, and the tool will automatically download and run those rules during the scanning process.
For example, to add a ruleset for scanning Django code for potential security issues, developers can add the following line to their configuration file:
rules:\
- https://semgrep.dev/p/r2c/django
This will download the django ruleset from Semgrep's Rule Board and apply it to the scanning process.
The Rule Board also allows developers to create and share their rulesets, making it a collaborative platform for improving code security across the development community. Once a custom ruleset is created, it can be added to the configuration file using the same syntax as pre-existing rulesets.
Writing your own Semgrep rules
Writing your own Semgrep rules can be a powerful way to customize your SAST scanning process and target specific issues in your code. To get started, you'll need to have some familiarity with the Semgrep syntax and be able to identify the types of problems you want to scan for.
To write your own Semgrep rules, you'll need to start by creating a new rule file. You can do this by running the semgrep --init command and selecting the language you want to create a rule for. This will generate a new rule file with some basic boilerplate code that you can modify to suit your needs.
Once you've made your rule file, you can start writing rules that target specific issues in your code. For example, you might create a rule that scans for SQL injection vulnerabilities by looking for instances where user input is concatenated directly into a SQL query.
To write this type of rule, you would use the Semgrep syntax to define a pattern that matches the vulnerable code. For example, use sql_concat to check instances where user input is concatenated directly into a SQL query.
In this example, we're using the sqlalchemy.SQL and Django.db.connection.execute functions to match instances where SQL queries are being conducted. We then use Concat to check cases where user input is concatenated into the query. Finally, we're using the Var function to match the user input variable.
To set up your custom rules for optimal SAST scanning, you should consider organizing them by category and reviewing them regularly to ensure they are up-to-date and effective. You should also consider integrating Semgrep with your custom rules into your CI/CD pipeline to ensure they run consistently and thoroughly.
Running a SAST scan with Semgrep
Running a SAST scan with Semgrep is a simple process that requires just a few commands in the terminal. In this tutorial, we will walk through the steps to run a scan using Semgrep.
Step 1: Install Semgrep. The first step is to install Semgrep using the following command:
$ curl -L https://semgrep.dev/install.sh | bash
This will download and install the latest version of Semgrep.
Step 2: Create a Semgrep configuration file. The next step is to create a configuration file for Semgrep. This file specifies which rules should be run during the scan and which files to scan. Here is an example configuration file:
This configuration file specifies two rulesets to use (Secret-detection and Cryptography) and includes all .py, .html, and .js files in the scan.
Step 3: Run the Semgrep scan. Once the configuration file has been created, the Semgrep scan can be run using the following command:
$ semgrep --config=<your_config> <your_code_directory>
This command runs Semgrep with the configuration file and scans all files in the current directory.
Step 4: View the results. After the scan, Semgrep will output a list of any issues found. These issues have details about their location and the rule that triggered them.
Here is an example output:
In this example, Semgrep found an issue in the Foo.py file that violates the cryptography.CVE-2019-16056 rule.
Semgrep also supports JSON and YAML output formats, which can be useful for automation, integration with CI/CD pipelines, or other custom workflows.
To generate JSON output, you can use the --json flag when running Semgrep:
semgrep --config=<your_config> --json <your_code_directory>
An example JSON output would look like this:
To generate YAML output, you can use the --output-format=yaml flag when running Semgrep:
semgrep --config= --output-format=yaml
An example YAML output would look like this:
You can use these formats to customize the output to suit your needs better.
Streamline your SAST Scanning with Jit
There you have it - Semgrep is the future of static analysis, and with Jit's compilation feature, it's now faster and more efficient than ever. With Jit, you can seamlessly integrate Semgrep with Jit's custom rules into your DevSecOps toolchain in the IDE and as part of the CI, increasing development velocity with continuous security. Start for free here.