Build a custom Python linter in 5 minutes

geoffreycopin - Jan 20 '23 - - Dev Community

Creating a custom linter can be a great way to enforce coding standards and detect code smells. In this tutorial, we'll use Sylver, a source code query engine to build a custom Python linter in just a few lines of code.

Sylver's main interface is a REPL console, in which we can load the source code of our project to query it using a SQL-like query language called SYLQ. Once we'll have authored SYLQ queries expressing our linting rules, we'll be able to save them into a ruleset that can be run like a traditional linter.

Installation

If sylver --version doesn't output a version number >= 0.2.2, go to https://sylver.dev to download a fresh copy of the software.

Project setup

We'll use the following Python file to test our linting rules:

#main.py
from users.models import *
from auth.models import check_password

foo = 100
O = 100.0

my_dict = {'hello': 'world'}

if my_dict.has_key('hello'):
    print('It works!')

if 'hello' in my_dict:
    print('It works!')
Enter fullscreen mode Exit fullscreen mode

Starting the REPL

Starting the REPL is as simple as invoking the following command at the root of your project:

sylver query --files="src/**/*.py" --language=python
Enter fullscreen mode Exit fullscreen mode

The REPL can be exited by pressing Ctrl+C or typing :quit at the prompt.

We can now execute SYLQ queries by typing the code of the query, followed by a ;.
For instance: to retrieve all the if statements (denoted by the node type IfStatement):

match IfStatement;
Enter fullscreen mode Exit fullscreen mode

The results of the query will be formatted as follow:

$0 [IfStatement main.py:1:9-23:10]
$1 [IfStatement main.py:1:12-23:13]
Enter fullscreen mode Exit fullscreen mode

The code of a given if statement can be displayed by typing :print followed by the node alias (for instance: :print $1). The parse tree can be displayed using the :print_ast command (for instance: :print_ast $1).

Rule1: wildcard imports (inspired by F403)

This rule will flag all the imports of the form from x import *.

The first step is to get familiar with the tree structure of Python's import statements, so let's print a ImportFromStatement node along with its AST:

λ> match ImportFromStatement;

$2 [ImportFromStatement main.py:1:1-27:1]
$3 [ImportFromStatement main.py:1:2-39:2]

λ> :print $2

from users.models import *

λ> :print_ast $2

ImportFromStatement {
. ● module_name: DottedName {
. . Identifier { users }
. . Identifier { models }
. }
. WildcardImport { * }
}
Enter fullscreen mode Exit fullscreen mode

It appears that the faulty part of the import statement (the wildcard: *) is represented by a WildcardImport node.
So this first rule can easily be expressed in SYLQ:

match WildcardImport;
Enter fullscreen mode Exit fullscreen mode

Rule2: Ambiguous variable name (inspired by E741)

This style-oriented rule will detect variables named 'l', 'I' or 'O', as these names can be confusing.

Same as before, let's analyze the tree structure of an assignment:

λ> match Assignment;

$4 [Assignment main.py:1:4-10:4]
$5 [Assignment main.py:1:5-10:5]
$6 [Assignment main.py:1:7-29:7]

λ> :print_ast $5

Assignment {
. ● left: Identifier { O }
. ● right: Float { 100.0 }
}

Enter fullscreen mode Exit fullscreen mode

The variable's Identifier can be accessed through the left field of the Assignment node. We can match the Identifier's text against a regex
by using the builtin matches method:

match a@Assignment when a.left.text.matches(`^(I|O|l)$`);
Enter fullscreen mode Exit fullscreen mode

Here the Assignment node is bound to a using the binding operator: @.

Rule3: has_key() is deprecated (inspired by W601)

This rule signals uses of the deprecated dictionnary has_key method.

Here is the tree representation of a call to has_key:

Call {
. ● function: Attribute {
. . ● object: Identifier { my_dict }
. . ● attribute: Identifier { has_key }
. }
. ● arguments: ArgumentList {
. . String { 'hello' }
. }
}
Enter fullscreen mode Exit fullscreen mode

This query can be expressed using nested patterns, as follow:

match Call(function: Attribute(attribute: 'has_key'));
Enter fullscreen mode Exit fullscreen mode

Creating the ruleset

The following ruleset uses our linting rules:

id: customRules

language: python

rules:
    - id: F403
      severity: warning
      message: "wildcard import"
      note: "wildcard imports are discouraged because the programmer often won’t know where an imported object is defined"

      query: >
        match WildcardImport



    - id: E741
      severity: info
      message: "ambiguous variable name"
      note: "variables named I, O and l can be very hard to read"

      query: >
        match a@Assignment when a.left.text.matches(`^(I|O|l)$`)


    - id: W601
      severity: error
      message: ".has_key() is deprecated"
      note: "'.has_key()' was deprecated in Python 2. It is recommended to use the 'in' operator instead"

      query: >
        match Call(function: Attribute(attribute: 'has_key'))

Enter fullscreen mode Exit fullscreen mode

Assuming that it is stored in a file called ruleset.yaml at the root of our project, we can run it with the following command:

sylver ruleset run --files "**/*.py" --rulesets ruletset.yaml
Enter fullscreen mode Exit fullscreen mode

Getting updates

For more informations about new features and/or cool SYLQ one-liners, connect with Sylver on Twitter or Discord!

. . . . . . . . . .