Creating a custom linter can be a great way to enforce coding standards and detect code smells. In this tutorial, we'll use Sylver, a source code query engine to build a custom Python linter in just a few lines of code.
Sylver's main interface is a REPL console, in which we can load the source code of our project to query it using a SQL-like query language called SYLQ. Once we'll have authored SYLQ
queries expressing our linting rules, we'll be able to save them into a ruleset that can be run like a traditional linter.
Installation
If sylver --version
doesn't output a version number >= 0.2.2
, go to https://sylver.dev to download a fresh copy of the software.
Project setup
We'll use the following Python file to test our linting rules:
#main.py
from users.models import *
from auth.models import check_password
foo = 100
O = 100.0
my_dict = {'hello': 'world'}
if my_dict.has_key('hello'):
print('It works!')
if 'hello' in my_dict:
print('It works!')
Starting the REPL
Starting the REPL is as simple as invoking the following command at the root of your project:
sylver query --files="src/**/*.py" --language=python
The REPL can be exited by pressing Ctrl+C
or typing :quit
at the prompt.
We can now execute SYLQ
queries by typing the code of the query, followed by a ;
.
For instance: to retrieve all the if statements (denoted by the node type IfStatement):
match IfStatement;
The results of the query will be formatted as follow:
$0 [IfStatement main.py:1:9-23:10]
$1 [IfStatement main.py:1:12-23:13]
The code of a given if statement can be displayed by typing :print
followed by the node alias (for instance: :print $1
). The parse tree can be displayed using the :print_ast
command (for instance: :print_ast $1
).
Rule1: wildcard imports (inspired by F403)
This rule will flag all the imports of the form from x import *
.
The first step is to get familiar with the tree structure of Python's import statements, so let's print a ImportFromStatement
node along with its AST:
λ> match ImportFromStatement;
$2 [ImportFromStatement main.py:1:1-27:1]
$3 [ImportFromStatement main.py:1:2-39:2]
λ> :print $2
from users.models import *
λ> :print_ast $2
ImportFromStatement {
. ● module_name: DottedName {
. . Identifier { users }
. . Identifier { models }
. }
. WildcardImport { * }
}
It appears that the faulty part of the import statement (the wildcard: *
) is represented by a WildcardImport
node.
So this first rule can easily be expressed in SYLQ
:
match WildcardImport;
Rule2: Ambiguous variable name (inspired by E741)
This style-oriented rule will detect variables named 'l', 'I' or 'O', as these names can be confusing.
Same as before, let's analyze the tree structure of an assignment:
λ> match Assignment;
$4 [Assignment main.py:1:4-10:4]
$5 [Assignment main.py:1:5-10:5]
$6 [Assignment main.py:1:7-29:7]
λ> :print_ast $5
Assignment {
. ● left: Identifier { O }
. ● right: Float { 100.0 }
}
The variable's Identifier
can be accessed through the left
field of the Assignment
node. We can match the Identifier
's text against a regex
by using the builtin matches
method:
match a@Assignment when a.left.text.matches(`^(I|O|l)$`);
Here the Assignment
node is bound to a
using the binding operator: @
.
Rule3: has_key()
is deprecated (inspired by W601)
This rule signals uses of the deprecated dictionnary has_key
method.
Here is the tree representation of a call to has_key
:
Call {
. ● function: Attribute {
. . ● object: Identifier { my_dict }
. . ● attribute: Identifier { has_key }
. }
. ● arguments: ArgumentList {
. . String { 'hello' }
. }
}
This query can be expressed using nested patterns, as follow:
match Call(function: Attribute(attribute: 'has_key'));
Creating the ruleset
The following ruleset uses our linting rules:
id: customRules
language: python
rules:
- id: F403
severity: warning
message: "wildcard import"
note: "wildcard imports are discouraged because the programmer often won’t know where an imported object is defined"
query: >
match WildcardImport
- id: E741
severity: info
message: "ambiguous variable name"
note: "variables named I, O and l can be very hard to read"
query: >
match a@Assignment when a.left.text.matches(`^(I|O|l)$`)
- id: W601
severity: error
message: ".has_key() is deprecated"
note: "'.has_key()' was deprecated in Python 2. It is recommended to use the 'in' operator instead"
query: >
match Call(function: Attribute(attribute: 'has_key'))
Assuming that it is stored in a file called ruleset.yaml
at the root of our project, we can run it with the following command:
sylver ruleset run --files "**/*.py" --rulesets ruletset.yaml
Getting updates
For more informations about new features and/or cool SYLQ
one-liners, connect with Sylver on Twitter or Discord!