Rego for beginners Part 2: Combining queries with AND/OR and custom messages

SnykSec - Nov 10 '23 - - Dev Community

This blog post series offers a gentle introduction to Rego, the policy language from the creators of the Open Policy Agent (OPA) engine. If you’re a beginner and want to get started with writing Rego policy as code, you’re in the right place.

In this three-part series, we’ll go over the following:

As a reminder, Rego is a declarative query language from the makers of the Open Policy Agent (OPA) framework. The Cloud Native Computing Foundation (CNCF) accepted OPA as an incubation-level hosted project in April 2019, and OPA graduated from incubating status in 2021.

Rego is used to write policy as code, which applies programming practices such as version control and modular design to evaluate cloud and infrastructure as code (IaC) resources. OPA is the engine that evaluates policy as code written in Rego. And Snyk uses the Rego language for custom rules.

Part 1 recap

In Part 1 of this blog post series, we explained that a Rego rule is a conditional assignment. A rule queries the input to find a match for a condition, and if a match is found, a value is assigned to a variable.

You can read a rule like this:

THIS VARIABLE    := HAS THIS VALUE {
    IF THESE CONDITIONS ARE MET
}
Enter fullscreen mode Exit fullscreen mode

Here's the example we used, which represents a corporate policy that only Alice, a network administrator, should have permission to create and delete virtual networks in the prod account:

allow := true {
  input.user == "alice"
}
Enter fullscreen mode Exit fullscreen mode

OPA evaluates a JSON or YAML input document against a rule to produce a policy judgment. The input document below represents the currently logged-in user:

{
  "user": "alice"
}
Enter fullscreen mode Exit fullscreen mode

If you use OPA to evaluate the input against this rule, it finds a match for the query input.user == "alice". Therefore, the variable in the rule head, allow, is assigned the value in the rule head, true. Here's the output proving this:

{
  "allow": true
}
Enter fullscreen mode Exit fullscreen mode

OPA has delivered the decision that Alice, the currently logged-on user, is allowed to create and delete virtual networks in the prod account. The input is compliant with the rule.

AND and OR

So far, we've only shown rules with a single query. A rule can also contain multiple queries. If it does, the queries represent multiple conditions that must all be met in order for a variable to be assigned. There's an implicit AND — "This condition must be met AND this condition must be met."

For example, in the rule below, both input.user == "alice" AND input.environment == "prod" must be true in order for the variable allow to be assigned the value true:

allow := true {
  input.user == "alice"
  input.environment == "prod"
}
Enter fullscreen mode Exit fullscreen mode

In some cases, OR might be more appropriate. You can represent OR by using the same head in multiple rules:

allow = true {
  input.user == "alice"
}

allow = true {
  input.user == "bob"
}
Enter fullscreen mode Exit fullscreen mode

This set of rules can be read like so:

allow is true if user is "alice" OR if user is "bob".

Technically, the set of rules forms a single rule because the head is the same for both. Because you're defining this rule in multiple steps, it's called an incremental rule. 

If you like, you can get rid of the second head and put the bodies together. A more succinct way of writing the above is:

allow = true {
  input.user == "alice"
} {
  input.user == "bob"
}
Enter fullscreen mode Exit fullscreen mode

You might be wondering why we've used the unification operator = rather than the assignment operator :=. That's because variables are immutable in Rego. Even though rules with the same head are treated as a single incremental rule if you try to use the assignment operator, you're effectively "assigning" the same variable multiple times — and that isn't allowed in Rego. Instead, we use the unification operator in the rule head because it unifies multiple rules with the same name.

If it's confusing to remember when to use which operator in the rule head, there's a simpler way, thanks to default values and a bit of syntactic sugar.

Default values in rule heads

The default value given to a variable in the head of a rule is true. So, Rego offers some syntactic sugar here: When a rule assigns the value true to the variable, you can omit the := true from the rule head.

That means this AND rule…

allow := true {
  input.user == "alice"
  input.environment == "prod"
}
Enter fullscreen mode Exit fullscreen mode

…is the same as this AND rule:

allow {
  input.user == "alice"
  input.environment == "prod"
}
Enter fullscreen mode Exit fullscreen mode

And likewise, this OR rule…

allow = true {
  input.user == "alice"
} {
  input.user == "bob"
}
Enter fullscreen mode Exit fullscreen mode

...is the same as this OR rule:

allow {
  input.user == "alice"
} {
  input.user == "bob"
}
Enter fullscreen mode Exit fullscreen mode

"Sweet" indeed!

default keyword

As we discussed in Part 1, if there are no matches in the input for a rule query, the variable in the rule head is not assigned the value in the head.

To demonstrate this, let's return to our example rule, which says that only Alice, a network administrator, should have permission to create and delete virtual networks in the prod account:

allow := true {
  input.user == "alice"
}
Enter fullscreen mode Exit fullscreen mode

And we'll say we have an input document where the currently logged-in user is Bob:

{
  "user": "bob"
}
Enter fullscreen mode Exit fullscreen mode

Since input.user is not "alice", when we evaluate the rule against the input, OPA does not find a match in the input. Therefore, allow is not assigned the value true, and the result of the evaluation is an empty set:

{}
Enter fullscreen mode Exit fullscreen mode

We say in this case that the value of allow is undefined. Whenever OPA queries input to evaluate a rule, it only returns values that match. If there is no matching value, there's nothing to return — thus, the empty set.

What if we want OPA to return false if allow is not explicitly true? We can use the default keyword to set a default value. This means if a rule evaluation isn't explicitly true, it returns a specific value (in this case, false) instead of returning an empty set of results. To do this, we write an additional rule that also uses allow:

default allow = false
Enter fullscreen mode Exit fullscreen mode

Now, if OPA determines that input.user is not "alice", allow does not evaluate to an empty set. Instead, it takes on the default value, which we've declared is false:

{
  "allow": false
}
Enter fullscreen mode Exit fullscreen mode

Note that when you specify the default keyword, you use the unification operator = instead of the assignment operator := in both the rule where you define the default value and the rule where you define the conditional assignment. Again, that's because variables are immutable. If you try to use the assignment operator, you're "assigning" the same variable multiple times, which Rego doesn't allow. We use the unification operator instead:

default allow = false

allow = true {
  input.user == "alice"
}
Enter fullscreen mode Exit fullscreen mode

You can, of course, take advantage of the syntactic sugar we described earlier and leave out the = true in the rule with the conditional assignment. This is perfectly acceptable and perhaps easier to use because you don't need to remember which operator to use in the conditional assignment:

default allow = false

allow {
  input.user == "alice"
}
Enter fullscreen mode Exit fullscreen mode

Custom messages

Sometimes, you want to return a series of messages rather than a simple pass or fail, true or false/undefined result. You can do so by using the rule head deny[msg] and by assigning the desired message to the variable msg. The rule below checks if the user is not Alice, and if that's the case, it assigns the string "User is denied access" to msg, which is then added to the deny set (we'll talk more about sets in Part 3):

deny[msg] {
  input.user != "alice"
  msg := "User is denied access"
}
Enter fullscreen mode Exit fullscreen mode

To test this out, let's suppose our input document contains the name of the currently logged-in user:

{
    "user": "bob"
}
Enter fullscreen mode Exit fullscreen mode

When we evaluate the rule against the input above using OPA's Rego Playground or the opa eval -i input.json -d check_user.rego "data.rules.check_user" --format pretty command (for instructions, see our Part 1 blog post), the result looks like this:

{
  "deny": [
    "User is denied access"
  ]
}
Enter fullscreen mode Exit fullscreen mode

What's happening here? We're actually creating a set rule, a concept we'll return to in our next blog post in this series. For now, just understand that rather than a single true or false/undefined result, we're returning a set of messages assigned to the deny variable. In Rego, a set is an unordered list of unique elements, such as integers { 1, 2, 3 } or strings { "alice", "bob", "carlotta" } or even other sets { { 1, 2}, {3, 4} }. You can make a set out of any supported Rego type, or even mix and match types within a single set.

In the case of our example rule, each element in the deny set is a string containing a message. There's only one element in the set for this particular input document, but there can be more, and we'll show you an example later in this blog post.

What if we want to return additional information in the message? We can use the sprintf built-in function to display the value of the input.user field that caused a deny result:

deny[msg] {
  input.user != "alice"
  msg := sprintf("User %v is denied access", [input.user])
}
Enter fullscreen mode Exit fullscreen mode

The sprintf function takes two arguments — a string and an array of values. In this case, the only element in the array is a string represented by input.user. We use %v as a placeholder in the first argument, and the value in the array takes its place when the rule is evaluated.

Now, if we evaluate the rule using the following input…

{
    "user": "bob"
}
Enter fullscreen mode Exit fullscreen mode

…we see this result:

{
  "deny": [
    "User bob is denied access"
  ]
}
Enter fullscreen mode Exit fullscreen mode

The not keyword

You can negate an expression by prefacing it with the not keyword so that it means the opposite. Most of the time, you'll want to use this in a query to specify the absence of a property from the input. So, for example, this query:

input.tags.environment
Enter fullscreen mode Exit fullscreen mode

…means "The input document has a tags.environment property, and the value is not false," and this query:

not input.tags.environment
Enter fullscreen mode Exit fullscreen mode

…means "The input document does not have a tags.environment property or tags.environment is set to false." There's no overlap or middle ground — an expression and its inverse are mutually exclusive.

Here's an example rule that assigns true to deny if the input does not have a department tag:

deny {
  not input.tags.department
}
Enter fullscreen mode Exit fullscreen mode

Let's use this input:

{
  "tags": {
    "environment": "staging"
  }
}
Enter fullscreen mode Exit fullscreen mode

If we were to evaluate this input against the rule above, we'd see that deny returns true because it is missing the required department property:

{
  "deny": true
}
Enter fullscreen mode Exit fullscreen mode

Evaluating an example rule with OPA

Let's experiment with the concepts we've discussed in this blog post by evaluating an example rule. As in Part 1, we will focus on two ways of interacting with OPA:

  • Using the Rego Playground
  • Using OPA’s command line tool

For instructions on using these interfaces, see Part 1.

This time, we're using more of a real-world example involving a Kubernetes pod. Here's the JSON manifest we will use as input:

{
  "apiVersion": "v1",
  "kind": "Pod",
  "metadata": {
    "name": "nginx-demo",
    "labels": {
      "release" : "stable"
    }
  },
  "spec": {
    "containers": [
      {
        "name": "nginx",
        "image": "nginx:1.14.2",
        "ports": [
          {
            "containerPort": 80
          }
        ]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

And here's the rule we'll be evaluating it against, which we've written to enforce the company policy "Kubernetes pods must be labeled with release and environment":

deny[msg] {
  input.kind == "Pod"
  not input.metadata.labels.release
  msg := sprintf("Pod %v is missing release label", [input.metadata.name])
} {
  input.kind == "Pod"
  not input.metadata.labels.environment
  msg := sprintf("Pod %v is missing environment label", [input.metadata.name])
}
Enter fullscreen mode Exit fullscreen mode

This rule demonstrates some concepts we've discussed in this blog post:

  • deny[msg] to return a set of custom messages instead of true or false/undefined
  • Both AND and OR rule structure:

    • Deny if the Kubernetes object is a pod AND it's missing the release label, OR:
    • Deny if the Kubernetes object is a pod AND it's missing the environment label
  • The not keyword to check for the absence of a property

  • The sprintf function to return a message that lists the name of the noncompliant pod

For your convenience, we've created a playground with this content already: https://play.openpolicyagent.org/p/KNVK9kEvIT 

If you evaluate the rule by selecting the Evaluate button in the playground or by executing a command such as opa eval -i input.json -d check_pod.rego "data.rules.check_pod" --format pretty if running OPA locally, you'll see this output:

{
  "deny": [
    "Pod nginx-demo is missing environment label"
  ]
}
Enter fullscreen mode Exit fullscreen mode

As we can see, the Kubernetes pod we're checking is noncompliant with our rule because the input does not contain a labels.environment property.

Now, let's remove the labels.release property. The labels section of the input should look like this:

    "labels": {
    }
Enter fullscreen mode Exit fullscreen mode

If you evaluate the rule now, you'll see that the deny set contains two messages:

{
  "deny": [
    "Pod nginx-demo is missing environment label",
    "Pod nginx-demo is missing release label"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Finally, let's add both a labels.release and labels.environment property to the input, so it looks like this:

    "labels": {
      "release" : "stable",
      "environment": "prod"
    }
Enter fullscreen mode Exit fullscreen mode

What happens if we evaluate the rule again? We see that the deny set is empty:

{
  "deny": []
}
Enter fullscreen mode Exit fullscreen mode

This means our pod is compliant because OPA did not add any messages to the deny set. Hooray!

What’s next?

Be sure to return to our blog to read Rego for Beginners Part 3, where we’ll explore set rules, object rules, functions, and iteration.

In the meantime, here are some useful resources:

If you’re interested in using Rego to write custom rules for Snyk IaC check out our documentation here. In addition to Snyk’s built-in security and compliance-mapped rulesets, IaC+ custom rules enable you to set customized security controls across your SDLC.

IaC+ gives you a single view and controls for your configuration issues from code to cloud with an issues UI, ruleset, and policy engine spanning IDE, SCM, CLI, CI/CD, Terraform Cloud, and deployed cloud environments such as AWS, Azure, and Google Cloud.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .