How does Open Policy Agent (OPA) work?

Adam Connelly - Oct 19 '21 - - Dev Community

In this article, I want to give an overview of Open Policy Agent, why you would want to use it, as well as showcasing how you can use OPA with your Spacelift account. Although OPA can be used for many purposes, I’m going to focus on how it can be used alongside Infrastructure as Code.

What is OPA?

Open Policy Agent provides a way of declaratively writing policies as code and then using those policies as part of a decision-making process. It uses a policy language called Rego, allowing you to write policies for various different services using the same language.

OPA can be used for a number of purposes, including:

  • Authorization of REST API endpoints.
  • Allowing or denying Terraform changes based on compliance or safety rules.
  • Integrating custom authorization logic into applications.
  • Implementing Kubernetes Admission Controllers to validate API requests.

OPA was originally created by Styra, and is now a part of the Cloud Native Computing Foundation (CNCF), alongside other CNCF technologies like Kubernetes and Prometheus.

How does OPA work?

We can visualise how OPA works using the following diagram:
How does OPA work
As you can see, OPA accepts a policy, input and query, and generates a response based on that. The input can be any valid JSON document, allowing OPA to integrate with any tool that produces JSON output.

Why would I want to use it?

In the following sections, I’ll go into more specific examples of using OPA that should help make things clearer, but before I do here’s a quick list of reasons that I’m interested in using OPA:

  • Policies as code allow you to follow your standard development lifecycle with PRs, CI, etc, and provide you with a history of changes to your policies.
  • OPA is designed to work with any kind of JSON input, meaning it can easily integrate with any tool that produces JSON output.
  • Because OPA integrates with a number of different tools, it allows you to use a standard policy language across many parts of your system, rather than relying on multiple vendor-specific technologies.
  • OPA supports unit-testing, making it easier and faster to iterate your policies with confidence that they won’t break.

Using OPA with Terraform

Let’s try to make that a bit less theoretical by using a specific example: Terraform. Terraform can produce a plan in JSON format via the terraform show command. This means that we can define policies for our infrastructure, and use OPA to make a decision about whether a plan is safe to apply or not:

Using OPA with Terraform

For example, say we have the following terraform definition to create an EC2 instance:

provider "aws" {
 region = "eu-central-1"
}

resource "aws_instance" "web" {
 ami           = "ami-00003c1d"
 instance_type = "t3.micro"
}
Enter fullscreen mode Exit fullscreen mode

Now say we want to ensure that every Terraform resource has a Name tag, we could enforce that by creating a file called plan.rego with the following content:

package spacelift

allow {
   resource_change := input.resource_changes[_]

   resource_change.change.after.tags["Name"]
}
Enter fullscreen mode Exit fullscreen mode

In order to use OPA to evaluate our policy, we need to take the following steps:

  1. Generate a Terraform plan as JSON.
  2. Run opa eval to verify whether that plan passes our policy or not.

Step 1 – Generate our Terraform plan as JSON

To get a JSON representation of our plan, we need to output our plan to a file, and then use the terraform show command to output that plan as JSON:

terraform plan -out spacelift.plan
terraform show -json spacelift.plan > spacelift.json
Enter fullscreen mode Exit fullscreen mode

Step 2 – Run opa eval

We can then use opa eval to evaluate our plan against our policy:

$ opa eval --data plan.rego --input spacelift.json "data.spacelift.allow"
{}
Enter fullscreen mode Exit fullscreen mode

As you can see, we’re using the query data.spacelift.allow because our policy is loaded as a data file, and we defined our allow rule in the spacelift namespace. You can also see that opa eval produced empty output ({}). This means that our allow rule didn’t evaluate to true, and so produced no output.

Let’s adjust our terraform definition to include a Name tag:

resource "aws_instance" "web" {
 ami           = "ami-00003c1d"
 instance_type = "t3.micro"

 tags = {
   Name = "my-instance"
 }
}
Enter fullscreen mode Exit fullscreen mode

Now if we generate our plan and evaluate the policy again, we should get a slightly different output:

$ terraform plan -out spacelift.plan && \
 terraform show -json spacelift.plan > spacelift.json
... lots of Terraform output

$ opa eval --data plan.rego --input spacelift.json "data.spacelift.allow"
{
 "result": [
   {
     "expressions": [
       {
         "value": true,
         "text": "data.spacelift.allow",
         "location": {
           "row": 1,
           "col": 1
         }
       }
     ]
   }
 ]
}
Enter fullscreen mode Exit fullscreen mode

This tells us that the allow rule has evaluated true. We can get that in a slightly more concise way using the pretty format option:

$ opa eval --data plan.rego --input spacelift.json --format pretty "data.spacelift.allow"
true
Enter fullscreen mode Exit fullscreen mode

At this stage, you can probably imagine how you could integrate this into your CI/CD pipeline to enforce naming schemes, security rules, and various other organizational policies.

Other Examples

The following examples illustrate some possible use-cases for OPA with Terraform. All of the examples have been taken from the Spacelift plan policy documentation, but have been altered to work with plain vanilla Terraform.

Example 1. Require human review when resources are deleted or updated

Adding new resources is usually a fairly safe operation, but updating or deleting existing resources can carry more risk. You could flag these for human review using the following policy:

package spacelift

warn[sprintf(message, [action, resource.address])] {
  message  := "action '%s' requires human review (%s)"
  review   := {"update", "delete"}
  resource := input.resource_changes[_]
  action   := resource.change.actions[_]
  review[action]
}
Enter fullscreen mode Exit fullscreen mode

Example 2. Require commits to be reasonably sized

Once a PR goes over a certain size it becomes difficult to review without missing things. The same is true for Terraform plans. The following policy can be used to warn when the number of changes goes over a certain threshold:

package spacelift

warn[msg] {
   msg := too_many_changes[_]
}

too_many_changes[msg] {
   threshold := 50
   res := input.resource_changes
   ret := count([r | r := res[_]; r.change.actions != ["no-op"]])
   msg := sprintf("more than %d changes (%d)", [threshold, ret])
   ret > threshold
}
Enter fullscreen mode Exit fullscreen mode

Example 3. Blast radius

The following policy attempts to determine the risk of a particular plan by assigning different weightings to the change type (create, update, delete), along with the affected resource type (ECS cluster, EC2 instance, etc). It takes the approach that an update or delete is more risky than a create because it affects an existing resource:

package spacelift

warn[msg] { msg := blast_radius_too_high[_] }

blast_radius_too_high[sprintf("change blast radius too high (%d/100)", [blast_radius])] {
   blast_radius := sum([blast |
                        resource := input.resource_changes[_];
                        blast := blast_radius_for_resource(resource)])

   blast_radius > 100   
}

blast_radius_for_resource(resource) = ret {
   blasts_radii_by_action := { "delete": 10, "update": 5, "create": 1, "no-op": 0 }

   ret := sum([value | action := resource.change.actions[_]
                   action_impact := blasts_radii_by_action[action]
                   type_impact := blast_radius_for_type(resource.type)
                   value := action_impact * type_impact])
}

# Let's give some types of resources special blast multipliers.
blasts_radii_by_type := { "aws_ecs_cluster": 20, "aws_ecs_user": 10, "aws_ecs_role": 5 }

# By default, blast radius has a value of 1.
blast_radius_for_type(type) = 1 {
   not blasts_radii_by_type[type]
}

blast_radius_for_type(type) = ret {
   blasts_radii_by_type[type] = ret
}
Enter fullscreen mode Exit fullscreen mode

Unit Testing

How can we make sure that our policies work as we expect and that they don’t break over time as we make changes to them? You guessed it: unit testing! Luckily for us, OPA has first-class support for testing via the opa test command.

To create tests for our policy, all we need to do is create another Rego file with a series of rules prefixed with test_. Each rule starting with that prefix defines a separate test.

Let’s go ahead and create a file called plan_test.rego, with the following contents:

package spacelift

test_allow_missing_name_tag {
 not allow with input as {
     "resource_changes": [
       {
         "change": {
           "after": {
             "tags": null,
           },
         }
       }
     ]
   }
}
Enter fullscreen mode Exit fullscreen mode

As you can see, OPA makes it really easy to specify the policy input using the <variable> as <value> syntax. This allows us to create very concise tests by only including values we care about in the policy input, rather than having to use the entire plan output.

Let’s go ahead and run opa test:

$ opa test .
data.spacelift.test_allow_missing_name_tag: PASS (188.503µs)
--------------------------------------------------------------------------------
PASS: 1/1




   resource_change.change.after.tags["Name"]

   resource_change.change.after.tags["Environment"]

}
Enter fullscreen mode Exit fullscreen mode

Unsurprisingly our test passes. That’s not very exciting, so let’s add a new test to ensure our resources include an Environment tag:

test_allow_missing_environment_tag {
 not allow with input as {
     "resource_changes": [
       {
         "change": {
           "after": {
             "tags": { "Name": "my-instance" },
           },
         }
       }
     ]
   }
}
Enter fullscreen mode Exit fullscreen mode

Running opa test again shows us a failure:

$ opa test .
data.spacelift.test_allow_missing_environment_tag: FAIL (118.078µs)
--------------------------------------------------------------------------------
PASS: 1/2
FAIL: 1/2
Enter fullscreen mode Exit fullscreen mode

Which we can fix by updating our policy:

allow {
   resource_change := input.resource_changes[_]
   resource_change.change.after.tags["Name"]
   resource_change.change.after.tags["Environment"]
}
Enter fullscreen mode Exit fullscreen mode

At this stage if you run the test command again it should show 2 passes:

$ opa test .
PASS: 2/2
Enter fullscreen mode Exit fullscreen mode

If you want to know more about OPA testing, the official docs are full of great examples and information about what you can do.

OPA + Spacelift

At this stage, hopefully, you’ve got a pretty good idea of what OPA is as well as how it can be useful to you. It’s not too difficult to see how you could start integrating OPA into your development process or even use it as part of production systems.

Luckily for you, at Spacelift all heavy lifting is done for you, allowing you to get the benefits of using OPA for Policy-as-Code without having to implement everything from scratch for yourself. In this section, I want to showcase some of the functionality that Spacelift provides that relates to OPA.

1) Policy Types

Spacelift allows you to use OPA policies to manage various aspects of your Spacelift account, not just during planning. For example, you can use policies to control who can login to your account, along with what they have access to. For more information see https://docs.spacelift.io/concepts/policy.

2) PR Checks

Spacelift can automatically trigger planning runs whenever you push changes to your VCS provider. For example, here’s what you might see in GitHub after creating a PR:

PR Checks 1

If all goes well your check will succeed, but if a policy is violated, a failed check will be reported:

PR Checks 2

You can then view the details of the failure in Spacelift:

PR Checks 3

3) Manual Approvals

Spacelift plan policies use a slightly different format than we used in our example policies earlier in this post. Not only do they allow you to specify a message to be displayed, but they also have the concept of deny and warn:

package spacelift

deny["you shall not pass"] {
 true
}

warn["hey, you look suspicious"] {
 true
}
Enter fullscreen mode Exit fullscreen mode

The deny rule fails the run completely, while the warn rule just displays a warning in the logs.

warn rules take on another role when a run is going to deploy changes (vs just showing the planned changes against a PR). If any warnings are reported during a deployment, the run will wait for manual approval before applying any changes.

Let’s use the following example policy, designed to warn if a certain set of suggested tags aren’t found on our resources:

suggested_tags := { "Name", "Environment" }

warn[sprintf("resource %q does not have all suggested tags (%s)", [resource.address, concat(", ", missing_tags)])] {
 resource := input.terraform.resource_changes[_]
 tags := resource.change.after.tags

 missing_tags := { tag | suggested_tags[tag]; not tags[tag] }
 count(missing_tags) > 0
}
Enter fullscreen mode Exit fullscreen mode

If we then attempt to add resources that don’t contain all of those tags, Spacelift will block before applying the changes, and give us the chance to manually approve or deny the run:

Manual approvals

This allows you to build complex workflows where certain changes are completely blocked, but others are allowed as long as the changes are reviewed first.

Check out the plan policy cookbook for more ideas about what you can do.

Spacelift Terraform Provider

Spacelift provides a Terraform provider for managing your Spacelift account. This means that you can manage the policies available within your account, along with the Stacks they apply to in code.

For example, to add the policy we defined earlier to Spacelift we can use the spacelift_policy resource like this:

resource "spacelift_policy" "plan" {
 type = "PLAN"

 name = "Plan Policy"
 body = file("${path.module}/plan.rego")
}
Enter fullscreen mode Exit fullscreen mode

We can then attach this policy to a Spacelift Stack using the spacelift_policy_attachment resource:

resource "spacelift_policy_attachment" "mystack-plan" {
 policy_id = spacelift_policy.plan.id
 stack_id  = spacelift_stack.mystack.id
}
Enter fullscreen mode Exit fullscreen mode

Taking this a step further, we could create a custom module for defining Spacelift Stacks that ensured that all Stacks had a certain set of policies attached by default.

What's Next?

I hope you’ve enjoyed this post, and can see the value that OPA brings to the table. If you’re interested in trying out Spacelift to see what it has to offer, why not sign up for a free trial? You can setup a Spacelift account in minutes, and get started on your Open Policy Agent journey!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .