# select * from code;

> Use SQL to query and scan infrastructure as code for insights and security best practices.

By Steampipe Team
Published: 2023-10-04


Infrastructure as code (IaC) has unleashed a Cambrian explosion of tools and techniques for modeling IT infrastructure and the policies governing it. From Terraform to Kubernetes to policy-as-code, we now define our desired infrastructure states declaratively. This evolution has enabled infrastructure automation, validation, and self-service. But it has also created a bewildering landscape of complex IaC formats and concepts. Terraform plans, Kubernetes manifests, and CloudFormation templates all model infrastructure differently. Under the hood though, there is a unified data model for IaC that represents cloud infrastructure as structured data. 

With the right lens you can extract a coherent graph of resources, relationships, and metadata — not only from IaC definitions, but also from live deployments. Steampipe is that lens: a single binary that plugs into your CI/CD pipelines, taps into the entire Turbot ecosystem of API connectors and security checks, and enables ad-hoc queries against both IaC resources and deployed infrastructure. Do it all without having to script multi-lingual conversations with APIs and CLIs. Like everything else about IaC, the language you use to reason over it should be declarative. Steampipe's core language, SQL, meets the need. 

## Query infrastructure as data

Let's look at some basic examples across the full range of the common data model, starting with this snippet of AWS-flavored Terraform.

```hcl
resource "aws_instance" "Turbot Bastion Host" {
  ami           = "ami-a1b2c3d4e5f6789ab" 
  instance_type = "t2.micro" 

  tags = {
    Name        = "Turbot Bastion Host"
    Environment = "Prod"
  }
}

resource "aws_instance" "Silverwater Bastion" {
  ami           = "ami-0f9e8d7c6b5a43210z" 
  instance_type = "t2.micro" 

  tags = {
    Name        = "Silverwater Bastion"
    Environment = "Dev"
  }
}
```

This query finds two rows in answer to the question "What are the AMIs for our bastions?"

```sql
select
  name,
  arguments ->> 'ami' as image_id
from
  terraform_resource
where
  arguments -> 'tags' ->> 'Name' ~ 'Bastion'
```

```
+---------------------+------------------------+
| name                | image_id               |
+---------------------+------------------------+
| Silverwater Bastion | ami-0f9e8d7c6b5a43210z |
| Turbot Bastion Host | ami-a1b2c3d4e5f6789ab  |
+---------------------+------------------------+
```

Once the instances are deployed, this query asks the same question of the live infrastructure. It returns the same rows, augmented with facts (`instance_id`, `instance_state`) known only at runtime.


```sql
select
  title as name, 
  image_id, 
  instance_id,
  instance_state
from
  aws_ec2_instance 
where 
  tags ->> 'Name' ~ 'Bastion'
```

```
+---------------------+------------------------+---------------------+----------------+
| name                | image_id               | instance_id         | instance_state |
+---------------------+------------------------+---------------------+----------------+
| Silverwater Bastion | ami-0f9e8d7c6b5a43210z | i-0957d10a9e54d560e | running        |
| Turbot Bastion Host | ami-a1b2c3d4e5f6789ab  | i-0b42074c10ba42ff0 | running        |
+---------------------+------------------------+---------------------+----------------+
```

The  two queries are not exactly the same, but they share a common model that abstracts both parsing static Terraform resources and making API calls to live infrastructure. 

Now let's look through a GCP-flavored Kubernetes lens. Here's part of the `values.yaml` file for a helm chart. 

```yml
images:
  repository: gcr.io/google-samples/microservices-demo

adService:
  create: true
  name: adservice
  resources:
    requests:
      cpu: 200m
      memory: 180Mi
    limits:
      cpu: 300m
      memory: 300Mi
```

This query uses the Kubernetes plugin to ask the helm chart: "What are the resources specified for the adservice app?"

```sql
select distinct 
  key_path, 
  value
from
  helm_value
where
  key_path::text ~ 'adService.resources'
```

```
+-------------------------------------+-------+
| key_path                            | value |
+-------------------------------------+-------+
| adService.resources.limits.cpu      | 300m  |
| adService.resources.limits.memory   | 300Mi |
| adService.resources.requests.cpu    | 200m  |
| adService.resources.requests.memory | 180Mi |
+-------------------------------------+-------+
```

This query also uses the Kubernetes plugin to ask the same question of the corresponding live Kubernetes deployment.

```sql
select
  distinct 
    namespace, 
    jsonb_pretty(template -> 'spec' -> 'containers' -> 0 -> 'resources') as resources
  from 
    kubernetes_deployment
  where 
    name ~ 'adservice'
  ```

```
+------------------------+---------------------------+
| namespace              | resources                 |
+------------------------+---------------------------+
| gke-microservices-demo | {                         |
|                        |     "limits": {           |
|                        |         "memory": "300Mi" |
|                        |     },                    |
|                        |     "requests": {         |
|                        |         "memory": "180Mi" |
|                        |     }                     |
|                        | }                         |
+------------------------+---------------------------+
```

This query asks: "Which node is running the app?"

```
with images as (
  select
    name,
    jsonb_array_elements_text(image.value->'names') a
  from
    kubernetes_node,
    jsonb_array_elements(images) as image
)
select
  name
from
  images
where
  image_name ~ 'microservices-demo' limit 1
```

```
+-----------------------------------------------------+
| name                                                |
+-----------------------------------------------------+
| gke-smyth-test-cluster-1-default-pool-42aa2e29-7264 |
+-----------------------------------------------------+
```

And now, switching from the Kubernetes plugin to the GCP plugin, this query asks: "What are the instance details?"

```sql
select
  machine_type_name,
  status,
  zone_name,
  network_interfaces -> 0 ->> 'networkIP' as networkIP
from
  gcp_compute_instance 
where 
  name = 'gke-smyth-test-cluster-1-default-pool-42aa2e29-7264'
```

```
+-------------------+---------+---------------+-------------+
| machine_type_name | status  | zone_name     | networkip   |
+-------------------+---------+---------------+-------------+
| e2-medium         | RUNNING | us-central1-c | 10.128.0.48 |
+-------------------+---------+---------------+-------------+
```

## Check infrastructure as data

Because the common model enables queries against both statically-defined resources and deployed resources, [compliance mods](https://hub.steampipe.io/mods?objectives=compliance) can check both types. For example, this Terraform code defines two AWS S3 buckets, one with logging and one without.

```hcl
provider "aws" {
  region = "us-east-1"
}

resource "aws_s3_bucket" "my_bucket_1" {
  bucket = "jon-test-project-foo-bucket-01"

  logging {
    target_bucket = aws_s3_bucket.my_bucket_2.bucket
    target_prefix = "log/"
  }
}

resource "aws_s3_bucket" "my_bucket_2" {
  bucket = "jon-test-project-foo-bucket-02"
}
```
The [AWS Compliance](https://hub.steampipe.io/mods/turbot/aws_compliance) mod includes a control ([s3_bucket_logging_enabled](https://hub.steampipe.io/mods/turbot/aws_compliance/controls/control.s3_bucket_logging_enabled)) that checks to ensure buckets will be logged. If we deploy these buckets and run the control, we'll see that bucket 2 is in the Alarm state. 

![](/images/blog/2023-10-select-star-from-code/aws-compliance-check-logging.png)

But why even let that happen? The corresponding [Terraform AWS Compliance](https://hub.steampipe.io/mods/turbot/terraform_aws_compliance) mod also includes an [s3_bucket_logging_enabled](https://hub.steampipe.io/mods/turbot/terraform_aws_compliance/controls/control.s3_bucket_logging_enabled) control. We can use that to detect the misconfiguration and fix it before the buckets ever get deployed.

![](/images/blog/2023-10-select-star-from-code/tf-aws-compliance-check-logging.png)

Mods that check statically-defined resources have long been available for both Terraform ([AWS](https://hub.steampipe.io/mods/turbot/terraform_aws_compliance), [Azure](https://hub.steampipe.io/mods/turbot/terraform_azure_compliance), [GCP](https://hub.steampipe.io/mods/turbot/terraform_gcp_compliance)) and [Kubernetes](https://hub.steampipe.io/mods/turbot/kubernetes_compliance). The queries that power all these controls exploit the power of the common model: it's just SQL, whether you're reasoning over static resources or live APIs.

Now we've broadened the scope of queries against static resources. In [Interrogate Terraform](https://steampipe.io/blog/interrogate-terraform) we showcase a new capability: you can now query plan files and state files along with `.tf` files. And in [Interrogate Kubernetes](https://steampipe.io/blog/interrogate-kubernetes) we discuss the analogous new ability to query helm charts and manifests along with live clusters. These new query targets make it even easier to reason over static resources and compare them to those you've deployed. It all follows logically from a unified model that abstracts the differences between static resources and live infrastructure. 

## GitOps for infrastructure as data

You've long been able to install Steampipe in [CI/CD pipelines](https://steampipe.io/docs/integrations/overview#cicd-pipelines) and use it to run checks and queries. That becomes even more powerful now that you can query and check Terraform plans and state files, as well as Kubernetes charts and manifests. But wait, there's more! We're also announcing [new GitHub actions](https://steampipe.io/blog/github-actions) that make it easier than ever to use Steampipe, along with plugins and mods, in GitHub workflows. Use the [Setup action](https://github.com/marketplace/actions/setup-steampipe) to install Steampipe itself, along with any of the plugins you'll need. Then use the [Check action](https://github.com/marketplace/actions/steampipe-check) to install mods that run your checks.

The Check action has a special trick up its sleeve. If a compliance mod detects a misconfiguration in a static resource that's included in pull request, the action will annotate the PR with the alarm reported by the control — at the line number in the resource file that triggered the alarm! Again, all this follows logically from a unified model that enables common declarative query over static resource and live APIs.

## select &#x002A; from cloud join select &#x002A; from code

Steampipe's original mantra was <em>select * from cloud</em>. We thought it was amazing to able to query within and across live APIs using basic SQL, and those of you who've experienced that tell us you agree. In an IaC world, though, resources you can see via APIs are defined in code, so we made it possible to query the code too. In three companion posts — [Interrogate Terraform](https://steampipe.io/blog/interrogate-terraform), [Interrogate Kubernetes](https://steampipe.io/blog/interrogate-kubernetes), and [Steampipe-powered GitHub actions](https://steampipe.io/blog/github-actions) — we explore the latest developments enabled by an IaC data model that unifies everything. 

Questions? Comments? Bring them to the [Turbot community](/community/join)! We'd love to know your thoughts.



