Announcement

New: Caching, Config and Improved Quals Handling

Learn more about the v0.3.0 release of the opensource Steampipe CLI.

Steampipe Team
5 min. read - Mar 19, 2021
Learn more about the v0.3.0 release of the opensource Steampipe CLI.

What is Steampipe?

SQL is an expressive language for asking questions of structured data. Steampipe, an open-source project from Turbot, enables cloud pros (e.g. software developers, operations engineers and security teams) to query their favorite cloud services with SQL.

Steampipe's intuitive command line interface (CLI) makes it easy to ask questions of your cloud resources and services. Traditional tools and custom scripts that provide visibility into these services are cumbersome, inconsistent across providers and painful to maintain. Steampipe provides a consistent, explorable and interactive approach to solve this problem across an ever growing list of cloud services.

steampipe cli
>
select
region,
instance_state as state,
instance_type as type
from
aws_ec2_instance;

+-----------+---------+-----------+
| region    | state   | type      |
+-----------+---------+-----------+
| eu-west-1 | running | t3.medium |
| eu-west-2 | running | m5a.large |
| us-east-1 | running | t3.large  |
+-----------+---------+-----------+
    

New in Release v0.3.0

tl;dr | We now enable query caching by default | Steampipe configuration has moved from environment variables to file-based configuration | Improved query optimization enables more complex joins across tables that have quals requirements.

Query Caching

In version v0.2.0 we released query caching as a preview capability (it needed to be manually enabled via an environment variable); Query caching is now enabled by default.

The first time you run a query, Steampipe retrieves data from the cloud service API and returns the live data. With caching enabled, that result is stored (in-memory) for subsequent queries. The cache is local to your device, the duration is configurable (the default is 5 minutes) and the data does not persist when Steampipe exits (unless you run Steampipe as a service).

How fast is the cache?

Fast (1 Second)

The first time you query a connection, Steampipe needs to create and authenticate API connections, in this case it took a little over 1 second to establish connections across 16 AWS regions and return results.

> .timing on
> select region, instance_id, instance_state, instance_type, title from aws_ec2_instance;

+-----------+---------------------+----------------+---------------+-----------------+
| region    | instance_id         | instance_state | instance_type | title           |
+-----------+---------------------+----------------+---------------+-----------------+
| eu-west-1 | i-003d889c8dc91f939 | running        | t3.medium     | Dev Bastion     |
| eu-west-2 | i-072ee9d889c80c59a | running        | m5a.large     | Squid           |
| us-east-1 | i-0667842133f5baeb7 | stopped        | t3.large      | WinBastion      |
| us-east-2 | i-059b0d1eaa04232f8 | running        | t3.large      | ECS Host        |
| us-east-2 | i-0e6f804203eb894eb | running        | t2.micro      | Linux Bastion   |
+-----------+---------------------+----------------+---------------+-----------------+
Time: 1.045864439s
    

Real Fast (¼ Second)

With the connections now cached, the same query returns in less than ¼ second.

> .timing on
> select region, instance_id, instance_state, instance_type, title from aws_ec2_instance;

+-----------+---------------------+----------------+---------------+-----------------+
| region    | instance_id         | instance_state | instance_type | title           |
+-----------+---------------------+----------------+---------------+-----------------+
| eu-west-1 | i-003d889c8dc91f939 | running        | t3.medium     | Dev Bastion     |
| eu-west-2 | i-072ee9d889c80c59a | running        | m5a.large     | Squid           |
| us-east-1 | i-0667842133f5baeb7 | stopped        | t3.large      | WinBastion      |
| us-east-2 | i-059b0d1eaa04232f8 | running        | t3.large      | ECS Host        |
| us-east-2 | i-0e6f804203eb894eb | running        | t2.micro      | Linux Bastion   |
+-----------+---------------------+----------------+---------------+-----------------+
Time: 243.693268ms
    

Blazing Fast ( < 1 Microsecond)

With query caching enabled, subsequent queries to the same table are more than 1000x faster!

$ export STEAMPIPE_CACHE=true
$ steampipe query
> .timing on
> select region, instance_id, instance_state, instance_type, title from aws_ec2_instance;

+-----------+---------------------+----------------+---------------+-----------------+
| region    | instance_id         | instance_state | instance_type | title           |
+-----------+---------------------+----------------+---------------+-----------------+
| eu-west-1 | i-003d889c8dc91f939 | running        | t3.medium     | Dev Bastion     |
| eu-west-2 | i-072ee9d889c80c59a | running        | m5a.large     | Squid           |
| us-east-1 | i-0667842133f5baeb7 | stopped        | t3.large      | WinBastion      |
| us-east-2 | i-059b0d1eaa04232f8 | running        | t3.large      | ECS Host        |
| us-east-2 | i-0e6f804203eb894eb | running        | t2.micro      | Linux Bastion   |
+-----------+---------------------+----------------+---------------+-----------------+
Time: 653.39µs
    

File-Based Configuration

In previous versions of Steampipe, configuration options would need to be set as environment variables. For example, to enable the query caching preview in v0.2.0 (discussed above) you had to set an environment variable:

Environment variables are now deprecated in favor of configuration files. Steampipe will create a ~/.steampipe/config/default.spc file upon installation, but the naming is inconsequential as Steampipe will load ALL configuration files from ~/.steampipe/config that have a .spc extension.

The configuration files themselves use HCL format for their syntax. A list of configuration options is available in the docs. Using the default configuration option we can enable the Steampipe cache and set the timeout to 5 minutes:

Improved Query Optimization

Much of the magic in how Steampipe is able to query and represent data is a result of our use of Postgres Foreign Data Wrappers (FDW). FDW provide a standardized way of handling access to remote objects from PostgreSQL.

This works flawlessly when querying data that can be listed (e.g. a list of IAM users in an account), but in some circumstances data does not exist unless qualifying paramaters are passed to the API. An example of this is the aws_iam_policy_simulator. Selecting all records (*) from the table would be meaningless, no data exists unless principal_arn, action, and resource_arn are specified:

Steampipe v0.3.0 now includes improved algorithms to recognize when database quals (query restrictions) can be adversely affected by query optimizations and will be able to handle more edge cases effectively.


Putting it all Together

We can see the combined impact of caching and improved quals in the following use case: Query a list of all IAM users in a given account, and then run a policy simulator test for the s3:DeleteBucket permission.

Before v0.3.0 this query would fail with the following error: 'List' call requires an '=' qual for all columns: principal_arn,action,resource_arn, now it runs flawlessly and performance is improved with cached results:

With the improved quals processing in this version the query executes and runs in a little over 7 seconds:

steampipe cli
>
select
u.name,
decision
from
aws_iam_policy_simulator p,
aws_iam_user u
where
action = 's3:DeleteBucket'
and resource_arn = '*'
and p.principal_arn = u.arn;

  +-----------------+--------------+
  | name            | decision     |
  +-----------------+--------------+
  | jan_levinson    | explicitDeny |
  | kelly_kapoor    | allowed      |
  | ryan_howard     | allowed      |
  | jim_halpert     | allowed      |
  | dwight_schrute  | allowed      |
  | michael_scott   | implicitDeny |
  +-----------------+--------------+
  
  Time: 7.252459667s
        

Running the same query with cached aws_iam_user data results in a 5x performance improvement:

steampipe cli
>
select
u.name,
decision
from
aws_iam_policy_simulator p,
aws_iam_user u
where
action = 's3:DeleteBucket'
and resource_arn = '*'
and p.principal_arn = u.arn;

  +-----------------+--------------+
  | name            | decision     |
  +-----------------+--------------+
  | jan_levinson    | explicitDeny |
  | kelly_kapoor    | allowed      |
  | ryan_howard     | allowed      |
  | jim_halpert     | allowed      |
  | dwight_schrute  | allowed      |
  | michael_scott   | implicitDeny |
  +-----------------+--------------+
  
  Time: 1.450723871s
        

Yes, we think that is super cool too!

Our team has fun solving these tricky query edge cases. If you think this kind of work is fun too, please join our growing Steampipe community! You can contribute by adding additional tables, helping with documentation or creating a brand new plugin! We hope the new capabilities in v0.3.0 prove to be a huge time saver for you in your day-to-day cloud work. For even more good stuff, checkout the full release notes on Steampipe v0.3.0.