What is Steampipe?
SQL is an expressive language for asking questions of structured data. Steampipe, an open-source project from Turbot, enables cloud pros (e.g. software developers, operations engineers and security teams) to query their favorite cloud services with SQL.
Steampipe's intuitive command line interface (CLI) makes it easy to ask questions of your cloud resources and services. Traditional tools and custom scripts that provide visibility into these services are cumbersome, inconsistent across providers and painful to maintain. Steampipe provides a consistent, explorable and interactive approach to solve this problem across an ever growing list of cloud services.
select region, instance_state as state, instance_type as typefrom aws_ec2_instance;
+-----------+---------+-----------+ | region | state | type | +-----------+---------+-----------+ | eu-west-1 | running | t3.medium | | eu-west-2 | running | m5a.large | | us-east-1 | running | t3.large | +-----------+---------+-----------+
New in Release v0.3.0
tl;dr | We now enable query caching by default | Steampipe configuration has moved from environment variables to file-based configuration | Improved query optimization enables more complex joins across tables that have quals requirements.
Query Caching
In version v0.2.0 we released query caching as a preview capability (it needed to be manually enabled via an environment variable); Query caching is now enabled by default.
The first time you run a query, Steampipe retrieves data from the cloud service API and returns the live data. With caching enabled, that result is stored (in-memory) for subsequent queries. The cache is local to your device, the duration is configurable (the default is 5 minutes) and the data does not persist when Steampipe exits (unless you run Steampipe as a service).
How fast is the cache?
Fast (1 Second)
The first time you query a connection, Steampipe needs to create and authenticate API connections, in this case it took a little over 1 second to establish connections across 16 AWS regions and return results.
> .timing on> select region, instance_id, instance_state, instance_type, title from aws_ec2_instance;
+-----------+---------------------+----------------+---------------+-----------------+ | region | instance_id | instance_state | instance_type | title | +-----------+---------------------+----------------+---------------+-----------------+ | eu-west-1 | i-003d889c8dc91f939 | running | t3.medium | Dev Bastion | | eu-west-2 | i-072ee9d889c80c59a | running | m5a.large | Squid | | us-east-1 | i-0667842133f5baeb7 | stopped | t3.large | WinBastion | | us-east-2 | i-059b0d1eaa04232f8 | running | t3.large | ECS Host | | us-east-2 | i-0e6f804203eb894eb | running | t2.micro | Linux Bastion | +-----------+---------------------+----------------+---------------+-----------------+ Time: 1.045864439s
Real Fast (¼ Second)
With the connections now cached, the same query returns in less than ¼ second.
> .timing on> select region, instance_id, instance_state, instance_type, title from aws_ec2_instance;
+-----------+---------------------+----------------+---------------+-----------------+ | region | instance_id | instance_state | instance_type | title | +-----------+---------------------+----------------+---------------+-----------------+ | eu-west-1 | i-003d889c8dc91f939 | running | t3.medium | Dev Bastion | | eu-west-2 | i-072ee9d889c80c59a | running | m5a.large | Squid | | us-east-1 | i-0667842133f5baeb7 | stopped | t3.large | WinBastion | | us-east-2 | i-059b0d1eaa04232f8 | running | t3.large | ECS Host | | us-east-2 | i-0e6f804203eb894eb | running | t2.micro | Linux Bastion | +-----------+---------------------+----------------+---------------+-----------------+ Time: 243.693268ms
Blazing Fast ( < 1 Microsecond)
With query caching enabled, subsequent queries to the same table are more than 1000x faster!
$ export STEAMPIPE_CACHE=true$ steampipe query> .timing on> select region, instance_id, instance_state, instance_type, title from aws_ec2_instance;
+-----------+---------------------+----------------+---------------+-----------------+ | region | instance_id | instance_state | instance_type | title | +-----------+---------------------+----------------+---------------+-----------------+ | eu-west-1 | i-003d889c8dc91f939 | running | t3.medium | Dev Bastion | | eu-west-2 | i-072ee9d889c80c59a | running | m5a.large | Squid | | us-east-1 | i-0667842133f5baeb7 | stopped | t3.large | WinBastion | | us-east-2 | i-059b0d1eaa04232f8 | running | t3.large | ECS Host | | us-east-2 | i-0e6f804203eb894eb | running | t2.micro | Linux Bastion | +-----------+---------------------+----------------+---------------+-----------------+ Time: 653.39µs
File-Based Configuration
In previous versions of Steampipe, configuration options would need to be set as environment variables. For example, to enable the query caching preview in v0.2.0 (discussed above) you had to set an environment variable:
Environment variables are now deprecated in favor of configuration files. Steampipe will create a ~/.steampipe/config/default.spc
file upon installation, but the naming is inconsequential as Steampipe will load ALL configuration files from ~/.steampipe/config
that have a .spc
extension.
The configuration files themselves use HCL format for their syntax. A list of configuration options is available in the docs. Using the default configuration option we can enable the Steampipe cache and set the timeout to 5 minutes:
Improved Query Optimization
Much of the magic in how Steampipe is able to query and represent data is a result of our use of Postgres Foreign Data Wrappers (FDW). FDW provide a standardized way of handling access to remote objects from PostgreSQL.
This works flawlessly when querying data that can be listed (e.g. a list of IAM users in an account), but in some circumstances data does not exist unless qualifying paramaters are passed to the API. An example of this is the aws_iam_policy_simulator. Selecting all records (*) from the table would be meaningless, no data exists unless principal_arn, action, and resource_arn are specified:
Steampipe v0.3.0 now includes improved algorithms to recognize when database quals (query restrictions) can be adversely affected by query optimizations and will be able to handle more edge cases effectively.
Putting it all Together
We can see the combined impact of caching and improved quals in the following use case: Query a list of all IAM users in a given account, and then run a policy simulator test for the s3:DeleteBucket
permission.
Before v0.3.0 this query would fail with the following error: 'List' call requires an '=' qual for all columns: principal_arn,action,resource_arn
, now it runs flawlessly and performance is improved with cached results:
With the improved quals processing in this version the query executes and runs in a little over 7 seconds:
select u.name, decisionfrom aws_iam_policy_simulator p, aws_iam_user uwhere action = 's3:DeleteBucket' and resource_arn = '*' and p.principal_arn = u.arn;
+-----------------+--------------+ | name | decision | +-----------------+--------------+ | jan_levinson | explicitDeny | | kelly_kapoor | allowed | | ryan_howard | allowed | | jim_halpert | allowed | | dwight_schrute | allowed | | michael_scott | implicitDeny | +-----------------+--------------+ Time: 7.252459667s
Running the same query with cached aws_iam_user data results in a 5x performance improvement:
select u.name, decisionfrom aws_iam_policy_simulator p, aws_iam_user uwhere action = 's3:DeleteBucket' and resource_arn = '*' and p.principal_arn = u.arn;
+-----------------+--------------+ | name | decision | +-----------------+--------------+ | jan_levinson | explicitDeny | | kelly_kapoor | allowed | | ryan_howard | allowed | | jim_halpert | allowed | | dwight_schrute | allowed | | michael_scott | implicitDeny | +-----------------+--------------+ Time: 1.450723871s
Yes, we think that is super cool too!
Our team has fun solving these tricky query edge cases. If you think this kind of work is fun too, please join our growing Steampipe community! You can contribute by adding additional tables, helping with documentation or creating a brand new plugin! We hope the new capabilities in v0.3.0 prove to be a huge time saver for you in your day-to-day cloud work. For even more good stuff, checkout the full release notes on Steampipe v0.3.0.