v0.10.0: Better concurrency and caching →
Announcement

v0.10.0: Better concurrency and caching

Discover the great new features in Steampipe's open source v0.10.0 release!

Steampipe Team
7 min. read - November 24, 2021

What is Steampipe?

Steampipe is open source software for interrogating your cloud. Run SQL queries, compliance controls and full governance benchmarks from the comfort of your CLI.

Steampipe’s codified operations framework gives you the power to test your cloud resources against security, compliance and cost benchmarks, and to build your own custom control frameworks.

Our multi-threaded Golang CLI makes your custom SQL controls blazing fast with unlimited integration options via our embedded PostgreSQL database.

steampipe cli
>
select
region,
instance_state as state,
instance_type as type
from
aws_ec2_instance;

+-----------+---------+-----------+
| region    | state   | type      |
+-----------+---------+-----------+
| eu-west-1 | running | t3.medium |
| eu-west-2 | running | m5a.large |
| us-east-1 | running | t3.large  |
+-----------+---------+-----------+
        

tl;dr

Better concurrency + caching = faster control runs
5 new plugins.
New and updated mods.
Even more goodies in the full release notes.

Better concurrency + caching = faster control runs

Steampipe v0.10 delivers major improvements to concurrency and caching that will, among other benefits, enable up to 5 controls to run in parallel and share a common cache. In v0.9 there was already plenty of parallelism and smart caching. When using the AWS plugin to query the aws_s3_bucket table, for example, there were three kinds of concurrency in play.

  1. Across sub-API calls. If your query asked for columns requiring separate S3 sub-APIs, such as GetBucketVersioning, GetBucketTagging, and GetBucketReplication, those sub-API calls happened concurrently.

  2. Across regions. When your account was configured for multiple regions, Steampipe queried all of them concurrently.

  3. Across accounts. When you used a connection aggregator, Steampipe parallelized across all aggregated accounts.

In v0.9 the foreign data wrapper (FDW) was also responsible for managing plugins and the query results cache. In v0.10 plugin management moves from the FDW to a separate plugin manager, which enables parallel database connections to use the same set of Steampipe connections.

Sharing plugin processes across database connections

To unpack that terminology, any client that connects to Postgres -- for example, psql or steampipe query -- uses a database connection. A Steampipe connection, though, exists between the FDW and a data source specified by the connection argument in a plugin's .spc file. If you aggregate two accounts AWS_A and AWS_B as AWS_ALL, you've specified two Steampipe connections. If you then run steampipe query in one terminal and steampipe check in another that adds up to four Steampipe connections, each an independing API client and gRPC server running in its own OS process. That's the situation shown on the left side of this diagram.

In v0.10, the FDW is no longer reponsible for managing plugins. When Steampipe starts the database service, it also starts the plugin manager. When a Steampipe client aggregates two AWS accounts, there will still be two plugin processes serving those two Steampipe connections. But that's all. A second Steampipe client will share the services of those plugin processes.

Sharing cache across database connections

In v0.9 the FDW held the cache of query results delivered from the plugins it managed. That meant a query running in one terminal and a benchmark running in another could not share cache hits. It also limited the performance to be gained by spawning database connections during control runs.

In v0.10, the query results cache moves from the FDW to the plugins. That means we can speed up control runs by spawning database connections. A steampipe check command runs faster not only because the work happens in parallel, but also because those database connections share a common plugin-managed cache. If any client causes a datum to be cached, it's available to other clients using other database connections. Eventually API throttling becomes the gating factor, so by default we stop at 5 database connections, but the new mechanism delivers a major speedup.

Caching row subsets

Shared cache across database connections was already a big improvement, but v0.10 does even more. In v0.9 a query that adds or alters "quals" (i.e. the WHERE clause) would refetch data and consume additional space in the cache. In v0.10 the results fetched from the broader query are cached as before, but now are available to the narrower query. This saves cache space and is much faster.

These two sequences show the difference. In v0.9:

select * from github_repository where full_name = 'turbot/steampipe-plugin-aws'
Time: 634.072742ms
select * from github_repository where full_name = 'turbot/steampipe-plugin-aws' and not private
Time: 808.203013ms

There's no cache benefit for the second query.

The same sequence in v0.10:

select * from github_repository where full_name = 'turbot/steampipe-plugin-aws'
Time: 692.34312ms
select * from github_repository where full_name = 'turbot/steampipe-plugin-aws' and not private
Time: 9.45468ms

The second query uses the cache.

New plugins

Since our last release, we've added 5 new plugins:

  • Airtable - query tables defined in an Airtable database
  • Datadog - query dashboards, logs, metrics users, and more
  • Google Sheets - query tables defined in Google Sheets
  • Twilio - query accounts, calls, messages, and more
  • VMWare vSphere - query datastores, hosts, networks, vms

We are always improving the suite of plugins. Highlights during this cycle: 11 new tables added to AWS, 22 new tables added to Azure Active Directory, and improved handling of context cancellation for many plugins.

New and updated mods

Let’s get building!

Steampipe delivers tools to build, execute and share cloud configuration, compliance, and security frameworks using SQL, HCL and a little elbow grease. To support those tools, it maps a growing suite of APIs to tables that you can query, and join across, in Postgres.

Do you want to help us expand the open source documentation and control coverage for CIS, PCI, HIPAA, and NIST? Add tables to existing plugins? Create plugins to bring new APIs into the mix? The best way to get started is to join our Slack workspace and raise your hand; we would love to talk to you!

For even more good stuff in v0.10.0, check out the full release notes on GitHub.