Announcement

Zero-ETL for SQLite: Live-query cloud APIs with 100+ new extensions

A new family of SQLite extensions enable SQL queries that translate APIs to foreign tables.

Steampipe Team
5 min. read - Dec 12, 2023
A new family of SQLite extensions enable SQL queries that translate APIs to foreign tables.

Your SQLite database just gained a new superpower: the ability to fill tables with data from cloud APIs. Actually there are more than 100 of these superpowers, that's how many new SQLite extensions just appeared in the world. We'll use the GitHub extension to illustrate but they all work the same way. You write SQL like this.

select
state, closed_at, title
from
github.github_issue
where
repository_full_name = 'turbot/steampipe-plugin-github'
order by state desc, closed_at desc;

And you get results like this.

+--------+---------------------------+-----------------------------------------------------------------------------------------------------------+
| state | closed_at | title |
+--------+---------------------------+-----------------------------------------------------------------------------------------------------------+
| OPEN | <null> | A way to query for assignee on Pull Requests |
| OPEN | <null> | Add `created_at` list key column and potentially others in `github_actions_repository_workflow_run` table |
| CLOSED | 2023-11-06T23:38:53-08:00 | Queries fail on GHE due to missing fields |
| CLOSED | 2023-10-24T08:14:09-07:00 | 502 Bad Gateway with repos with large amount of PRs

In this case we're asking the GitHub extension to tell us about its own issues. The github_issue table is just one of 55 tables provided by the GitHub extension, and that extension is just one of 100+. Some represent APIs as a handful of tables, others as hundreds of tables; the AWS extension leads the pack with over 450. There are more than 2000 tables in all, each clearly documented with copy/paste/run examples.

When you query one of these tables, the extension fills with live data from underlying APIs. It retains query results in its cache for five minutes, after which a repeat of the same query will again fetch from APIs. Why this ephemeral approach? It's our take on zero-ETL: pull data directly from sources, and use SQL as the uniform way to ask questions about the current state of those sources. APIs are fast enough to answer a wide range of such questions without needing to warehouse the data.

These queries can join with your own SQLite tables, and/or with queries of virtual tables provided by other extensions. In Using Shodan to test AWS Public IPs we showed how to query for AWS EC2 instances with public IP addresses deemed vulnerable by Shodan.

select
instance_id,
ports,
vulns,
security_groups
from
aws_ec2_instance
left join
shodan_host on public_ip_address = ip
where
public_ip_address is not null;

Ready to try? Let's get started.

Install and configure the GitHub extension

To install the extension, download the version for your OS and CPU architecture. Now it's the usual drill you'll be familiar with if you've ever loaded and used a SQLite extension. In the SQLite shell, with a version of SQLite that enables extension loading, do this.

.load /home/jon/anywhere-sqlite/steampipe_sqlite_extension_github.so

To configure the extension with your GitHub token, do this.

select github_configure('token='"ghp_…""');

Steampipe plugins often support multiple authentication methods that you configure in .spc files, or by way of environment variables, or by defaulting to the credentials you use with a CLI tool. With a standalone SQLite extension there's no .spc file. The standard env vars used to configure a Steampipe plugin will work if you set them before starting your instance of SQLite. So in this case if you've set GITHUB_TOKEN you can skip the configure_github() call.

What does Zero-ETL mean?

Zero to query in seconds

It's quick and easy to install an extension, configure credentials, and immediately query the predefined foreign tables it provides.

Install the SQLite extension for a Steampipe plugin, configure credentials, and you're ready to go. Immediately query the provided tables, join across them, or join with you own tables and/or those provided by other Steampipe SQLite extensions.

Zero infrastructure

No data warehouse is required, and persistence is optional. Just write SQL queries to ask questions of cloud APIs and receive instant answers from live API calls.

Our zero-ETL approach stores nothing permanently by default; your foreign tables represent the current state of the systems whose APIs they pull from. But you do have the option to retain results. To do that, prepend create materialized view TABLE_NAME as to any query of a foreign table.

Zero roadblocks

These open-source extensions are available for more than 100 services. Each provides prebuilt tables that reflect lessons learned by users and contributors as the suite has evolved and matured.

There are thousands of tables that embody consistent access patterns, and each table is well-documented with examples. Missing a table? Add one! Missing a plugin? Build one!

See it in action

Tap into the Steampipe ecosystem

The Steampipe hub currently provides more than 100 extensions that all work like the ones you've seen here. There are extensions to query infrastructure (AWS, Azure, GCP, Kubernetes), SaaS (ClickUp, Jira, Google Workspace, Microsoft 365, Salesforce, Slack), security tools (1Password, CrowdStrike, PAN-OS, Trivy), and IaC (Ansible, CloudFormation, Terraform), directories (Google Directory, LDAP, Okta), and network tools (crt.sh, ipstack, Tailscale). The ecosystem keeps growing as we and our contributors create new Steampipe plugins that will now also be available as standalone SQLite extensions.

Are you ready to get started? Head over to Steampipe downloads, try a few of these extensions, and let us know how it goes!