Announcement

Zero-ETL for Postgres: Live-query cloud APIs with 100+ new FDWs

A new family of Postgres extensions enable SQL queries that translate APIs to foreign tables.

Steampipe Team
5 min. read - Dec 12, 2023
A new family of Postgres extensions enable SQL queries that translate APIs to foreign tables.

Your Postgres database just gained a new superpower: the ability to fill tables with data from cloud APIs. Actually there are more than 100 of these superpowers, that's how many new Postgres foreign data wrapper extensions (FDWs) just appeared in the world. We'll use the GitHub extension to illustrate but they all work the same way. You write SQL like this.

select state, closed_at, title
from github.github_issue
where repository_full_name = 'turbot/steampipe-plugin-github'
order by state desc, closed_at desc;

And you get results like this.

+--------+---------------------------+-----------------------------------------------------------------------------------------------------------+
| state | closed_at | title |
+--------+---------------------------+-----------------------------------------------------------------------------------------------------------+
| OPEN | <null> | A way to query for assignee on Pull Requests |
| OPEN | <null> | Add `created_at` list key column and potentially others in `github_actions_repository_workflow_run` table |
| CLOSED | 2023-11-06T23:38:53-08:00 | Queries fail on GHE due to missing fields |
| CLOSED | 2023-10-24T08:14:09-07:00 | 502 Bad Gateway with repos with large amount of PRs

In this case we're asking the GitHub extension to tell us about its own issues. The github_issue table is just one of 55 tables provided by the GitHub extension, and that extension is just one of 100+. Some represent APIs as a handful of tables, others as hundreds of tables; the AWS extension leads the pack with over 450. There are more than 2000 tables in all, each clearly documented with copy/paste/run examples.

When you query one of these tables, the extension fills with live data from underlying APIs. It retains query results in its cache for five minutes, after which a repeat of the same query will again fetch from APIs. Why this ephemeral approach? It's our take on zero-ETL: pull data directly from sources, and use SQL as the uniform way to ask questions about their current state. APIs are fast enough to answer a wide range of such questions without needing to warehouse the data.

Ready to try? Let's get started.

Install and configure the GitHub extension

Visit Steampipe downloads to find the installer for your OS, and run it to put the extension's three files into your Postgres installation.

Now connect to your server as usual, using psql or another client, most typically as the postgres user. Then run these commands which are typical for any Postgres foreign data wrapper.

As with all Postgres extensions, you start like this:

CREATE EXTENSION steampipe_postgres_fdw_github;

To use a foreign data wrapper, you first create a server:

CREATE SERVER steampipe_github FOREIGN DATA WRAPPER steampipe_postgres_github OPTIONS (config 'token="ghp_..."');

Use OPTIONS to configure the extension to use your GitHub access token. (Alternatively, the standard environment variables used to configure a Steampipe plugin – GITHUB_TOKEN in this case – will work if you set them before starting your instance of Postgres.)

The tables provided by the extension will live in a schema, so define one:

CREATE SCHEMA github;

Now import the schema defined by the foreign server into the local schema you just created:

IMPORT FOREIGN SCHEMA github FROM SERVER steampipe_github INTO github;

Now run a query!

The foreign tables provided by the extension live in the github schema, so by default you'll refer to tables like github.github_my_repository. If you set search_path = 'github', though, the schema becomes optional and you can write queries like this.

select
count(*)
from
github_my_repository
count
-------
447

If you have a lot of repos, the first run of that query will take a few seconds. The second run will return results instantly, though, because the extension includes a powerful and sophisticated cache.

And that's all there is to it! Every Steampipe foreign data wrapper works exactly like this one.

What does Zero-ETL mean?

Zero to query in seconds

It's quick and easy to install an extension, configure credentials, and immediately query the predefined foreign tables it provides.

Install the FDW for a Steampipe plugin, configure credentials, and you're ready to go. Immediately query the provided tables, join across them, or join with you own tables and/or those provided by other Steampipe FDWs.

Zero infrastructure

No data warehouse is required, and persistence is optional. Just write SQL queries to ask questions of cloud APIs and receive instant answers from live API calls.

Our zero-ETL approach stores nothing permanently by default; your foreign tables represent the current state of the systems whose APIs they pull from. But you do have the option to retain results in a Postgres materialized view. To do that, prepend create materialized view TABLE_NAME as to any query of a foreign table. You can then use refresh materialized view TABLE_NAME to fill the view with current data from underlying APIs.

Zero roadblocks

These open-source extensions are available for more than 100 services. Each provides prebuilt tables that reflect lessons learned by users and contributors as the suite has evolved and matured.

There are thousands of tables that embody consistent access patterns, and each table is well-documented with examples. Missing a table? Add one! Missing a plugin? Build one!

See it in action

Tap into the Steampipe ecosystem

The Steampipe hub currently provides more that 100 extensions that all work like the ones you've seen here. There are extensions to query infrastructure (AWS, Azure, GCP, Kubernetes), SaaS (ClickUp, Jira, Google Workspace, Microsoft 365, Salesforce, Slack), security tools (1Password, CrowdStrike, PAN-OS, Trivy), and IaC (Ansible, CloudFormation, Terraform), directories (Google Directory, LDAP, Okta), and network tools (crt.sh, ipstack, Tailscale). The ecosystem keeps growing as we and our contributors create new Steampipe plugins that will now also be available as standalone Postgres extensions.

Are you ready to get started? Head over to Steampipe downloads, try a few of these extensions, and let us know how it goes!