How To

Why build an HTTP client into a database? So you can ingest web data directly!

When there isn't a Steampipe plugin to meet your need, the Net plugin's net_http_request table can save the day.

Jon Udell
5 min. read - Dec 08, 2022
When there isn't a Steampipe plugin to meet your need, the Net plugin's net_http_request table can save the day.

When a plugin provides a Steampipe table, it typically populates the table by making an API call, then mapping the JSON response to a Go struct which the Steampipe foreign data wrapper then maps to a Postgres table. But not all tables work this way. The CSV plugin builds Postgres tables from one or more CSV files. The Code plugin provides one table, code_secret, that you join with a column in any other table to scan that column for secrets. The Trivy plugin embeds the Trivy scanner and runs it to populate tables.

Steampipe tables can do all sorts of surprising things! Did you know, for example, that the Net plugin's net_http_request encapsulates a full-blown HTTP client?

select
response_status_code,
response_headers
from
net_http_request
where
url = 'https://steampipe.io'
+----------------------+-----------------------------------------------------------------------
| response_status_code | response_headers >
+----------------------+-----------------------------------------------------------------------
| 200 | {"Access-Control-Allow-Origin":["*"],"Age":["102041"],"Cache-Control":
+----------------------+-----------------------------------------------------------------------

Here are three ways we've been using net_http_request.

1. Summarize the News and Reviews page

Our News and Reviews page lists articles about Steampipe published elsewhere. We wanted to chart them by month. The source of the data is the News and Reviews web page. Here's the query that produces data for the chart.

-- generate a series of months
with months as (
select
to_char(
generate_series (
date('2021-01-01'),
now()::date,
'1 month'::interval
),
'YYYY-MM'
) as month
),
-- read in the article and match publication dates
article_months as (
select
(regexp_matches(response_body, '>(\d{4,4}\-\d{2,2})-\d{2,2}<', 'g'))[1] as article_month,
count(*)
from
net.net_http_request
where
url = 'https://steampipe.io/blog/news-and-reviews'
group by
article_month
order by
article_month
)
-- join with series of months, report month and count
select
month,
coalesce(count, 0) as count
from
months
left join
article_months
on
month = article_month

And here's that data displayed as a chart on a Steampipe dashboard.

We find the dates by matching the HTML returned by net_http_request against a regular expression. The pattern we're looking for is easy to find that way.

Links are another pattern that's easy to match, so we made a link checker that only relies on net_http_request and Postgres' regexp_match function. It embeds this query in a dashboard that enables you to choose a starting URL, check its links, and then click through to check any of those links.

The result is a serviceable interactive link checker made with 100 lines of HCL and SQL!

3. List ClickUp tasks

We just started using ClickUp to manage this blog, so of course we want to query our tasks and also correlate them with preview links available by way of the Vercel plugin.

Good news: Graza Andersson has built a ClickUp plugin.

Bad news: A bug in the upstream Go SDK stopped us from using the plugin to list all our tasks.

Good news: net_http_request to the rescue!

We expect the SDK bug to be resolved soon, but meanwhile here's a query that gets the job done.

-- call the api
with response_body as (
select
response_body::jsonb
from
net_http_request
where
url = 'https://api.clickup.com/api/v2/team/367...759/task'
and request_headers = jsonb_object( array ['authorization', 'pk_44...VR4'] ) -- redacted secret
),
-- extract tasks
tasks as (
select
jsonb_array_elements( response_body -> 'tasks' ) as task
from
response_body
)
-- report tasks
select
task ->> 'name' as name,
task -> 'status' ->> 'status' as status,
task -> 'creator' ->> 'username' as creator,
task -> 'assignees' -> 0 ->> 'username' as assignee,
to_char( to_timestamp( (task ->> 'date_created')::numeric / 1000 ), 'YYYY-MM-DD' ) as date_created,
to_char( to_timestamp( (task ->> 'date_closed')::numeric / 1000 ), 'YYYY-MM-DD') as date_closed,
task ->> 'url' as url
from
tasks
order by
task ->> 'date_updated' desc

In this case it's a temporary solution, we'll soon have the much richer capabilities of the plugin. But what if there's no plugin for an API you'd like to use with Steampipe? The combined power of net_http_request and Postgres' JSON features might take you farther than you'd guess.

Two-way traffic

The examples we've seen here default to HTTP GET, but you can also specify method = 'POST' in a net_http_request query. And along with request headers, you can send a request body. We haven't yet needed to write a query that posts to an endpoint, but the possibilities are intriguing. Suppose, for example, your query returns data that you want to send as a notification to Slack. The conventional way would be to embed Steampipe as a component in a program that runs the query, retrieves the result, and posts it to Slack. With a full-blown HTTP client built into Steampipe you can consider a more unified solution: Steampipe as both query engine and HTTP client.

If you try using Steampipe this way, let us know how it goes!