Announcement

Scaling an open source community

How we use Steampipe and Turbot Pipes to manage a growing collection of open-source repos.

Turbot Team
5 min. read - Oct 05, 2023
How we use Steampipe and Turbot Pipes to manage a growing collection of open-source repos.

The Steampipe project has spawned a growing collection of open-source repositories. Currently our hub lists 136 plugins and 42 mods. We also manage contributions to the Steampipe core and the plugin SDK. At this scale and pace it's critical to standardize the configuration of our repos, and GitHub's native tooling doesn't make that easy. Equally critical: we need to track PRs and stale issues across all the repos as the number of contributions grows over time.

Happily the project has also spawned the tools we need to manage all these repos. The GitHub plugin enables queries that drive controls, benchmarks, and reports. Pipes enables us to run these on a schedule, cut snapshots to a shared team workspace, and notify the team when new snapshots appear.

The custom mods that package these queries are available for you to reuse and remix, or you can just take these ideas and run with them. Let's see how it all works.

Checking repo settings

For plugins, our checks include these conditions:

  • The plugin has a standard description

  • It uses the correct mandatory topics

  • It uses semantic versioning

Here, for example, is a control that checks for a standard description.

control "plugin_repo_description" {
title = "Plugin repo has standard description"
sql = <<-EOT
(
select
url as resource,
case
when description like 'Use SQL to instantly query %. Open source CLI. No DB required.' then 'ok'
else 'alarm'
end as status,
name_with_owner || ': ' || description as reason,
name_with_owner
from
github_my_repository
where
name_with_owner like 'turbot/steampipe-plugin-%'
)
union
(
select
url as resource,
case
when description like 'Use SQL to instantly query %. Open source CLI. No DB required.' then 'ok'
else 'alarm'
end as status,
name_with_owner || ': ' || description as reason,
name_with_owner
from
github_repository
where
full_name in (select jsonb_array_elements_text(to_jsonb($1::text[])))
)
EOT
param "github_external_repo_names" {
description = "External repo names."
default = var.github_external_repo_names
}
}

As with all controls, the query that drives this one reports the three required columns resource, status, and reason. It's a UNION of two subqueries because the repos published to the hub are a mix of some in the turbot GitHub organization and others maintained elsewhere by project contributors. So the first subquery uses github_my_repository to check plugin repos belonging to turbot and the second uses github_repository to check repos belonging to contributors. The contributed repos are defined by a mod variable that's passed as a parameter to the query and instantiated as $1 in the query.

variable "github_external_repo_names" {
type = list(string)
description = "A list of community repositories to run checks for."
default = [ "francois2metz/steampipe-plugin-ovh", "theapsgroup/steampipe-plugin-vault", "francois2metz/steampipe-plugin-scalingo", "francois2metz/steampipe-plugin-gandi", "francois2metz/steampipe-plugin-airtable", "theapsgroup/steampipe-plugin-gitlab", "ellisvalentiner/steampipe-plugin-confluence", "theapsgroup/steampipe-plugin-keycloak" ]
}

Partial output from the query looks like this.

https://github.com/turbot/steampipe-plugin-slack | ok | turbot/steampipe-plugin-slack: Use SQL to instantly query users, channels, emoji and more from your Slack workspace. Open source CLI. No DB required.
https://github.com/theapsgroup/steampipe-plugin-vault | ok | theapsgroup/steampipe-plugin-vault: Use SQL to instantly query Hashicorp Vault secrets, certs and more. Open source CLI. No DB required. >

Note that standard descriptions aren't all identical! The control enforces a common pattern but each plugin describes its query targets in its own way.

You can run the control individually in your console like so:

steampipe check control.plugin_repo_description

More typically, you'll want to package sets of control as benchmarks.

benchmark "plugin" {
title = "Plugin Repository Checks"
children = [
control.plugin_repo_description,
control.plugin_repo_has_mandatory_topics,
control.plugin_uses_semantic_versioning,
]
}

There's another benchmark for mods, which checks for monotonic versioning and a different set of mandatory tags. And there's a common benchmark as well.

benchmark "common" {
title = "Common Repository Checks"
children = [
control.license_is_apache,
control.repo_homepage_links_to_hub,
control.repo_wiki_disabled,
control.repo_projects_disabled,
control.repo_is_public,
]
}

You can run steampipe dashboard and visit localhost:9194 to view benchmarks and their controls in your browser. But of course you'll want to share such results with your team. So we run the benchmarks and controls in a Pipes pipeline, capture results as snapshots, and notify a Slack channel when each snapshot completes.

Tracking open PRs and stale issues

Here's our dashboard report for open pull requests raised against plugins and mods.

There are more PRs than these, but this report focuses our attention on those from contributors outside the turbot organization. Here's the query that finds that set of PRs. It searches for open PRs in org:turbot excluding those from members of the organization. The GitHub plugin makes this easy to do with nothing more than basic SQL.

select
repository_full_name as "Repository",
title as "Pull Request",
now()::date - created_at::date as "Age in Days",
now()::date - updated_at::date as "Last Updated (Days)",
author ->> 'login' as "Author",
url
from
github_search_pull_request
where
query = 'org:turbot is:open'
and repository_full_name ~ 'turbot/steampipe-(plugin|mod)'
and repository_full_name <> 'turbot/steampipe-plugin-sdk'
and author ->> 'login' not in (
select
m.login as member_login
from
github_organization_member m
where
m.organization = 'turbot'
)
order by
"Age in Days" desc;

Naturally we want to see trends as well. For that we take advantage of the fact that these dashboards, running in a Pipes pipeline, produce daily snapshots. The data contained in those snapshots is accessible by way of the Turbot Pipes plugin. We use it to query the data in those snapshots and build this dashboard.

Here's the query that drives the Open Plugin and Mod Pull Requests chart.

select
created_at as "Date",
sum((r ->> 'Age in Days')::numeric) as "Total Days"
from
pipes_workspace_snapshot,
jsonb_array_elements(data -> 'panels' -> 'github_tracker.table.container_dashboard_github_open_plugin_mod_pull_request_report_anonymous_container_0_anonymous_table_0' -> 'data' -> 'rows') as r
where
dashboard_name = 'github_tracker.dashboard.github_open_plugin_mod_pull_request_report'
group by
created_at
order by
created_at

The query finds all the snapshots for a dashboard, reaches into each snapshot's JSON, grabs the rows for a panel, then reports dates and the sum of Age in Days for each date.

The dashboard that contains these charts, as well as reports like the one shown above, runs on a schedule in a Pipes pipeline.

These dashboards, packaged as mods that we install into a team workspace, have helped us focus and prioritize. And the charts show us that were making good progress!

Mix, stir, and serve hot

The ingredients of this solution include: the GitHub plugin (now even more awesome); the Pipes plugin; the plugin management, SQL, control execution, and dashboard services provided by Steampipe; and the snapshots, pipelines, notifications provided in a team workspace by Pipes. It's surprisingly easy to stir all these ingredients together, bind them with HCL and SQL, and deliver the views you need as steward of a growing community of open-source contributors. They're friendly folks, and you can meet them in our Slack community.