# Scaling an open source community

> How we use Steampipe and Turbot Pipes to manage a growing collection of open-source repos.

By Turbot Team
Published: 2023-10-05


The Steampipe project has spawned a growing collection of open-source repositories. Currently our [hub](https://hub.steampipe.io) lists 136 [plugins](https://hub.steampipe.io/plugins) and 42 [mods](https://hub.steampipe.io/mods). We also manage contributions to the [Steampipe core](https://github.com/turbot/steampipe) and the [plugin SDK](https://github.com/turbot/steampipe-plugin-sdk). At this scale and pace it's critical to standardize the configuration of our repos, and GitHub's native tooling doesn't make that easy. Equally critical: we need to track PRs and stale issues across all the repos as the number of contributions grows over time.

![monthly contributions](/images/blog/2023-10-repo-management/monthly-contributions.png)

Happily the project has also spawned the tools we need to manage all these repos. The [GitHub plugin](https://hub.steampipe.io/plugins/turbot/steampipe-plugin-github) enables queries that drive controls, benchmarks, and reports. Pipes enables us to run these on a schedule, cut snapshots to a shared team workspace, and notify the team when new snapshots appear.

The custom mods that package these queries are [available](link) for you to [reuse and remix](https://steampipe.io/blog/remixing-dashboards), or you can just take these ideas and run with them. Let's see how it all works.

## Checking repo settings

For plugins, our checks include these conditions:

- The plugin has a standard description

- It uses the correct mandatory topics

- It uses semantic versioning

Here, for example, is a control that checks for a standard description.

```
control "plugin_repo_description" {
  title = "Plugin repo has standard description"
  sql = <<-EOT
    (
    select
      url as resource,
      case
        when description like 'Use SQL to instantly query %. Open source CLI. No DB required.' then 'ok'
        else 'alarm'
      end as status,
      name_with_owner || ': ' || description as reason,
      name_with_owner
    from
      github_my_repository
    where
      name_with_owner like 'turbot/steampipe-plugin-%'
    )
    union
    (
    select
      url as resource,
      case
        when description like 'Use SQL to instantly query %. Open source CLI. No DB required.' then 'ok'
        else 'alarm'
      end as status,
      name_with_owner || ': ' || description as reason,
      name_with_owner
    from
      github_repository
    where
      full_name in (select jsonb_array_elements_text(to_jsonb($1::text[])))
    )
  EOT
  param "github_external_repo_names" {
    description = "External repo names."
    default     = var.github_external_repo_names
  }
}
```

As with all controls, the query that drives this one reports the three [required columns](https://steampipe.io/docs/reference/mod-resources/control#required-control-columns) `resource`, `status`, and `reason`. It's a UNION of two subqueries because the repos published to [the hub](https://hub.steampipe.io/plugins) are a mix of some in the `turbot` GitHub organization and others maintained elsewhere by project contributors. So the first subquery uses [github_my_repository](https://hub.steampipe.io/plugins/turbot/github/tables/github_my_repository) to check plugin repos belonging to `turbot` and the second uses [github_repository](https://hub.steampipe.io/plugins/turbot/github/tables/github_repository) to check repos belonging to contributors. The contributed repos are defined by a [mod variable](https://steampipe.io/docs/mods/mod-variables) that's passed as a parameter to the query and instantiated as `$1` in the query.

```hcl
variable "github_external_repo_names" {
  type        = list(string)
  description = "A list of community repositories to run checks for."
  default     = [ "francois2metz/steampipe-plugin-ovh", "theapsgroup/steampipe-plugin-vault", "francois2metz/steampipe-plugin-scalingo", "francois2metz/steampipe-plugin-gandi", "francois2metz/steampipe-plugin-airtable", "theapsgroup/steampipe-plugin-gitlab", "ellisvalentiner/steampipe-plugin-confluence", "theapsgroup/steampipe-plugin-keycloak" ]
}
```

Partial output from the query looks like this.

```
https://github.com/turbot/steampipe-plugin-slack      | ok     | turbot/steampipe-plugin-slack: Use SQL to instantly query users, channels, emoji and more from your Slack workspace. Open source CLI. No DB required.
https://github.com/theapsgroup/steampipe-plugin-vault | ok     | theapsgroup/steampipe-plugin-vault: Use SQL to instantly query Hashicorp Vault secrets, certs and more. Open source CLI. No DB required.             >
```

Note that standard descriptions aren't all identical! The control enforces a common pattern but each plugin describes its query targets in its own way.

You can run the control individually in your console like so:

```
steampipe check control.plugin_repo_description
```

More typically, you'll want to package sets of control as benchmarks.

```
benchmark "plugin" {
  title = "Plugin Repository Checks"
  children = [
    control.plugin_repo_description,
    control.plugin_repo_has_mandatory_topics,
    control.plugin_uses_semantic_versioning,
  ]
}
```

There's another benchmark for mods, which checks for monotonic versioning and a different set of mandatory tags. And there's a common benchmark as well.

```
benchmark "common" {
  title = "Common Repository Checks"
  children = [
    control.license_is_apache,
    control.repo_homepage_links_to_hub,
    control.repo_wiki_disabled,
    control.repo_projects_disabled,
    control.repo_is_public,
  ]
}
```

You can run `steampipe dashboard` and visit `localhost:9194` to view benchmarks and their controls in your browser. But of course you'll want to share such results with your team. So we run the benchmarks and controls in a Pipes pipeline, capture results as snapshots, and notify a Slack channel when each snapshot completes.

## Tracking open PRs and stale issues

Here's our dashboard report for open pull requests raised against plugins and mods.

![plugin and mod prs](/images/blog/2023-10-repo-management/open-prs-from-externals.png)

There are more PRs than these, but this report focuses our attention on those from contributors outside the `turbot` organization. Here's the query that finds that set of PRs. It searches for open PRs in `org:turbot` excluding those from members of the organization. The GitHub plugin makes this easy to do with nothing more than basic SQL.

```sql
select
  repository_full_name as "Repository",
  title as "Pull Request",
  now()::date - created_at::date as "Age in Days",
  now()::date - updated_at::date as "Last Updated (Days)",
  author ->> 'login' as "Author",
  url
from
  github_search_pull_request
where
  query = 'org:turbot is:open'
  and repository_full_name ~ 'turbot/steampipe-(plugin|mod)'
  and repository_full_name <> 'turbot/steampipe-plugin-sdk'
  and author ->> 'login' not in (
    select
      m.login as member_login
    from
      github_organization_member m
    where
      m.organization = 'turbot'
    )
order by
  "Age in Days" desc;
```

Naturally we want to see trends as well. For that we take advantage of the fact that these dashboards, running in a Pipes pipeline, produce daily snapshots. The data contained in those snapshots is accessible by way of the [Turbot Pipes](https://hub.steampipe.io/plugins/turbot/pipes) plugin. We use it to query the data in those snapshots and build this dashboard.

![trends](/images/blog/2023-10-repo-management/open-pr-issue-trends.png)

Here's the query that drives the `Open Plugin and Mod Pull Requests` chart.

```sql
select
  created_at as "Date",
  sum((r ->> 'Age in Days')::numeric) as "Total Days"
from
  pipes_workspace_snapshot,
  jsonb_array_elements(data -> 'panels' -> 'github_tracker.table.container_dashboard_github_open_plugin_mod_pull_request_report_anonymous_container_0_anonymous_table_0' -> 'data' -> 'rows') as r
where
  dashboard_name = 'github_tracker.dashboard.github_open_plugin_mod_pull_request_report'
group by
  created_at
order by
  created_at
```

The query finds all the snapshots for a dashboard, reaches into each snapshot's JSON, grabs the rows for a panel, then reports dates and the sum of `Age in Days` for each date. 

The dashboard that contains these charts, as well as reports like the one shown above, runs on a schedule in a Pipes pipeline.

![pipeline](/images/blog/2023-10-repo-management/issue-pr-pipeline.png)

These dashboards, packaged as mods that we [install](https://turbot.com/pipes/docs/mods#installing-custom-mods) into a team workspace, have helped us focus and prioritize. And the charts show us that were making good progress!

## Mix, stir, and serve hot

The ingredients of this solution include: the GitHub plugin (now [even more awesome](https://steampipe.io/blog/github-graphql)); the Pipes plugin; the plugin management, SQL, control execution, and dashboard services provided by Steampipe; and the snapshots, pipelines, notifications provided in a team workspace by Pipes. It's surprisingly easy to stir all these ingredients together, bind them with HCL and SQL, and deliver the views you need as steward of a growing community of open-source contributors. They're friendly folks, and you can meet them in our [Slack community](/community/join).






