As Steampipe's new community lead I've been looking for ways to keep track of the blistering pace of activity in this young project. There are a lot of GitHub repos, let's focus particularly on plugins and mods. A GitHub search finds 75 results, including everything I'm looking for, but doesn't break them down by type.
I can use the github_my_repository table in the GitHub plugin to do that breakdown.
select count(*), 'plugins' as categoryfrom github_my_repository where full_name ~ 'turbot/steampipe-plugin'union select count(*), 'mods' as categoryfrom github_my_repositorywhere full_name ~ 'turbot/steampipe-mod';
count | category -------+--------- 24 | mods 51 | plugins
Good start!
How many open and closed issues?
Now let's look at open and closed issues across these categories.
with repo_names as ( select full_name from github_my_repository where full_name ~ 'turbot/steampipe-(plugin|mod)'),issues as ( select * from repo_names r join github_issue gi on r.full_name = gi.repository_full_name),open_issues as ( select i.full_name, count(*) as open_issues from issues i where i.closed_at is null group by i.full_name),closed_issues as ( select i.full_name, count(*) as closed_issues from issues i where i.closed_at is not null group by i.full_name)select case when o.full_name is not null then o.full_name else c.full_name end as full_name, o.open_issues, c.closed_issuesfrom open_issues o full join closed_issues c on o.full_name = c.full_nameorder by full_name;
full_name | open_issues | closed_issues -------------------------------------------+-------------+--------------- turbot/steampipe-mod-alicloud-compliance | 1 | 14 turbot/steampipe-mod-alicloud-thrifty | 2 | 3 turbot/steampipe-mod-aws-compliance | 5 | 129 turbot/steampipe-mod-aws-tags | | 1 turbot/steampipe-mod-aws-thrifty | 10 | 16 turbot/steampipe-mod-aws-top10 | 1 | turbot/steampipe-mod-azure-compliance | 3 | 34 turbot/steampipe-mod-azure-tags | | 1 turbot/steampipe-mod-azure-thrifty | 3 | 4 turbot/steampipe-mod-digitalocean-thrifty | 3 | 3 turbot/steampipe-mod-gcp-compliance | 1 | 23 turbot/steampipe-mod-gcp-labels | | 2 turbot/steampipe-mod-gcp-thrifty | | 5 turbot/steampipe-mod-github-sherlock | | 6 turbot/steampipe-mod-oci-compliance | 3 | 13 turbot/steampipe-mod-oci-tags | 1 | turbot/steampipe-mod-oci-thrifty | 3 | 5 turbot/steampipe-mod-zoom-compliance | | 1 turbot/steampipe-plugin-alicloud | 21 | 112 turbot/steampipe-plugin-aws | 12 | 349 turbot/steampipe-plugin-azure | 34 | 170 turbot/steampipe-plugin-azuread | 7 | 12 turbot/steampipe-plugin-bitbucket | 9 | 20
Reporting stale issues
We can use a control in the github-steampipe-mod-sherlock mod to run a quick check that flags issues older than 30 days.
steampipe check control.issue_older_30_days
Or we can use SQL to drill down for more insight. For example, what's the average age of open issues by repo?
with repo_names as ( select full_name from github_my_repository where full_name ~ 'turbot/steampipe-(plugin|mod)')select gi.repository_full_name, count(*), avg(extract(day from current_timestamp - gi.created_at))::int as avg_issue_days_openfrom github_issue gijoin repo_names ron gi.repository_full_name = r.full_namewhere gi.closed_at is nullgroup by gi.repository_full_nameorder by avg_issue_days_open desc;
repository_full_name | count | avg_issue_days_open -------------------------------------------+-------+--------------------- turbot/steampipe-plugin-cloudflare | 2 | 200 turbot/steampipe-plugin-equinix | 1 | 196 turbot/steampipe-plugin-chaos | 1 | 181 ... snip ... turbot/steampipe-plugin-turbot | 2 | 14 turbot/steampipe-mod-oci-tags | 1 | 7 turbot/steampipe-plugin-heroku | 1 | 5
Visualizing release cadence
How often are Steampipe plugins and mods updated? The tables we can use to answer this question include github_release, github_tag, and github_commit. We don't yet cut releases for most plugins and mods, but we can look at tags.
with repo_names as ( select full_name from github_my_repository where full_name ~ 'turbot/steampipe-(plugin|mod)'),issues as ( select * from repo_names r join github_issue gi on r.full_name = gi.repository_full_name),open_issues as ( select i.full_name, count(*) as open_issues from issues i where i.closed_at is null group by i.full_name),closed_issues as ( select i.full_name, count(*) as closed_issues from issues i where i.closed_at is not null group by i.full_name)select case when o.full_name is not null then o.full_name else c.full_name end as full_name, o.open_issues, c.closed_issuesfrom open_issues o full join closed_issues c on o.full_name = c.full_nameorder by full_name;
full_name | open_issues | closed_issues -------------------------------------------+-------------+--------------- turbot/steampipe-mod-alicloud-compliance | 1 | 14 turbot/steampipe-mod-alicloud-thrifty | 2 | 3 turbot/steampipe-mod-aws-compliance | 5 | 129 turbot/steampipe-mod-aws-tags | | 1 turbot/steampipe-mod-aws-thrifty | 10 | 16 turbot/steampipe-mod-aws-top10 | 1 | turbot/steampipe-mod-azure-compliance | 3 | 34 turbot/steampipe-mod-azure-tags | | 1 turbot/steampipe-mod-azure-thrifty | 3 | 4 turbot/steampipe-mod-digitalocean-thrifty | 3 | 3 turbot/steampipe-mod-gcp-compliance | 1 | 23 turbot/steampipe-mod-gcp-labels | | 2 turbot/steampipe-mod-gcp-thrifty | | 5 turbot/steampipe-mod-github-sherlock | | 6 turbot/steampipe-mod-oci-compliance | 3 | 13 turbot/steampipe-mod-oci-tags | 1 | turbot/steampipe-mod-oci-thrifty | 3 | 5 turbot/steampipe-mod-zoom-compliance | | 1 turbot/steampipe-plugin-alicloud | 21 | 112 turbot/steampipe-plugin-aws | 12 | 349 turbot/steampipe-plugin-azure | 34 | 170 turbot/steampipe-plugin-azuread | 7 | 12 turbot/steampipe-plugin-bitbucket | 9 | 20
There are no dates available via the API wrapped by github_tag
but we can join with github_commit
to find them.
with tags as ( select repository_full_name, commit_sha, name from github_tag where repository_full_name = 'turbot/steampipe-plugin-github' order by commit_sha ),commits as ( select repository_full_name, sha, committer_datefrom github_commitwhere repository_full_name = 'turbot/steampipe-plugin-github'order by sha)select t.repository_full_name, t.commit_sha, t.name, c.committer_datefrom tags tjoin commits con c.sha = t.commit_shaorder by t.name desc, c.committer_date desc;
repository_full_name | commit_sha | name | committer_date --------------------------------+------------------------------------------+--------+--------------------- turbot/steampipe-plugin-github | 4514cf7450dc545c4051aa94f801016f9b2098a7 | v0.6.1 | 2021-09-23 12:31:07 turbot/steampipe-plugin-github | 49ce3d89293e2f28997b96f0dfb1879c1382d3bd | v0.6.0 | 2021-09-09 07:47:37 turbot/steampipe-plugin-github | 8771752e9b13fa8d59ba73433a756a8707de8a1d | v0.5.1 | 2021-06-06 01:24:22 turbot/steampipe-plugin-github | 5dc64b75886936c793d8dd6dcb5182ec3ee612cd | v0.5.0 | 2021-05-27 18:01:57 turbot/steampipe-plugin-github | faf4b4fcafc3bcbf6f92de8f2f751c77b7acd80d | v0.4.0 | 2021-05-15 18:57:00 turbot/steampipe-plugin-github | bc4a26a64b23972e1a4891f31cf6d6bef90a1f66 | v0.3.0 | 2021-04-30 21:18:25 turbot/steampipe-plugin-github | 4d05c6efdd391cb7f02c00ac89fd5fcd175c49c1 | v0.2.0 | 2021-03-18 20:08:54 turbot/steampipe-plugin-github | 43173a8434a9ddb8c0c6de375c2a70757adbd4cd | v0.1.1 | 2021-02-25 16:24:43 turbot/steampipe-plugin-github | c542c3d53e33a2676a1df75f22911b54526989c0 | v0.1.0 | 2021-02-18 19:11:30 turbot/steampipe-plugin-github | bd54547306edf15dc0ea73d1b7b090e2467da765 | v0.0.5 | 2021-01-28 18:49:51
Let's have some fun with this data. How about visualizing it on a timeline? There are lots of charting libraries that can do timelines, we'll arbitrarily pick the ApexCharts implementation. It uses a structure like this:
data: [ { x: 'Code', y: [ new Date('2019-03-02').getTime(), new Date('2019-03-04').getTime() ] }, { x: 'Test', y: [ new Date('2019-03-04').getTime(), new Date('2019-03-08').getTime() ] } ]
You might not think of Postgres as the tool of choice for wrangling data into this format, but watch. First, capture the above query as a view.
create view timeline_data as ( with tags as ( select repository_full_name, commit_sha, name from github_tag where repository_full_name = 'turbot/steampipe-plugin-github' order by commit_sha ), commits as ( select repository_full_name, sha, committer_date from github_commit where repository_full_name = 'turbot/steampipe-plugin-github' order by sha ) select t.repository_full_name, t.commit_sha, t.name, c.committer_date from tags t join commits c on c.sha = t.commit_sha order by t.name desc, c.committer_date desc);
To produce start and end dates:
select name, to_char(committer_date, 'YYYY-MM-DD') as start, lag(to_char(committer_date,'YYYY-MM-DD')) over () as endfrom timeline_data;
name | start | end --------+------------+------------ v0.6.1 | 2021-09-23 | v0.6.0 | 2021-09-09 | 2021-09-23 v0.5.1 | 2021-06-06 | 2021-09-09 v0.5.0 | 2021-05-27 | 2021-06-06 v0.4.0 | 2021-05-15 | 2021-05-27 v0.3.0 | 2021-04-30 | 2021-05-15 v0.2.0 | 2021-03-18 | 2021-04-30 v0.1.1 | 2021-02-25 | 2021-03-18 v0.1.0 | 2021-02-18 | 2021-02-25 v0.0.5 | 2021-01-28 | 2021-02-18 v0.0.4 | 2021-01-22 | 2021-01-28 v0.0.2 | 2021-01-21 | 2021-01-22 v0.0.1 | 2021-01-21 | 2021-01-21
To transform that data into the JSON objects used by the timeline:
with raw_data as ( select name, to_char(committer_date, 'YYYY-MM-DD') as start, lag(to_char(committer_date,'YYYY-MM-DD')) over () as end from timeline_data),raw_json as ( select json_build_object( 'x', rd.name, 'y', array[rd.start, rd.end] ) as object from raw_data rd),cooked_json as ( select concat( '{ ' , 'x: "' , (object->>'x') , '", ' , 'y: ' , '[new Date("' , ( replace(object->'y'->>0,'"','')) , '").getTime()' , ',' , 'new Date("' , ( replace(object->'y'->>1,'"','')) , '").getTime()' , ']' , '}, ' ) as timeline_object from raw_json)select replace(timeline_object, '""', 'Date.now()') as timeline_datafrom cooked_json;
timeline_data ------------------------------------------------------------------------------------------ { x: "v0.6.1", y: [new Date("2021-09-23").getTime(),new Date(Date.now()).getTime()]}, { x: "v0.6.0", y: [new Date("2021-09-09").getTime(),new Date("2021-09-23").getTime()]}, { x: "v0.5.1", y: [new Date("2021-06-06").getTime(),new Date("2021-09-09").getTime()]}, { x: "v0.5.0", y: [new Date("2021-05-27").getTime(),new Date("2021-06-06").getTime()]}, { x: "v0.4.0", y: [new Date("2021-05-15").getTime(),new Date("2021-05-27").getTime()]}, { x: "v0.3.0", y: [new Date("2021-04-30").getTime(),new Date("2021-05-15").getTime()]}, { x: "v0.2.0", y: [new Date("2021-03-18").getTime(),new Date("2021-04-30").getTime()]}, { x: "v0.1.1", y: [new Date("2021-02-25").getTime(),new Date("2021-03-18").getTime()]}, { x: "v0.1.0", y: [new Date("2021-02-18").getTime(),new Date("2021-02-25").getTime()]}, { x: "v0.0.5", y: [new Date("2021-01-28").getTime(),new Date("2021-02-18").getTime()]}, { x: "v0.0.4", y: [new Date("2021-01-22").getTime(),new Date("2021-01-28").getTime()]}, { x: "v0.0.2", y: [new Date("2021-01-21").getTime(),new Date("2021-01-22").getTime()]}, { x: "v0.0.1", y: [new Date("2021-01-21").getTime(),new Date("2021-01-21").getTime()]},
Paste that data into a copy of the ApexCharts timeline example, and voilà!
If you'd rather do that bit of data wrangling another way, Postgres can oblige. For example, the pl/python extension can work in Steampipe, and you can do lots with it. The main advantage here would be easier string-building. If there's interest in pl/python I can explore it in a future post.
Finding external contributors
Open source projects are often sponsored by companies whose employees are the main contributors. When others contribute issues or commits it's worth celebrating, but how to keep track of those people? The GitHub plugin has the all ingredients needed to do that. In github_my_organization we can find user logins for all the GitHub organizations we belong to; in github_commit we can find commits to a repo; in github_issue we can find issues filed for a repo.
To find external contributors, I started with a list of GitHub user logins for the turbot
organization where the Steampipe plugins and mods live. Since some contributions track to users in the related turbotio
org, the query to find all internal contributors looks like this.
select g.name, g.login, jsonb_array_elements_text(g.member_logins) as member_loginfrom github_my_organization gwhere g.login = any( array['turbot', 'turbotio'] );
name | login | member_login --------------------+----------+------------------- Turbot | turbot | bigdatasourav Turbot | turbot | binaek Turbot | turbot | cbruno10 Turbot | turbot | dboeke
To find committers in a repo:
select distinct author_login from github_commit where repository_full_name = 'turbot/steampipe-plugin-aws';
To find external committers, subtract one list from the other.
with committers as ( select distinct g.repository_full_name, g.author_login from github_commit g where g.repository_full_name = 'turbot/steampipe-plugin-aws'),users_in_excluded_orgs as ( select g.name, g.login, jsonb_array_elements_text(g.member_logins) as member_login from github_my_organization g where g.login = any(array['turbotio','turbot']))select c.repository_full_name, c.author_loginfrom committers cwhere not exists ( select c.author_login from users_in_excluded_orgs u where c.author_login = u.member_login )order by lower(c.author_login);
full_name | author_login -----------------------------+----------------- turbot/steampipe-plugin-aws | davidhammturner turbot/steampipe-plugin-aws | fitchtravis turbot/steampipe-plugin-aws | gazoakley turbot/steampipe-plugin-aws | jackdelab turbot/steampipe-plugin-aws | jzendle turbot/steampipe-plugin-aws | kpapagno turbot/steampipe-plugin-aws | RyanJarv turbot/steampipe-plugin-aws | sankeyraut turbot/steampipe-plugin-aws | Tucker-Eric
The actual list was longer because it included some GitHub logins for people who were, but are no longer, members of one of the organizations in play. If there's an automated way to cull that list please let me know. Meanwhile it's not too hard to eyeball the list and delete former employees.
To summarize activity for each external contributor, we'll now combine all the ingredients we've seen so far.
with committers as ( select distinct g.repository_full_name, g.author_login from github_commit g where g.repository_full_name = 'turbot/steampipe-plugin-aws'),users_in_excluded_orgs as ( select g.name, g.login, jsonb_array_elements_text(g.member_logins) as member_login from github_my_organization g where g.login = any(array['turbotio','turbot'])),external_contributors as ( select c.repository_full_name, c.author_login from committers c where not exists ( select c.author_login from users_in_excluded_orgs u where c.author_login = u.member_login )),commits as ( select g.repository_full_name, g.author_login, count(*) as commits from github_commit g join external_contributors e on g.repository_full_name = e.repository_full_name and g.author_login = e.author_login group by g.repository_full_name, g.author_login),issues as ( select g.repository_full_name, g.author_login, count(*) as issues from github_issue g join external_contributors e on g.repository_full_name = e.repository_full_name and g.author_login = e.author_login group by g.repository_full_name, g.author_login)select c.repository_full_name, c.author_login, c.commits, i.issuesfrom commits cleft join issues ion c.repository_full_name = i.repository_full_name and c.author_login = i.author_loginorder by lower(c.author_login);
repository_full_name | author_login | commits | issues -----------------------------+-----------------+---------+-------- turbot/steampipe-plugin-aws | davidhammturner | 1 | turbot/steampipe-plugin-aws | fitchtravis | 1 | 1 turbot/steampipe-plugin-aws | gazoakley | 3 | turbot/steampipe-plugin-aws | jackdelab | 1 | 3 turbot/steampipe-plugin-aws | jzendle | 1 | 1 turbot/steampipe-plugin-aws | kpapagno | 1 | 1 turbot/steampipe-plugin-aws | RyanJarv | 1 | 1 turbot/steampipe-plugin-aws | sankeyraut | 1 | 4 turbot/steampipe-plugin-aws | Tucker-Eric | 1 |
It's nice to see this data at a glance! Next, of course, we'll want to project the view across all the repos to which external contributors have filed issues or made commits. That'll make it easier to find and thank davidhammturner, fitchtravis, gazoakley, and everyone else who's contributed to Steampipe. When repeat contributors show up we'll immediately know their prior context, and we'll be able to extend a special welcome to new folks.
The code behind this concise view is, admittedly, not so easy to see at a glance. In Postgres there are good ways to improve the readability of a long query. We've seen an example of using a view to encapsulate a live query. There are also materialized views that cache slow queries to disk for immediate recall. And you can write functions to augment the chunking power that CTEs deliver. If you write a lot of SQL for use in Steampipe, you'll be able to achieve a high standard of readibility and modularity. But that's a topic for another post.
The bottom line
You can use Steampipe's GitHub plugin to build compelling views of activity in your open source project, and you can use that information to showcase and connect with your contributors. Try these recipes for yourself, and let us know what you discover!