The Google Sheets plugin provides two kinds of tables. First, there are data tables that map the data in each spreadsheet tab to a database table. In our sample sheet the tabs are "Dashboard", "Students", "Books", "Marks", and "Employees".
select * from "Marks"
These queries deliver results that correspond to what you see in the Marks tab of the sample sheet.
student | subject | score---------+---------+-------Bob | art | 70Bob | history | 70Sue | art | 89Bob | math | 77Bob | science | 71Sue | math | 87Sue | history | 89Sue | science | 88
The plugin also provides three introspection tables that describe: 1) the spreadsheet as a whole, 2) each tab, and 3) each cell across all tabs. Click these links to review them; you can also use .inspect
in the Steampipe CLI.
1 .inspect googlesheets_spreadsheet
Reports metadata about the spreadsheet, including creation time and permissions.
Reports metadata about each tab in the spreadsheet, including order in the sheet, charts, protected ranges, and merges.
Reports every cell in the spreadsheet, across all tabs, including row and column coordinates, values, and formulas.
Join spreadsheets with tables
If you're using the GitHub plugin, and you want to query a subset of the repos visible to you in the github-my-repository
table, one common pattern is to match repo names using a regular expression.
selectfull_namefromgithub_my_repositorywherefull_name ~ 'turbot/steampipe-(mod|plugin).+'
You can alternatively list the names in a Google Sheet.
selectfull_namefromgithub_my_repositoryjoinreposheet."Repos"using(full_name)
This pattern delivers a couple of benefits. First, it enables you to list names that aren't easy to capture with a regex. Second, it abstracts the list away from any particular query and makes it maintainable by people who can write to the spreadsheet but can't (or don't want to) edit queries.
The pattern extends to custom controls as well. In that case, though you can use mod variables to encode the list of names, such a list is again specific to that context and not easily team-accessible.
Suppose you want to enforce a license_is_apache rule for a set of GitHub repos listed in a column of a spreadsheet like so:
full_nameturbot/steampipe-plugin-githubturbot/steampipe-mod-github-sherlock...
Here's a custom control that joins github_my_repository
with the spreadsheet to check that rule.
control "license_is_apache" {title = "License is Apache 2.0"sql = <<EOTselecthtml_url as resource,casewhen license_spdx_id = 'Apache-2.0' then 'ok'else 'alarm'end as status,full_name || ' license is ' || license_name as reason,full_namefromgithub_my_repositoryjoinreposheet."Repos"using (full_name)EOT}
To run it:
steampipe check control.license_is_apache
Enforce named ranges
We often use spreadsheets as if they were databases, relying on implied referential integrity that does not actually exist. In Using SQL to check spreadsheet integrity we considered a spreadsheet used for event planning. Sessions referenced in one tab are intended to refer to sessions defined in another, but nothing guarantees they do. We showed how to write a custom control, sessions_valid_in_sessions_table
, to check for broken references.
There's another kind of referential integrity that spreadsheets aspire to but often fail to achieve: named ranges. You may intend the first thousand rows of column A in a sheet to represent a canonical list of repository names. But that intention only lives in your head until you give the range A1:A1000 a name and then refer to the range using the name.
Here's a sheet about Steampipe plugins that gets it half right. It groups plugins by schema type -- dynamic vs fixed -- and counts items in each group. The dynamic plugins are captured in a named range, and counted using that range. But the fixed plugins are captured and counted using raw cell addresses.
If you intend to use named ranges exclusively, the formula in cell D1 is an error. Here's a query to find that error.
selectcell,regexp_matches(formula,'\w\d+:\w\d+')asreffrom reposheet.googlesheets_cell
+------+------------+| cell | ref |+------+------------+| D1 | {C2:C1000} |+------+------------+
When named ranges do exist, you can of course use them in formulas like =counta(dynamic)
. And now you can also use them in SQL!
selectcount(*)fromreposheet.googlesheets_cellwhere range = 'dynamic'
+-------+| count |+-------+| 3 |+-------+
Find secrets
In Find secrets everywhere we showed how to use the code plugin's code_secret table to look for secrets in various AWS nooks and crannies.
The same method works with any column of any table provided by any plugin. If you've configured the Google Sheets plugin for a single spreadsheet, this query scans all of its tabs for secrets in values or formulas.
select*fromgooglesheets_cellwheresrc = value or src = formula
To scan multiple spreadsheets, use a connection aggregator to combine them.
connection "all_sheets" {type = "aggregator"plugin = "googlesheets"connections = ["sheet1", "sheet2", "sheet3"]}
Then use the all_sheets
connection to scan across all the sheets.
select*fromall_sheets.googlesheets_cellwheresrc = value or src = formula
SQL instead of formulas
The Dashboard tab of our sample spreadsheet summarizes the "Students" tab by grade.
Student GPAGPA Grade Count4.0 A 23.0 - 3.9 B 142.0 - 2.9 C 81.0 - 1.9 D 60 - 0.9 F 0
It does that with formulas, in column C, like =COUNTIFS(Students!H2:H31, "=4.0")
. With the googlesheets
plugin you can accomplish the same thing with a query.
select '4.0' as gpa, 'A' as grade, count(*) from "Students" where "GPA"::float = 4.0unionselect '3.0-3.9', 'B', count(*) from "Students" where "GPA"::float between 3.0 and 3.9unionselect '2.0-2.9', 'C', count(*) from "Students" where "GPA"::float between 2.0 and 2.9unionselect '1.0-1.9', 'D', count(*) from "Students" where "GPA"::float between 1.0 and 1.9unionselect '0-0.9', 'E', count(*) from "Students" where "GPA"::float between 0 and 0.9order by grade
This snippet of SQL is arguably both more readable and more maintainable than a collection of formulas scattered across multiple cells in a spreadsheet.
Pivot the Postgres way
To see the average GPA for students by home state and major, you'd ordinarily use a pivot table.
With Steampipe and the googlesheets
plugin you can do the same thing in SQL with the help of the built-in Postgres tablefunc
extension.
First, some context for Postgres extensions. Steampipe is an app that launches Postgres and activates a type of extension, called a foreign data wrapper, that converts plugin-sourced API data into database tables. But steamipe-postgres-fdw isn't the only extension you can activate in the Steampipe instance of Postgres. There's a long list of contributed extensions that users of Postgres, and thus also of Steampipe, can use. One of these, tablefunc, provides functions that you can use to create the same pivot table directly in SQL.
Here's the definition of the function we'll use.
crosstab ( *source_sql* text, *category_sql* text ) -> setof recordProduces a "pivot table" with the value columns specified by a second query.
Postgres functions, including those built in to the database and those you write yourself, can return primitive types such as integers, strings, dates, and arrays. But they can also return sets of records that behave, in SELECT contexts, just like tables, and crosstab
is that kind of function.
Postgres functions can share a name but differ with respect to their signatures, and that's also true for crosstab
. The variant we'll use here takes two parameters. Both are SQL snippets of type text
. source_sql selects the raw data for the pivot; category_sql selects the columns (Art, English, etc.).
Because source_sql and category_sql include quoted table names and column names, we'll use Postgres' dollar-quoted string constants to quote SQL snippets.
Here's the query.
select * from crosstab (-- SQL snippet to select raw data for the pivot$$select"Home State","Major",avg("GPA"::float)::numericfrom"Students"group by"Home State", "Major"order by"Home State", "Major"$$,-- SQL snippet to select categories$$select distinct"Major"from"Students"$$)-- definition of the output tableas table_def (state text,art numeric,english numeric,math numeric,physics numeric);
And here's the SQL output corresponding to the pivot table in the spreadsheet.
+-------+--------+---------+--------+---------+| state | art | english | math | physics |+-------+--------+---------+--------+---------+| AK | <null> | 1.6 | <null> | 2.4 || CA | <null> | 2.5 | <null> | 2.2 || FL | <null> | 3.4 | 3 | <null> || MA | 2.5 | <null> | 3 | <null> || MD | 2.2 | <null> | 1.9 | <null> || NC | <null> | 2.4 | <null> | 2.8 || NE | <null> | 2.7 | <null> | 4 || NH | <null> | 3.9 | <null> | 3.8 || NY | 2.7 | <null> | 3.6 | <null> || RI | 4 | <null> | 3.2 | <null> || SC | 1.3 | <null> | 3.6 | <null> || SD | 3.1 | <null> | 3.5 | <null> || WI | <null> | 1.5 | <null> | 2.9 |+-------+--------+---------+--------+---------+
Spreadsheets and SQL
We use spreadsheets to capture and analyze data. With Steampipe + Google Sheets you can decouple capture from analysis. Many will agree that a web-based spreadsheet is a great tool for collaborative data entry, but few will argue that it's the best tool for analysis. Data scientists often prefer R and Python; SQL is a powerful alternative too. With Steampipe you can use SQL to validate data entry, then do your analysis with readable and maintainable SQL code.
What about displaying the results of that analysis? If you're a user of Metabase, Tableau, PowerBI, Superset, or another visualization tool that connects to Postgres, any of these will work. But we hope you'll also try Steampipe Dashboards!