Steampipe Table & Column Standards
- Naming
- Standard Columns
- Data Types
- Table and Column Descriptions
- Column Defaults and null
- Standardized Structure
Naming
Use snake_case for all table and column names.
Table names are in the format
{plugin}_{service}_{resource_type}
. Generally, table names should match the corresponding Terraform resource name.Use singular form (not plural) for table names, e.g.
aws_s3_bucket
, notaws_s3_buckets
.For columns derived from nested object fields, the column should contain the path, snake cased. For example
Foo.Bar.Baz
should will be in a column namedfoo_bar_baz
:"foo": {"bar": {"baz": "value"}}Use Terraform as a strong inspiration for field names, when to expand arrays, etc. Being consistent with Terraform is a desirable, and minimum position. Standard columns are an exception and should be consistent in our tables regardless of the Terraform name (they will very rarely conflict anyway)
When naming columns for which there is no direct equivalent:
- Where the field contains an arn or arns, explicitly suffix with
_arn
:- Good:
attached_policy_arns
- Bad:
attached_policies
- Good:
- Where the field contains an id, explicitly suffix with
_id
:- Good:
aws_account_id
- Bad:
aws_account
- Good:
- Where the field contains a name but references something that may also have an id or arn, explicitly suffix with
_name
:- Good:
role_name
- Bad:
role
- Good:
- Where the field contains an arn or arns, explicitly suffix with
Standard Columns
ALL tables that represent a resource should contain the following standard columns:
Column Name | Data Type | Description |
---|---|---|
title | ColumnType_STRING | The display name for this resource. |
akas | ColumnType_JSON | A JSON array of AKAs (also-known-as) that uniquely identify this resource. The format of the AKAs varies by plugin (arns in aws, resource paths for azure) but they must be unique and should be immutable. |
tags | ColumnType_JSON | The tags on this resource, as a map of key:value pairs. Many resources support tags, though not all in the same format. If the provider tags are in a different format, expose them in the native format in a tags_raw column, and convert them to key:value map in the tags column. When tags are simple labels with no key:value (like Github issue labels), use the format label:true . |
You may choose to define additional standard columns that are specific to your plugin as well, and it is recommended to do so when appropriate. For example, we define standard columns for our cloud provider plugins:
- AWS
partition
account_id
region
- Azure
subscription_id
resource_group
region
- Google
project
location
Data Types
Use the appropriate data type so that you can search and filter intelligently. Most of this is fairly self-explanatory but there are a couple items worth pointing out:
- Steampipe does not not support native Postgres arrays - use
ColumnType_JSON
for arrays - There are 2 valid IP address formats,
ColumnType_IPADDR
andColumnType_CIDR
which correspond to Postgres inet and cidr data types:- Use
ColumnType_IPADDR
for single ip address -10.11.12.13
. - Use
ColumnType_IPADDR
when a file can either be a single single ip address OR a cidr range -192.168.0.0/24
,10.11.12.13
. - Use
ColumnType_CIDR
for cidr ranges that are ALWAYS represented as a cidr -192.168.0.0/24
,10.11.12.13/32
. - The essential difference between
ColumnType_IPADDR
andColumnType_CIDR
data types is thatColumnType_IPADDR
accepts values with nonzero bits to the right of the netmask, whereasColumnType_CIDR
does not. For example,192.168.0.1
is valid forColumnType_IPADDR
but not forColumnType_CIDR
.
- Use
Table and Column Descriptions
- While technically optional, all tables and columns should contain a
Description
. This is added as a comment in the postgres schema and will be used:- To show more info within the cli in the
.inspect
command. - To generate help/reference documentation on hub.steampipe.io
- To show more info within the cli in the
- The descriptions should be pretty brief (1-2 sentences), and generally should be taken from the provider's API docs.
- The descriptions should start with a capital letter, and end with a period.
Column Defaults and null
In general, use null
when a field isn't present instead of setting a default.
Standardized Structure
Arrays should be stored in their native format as jsonb.
Fields containing an array of deep and important information (e.g. security group rules) may be expanded into a separate table. For example,
aws_vpc_security_groups
has an associated table ofaws_vpc_security_group_rules
. Use this model when the data is both important to query and large in scale.Cloud providers sometimes store data in an array, even if they only ever have one value (e.g. AWS Subnet IPv6 CIDR Associations). In this case, you may choose to expand to columns as if there was a single object.
- The original field (e.g. foo) should NOT be used, and should NOT have the full JSON array. Instead, we exclude the array data (it's noisy), but leave the field name available in case the provider actually uses an array in the future.
- Generally, nested object fields like Foo.Bar.Baz are stored as foo_bar_baz - see Naming
JSON objects should be stored as
ColumnType_JSON
(jsonb), not a delimited string. If the JSON contains sub-objects that are json as string, convert to json (for example inline policies in AWS roles).For JSON/YAML objects fields, if the raw format is also useful in itself (for example, the
template_body
inaws_cloudformation_stack
), you may choose to create 2 columns:fieldname_src
: The string representation asColumnType_STRING
.fieldname
: The object representation asColumnType_JSON
(for joining, querying, etc).
Some JSON/YAML fields may allow multiple schema formats to represent the same object. For example, AWS IAM policies allow you to specify an array of
Action
s, or a singleAction
as a string, and are not case sensitive. In such a case, it is often useful to convert all of these objects to the same format to simplify searching and filtering. In such a case, you should keep the original object format in thefieldname
column, and add an additionalfieldname_std
column in the standardized format.Some fields are base64 encoded in the cloud provider's API. These can be evaluated on a case-by-case basis, but generally they should be decoded - If someone wants the column, they more than likely want to view or search the decoded text.
Key columns should appear first, then the rest added alphabetically, then "standard" columns last. Note that help (
.inspect
, online docs) order the columns alphabetically regardless of the order in thecreate table
statement.