It's Just SQL!
Steampipe leverages PostgreSQL Foreign Data Wrappers to provide a SQL interface to external services and systems. Steampipe uses an embedded PostgreSQL database (currently, version 14.2.0), and you can use standard Postgres syntax to query Steampipe.
Basic SQL
Like most popular relational databases, Postgres complies with the ANSI SQL standard - If you know SQL, you already know how to query Steampipe!
You can query all the columns in a table:
select * from aws_ec2_instance;
This is inefficient though -- you should only query the columns that you need. This will save Steampipe from making API calls to gather data that you don't want anyway:
selectinstance_id,instance_type,instance_statefromaws_ec2_instance;
You can filter rows where columns only have a specific value:
selectinstance_id,instance_type,instance_statefromaws_ec2_instancewhereinstance_type = 't2.small';
or a range of values:
selectinstance_id,instance_type,instance_statefromaws_ec2_instancewhereinstance_type in ('t2.small', 't2.micro');
or match a pattern:
selectinstance_id,instance_type,instance_statefromaws_ec2_instancewhereinstance_type like '%small';
You can filter on multiple columns, joined by and
or or
:
selectinstance_id,instance_type,instance_statefromaws_ec2_instancewhereinstance_type = 't2.small'and instance_state = 'stopped';
You can sort your results:
selectname,runtime,memory_sizefromaws_lambda_functionorder byruntime;
You can sort on multiple columns, ascending or descending:
selectname,runtime,memory_sizefromaws_lambda_functionorder byruntime asc,memory_size desc;
You can group and use standard aggregate functions. You can count results:
selectruntime,count(*)fromaws_lambda_functiongroup byruntimeorder bycount desc;
or sum them:
selectruntime,sum(memory_size)fromaws_lambda_functiongroup byruntime;
or find min, max, and average:
selectruntime,min(memory_size),max(memory_size),avg(memory_size)fromaws_lambda_functiongroup byruntime;
You can exclude duplicate rows:
select distinctinstance_typefromaws_all.aws_ec2_instance_type
or exclude all but one matching row:
select distinct on (name)name,log_group_namefromaws_all.aws_cloudwatch_log_stream
Of course the real power of SQL is in combining data from multiple tables!
You can join tables together on a key field. When doing so, you may need to alias the tables (with as
) to disambiguate them:
selectinstance.instance_id,instance.subnet_id,subnet.availability_zonefromaws_ec2_instance as instancejoin aws_vpc_subnet as subnet on instance.subnet_id = subnet.subnet_id;
You can use outer joins (left, right, or full) when you want to find non-matching rows as well. For example to see all your volumes and the number snapshots from them:
selectv.volume_id,count(s.snapshot_id) as snapshot_countfromaws_ebs_volume as vleft join aws_ebs_snapshot as s on v.volume_id = s.volume_idgroup byv.volume_id;
or to find snapshots from volumes that no longer exist:
selects.snapshot_id,s.volume_idfromaws_ebs_volume as vright join aws_ebs_snapshot as s on v.volume_id = s.volume_idwherev.volume_id is null;
You can use union queries to combine datasets. Note that union all
is much more efficient if you don't need to eliminate duplicate rows.
selectname,arn,account_idfromaws_iam_roleunion allselectname,arn,account_idfromaws_iam_userunion allselectname,arn,account_idfromaws_iam_group;