Ellis Valentiner is the lead data scientist at Virtual Facility, and author of two plugins: Confluence and WeatherKit.
When he first saw Steampipe on Hacker News, Ellis thought it sounded interesting but wasn't sure what problem he'd use it to solve. Soon, though, a first use case emerged: auditing IAM permissions for the company he was with at the time. He wanted to extract all the data for users, for their groups, and for roles attached to groups, then build a visual graph.
He'd already been doing things the hard way, using the AWS command line and Python/boto3 to retrieve JSON payloads and normalize them as database tables he could query and join with SQL. "That's when I remembered: Steampipe says it can select * from cloud, let's give that a shot." He immediately tossed all the bash and Python scripts. Coming from a data background, he was very familiar with SQL and found it a much easier and more natural approach.
SQL and Postgres benefits
He'd learned SQL in grad school, and says that while the language has continued to evolve, the fundamentals are still there. If you haven't used SQL in a while, he says, you might need a refresher, but it's easy to get up to speed. Beyond the API superpowers that Steampipe brings to the table, he stresses the universality of databases that support SQL. "It's one thing to have a data frame in R or Python on your machine," he says, "but eventually you've got to work with other people, and with data that doesn't fit on your machine."
Another benefit: Steampipe inherits Postgres' JSONB support. Data scientists more familiar with how R and Python handle JSON may not realize that the Postgres JSON functions have come a long way in recent years. "I use them all the time," he says, "and it's almost always faster than pulling the data out, massaging it in another language, then putting it back in -- you can go really far with the built-in JSON functions."
As a longtime R and Python user, he knows that you can fit a lot of data in memory nowadays, and you can do a lot with that in-memory data. "But what I've found using Steampipe," he says, "is that if I were to stay in the Python ecosystem I'd be making redundant API calls, and worrying about how to persist data across a series of steps in an analytics workflow." You don't always know in advance how you'll use the data sourced from APIs, and the database is a good environment in which to figure that out.
"Sometimes I'll persist Steampipe data in tables or materialized views," he says, "but it's also really nice to leverage the Steampipe cache which I use in all kinds of ways, including from Python or R, both of which can connect to Steampipe." A Postgres database is a very natural way to unify different data-oriented languages and apps. "Coming from the data world," he says, "I'm well aware that 99% of the work requires you to take data from here and put it there; it's amazing how much work that is; companies have teams dedicated to data-wrangling; it's so much simpler once you've got the data into a database."
The Confluence plugin
Every organization struggles with writing documentation and managing knowledge. At his last company, Confluence was the tool for that job. Ellis had been writing scripts to answer questions like "Which web pages haven't been updated in 6 months?" and "How consistent are the tags on documents?" He'd also been combining Confluence data with data from Jira and GitHub, to answer questions like: "Which tickets are references in pull requests?" or "Which doc pages reference those tickets?" Internal communication was much smoother when the team could answer these questions. Steampipe looked like the right tool for the job, but while there were plugins for Jira and GitHub, there wasn't yet one for Confluence. So he decided to create one.
This was Ellis' first experience with Go. He found it straightforward to learn from existing plugins, but realizes there's much more to learn. "There are so many features in Steampipe, and so many nuggets of info in the docs, it's hard to absorb all that info, or even know what you don't know, or what you should want to learn." Aspiring plugin writers take heart: Even authors of published plugins have lots to learn, and appreciate help from other Steampipe users and developers. "Everyone is so friendly and helpful," Ellis says, "I love being part of this community."
The WeatherKit plugin
Ellis' new plugin works with Apple's WeatherKit. Why did he create it? He's always been fascinated with weather data and, as a remote worker in a fully-distributed company, he'd like to see a dashboard that reports the weather for all the places where team members and partners are working.
Dashboards
As soon as he got started with dashboards as code, Ellis saw that this approach was more effective than using the dashboard features of languages like Python (via Jupyter) or R (via Shiny), or of tools like Tableau and Superset.
"The HCL + SQL combination is so powerful," he says. "It's not no-code, but it's incredibly low-code, there's very little boilerplate, it strips away all the unnecessary ceremony so I can focus on data and content." He soon realized that this approach wasn't only useful for data provided by Steampipe plugins, but would also be a great way to light up data that lives in other databases.
"For my current company," he says, "I'm building an internal dashboard, and Steampipe is perfect for what we need -- move quickly to proof-of-concept, then iterate by just editing queries and seeing results in realtime."
He doesn't see a need to build dashboards for compliance because "there's so much richness packed into the Steampipe compliance mods." And he thinks Steampipe Cloud is a good way to make those compliance dashboards easily available to the team in a shared space.
Thank you, Ellis, for all you've done with and for Steampipe. We look forward to your continuing adventures!
For anyone interested in developing Steampipe plugins and dashboards, please check out the guides for authors of plugins and dashboards, then bring your questions and comments to our friendly Slack community.