# Cloud resource tagging strategies for your organization

> Ensuring compliance, conformance, and accuracy in your organization's tagging practices

By Chris Farris
Published: 2023-04-14


In almost every organization I talk to, resource tagging is a management priority, yet it is nearly impossible for cloud operations and builders to get it right. Developers are focused on building; they don't live in the accounting system or know the appropriate cost center for an application.

While the builder might know the technical or business owners at the creation time, organizational changes mean that information quickly becomes outdated. Operation teams are reluctant to risk production changes just to update a cost center or contact tag. Tag information becomes less and less accurate, so operation teams put less effort into making it accurate.

How can we make tagging less of a Sisyphean effort? Regularly monitoring the status of tags  helps ensure they are both present, conformant, and, most importantly, accurate.

## What makes a good tagging strategy?

To be successful, tagging needs to be:
1. **Compliant** - e.g., does the tag exist?
2. **Conformant** - e.g., does the tag value reflect the form and function expected? If the `Owner` tag is supposed to be an email address, is it in the form of an email address?
3. **Accurate** - e.g., does the tag value accurately reflect a practical reality?

In my experience in dealing with security remediations or incident response, there are a few key tags you should have:
* **Name** - What is this resource?
* **Owner **- Who is the person who can make a business decision about a resource? This is critical if downtime needs to be scheduled or legal or contractual impacts may occur if the resource is compromised.
* **TechContact** - Who can answer specific technical questions about a resource? While the owner can answer questions about business impact, the Technical Contact should be able to answer questions about the resource or application's behavior and configuration.
* **Application** - What application does the resource belong to?
* **Environment** - Is this production, development, or something in between?
* **CostCenter** - How should the cost of this specific resource be allocated to break out the cloud bill?

Let's say we have an EC2 Instance that runs an AI-based video compression algorithm. Let’s also say the tags look like this:

Name: SonofAnton<br/>
Owner: Erlich Bachman<br/>
TechContact: bertram.gilfoyle@piedpiper.com<br/>
CostCenter: 12345<br/>
Environment: Production

This resource is not compliant with the tagging policy. It has the four required tags of `Name`, `Owner`, `TechContact`, `Environment`, and `CostCenter`, but it lacks an `Application` tag.

This resource is mostly conformant. The `Name` is freeform, the `TechContact` is in the form of an email address, and the `CostCenter` is a 5-digit number. The `Owner` tag is not conformant because it is not in the form of an email address. The `Environment` tag is not conformant if the allowed values are "dev, test, prod".

The accuracy of tags is where the rubber meets the road. Is `12345` an actual cost center, or was it added by an engineer who didn’t know what cost center to use and put in a conformant value so the pipeline would deploy? Is Erlich Bachman really the business owner? Can you reach him in an emergency while on hiatus in Tibet?

To address tagging compliance, the following query will identify all the S3 buckets that lack an `Owner` tag:
```sql
select
  arn, title
from
  aws_tagging_resource
where
  arn LIKE 'arn:aws:s3:%'
and
  tags ->> 'Owner' is Null
```
But the presence of an `Owner` tag is not enough. If your policy is that all contacts should be in the form of an email address, then the `Owner` tag of `Erlich` is not compliant. This query ensures that all `Owner` tags are in the form of an email address:

```sql
select
  arn, title,
  tags ->> 'Owner' as non_compliant_owner_tag
from
  aws_tagging_resource
where
  arn LIKE 'arn:aws:s3:%'
and
  (regexp_match(tags ->> 'Owner', '[\w]@[\w]')) [ 1 ] is Null
+--------------------------+-------------+-------------------------+
| arn                      | title       | non_compliant_owner_tag |
+--------------------------+-------------+-------------------------+
| arn:aws:s3:::fooli-test1 | fooli-test1 | Erlich                   |
+--------------------------+-------------+-------------------------+
```

Another way to ensure conformance and accuracy is to ensure the tag values are part of a canonical set of approved values. In the following two examples, we have a CSV export of our application portfolio (fooli_apm.fooli_apps) with the list of approved applications and their cost centers.

```csv
application,costcenter
MemeFactory,24601
Nuculus,31789
MemeRecommendationEngine,51961
```
This query returns S3 buckets that have invalid `Application` tag values:
```sql
select
  r.arn, r.title,
  r.tags ->> 'Application' as non_compliant_app_tag
from
  aws_tagging_resource as r
left join fooli_apm.fooli_apps as a
on r.tags ->> 'Application' = a.Application
where
  r.arn LIKE 'arn:aws:s3:%'
and a.Application is Null
+--------------------------+-------------+-----------------------+
| arn                      | title       | non_compliant_app_tag |
+--------------------------+-------------+-----------------------+
| arn:aws:s3:::fooli-test2 | fooli-test2 | meme-factory          |
+--------------------------+-------------+-----------------------+
```
The proper value for the meme factory is `MemeFactory`, not `meme-factory`

We can also ensure that applications and cost-center values align:
```sql
select
  r.arn, r.title,
  r.tags ->> 'Application' as Application,
  r.tags ->> 'CostCenter' as CostCenter,
  a.costcenter as CorrectCostCenter
from
  aws_tagging_resource as r
left join fooli_apm.fooli_apps as a
  on r.tags ->> 'Application' = a.Application
where
  r.arn LIKE 'arn:aws:s3:%'
and  a.costcenter != r.tags ->> 'CostCenter'
+--------------------------+-------------+-------------+------------+-------------------+
| arn                      | title       | application | costcenter | correctcostcenter |
+--------------------------+-------------+-------------+------------+-------------------+
| arn:aws:s3:::fooli-test1 | fooli-test1 | MemeFactory | 98765      | 24601             |
+--------------------------+-------------+-------------+------------+-------------------+
```

## Tagging Strategy or Tagging Strategies?

If you've spent time at a large organization, you know that mergers and acquisitions are a way of life. When one company acquires another, the chances of the tagging strategy of the acquired company matching that of their new corporate overlord is pretty much nil. The same problem occurs when a merger occurs. If the larger company buys a smaller company, the odds are combined entity will adopt the larger company's tagging standard. But, in the case of a merger, two companies of the same size have to reconcile their tagging strategy.

Can an enterprise maintain multiple tagging strategies? In most cases, the tags serve similar purposes. `Application` vs. `app`. `TechContact` vs. `CreatedBy`. `Owner` vs. `BusinessOwner`. Maybe one company prefers CamelCase while the other wants snake_case.

Should the business priority be to go and re-deploy all the resources to meet the new tagging standard, Or should the newly combined entity adopt a tagging Rosetta Stone? Can a company use either `app`, `application,` or `Application`?

This query uses the [COALESCE](https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-coalesce/) function to select a tag based on a prioritized list of possible tag keys.
```sql
select
  arn,
  tags ->> 'Name' as Name,
  COALESCE(
    tags ->> 'ExecutiveOwner',
    tags ->> 'executive_contact',
    tags ->> 'Owner'
  ) as BusinessOwner,
  COALESCE(
    tags ->> 'TechnicalContact',
    tags ->> 'tech_contact',
    tags ->> 'created_by'
  ) as TechnicalContact,
  COALESCE(
    tags ->> 'Application',
    tags ->> 'application',
    tags ->> 'app'
  ) as Application,
  COALESCE(
    tags ->> 'CostCenter',
    tags ->> 'cost_center',
    tags ->> 'billing_code'
  ) as CostCenter,
  _ctx ->> 'connection_name' AS AccountName
from
  aws_tagging_resource;
```

## Take back your tagging

Often organizations change as frequently as their cloud resources. Tagging standards come and go based on business needs. Important contacts move on or are reassigned. Regularly reviewing your tag keys and values is one way to tame the endless frustration of tagging. Pulling this data and [making it available to management or finance](https://steampipe.io/blog/siloed-data) is just one way Steampipe can help.


