IaC and DevSecOps: choosing the best tools for infrastructure code analysis and protection

Hello, readers! Anastasia Berezovskaya, an application development security engineer at Swordfish Security, is with you. Today we will once again talk about the features of static scanning, but this time we will shift the focus from software code to infrastructure code.

Partially, this issue was discussed in our colleague's article about the security of containerized applications in DevSecOps. This article will cover brief theoretical information about the “Infrastructure as Code” approach, the place of IaC security in the DevSecOps cycle, methods of static analysis of configuration files, and key features of working with the KICS tool.

Once again about IaC

In this section, we will provide a brief description of IaC. If you are already familiar with it, you can immediately move on to the part dedicated to security. And for those who are in the tank, or rather, not in the container, we will allocate part of the article to theory.

The use of the Infrastructure as Code (IaC) approach is no longer surprising: it is found in both new startups and large companies. One of its main advantages is reuse. Instead of manually repeating in the console, which is perfectly tailored to the technical task and meets all the best practices of command sequence settings, the engineer uses infrastructure management automation tools. The ‘install.sh’ script can be called the progenitor of this approach.

In the strict sense, IaC systems must meet a number of requirements. Three most important principles for such systems are highlighted:

  • Idempotency: the ability of an operation to produce the same result regardless of how many times it is performed.

  • Reproducibility: the same IaC code always creates identical infrastructure.

  • Composability: the ability to create complex systems from simpler, reusable components.

From a practical point of view, these principles are applicable just like the trio of “confidentiality, integrity, and availability” in information security. The theoretical base is like that. We will return to these points.

There are two common classifications of IaC: by functional purpose and by type of IaC language.

By functional purpose, that is, in relation to the tasks that IaC tools solve, they are divided into two groups. The first includes infrastructure management tools, or, as they are also called, orchestration tools. They create, modify, and destroy various infrastructure components, such as virtual machines, storage, and network components. Management can occur at the level of machine images — Packer, at the level of virtual machines — Cloudify, Terraform, Vargant, and at the level of container technologies — Docker Swarm, Kubernetes.

The second group includes configuration management tools. They install, update, and manage software in infrastructure components. This group includes, for example, Ansible, Chef, Puppet.

The second classification is much more common in the literature. It divides tools according to the fundamental paradigm of the IaC description language: declarative and imperative (sometimes also called procedural). Everyone has probably already encountered this in programming.

🤓 Interesting historical fact

When describing programming languages, we often use the word "paradigm." It acquired its modern meaning — the leading system of ideas and concepts, the initial conceptual scheme — only in the 1960s along with Thomas Kuhn's work "The Structure of Scientific Revolutions." Previously, this word meant "template," "example," "model" and was used more often in rhetoric, linguistics, and grammar.

Imperative approach: how to achieve the desired. The user defines the sequence of actions that the system will perform to achieve the desired state of the infrastructure. The main focus is on "how" this state is achieved. Imperative IaC languages include tools such as AWS CDK, Pulumi, Puppet, Ansible, and Chef.

Declarative approach: what should be achieved.The user sets the desired state of the system. Defining the sequence of steps to transform the infrastructure according to this state is the task of the IaC tool. Solutions like Terraform, AWS CloudFormation, and Azure Resource Manager use a declarative approach.

The second approach at first glance seems better. Firstly, configuration files reflect the current state of the infrastructure. Secondly, it is less labor-intensive, as most of the work is done automatically. However, the second approach provides greater manageability and flexibility. Detailed control over the process allows for more precise and specific results. That is why the imperative approach has not completely displaced the first one.

A bit of stuffiness: when introducing such a classification, it is more correct to talk about the paradigm of the description language itself, rather than the system. This distinction is related to how the language itself describes and manages the infrastructure.

There are two established trends regarding the use of languages. The first is the use of general-purpose programming languages (GPL) for managing infrastructure. Examples of this are Pulumi and AWS CDK, which support languages such as Java, JavaScript, and Go. The other trend is domain-specific languages (DSL), which are designed to model a specific domain, in this case, infrastructure. These include Terraform with HCL, CloudFormation, and Openstack, which use YAML or JSON.

The statistics we found on the prevalence of different types of IaC date back to 2019, and much has changed since then. At that time, the most popular tool was Docker with 59.0% usage among developers. The most popular cloud IaC tools were Ansible (52.2%), Chef (36.3%), and Terraform (34.1%). The study itself is quite interesting to read. It is based on 44 interviews with DevOps engineers who shared how they create and maintain IaC code, what practices they consider successful and problematic, and what pros and cons they see in existing tools.

👮 Security & IaC

General safety tips are not much different from any other area. To ensure the completeness of our article's research, let's briefly list them:

  • It is necessary to update the relevant software in a timely manner (in the current realities, with an amendment to the appropriate behavior of some foreign vendors 🙄).

  • It is necessary to monitor the use of external components (trend of reports from past years).

  • It is important to manage accounts correctly (account separation, privilege minimization — where would we be without strict authentication).

  • It is necessary to conduct logging and monitoring.

— Boring! I've seen it all before! 🥱

Indeed, high-level abstractions. And where are the features of the "as code" approach? Let's switch to an imperative description language. IaC concepts fit perfectly into DevOps. Therefore, for each stage of the well-known figure-eight, you can highlight points that are directly characteristic of IaC. This is described in detail in the article Best Practices for Securing Infrastructure as Code IaC in the DevOps SDLC.

The Guardian blog recommends following security frameworks.

Automation has reached the threat planning stage! Here you can mention a tool like IaCAssist from ThreatModeler. It integrates as a plugin into the development environment and transforms configuration files into full-fledged threat models. Analyzes them, identifies potential threats, and offers recommendations for improving security.

ThreatCanvas — a similar tool for cloud applications, particularly for AWS, Azure, and Google Cloud Platform (GCP). It has a free version called ThreatCanvas Lite. A more functional and aesthetically pleasing tool is IriusRisk, which already falls under the Security as Code approach. It also has a free version called IriusRisk Community Edition. Both tools have a built-in AI assistant that helps create a diagram from a verbal description. The latter can export a diagram in draw.io format. They both integrate with Jira. Both applications seemed like excellent options for quickly sketching threat models. However, both free versions are only available online, which may conflict with your company's internal security policy.

Unfortunately, we didn't get our hands on the paid versions, and we haven't come across any Open Source equivalents. If you have experience with similar tools or know of good open-source alternatives, please share them in the comments. ✍

Next, we move on to the more technical part. But first, for better understanding, let's describe what a typical day in the life of a DevOps specialist looks like.

  • The DevOps engineer writes an IaC script.

  • The DevOps engineer commits changes to the IaC script in the version control system (VCS).

  • The changes trigger static tests in the CI tool.

  • The CI tool incorporates the changes to the IaC script if the tests pass (and rejects the changes otherwise).

  • The CI tool deploys the updated script in the sandbox and/or test environment.

  • The CI tool deploys the updated IaC script on the target servers.

At this stage, we can finally start using the long-familiar advantages of application development, such as code reuse, collaboration, testing, and static code analysis.

According to Scheme 1, the first entity for analysis can be identified as a template. To bring it in order, we apply the methods already known to us: linters and static analyzers. Some authors identify secrets in templates as separate sub-entities, for which separate secret search tools can be used.

During the development stage, these checks are initially achieved locally using IDE plugins and Pre-commits hooks. In CI/CD, testing is repeated.

We will skip the consideration of further stages of the eight today and move on to a more detailed consideration of static analysis methods.

📑 Static analysis of templates

Here we will talk about the main methods of static analysis of infrastructure code. These methods have partially transitioned from static code analysis.

Code smells in the context of IaC are parts of the code that may signal potential problems with the quality, security, or maintainability of your infrastructure. They come in several types.

Implementation smells are specific coding practices in scripts. They can affect the readability and usability of the code. For example, long scripts or incorrect formatting.

Design smells are problems in the architecture or structure of scripts. They can make it difficult to maintain and scale the infrastructure. Examples include code duplication or overly complex configurations that are difficult to understand.

If the previous two types were more of a linters' task, then security smells are patterns that lead to vulnerabilities in the architecture. Secrets or the use of empty values for accounts are classic examples.

The search for "smells" is one of the most popular and well-studied topics in the field of IaC script analysis in the academic community. Researchers note that the latter type is quite common: in some projects, up to 32.9% of IaC scripts may contain at least one of them.

Code smells can be specific to certain frameworks (e.g., Puppet or Chef) or common to all IaC tools. Examples here can be Puppeteer-Link and FoodCritic tools, respectively. There are also tools such as IaCSec and GLITCH, which are aimed at finding flaws in multiple languages.

The wave of popularity of machine learning and LLM, in particular, has not passed by and has affected the topic we are discussing today. For example, the DeepIaC tool uses deep learning (convolutional neural networks) to detect anti-patterns in Ansible scripts. According to the creators, DeepIaC showed high accuracy when tested on a sample of more than 18,000 scripts extracted from GitHub repositories, with an accuracy ranging from 79% to 92%. Code and process metrics, which are used in machine learning algorithms, are also used to detect defects. They help create a "profile" of the code that can be used to predict the presence of defects. Methods such as Bag of Words (BOW) and TF-IDF. Random Forest can be encountered here. Their discussion is far beyond the scope of our article, but if you are interested, details are available in the article.

Another interesting tool is SecGuru, which is designed to analyze security policies in Azure cloud infrastructures. It is based on model checking, a formal verification method that allows you to automatically check the compliance of a system with given properties. SecGuru uses this technology to analyze access control policies (ACL) in Azure cloud infrastructures.

Unfortunately, despite the rigorous justification of the principles of these tools, they have not gone beyond scientific articles and are almost not found in industrial use. What tools do large companies use?

KICS for IaC security

The most popular tools are KICS from Checkmarx, Trivy from Aqua Security, and Checkov from Bridge. They are quite similar in their working principles. However, KICS seemed to be the most beneficial for us to use. It uses a rule library written in Rego. Rules can be added and modified. Moreover, this library turned out to be the most complete — it includes the main best practices recommended for Docker, Kubernetes, etc. In addition, the tool is actively supported by the community (unlike the previously mentioned projects).

According to the diagram in Figure 1, KICS can be used in three designated places:

  • IDE Plugin: KICS provides a plugin for VS Code.

  • Pre-commit Hooks: it can be easily included in existing checks.

  • CI/CD: the tool offers a large list of integrations — Azure Pipelines, Bamboo, Bitbucket Pipelines, CircleCI, Github Actions, GitLab CI, Jenkins, TeamCity, Travis, AWS CodeBuild — you surely found the one used in your company here.

Local KICS run is performed with a simple command:

docker run -t -v {path_to_scan}:/path checkmarx/kics scan -p /path

As a small test example, let's write a Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY . .

RUN pip install --no-cache-dir -r requirements.txt

EXPOSE 8000

CMD ["python", "app.py"]

And this is what the results will look like:

Results
Scanning with Keeping Infrastructure as Code Secure v2.1.3

Preparing Scan Assets: Done                                                     
Executing queries: [---------------------------------------------------] 100.00%

Healthcheck Instruction Missing, Severity: LOW, Results: 1
Description: Ensure that HEALTHCHECK is being used. The HEALTHCHECK instruction tells Docker how to test a container to check that it is still working
Platform: Dockerfile
CWE: 710
Learn more about this vulnerability: https://docs.kics.io/latest/queries/dockerfile-queries/b03a748a-542d-44f4-bb86-9199ab4fd2d5

	[1]: ../../path/Dockerfile:1

		001: FROM python:3.11-slim
		002: 
		003: WORKDIR /app

Missing User Instruction, Severity: HIGH, Results: 1
Description: A user should be specified in the dockerfile, otherwise the image will run as root
Platform: Dockerfile
CWE: 250
Learn more about this vulnerability: https://docs.kics.io/latest/queries/dockerfile-queries/fd54f200-402c-4333-a5a4-36ef6709af2f

	[1]: ../../path/Dockerfile:1

		001: FROM python:3.11-slim
		002: 
		003: WORKDIR /app

Results Summary:
CRITICAL: 0
HIGH: 1
MEDIUM: 0
LOW: 1
INFO: 0
TOTAL: 2

As we can see, it points to two obvious drawbacks of the image file: the lack of a successful launch check (healthcheck) and the lack of user reassignment.

Conclusion

According to the statistics collected by Snyk in 2021, 40% of companies do not use IaC testing in pipelines. At the same time, 32% of organizations believe that this practice can significantly improve the security of the company's infrastructure. Hence, the obvious recommendation is to use the advantages of IaC in security, where linters and static code analysis tools play an important role. They allow you to automatically identify potential security issues, non-compliance with best practices, and configuration errors at the development stage. According to research, the use of linters and static analysis can reduce the number of vulnerabilities in IaC by 70-80%.

The use of tools like KICS in combination with other IaC security practices can significantly increase the level of infrastructure security and reduce the risks associated with misconfiguration.

Comments