LLM Instructions Best Practices

Our goal is to have a set of LLM instructions and example context that allow for the generation of semantically and logically correct Terraform configurations for the Octopus Terraform provider.

This page outlines the best practices for working with LLMs.

What we Control

To understand how we improved the output of LLMs, we must understand what we control. There are multiple ways to influence the output generated by an LLM, including:

Examples - specifically, example Terraform configurations based on the Octopus Terraform provider
Free text embedded in examples, such as description or script fields
Custom system prompts
LLM fine tuning - we don't do this yet, but may consider it once the Terraform provider reaches version 1.0
LLM selection - we stick with one LLM for now (GPT 4.1), but do have the option of selecting others

Modifying Examples

All examples provided to the LLM are generated from Octopus resources in a sample space exported to Terraform using Octoterra. The exported Terraform is then passed to the LLM in the context of a prompt.

There are two types of sample Octopus resources:

Opinionated and functional examples
Non-functional, but semantically valid, examples that exist only to provide variety in the generated Terraform

Both types of sample Octopus resources are passed as context in the prompt to generate the desired output. The opinioned examples take precedence and are selected based on the prompt. The non-functional examples are passed to the LLM with every prompt to effectively document the Octopus Terraform provider.

We do not modify the exported Terraform. Octoterra can be updated to customise the generated Terraform. This is because we must be able to quickly re-export the sample Octopus space at any time, and custom modifications to the exported Terraform do not support this.

Instructions in Free Text

There are many free text fields available in Octopus resources, notably:

Description or notes fields
Script text

We can use these fields to embed additional instructions for the LLM. We are, in effect, "talking to" the LLM via the free text fields.

This is most useful when combined with the non-functional example resources.

For example, a step may be added to a non-functional example project to demonstrate an example in the property bag. The step's description can describe the resulting key/value pair in the property bag. The free text description is included in the generated Terraform configuration, providing additional context for the LLM.

Important

This approach is not suitable for the opinionated examples, as we do not want end users to see low-level instructions embedded in description fields.

Custom System Prompts

System prompts are specifically formatted instructions passed to the LLM along side the sample Terraform configurations.

We use the strategies documented here as a base.

System prompts are used to provide high level descriptions of Terraform configurations, and to penalize undesirable behaviors.

Testing Strategies

We have three approaches to testing:

Testing a baseline prompt that recreates the opinionated example
Testing known variations of the opinionated example
Generating random prompts with an LLM

Testing the Baseline

This involves testing the output of a simple prompt like:

Create a Kubernetes project called "Web App"

This must recreate the example project verbatim.

Testing Known Variations

We support common variations on the baseline prompt such as:

Tenants
Accounts (OIDC or password-based)
Altering environment names
Adding steps like smoke testing or Slack notifications
Change project group
Use a different feed
Add a runbook called "whatever" with a step that "does something"
Add variables
Modifying scheduled project triggers
Add channels (e.g., A hotfix channel)

Tip

The non-functional examples allow the LLM to implement variations on the opinionated example.

Generating Prompts with LLMs

We can use LLMs to generate prompts that we then use to test our own implementation.

The purpose of these prompts is to validate that our tooling generating semantically valid Terraform configuration (i.e. Terraform configuration that can be successfully applied). It is OK if the generated prompts are "word salad" because we want to stress the LLM and see how it responds to random instructions.

Important

It is not expected that the sample prompts can generate logically valid Terraform configuration. The prompts will often ask for things that would be impossible for an Octopus expert to solve.

This is a prompt that can generate sample prompts:

Given these prompts:

'Create a Kubernetes project called "My K8s Project 2". Add a step to deploy a K8s deployment. Add a step to deploy a Kubernetes ClusterIP service. Use the "Security" lifecycle. Add a step to deploy a Kubernetes ingress resource. Enable variable debugging.'

'Create an Azure Web App project call "My Azure Project 1"`

Write 5 more prompts including a mix of:

feeds
accounts
steps
platforms like helm, kustomize, aws lambdas, azure web apps, azure functions, cloud formation, arm templates, bicep templates
lifecycles
project groups
project names
step run conditions
rollout strategies like blue/green and canary
variables
environments
target tags
packages
cloud providers
retention policies
inline scripts
script from packages
tenanted, untenanted, or tenanted and untenanted deployments
tenants
variable scopes
tenant tags

Do not reuse project names from previous responses.

Include a random number suffix on the project name from 1 to 10000.

Print each example in an individual markdown code block.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM Instructions Best Practices

What we Control

Modifying Examples

Instructions in Free Text

Custom System Prompts

Testing Strategies

Testing the Baseline

Testing Known Variations

Generating Prompts with LLMs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally