-
Notifications
You must be signed in to change notification settings - Fork 7
LLM Instructions Best Practices
Our goal is to have a set of LLM instructions and example context that allow for the generation of semantically and logically correct Terraform configurations for the Octopus Terraform provider.
This page outlines the best practices for working with LLMs.
To understand how we improved the output of LLMs, we must understand what we control. There are multiple ways to influence the output generated by an LLM, including:
- Examples - specifically, example Terraform configurations based on the Octopus Terraform provider
- Free text embedded in examples, such as description or script fields
- Custom system prompts
- LLM fine tuning - we don't do this yet, but may consider it once the Terraform provider reaches version 1.0
- LLM selection - we stick with one LLM for now (GPT 4.1), but do have the option of selecting others
All examples provided to the LLM are generated from Octopus resources in a sample space exported to Terraform using Octoterra. The exported Terraform is then passed to the LLM in the context of a prompt.
There are two types of sample Octopus resources:
- Opinionated and functional examples
- Non-functional, but semantically valid, examples that exist only to provide variety in the generated Terraform
Both types of sample Octopus resources are passed as context in the prompt to generate the desired output. The opinioned examples take precedence and are selected based on the prompt. The non-functional examples are passed to the LLM with every prompt to effectively document the Octopus Terraform provider.
We do not modify the exported Terraform. Octoterra can be updated to customise the generated Terraform. This is because we must be able to quickly re-export the sample Octopus space at any time, and custom modifications to the exported Terraform do not support this.
There are many free text fields available in Octopus resources, notably:
- Description or notes fields
- Script text
We can use these fields to embed additional instructions for the LLM. We are, in effect, "talking to" the LLM via the free text fields.
This is most useful when combined with the non-functional example resources.
For example, a step may be added to a non-functional example project to demonstrate an example in the property bag. The step's description can describe the resulting key/value pair in the property bag. The free text description is included in the generated Terraform configuration, providing additional context for the LLM.
Important
This approach is not suitable for the opinionated examples, as we do not want end users to see low-level instructions embedded in description fields.
System prompts are specifically formatted instructions passed to the LLM along side the sample Terraform configurations.
We use the strategies documented here as a base.
System prompts are used to provide high level descriptions of Terraform configurations, and to penalize undesirable behaviors.
We have three approaches to testing:
- Testing a baseline prompt that recreates the opinionated example
- Testing known variations of the opinionated example
- Generating random prompts with an LLM
This involves testing the output of a simple prompt like:
Create a Kubernetes project called "Web App"
This must recreate the example project verbatim.
We support common variations on the baseline prompt such as:
- Tenants
- Accounts (OIDC or password-based)
- Altering environment names
- Adding steps like smoke testing or Slack notifications
- Change project group
- Use a different feed
- Add a runbook called "whatever" with a step that "does something"
- Add variables
- Modifying scheduled project triggers
- Add channels (e.g., A hotfix channel)
Tip
The non-functional examples allow the LLM to implement variations on the opinionated example.
We can use LLMs to generate prompts that we then use to test our own implementation.
The purpose of these prompts is to validate that our tooling generating semantically valid Terraform configuration (i.e. Terraform configuration that can be successfully applied). It is OK if the generated prompts are "word salad" because we want to stress the LLM and see how it responds to random instructions.
Important
It is not expected that the sample prompts can generate logically valid Terraform configuration. The prompts will often ask for things that would be impossible for an Octopus expert to solve.
This is a prompt that can generate sample prompts:
Given these prompts:
'Create a Kubernetes project called "My K8s Project 2". Add a step to deploy a K8s deployment. Add a step to deploy a Kubernetes ClusterIP service. Use the "Security" lifecycle. Add a step to deploy a Kubernetes ingress resource. Enable variable debugging.'
'Create an Azure Web App project call "My Azure Project 1"`
Write 5 more prompts including a mix of:
feeds
accounts
steps
platforms like helm, kustomize, aws lambdas, azure web apps, azure functions, cloud formation, arm templates, bicep templates
lifecycles
project groups
project names
step run conditions
rollout strategies like blue/green and canary
variables
environments
target tags
packages
cloud providers
retention policies
inline scripts
script from packages
tenanted, untenanted, or tenanted and untenanted deployments
tenants
variable scopes
tenant tags
Do not reuse project names from previous responses.
Include a random number suffix on the project name from 1 to 10000.
Print each example in an individual markdown code block.