Tagging Data in LLM Prompts for More Reliable Output
What’s the best way for a prompt to operate on runtime data? Should we use string concatenation like we did for SQL before we knew better?
Langchain, for example, provides templates that support many types of models. If you use native APIs instead of an abstraction layer like Langchain, you have to learn the Best Way for every model. But even if you do use Langchain, you still have the problem of getting the team to use it consistently since LLMs are so adept at handling uncertainty.
Wishy washy instructions result in more randomness even with identical inputs, temperatures, p values, and random seeds.
Let’s ask ChatGPT
The attached conversation with GPT confirms my belief that variables and references to them should be explicit even when testing shows that it’s unnecessary. As with code, readability and fewer surprises usually trump everything else.
Let’s say a prompt says: “List the top cars in order of price and sales {cars} {sales}” and I provide a dictionary containing cars
and sales
keys, does the LLM know the difference between cars and sales automatically or should I add that information to the prompt explicitly?
In LangChain, when you create a prompt template and provide variables, the template will replace placeholders like {cars}
and {sales}
with the corresponding values from the dictionary you provide. However, the LLM does not inherently know the names of the variables unless they are explicitly mentioned in the prompt itself. The LLM only sees the final, rendered text after the placeholders have been filled in with their corresponding values.
Here’s a step-by-step breakdown of what happens:
- Template Creation: You define a prompt template, e.g.,
"List the top cars in order of price and sales {cars} {sales}"
. - Providing Variables: You provide a dictionary with the variables, e.g.,
{"cars": ["Car A", "Car B"], "sales": [100, 200]}
. - Rendering the Prompt: LangChain replaces
{cars}
and{sales}
in the template with the values from the dictionary. For example, the rendered prompt might look like:
List the top cars in order of price and sales Car A, Car B 100, 200
4. LLM Execution: The LLM receives the rendered prompt as text. It does not know that "Car A, Car B"
came from a variable named cars
and "100, 200"
came from a variable named sales
. The LLM just sees the plain text and processes it accordingly.
How to Ensure Clarity in the Prompt
If you need the LLM to distinguish between cars
and sales
, you should make sure the prompt is clear and unambiguous in the rendered form. For example, you could rewrite the template like this:
List the top cars in order of price and sales.
Cars: {cars}
Sales: {sales}
When filled, the prompt might look like:
List the top cars in order of price and sales.
Cars: Car A, Car B
Sales: 100, 200
This way, the LLM can clearly distinguish between the car names and sales numbers.