Tagging Data in LLM Prompts for Repeatable Output

Terris Linenbach
2 min readSep 27, 2024

What’s the best way for a prompt to operate on runtime data? Should we use string concatenation like we did to inject data into SQL statements before we knew better?

Langchain, for example, provides templates that support many types of models. If you use native APIs instead of an abstraction layer like Langchain, you have to learn the Best Way for every model. But even if you use Langchain, you still need to get everyone on the team to use it consistently since LLMs are so adept at handling uncertainty.

Wishy washy instructions result in more randomness even with identical inputs, temperatures, p values, and random seeds.

Let’s ask ChatGPT

The attached conversation with GPT confirms my belief that variables and references to them should be explicit even when testing shows that it’s unnecessary. As with code, readability and fewer surprises usually trump everything else.

Let’s say a prompt says: “List the top cars in order of price and sales {cars} {sales}” and I provide a dictionary containing cars and sales keys, does the LLM know the difference between cars and sales automatically or should I add that information to the prompt explicitly?

In LangChain, when you create a prompt template and provide variables, the template will replace placeholders like {cars} and {sales} with the corresponding values from the dictionary you provide. However, the LLM does not inherently know the names of the variables unless they are explicitly mentioned in the prompt itself. The LLM only sees the final, rendered text after the placeholders have been filled in with their corresponding values.

Here’s a step-by-step breakdown of what happens:

  1. Template Creation: You define a prompt template, e.g., "List the top cars in order of price and sales {cars} {sales}".
  2. Providing Variables: You provide a dictionary with the variables, e.g., {"cars": ["Car A", "Car B"], "sales": [100, 200]}.
  3. Rendering the Prompt: LangChain replaces {cars} and {sales} in the template with the values from the dictionary. For example, the rendered prompt might look like:
List the top cars in order of price and sales Car A, Car B 100, 200

4. LLM Execution: The LLM receives the rendered prompt as text. It does not know that "Car A, Car B" came from a variable named cars and "100, 200" came from a variable named sales. The LLM just sees the plain text and processes it accordingly.

How to Ensure Clarity in the Prompt

If you need the LLM to distinguish between cars and sales, you should make sure the prompt is clear and unambiguous in the rendered form. For example, you could rewrite the template like this:

List the top cars in order of price and sales.
Cars: {cars}
Sales: {sales}

When filled, the prompt might look like:

List the top cars in 
order of price and sales.
Cars: {cars}
Sales: {sales}

This way, the LLM can clearly distinguish between the car names and sales numbers.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Terris Linenbach
Terris Linenbach

Written by Terris Linenbach

He/him. Coder since 1980. Always seeking the Best Way. CV: https://terris.com/cv

No responses yet

What are your thoughts?

Recommended from Medium

Lists

See more recommendations