Querying AWS Athena Using Natural Language with Langchain and ChatGPT

Terris Linenbach
1 min readJan 8, 2024

--

Let’s use GPT-4 programmatically to query virtually any data that is accessible to AWS Athena (CSV files, SQL databases, Cloudwatch Logs, …) using natural language. This solution uses Langchain and Jupyter. There is also a more verbose Llamaindex version.

The questions asked don’t need to refer to table or column names. GPT4 does its best to generate SQL statements. For example, if your data is about sales, you can ask “which customers most recently made a purchase exceeding $2 million.” Perhaps you saw a similar demo (using Amazon Q) at re:Invent 2023 except they used a Bedrock model (probably Titan).

First, install Python 3 and Jupyter.

Second, install dependencies and specify your OpenAI API key:

python -m ensurepip
pip install --upgrade pip setuptools
pip install langchain langchain-openai sqlalchemy PyAthena

export OPENAI_API_KEY=YOUR-KEY-HERE

Finally, modify the notebook and run it.

Ignore the following warning which is the result of PyAthena remaining compatible with SQLAlchemy 1.x:

SADeprecationWarning: The dbapi() classmethod on
dialect classes has been renamed to import_dbapi().
backwards compatibility.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Terris Linenbach
Terris Linenbach

Written by Terris Linenbach

He/him. Coder since 1980. Always seeking the Best Way. CV: https://terris.com/cv

No responses yet

What are your thoughts?