Querying AWS Athena Using Natural Language with Langchain and ChatGPT

Terris Linenbach
1 min readJan 8, 2024

Let’s use GPT-4 programmatically to query virtually any data that is accessible to AWS Athena (CSV files, SQL databases, Cloudwatch Logs, …) using natural language. This solution uses Langchain and Jupyter. There is also a more verbose Llamaindex version.

The questions asked don’t need to refer to table or column names. GPT4 does its best to generate SQL statements. For example, if your data is about sales, you can ask “which customers most recently made a purchase exceeding $2 million.” Perhaps you saw a similar demo (using Amazon Q) at re:Invent 2023 except they used a Bedrock model (probably Titan).

First, install Python 3 and Jupyter.

Second, install dependencies and specify your OpenAI API key:

python -m ensurepip
pip install --upgrade pip setuptools
pip install langchain langchain-openai sqlalchemy PyAthena

export OPENAI_API_KEY=YOUR-KEY-HERE

Finally, modify the notebook and run it.

Ignore the following warning which is the result of PyAthena remaining compatible with SQLAlchemy 1.x:

SADeprecationWarning: The dbapi() classmethod on
dialect classes has been renamed to import_dbapi().
backwards compatibility.

--

--