Using ChatGPT as a Data Transformation IDE
Using ChatGPT to transform a loosely structured spreadsheet into CSV-compatible data.
The Problem
Consider the following Excel spreadsheet containing tasks exported from a Monday.com workspace:
NFL
Quarterback Development
Name Subitems Person Status Date Priority Quick Notes Timeline - Start Timeline - End Item ID (auto generated)
Practice passing Joe Namath, Joe Montana Working on it 2023-12-30 High 2023-09-05 2023-10-28 5165798149
This spreadsheet is challenging to process because it contains semi-structured data. “NFL,” which is the name of the workspace, can be ignored. “Quarterback Development” is the name of a Kanban board. Any number of tasks may appear below it. The spreadsheet might contain multiple boards. Finally, the Person column contains a comma-separated list of names.
I would like to transform this spreadsheet into a CSV file that Atlassian’s Jira can import, such as:
Category,Summary,Assignee,Status,Start date,Due date
Quarterback Development,Practice passing,joe@jets.com,In Progress,2023-09-05,2023-10-28
Quarterback Development,Practice passing,joe@49ers.com,In Progress,2023-09-05,2023-10-28
Jira issues may have at most one assignee. Therefore, while the example spreadsheet contains one issue, the CSV file contains one row for each assignee.
A Computationally Expensive Experiment Which is Probably Mostly Bad For Earthlings on Multiple Levels
I could write a script that transforms this spreadsheet, but it’s tricky. My first instinct is to export the spreadsheet to a CSV file. However, since the Kanban board names appear on their own row, that is not possible, so I would need to transform the file. I don’t want to manually change the file — that’s boring! I could write a macro but I have bad memories of coding in Visual Basic in 1998 so let’s not go there.
I don’t seem to have any option but to find a library that can read Excel spreadsheets and start coding.
Is ChatGPT up to the task? Can I paste the spreadsheet as-is into ChatGPT and transform it?
Upon pasting the spreadsheet into ChatGPT, ChatGPT cheerfully summarizes the input. So far so good, but I need to tell ChatGPT to transform the data. After much trial and error, I arrive at the following prompt:
The next message you receive will contain tables.
The name of each table appears in the first row
above it. Output this name in a column named
Category.
Rename the "Timeline - End" column to "Due date".
Remove commas and double quotes from the Name
column.
Rename the "Timeline - Start" column to
"Start date".
The Person column may contain multiple names. Output
each name in its own row.
For the Person column, change the following values:
From "Joe Namath" to "joe@jets.com”
From "Joe Montana" to "joe@49ers.com".
Rename the Person column to Assignee.
For the Status column, change the following values:
From "Backlog" to "To Do"
From "Backlog - Tackle this Month" to "To Do"
From "Canceled" to "Done"
From "Working on it" to "In Progress"
From "Stuck" to "To Do".
Rename the Name column to Summary.
Output as a single table in CSV format.
If a value is blank, output a comma unless it’s the
last column.
Do not output the following columns:
Subitems, Priority,
Quick Notes, Date, Item ID.
Repeat.
The punctuation and whitespace is for my own sanity. ChatGPT ignores it.
The above spreadsheet is a simple example. The real spreadsheet contains over 100 rows, which is a problem because it doesn’t fit in ChatGPT’s context window. I can only paste in a few boards at a time. The last line in the prompt specifies “Repeat” to tell ChatGPT that each input message will contain data to be transformed instead of a prompt.
How the Sledgehammer Performed as a Fly Swatter
If GPT3 performed badly, I would have switched to GPT4 anyway, so I started with the better model. The only reason to use GPT3 is to avoid paying $20 per month for GPT4. Twenty bucks was a bargain for this mini-project.
Using ChatGPT to transform an Excel spreadsheet that is meant for human consumption into a custom CSV file worked surprisingly well, but it wasn’t perfect.
- Manually pasting text into ChatGPT multiple times is poor UX.
- ChatGPT doesn’t create a CSV file to my local disk.
- When a value contains a comma or a double quote, ChatGPT outputs a badly formatted CSV line. The rules for CSV files are tricky, and ChatGPT doesn’t know them yet.
- Finally, when both the start and end dates are blank, ChatGPT outputs a single comma instead of two, despite the instruction “If a value is blank, output a comma unless it’s the last column.” Sometimes ChatGPT gets confused.
ChatGPT gracefully handles a single input containing items for multiple boards. It notices when a new board name appears in the input, despite the fact that the data is structured weirdly. There is no header row containing a field name like “board name.” I didn’t give ChatGPT a heads up that this could happen. It just knew.
Where does this magic come from? Would Bard do the same? Neural networks are unpredictable.
Don’t assume anything is too hard for ChatGPT. Do yourself a favor and try the easy way first.
If you study this particular solution closely, you’ll see a fundamental flaw when it comes to “AI memory.” ChatGPT only works when the tasks for an entire board fit into ChatGPT’s context window. That’s because the board name, which is copied to the Category field in every row, only appears once in the input (pasted) message.
I’m sure I would have gotten better results by using advanced features like plugins or the confusingly named “code interpreter.” LangChain can handle all of the automation, although I would have to write code to split up the spreadsheet due to the board name. I didn’t have time for all that. The point of this exercise was to see if ChatGPT could do the job faster than I could.
ChatGPT and I accomplished a seemingly trivial goal faster than I could have done alone. I felt like I was working with “someone” the whole time. It sounds silly, but ChatGPT motivated me to stay on task.
As usual, the solution turned out to be more complicated because the problem was harder than it initially seemed.
ChatGPT is limited by the relatively tiny amount of data that it can process. It also makes unexpected mistakes. GPT4 isn’t better than a capable software engineer intern, but its needs are meager. Infinite context windows will either put software developers out of work or free them up to do their real job instead of moving items from Monday.com to Jira. I digress.
I fixed the CSV file by hand and imported it into Jira. All told, this project took about four hours, which is about the amount of time it would have taken to write code that actually worked. Of course, I might be overestimating my coding ability and luck. We will never know!
This experience was funner and more rewarding than coding. Unlike GitHub Copilot, ChatGPT executed my plain language instructions. ChatGPT didn’t just write code — it ran it.
I pasted a spreadsheet into ChatGPT which transformed it into custom CSV text. This required a credit card, $20 (eventually), some thinking, the most basic UI imaginable, and a willingness to try and try again. All without a code editor, compiler, or command line.
But Wait. There’s More!
The above prompt is a reasonably detailed description of a typical data transformation program.
The more precise the description, the better ChatGPT performs.
ChatGPT will even provide the “code” (or Skynet instructions) it used, in your language of choice:
Output these instructions as a Python program
Unfortunately, the code doesn’t work as-is since the input file is an Excel spreadsheet and not a text file, but it’s a good start. ChatGPT can help with that too!