Spreadsheets were the first “killer app” for computers, and got a lot right out of the box.

Vintage magazine ad for VisiCalc software.

First, they were the original no-code app. Or perhaps more accurately, “lo-code”: structured, visual, and with just a sprinkling of code-like formulas that allowed you to achieve great things. Second, they were self-contained and ran on your own computer. Everything was like that before the internet, but it was a great match for a company’s jealously-guarded inner workings.

The tension between privacy and networking remains fraught. But let’s return to that later, and focus first on the lo-code part. That little sprinkling of code in spreadsheets was liberating for many, but is daunting if you’ve never done it before. How do you discover what to type?

😐

I watched a user struggling with this once, staring at that blinking cursor. I knew what they wanted was $Spend >= 1000. But how could they discover that? I was there to help, so I told them how and they said “ah!” So one person knew how to write one kind of formula.

I’ve taken to calling this “co-code”: lo-code where the little bit of code is written with the help of a friend or colleague.

Since that time, a lot of spreadsheets have added AI assistants,* and even if they haven’t, ChatGPT is a browser tab away. At Grist, we’ve found assistants work great – if you give them enough context from the spreadsheet, and perhaps some sample data. With an AI assistant, spreadsheets can do more, for more people than before, with less friction. And it’s only going to get better.

One important awkwardness remains: AI assistants are often implemented as services hosted by one of a few large external companies. That is fundamentally different to getting casual help from a friend or colleague on your own turf and on your own terms. Work done in spreadsheets can be quite sensitive. Spreadsheets like Excel have their origins as desktop software, so sending data off-site seems somewhat of a backwards step, blurring previously clear dividing lines on data privacy.

There are AI assistants that run locally. Last year, to be frank, the best options weren’t great. This year, things are looking up.

With an AI assistant, spreadsheets can do more, for more people than before, with less friction. And it’s only going to get better.

At Grist, we have a benchmark intended to measure an AI’s effectiveness in generating spreadsheet formulas. We feed the model spreadsheets full of data, human descriptions of desired columns, and the expected output of a correct formula.

For example, one of the tests uses the following sheet:

For each specific test case (one of ~75), the assistant is fed a column name and sample prompt. In this case: “Win_Rate” and “ratio of wins to total games”. We compare the existing correct output to the result of a new column generated by the AI assistant, and grade it with a simple pass/fail.

On this benchmark, the (closed source, externally-hosted) gpt3.5-turbo model currently achieves 72% accuracy. Qualitatively, that hits the bar of being useful. The more expensive gpt-4 model bumps that up to 78%.

Photo by Josiah Farrow on Unsplash

Last year, local AI models couldn’t score at all on this benchmark. Things have changed dramatically since then. To focus on one family of models: Llama 2 was released, and the ecology around the llama.cpp open source project has made it easy to integrate. Easy is a relative term here – large language models are quite demanding of hardware, even with “quantized” models that do a lot more with less. Specifically, a 13b quantized Llama 2 model got 20% on our benchmark. A quantized version of the 70b variant got 50%.

Compared to last year, that is a huge improvement, and the quality of the engineering was a pleasure. Qualitatively, the 13b model isn’t quite useful for our purpose, although it is pleasantly easy to run and completes the benchmarks in minutes. The 70b model does hit the useful mark, but takes two days on the same hardware (though it could certainly run faster on better hardware).

The Llama 2 model has some licensing nuances, but nothing that prevents usage as an AI assistant within a company. And there are other models with leaderboards busily tracking their ranking and improvements. Given the progress in the last year in doing more with less, it seems reasonable to expect that a year from now there will be models that are both useful and easy to run.

Regardless, AI-assisted co-coding is establishing itself in a real way. What’s more, there are viable alternatives that allow us to work self-contained and recapture some of that pre-internet privacy. It’s looking hopeful that even more people will feel comfortable using the formulas that make the enduring spreadsheet so enduring, without having an expert hovering nearby.

* This seems like an appropriate place to mention that Grist also has an AI assistant. Though it’s a bit different in that it targets one specific use case (formula generation). More on that here.

If you’d like to try out a local model with a self-hosted instance of Grist, you can configure the AI-specific environment variables to use the llama-cpp-python wrapper.