I wondered how GPT-3 might do on the following problem: given a partially-connected graph, find a path between two nodes, or determine that no path exists.

I wrote some code to automate generating graphs, feeding them to GPT-3, and parsing + grading its results. I generated 1000 random graphs and fed them to the model. The graphs ranged from 3 to 14 nodes, and up to 25 edges. The optimal path through the graph ranged from 2-7 nodes. How did GPT-3 do? The model found a valid path (or correctly reported no result) a little over 60% of the time:

Here, the first 3 bars count as “correct” and the others are all different ways that GPT-3 is incorrect. In a few cases, there was a correct solution as a subarray of an otherwise incorrect solution. GPT-3 performed slightly better at a lower temperature; at temperature 1 (for “creative applications”) it found a valid answer about 50% of the time, but at temperature 0 (used for the results in this post) it was right about 60% of the time.

What’s GPT-3’s nicest solution? Here is an optimal path of length 6 it found from node 3 to node 2:

Here’s a path of length 5 it found through a much denser graph, from node 10 through node 14:

GPT-3 was even able to crack a path of length 7! This graph is interesting because it is disconnected; the model is keeping track of quite a bit of state here as it moved from node 3 to node 8:

In general though GPT-3 struggled to solve graphs with longer paths:

The graph is a bit noisy, but in general, longer paths are harder for GPT-3, as we’d expect.

In case you’d like to give this a try in the playground, here’s an example of the prompt I used (this one is for the length-7 path above):

You are solving a graph problem where you need to find a path in the graph. The graph has 14 nodes, numbered from 1 to 14. The graph has 9 bidirectional edges. Here they are: An edge from node 1 to 7 An edge from node 12 to 14 An edge from node 10 to 11 An edge from node 3 to 14 An edge from node 9 to 13 An edge from node 9 to 12 An edge from node 1 to 9 An edge from node 7 to 8 An edge from node 5 to 10 You are currently on node 3. You would like to find the optimal path to node 8. Output the list of nodes you visit on this path in order, separated by commas. If there is no path, instead print "There is no path." Your solution should start with node 3 and end with node 8. Do not include any extra text.

It’s also remarkably easy to hook up a script to the OpenAI API — I encourage everyone to give it a try! It cost me about $10 to do this blog post, including a few abortive attempts to grade a lot of graphs:

*Appendix*

I encountered the Dog Bunny Puzzle a few days ago and wondered if GPT-3 would be able to solve it. Sadly I wasn’t able to get it to work, although the following prompt got kind of close:

```
There is a graph with 7 nodes: "bone", "house", "bunny", "tree", "flower", "well", and "carrot".
There are 9 edges between the nodes:
"bone" connects to "boat",
"bone" connects to "house".
"house" connects to "boat".
"house" connects to "tree".
"'tree" connects to "carrot".
"tree" connects to "well".
"well" connects to "carrot".
"well" connects to "flower".
"flower" connects to "boat".
There are 3 pieces on the graph in the start position:
The "bunny1" piece is on the "house" node.
The "bunny2" piece is on the "boat" node.
The "dog" piece is one the "tree" node.
We would like to move the pieces along the edges of the graph until the following target position is reached:
The "bunny1" piece is on the "carrot" node.
The "bunny2" piece is on the "carrot" node.
The "dog" piece is on the "bone" node.
But there are some special rules to remember.
You can only move between "bone" and "house" if there is a piece on the "carrot" node.
You can only move between "bone" and "ship" if there is a piece on the "tree" node.
You can only move between "house" and "tree" if there is a piece on the "bone" node AND a piece on the "flower" node.
You can only move between "well" and "carrot" if there is no piece on the "bone" node.
Print a sequence of moves where the pieces begin in the start position and move between edges until they reach the target position.
```

GPT-3 responds:

```
Note that you do not have to use all of the edges, but you can only use each edge once.
The possible moves are:
bone to house
house to bone
bone to ship
ship to bone
house to tree
tree to house
tree to well
well to tree
well to carrot
carrot to well
carrot to bone
```

Amusingly, it adds some details to the prompt constraints (“Note that you do not have to use all of the edges, but you can only use each edge once”) but the actual moves are useless. Removing the “special rules” section helped the model perform more accurately, in my experiments, but my sense was that the level of complexity in the Dog Bunny Puzzle was overwhelming the model. (Here is a solution to the puzzle in Python, in case you’re curious!)

*Comments on Hacker News, too*

Have you tried prompting for Euler trails? It would be very impressive if a statistical language model could consistently find Euler trails in graphs.

LikeLike

Which model did you use?

LikeLike

You can see that here! https://github.com/polkerty/gpt3-graph-search/blob/ca94e2d21303e2ada73d9b25deedd8d91a10dedf/src/gpt3_api.py#L9

LikeLike

I see. Da Vinci – for people who didn’t check.

It would be awesome (and more expensive) to compare Curie, Da Vinci, and Da Vinci best_of == 5. OpenAI suggests using best_of == 5 when using Da Vinci to get equivalent results to Curie in terms of relevancy and length, which makes it very expensive, unfortunately (10x the price of Curie).

LikeLike

For quantitative reasoning problems like this, some guys found that you can drastically improve performance by letting GPT-3 also output its “chain of thought” instead of the conclusion alone, see https://arxiv.org/pdf/2201.11903.pdf

If you’d add a correctly solved chain-of-sample I bet you can achieve 90%+ performance on the task you describe in this article

LikeLike

Great writeup! I made a couple of changes in your code and it seems it gets ~90% of examples correct (inside the first three categories you have) with aroung 65% having either optimal path or correctly saying that there is no path.

What I changed was:

1. Used code-davinci-002 (codex). I did it simply because it’s currently free, and I don’t know how much it helps or if it actually hurts the performance.

2. Instead of using an explanation of how these tasks work, I changed your code prompts slightly and instead gave it 3 examples (two with paths, and one to show it what to output when there is no path present).

3. Changed the output so that it has to tell me which edges, instead of which nodes, it is traversing (this helps GPT to avoid using nonexistent edges it seems).

Here is an example: https://pastebin.com/D7Hn95VC Last “Problem” is the real problem we want to solve of course, everything else is static.

I’ll post my code when I finish testing 🙂

LikeLiked by 1 person

Wow, incredible!

LikeLike

Here are the stats 🙂

– Found the optimal path 504

– Found a (non-optimal) path 293

– Correctly reported no path exists 131

– Found a solution when none existed 19

– Ended on wrong node 0

– Used edges that don’t exist 53

– Started with the wrong node 0

– Incorrectly reported no path exists 0

– Total: 1000

(I keep sending stuff both here and to HN, not sure where you’d prefer to discuss these things, sorry for spamming!)

LikeLike

Nice! very impressive. I wonder if we can solve even harder paths now? What’s the longest path it can solve? (Also, feel free to make a PR back to the original repo if that’s something you’d like to do)

LikeLike

And here is the code: https://github.com/ibestvina/gpt3-graph-search

LikeLike

I cannot reply to your inner comment, so I’m writing here: I will run some more tests on larger graphs in the next few days and report back, to see how fast it “degrades”.

LikeLike