Keep the human in the loop

I found this great example to showcase how an LLM performs horribly when presented with a task it cannot digest or split into smaller tasks, but when someone else does before feeding the LLM with the input, the task suddenly becomes manageable¹.

Find on the first image, Gemini’s talent here in executing on what any 10 year old kid would judge as a simple request². Now this task is really simple if we split it up. What we can do is:

Remove the letters in the foreground
Rotate and mirror the image
Add the letters in the foreground again.

SOTA LLMs can easily remove parts of an image and fill in the gaps flawlessly. A task that not long ago would’ve taken a decent amount of time, provided you had the talent: Notice below in Figure 3, how the image with the shades, the gradients, the colors is all kept to a point where it’s hard to tell the edits were made by an LLM if it weren’t for the watermark.

The rest is a simple Figma edit or whichever tool you use. Fonts are public online to re-add them. But we could always force AI in.

What this teaches us is that there is a long way to understand and educate yourself in how to interact with LLMs. You shouldn’t assume that as the models get better, they will get closer to how you think – even if models are a composite of algorithms and LLMs.

Figure 3: Intermediate step

Figure 4. Final result

This is a widely discussed topic, with plenty of literature exists. RAG is one of such methods, and as ChatGPT became mainstream, many initiatives sprung suggesting that LLMs should have to do lists. Something that’s taken for granted today as Claude and Claude code shipped. ↩
This is one of the many attempts, but I won’t be over indexing in how LLMs do not understand. You can try this at home if you’re curious. ↩

Keep the human in the loop

Footnotes