Monday, March 3, 2025

Aya Goes for Walk and Finds Out Who Owns the Zebra


Aya Goes for Walk and Finds Out Who Owns the Zebra - Microsoft Designer Generated

Overview


There is so much being written about AI these days, and in particular Large Language Models (LLM) and Chatbots (oh my). And here goes our paltry contribution.

We will tell you straight off that we make heavy use of Copilot, Microsoft’s AI companion. Half the time it’s the Copilot inside of Visual Studio development environment that we are using. We estimate that we code at least 30 – 50% “better” than without it. What do we mean by better? Copilot brings more ideas to the table, quicker. We are less blocked by coding issues (hello lambda notation I’m talking about you). It helps in debugging. It really is like having a pair programmer at your side. Copilot is not correct all the time, but boy does it help keep you from being blocked. In short, it has been transformative for us.

Outside of Visual Studio, we are using chatbots for planning trips and asking questions that go beyond simple Google searches. Questions that spin off into long conversations that are truly interesting and informative to us. Again, is it always correct, no, but it’s a time saver and it’s prompting us to ask more and better questions, to think before we search for something. And perhaps this last statement is the part we think many of the naysayers miss.

That said, lots of people are worried about AI. In fact, we just got done reading The Coming Wave, Technology, Power and the 21st Century's Greatest Dilemma (2023) by Mustafa Suleyman, Michael Bhaskar. This book examines the transformative and potentially perilous impact of advanced technologies, particularly AI and synthetic biology. The book’s major theme is “the containment problem”—the task of maintaining control over powerful technologies. It’s a sobering read yet, we didn’t feel as moved to action as we thought we might be. Given that AI for us, for now as mentioned in the first few paragraphs, has been a net positive for us, the containment problem didn’t resonate. Maybe we’ll be changing our minds?

Recently, we read two articles on AI, the TechCrunch article OpenAI announces new o3 models and the Quanta Magazine article Chatbot Software Begins to Face Fundamental Limitations. (Quanta has some really good writing!)

One big question that both articles deal with is whether the newest models are approaching AGI or “artificial general intelligence,” referring to whether AI that can perform any task a human can. The TechCrunch article mentions some mathematical benchmarks used to figure that out. The Quanta article mentions the Zebra or Einstein’s riddle.

Aya


The TechCrunch article mentions turning the AI models against the 2024 American Invitational Mathematics Exam questions. I thought, hey, I should try this too and see how I do. (Sort of confirming my humanity?) Confident of a quick solution, I read the problem statement of 2024 AIME I Problems / Problem 1:

Every morning Aya goes for a 9-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of s kilometers per hour, the walk takes her 4 hours, including t minutes spent in the coffee shop. When she walks s+2 kilometers per hour, the walk takes her 2 hours and 24 minutes, including t minutes spent in the coffee shop. Suppose Aya walks at s + 1/2 kilometers per hour. Find the number of minutes the walk takes her, including the t minutes spent in the coffee shop.

Six hours later I had sort of a solution! (The time it took me was longer than Aya's walk.) I had all the right ideas but forgot about solving – spoiler alert – the roots of a quadratic equation. It was a humbling experience. The Open AI o3 models o3 scored 96.7% on the 2024 American Invitational Mathematics Exam, missing just one question. I wonder what question was missed?

What’s interesting about this experience, is that I knew at the end I had to solve a quadratic equation but forgot how to so I asked Copilot the details of solving one. Is that so wrong?

Zebra


The Zebra puzzle first appeared on December 17, 1962, in Life International magazine. As the Quanta article says “Also known as Einstein’s puzzle or riddle (likely an apocryphal attribution), the problem tests a certain kind of multistep reasoning.”  According to the article, this kind of riddle requires composing a larger solution from solutions to subproblems, which is not easy for LLMs. But it should be easier for humans I thought. So, I gave the riddle a try and lost another 6 hours of my life! 
 
The riddle consists of 15 sentences describing five houses on a street. Each sentence is a clue, such as “Coffee is drunk in the green house.” or “The Lucky Strike smoker drinks orange juice.” Each house was a different color, with people of different nationalities, who own different pets, drink different beverages, and smoke different cigarette brands (it was the 1960s!). The story’s headline asks: “Who Owns the Zebra?”  Hint: Not Aya.

Pollyanna


What point are we trying to make? First, AI is here to stay and is not going away or going to be easily “contained”. In fact, in the “Coming Wave” book a quote from Chapter 8 said it all: “Today's world is optimized for curiosity, sharing, and research at a pace never seen before. Modern research works against containment. So too do the necessity and desire to make a profit." Bold added by us.

Our second point is that we need to adapt to using these tools and use them to make us better. Just reading about the tools and using the tools has inspired us to relearn things we forgot as well as answer new and more interesting question. We think that this is positive. Perhaps we are "pollyannish" on AI, that is people who tend to be excessively optimistic about it. 

The term Pollyanna comes from the title character in the 1913 novel Pollyanna often by Eleanor H. Porter. Pollyanna is a young girl who remains relentlessly positive and tries to find something good in every situation, no matter how challenging.

Guilty as charged!

In fact, we are not the types who stare at a ChatGPT prompt and ask dark or maleficent questions and then go huh? look at what it spit out. We are always asking positive (for the most part) questions, looking to get to some new higher ground or state, be it in programming or otherwise. And maybe being the Pollyanna types that we are, we miss that some people don’t do this.

No comments:

Post a Comment

All comments go through a moderation process. Even though it may not look like the comment was accepted, it probably was. Check back in a day if you asked a question. Thanks!