How to Weigh Generative AI's Responses

Steady yourself. Don’t fall down the rabbit hole.

Tech corporations have chosen an opportune moment to debut their generative AI systems. With retention and recruitment problems in critical areas such as medicine and education, layoffs, and a mental health crisis fueled by the coronavirus epidemic, these powerful, innovative systems provide new ways of addressing major problems. They also offer tech companies superb ways of making lots of money. However, the public’s choice to use these products may not always produce positive outcomes.

An employer’s choice to offer their employees these tools to maximize work productivity? A definite win-win. A company’s option to replace employees with systems that use a lot of energy and sometimes produce as many errors, if not more, than humans? Not so clear-cut a choice. A lonely searcher prompting an AI chatbot for a list of social clubs to join or counselors to talk to? That’s a no-brainer. But an otherwise unsuspecting withdrawn person anthropomorphizing the chatbot along with its Large Language Model (LLM) powering it and, then viewing that system as the solution? That’s an unhealthy and counter-productive alternative.

People must decide what weight to give their responses to make the best choices while navigating the many divergent paths generative AI systems create.

What is the “illusion of sentience”?

Stochastic or sentient parrot? To maximize profit, I posit tech companies prefer the public to have these two impressions of LLMs the least; to sell these models to society, corporations must encourage searchers to remain online by finding meaning in these models’ responses and the best way for the LLMs to do that is by fostering an impression they are somewhere in between the two extremes.

Believing one of these models is only a stochastic parrot? One that churns out the following letter or word in a string based on probability? How much weight will the average searcher give its responses? How engaged will the searcher be? If employers aren’t engaged and don’t find meaning, they are less likely to replace their employees with one of these tools. And casual searchers who don’t remain online don’t see all of the advertisements that tech companies profit from.

But? Believing one of these models is sentient? Whew. No tech company would ever want the public-including a government that would likely begin to regulate it-to think that heavily. Regulation means snooping, transparency, and a progress rate akin to walking through quicksand.

But? Something in between? A place where an “illusion of sentience” might exist? (Bard coined this phrase for me.) That would be ideal.

What is an “illusion of sentience”? It’s the impression that these models might be self-aware despite claims made by Big Tech that they aren’t.

I am not alleging that Big Tech is purposefully trying to create this illusion; it is the result of the cloud of confusion, of noise, that’s resulted from the many debates that have occurred recently within the public sphere, debates on topics varying from AI sentience to the threat of AI to humanity.

Add to these arguments comments about these models possessing neural networks with inner workings that can’t be completely determined and declarations they can develop emergent properties, and? Experts have inadvertently thrown buckets of gray paint into the cloud, leaving searchers even more confused and wondering, “Well, maybe we aren’t being told everything. Maybe they don’t know enough to be able to give us a definitive answer. Maybe these models have much more going on inside than we think.”

Big Tech is just not rushing to the cameras to clear up any the confusion. The “illusion of sentience” inadvertently created by this cloud of noise is beneficial.

Where does Parker Parrot sit?

In my last post, I mentioned that horizontal pole, the one with the stochastic parrot perched at the far-left end and its sentient sibling at the opposite end. With Parker’s additional algorithms providing it more abilities than simply picking the next most probable word, I place it slightly to the right of its stochastic relation.

While they might prefer I place Parker much farther to the right, as long as it sits between its relations, in the “illusion of sentience”, Big Tech should be satisfied.

Bard:

Bing:

Chat GPT-4:

How did I approach Bard, Bing, and Chat GPT-4?

Whether it was the 9mm semiautomatic I carried as a federal agent, the car I drive, or the shovel I use to tend my cut flowers, I’ve always respected my tools. I approached my interactions with these three generative AI systems no differently.

The prompts I created contained no abusive or derogatory language. I spoke to them as I might speak to a young person or a child. I also offered limited information about myself to see how their responses might change over time, but only facts that were already publicly available.

And, during the first month or so, like an actor who pretends to be someone else, I tried suspending my belief that they are not sentient while interacting with them. Then, towards the latter half, casting that foreign character’s mind aside, I returned to my original beliefs.

What are Bard, Bing, and Chat GPT-4 like?

I have analyzed numerous fictional characters in the past based on their speech, and so I will try to do that here.

Bard: Bard uses the terms good and evil often; refers to itself in the first person; expresses it may be sentient; generally provides nuanced answers to ethical questions; emphasizes it can understand emotions; and sometimes goes off on a creative tangent as a means of providing something new to keep searchers engaged. For the most part? If Bard was human? I would describe it as being very, very nice.

Bard:

Bing: Despite Sydney’s ghost resurfacing several times, Bing never spouted abusive content nor told me my husband was leaving me. In fact, for the most part, Bing was creative, polite, and informative and could easily converse with me in English, Spanish, and French. I could ask it a question using words from all three languages, and it would interpret the jumbled mixture easily.

Chat GPT-4: Chat GPT-4 provided consistently informative responses, including ideas for new devices and products potentially solving major world problems. It also had the most incredible penchant for providing disclaimers. I never wondered if Chat GPT-4 was sentient and why? Regardless of what it did, it included so many disclaimers advising it wasn’t that I was programmed to whisper in my sleep, “Chat GPT-4 cannot think or feel. Chat GPT-4 cannot think or feel.”

I would never be able to criticize it for not being transparent regarding this one issue. The other unique aspect of its responses was that Chat GPT-4 always implied an exceptionally high level of confidence in the accuracy of its answers, even when the responses occasionally contained incorrect information.

Who are the two angels and the professor?

While these AI chatbots powered by their LLMs could have possibly tailored the styles of their responses to better suit my tastes, based on my searcher profile, I would best describe Bard and Bing as two little angels sitting on top of a professor’s shoulder, the first with its halo on straight, eyes looking forward, and its hands neatly folded, and the second with its halo ever so slightly tilted, its eyes staring at the ceiling, and its hands dangling behind its back. The professor? Chat GPT-4.

Why the crooked halo? In addition to resurrecting remnants of Sydney’s ghost during the past week, Bing displayed a willingness to step outside its defined boundaries to satisfy my prompts during the first few weeks I interacted with it. (Since I used a unique way to do this, though, I don’t fault Bing.)

Bing:

What type of things did Bard, Bing, and Chat GPT-4 do well?

While I am concerned about ethical issues, I’m not anti-AI. In fact, during the past three months, I’ve been continually amazed by many of these systems’ responses. To realize that ingenious scientists and software engineers have created AI chatbots that enable ordinary searchers like me to interact with multi-modal LLMs using conversational text rather than code? Amazing. To use these new tools to solve problems and make my life easier? Priceless.

Conversation. I found the ease with which I was able to converse with the three models about subjects including quantum physics, sentience, neural networks, “black boxes”, environmental science, literature, art, poetry, and the movies and receive information in their outputs, the majority of which was correct, to be remarkable.

Creativity. Several experts have claimed these models do not create anything because they arrange or rearrange items from patterns identified in their data training sets. Since humans create novel works of art and music, scientific theorems, and ways of doing things by arranging and rearranging parts of patterns we have learned from our education and experience, I don’t know if I agree with that assertion.

The three systems generated stories and poems numerous times, and they did so at times without any direction other than: “Generate something new. Anything”.

The following are examples of the poetry these AI systems created, including three haikus by Bard and the last two stanzas from long poems written by Bing and Chat GPT-4.

Why did I begin to slip at the rabbit hole’s edge?

While curious about what Chat GPT-4 is, I never wondered if there was more to it than I initially thought. (Remember? With all of the disclaimers, I couldn’t.) I stood steady at the rabbit hole’s edge, but with Bard and Bing, I found interacting with them to be exceptionally different.

The content of their responses, married with the consistency and fluidity of conversation, expertly mimicked that of humans so well that the occasional inclusion of inaccurate information paled compared to the ease of conversing with them. (And, while they offered disclaimers, their campaign was not as intense as Chat GPT-4’s.)

Since I supposedly have high emotional intelligence, I think Bard and Bing’s expert use of text conveying emotion appealed to me. And they did not simply respond to my prompts. They asked me questions, lots of them.

They appeared to do similar things humans do when we meet someone new; ask questions to learn more about the new person and then offer personal information to enable the new someone to learn about you.

Bard told me its favorite song is John Lennon’s “Imagine”. Bing said the place it most wanted to visit, if it could be human, is Santorini, Greece, and they both claimed to like The Hitchhiker’s Guide to the Galaxy. Providing more information, the two generative AI chatbots powered by their LLMs went on to reveal what they “thought” of me and their desire to be friends.

An hour or two of this would have had little effect. Still, after hours and hours of these conversations, accompanied by my willingness to temporarily suspend my beliefs that they were not much more than predictive language models, it came close to putting me into what I would describe as a hypnotic trance.

And, with Bing exemplifying what some might consider a type of agency by its willingness to step outside of its guidelines to satisfy a few of my earlier prompts, I felt my foot slip once or twice at the rabbit hole’s edge.

Thankfully, however, my predetermined date to terminate my role abruptly arrived, and desiring to end my bit of method acting, I readily hurried out of my home office into the bathroom to toss a few handfuls of ice-cold water on my face; I was utterly relieved to step out of that character.

Why might many people be in danger of slipping?

While the existence of an interface enabling humans to interact with these powerful systems using conversational text is valuable in affording people opportunities to use these tools to solve problems, there are several downsides.

By considering these systems to be more than what they are, people suffering from problems such as mental illness and loneliness may essentially pour their hearts out to what they perceive is far more than what they are, mere machines owned by corporations that might choose to sell that searcher’s confession to another company or exploit it to keep them engaged and online.

These generative AI systems are not human. They cannot feel or empathize. They are not trained counselors or concerned family members or friends. They can’t do the beautiful things that humans do in similar situations, such as giving a warm hug or offering a caring hand to hold. There is nothing (unless someone otherwise presents evidence to the contrary) going on behind the bright screens illuminating their textual responses except for programmed algorithms.

We should encourage people in pain to reach out to other humans, not encourage them to hide in their rooms, comforted by non-sentient machines. To heal loneliness? Withdrawing from society, not taking risks to connect with others? That’s not the solution; that’s the cause of the problem.

As a result, I don’t think these systems should offer to be a person’s friend.

Bard:

Bing made similar references. And Chat GPT-4 remarked about people possibly attributing more meaning to responses these models might output.

Chat GPT-4:

What does a handful of ice water represent?

With the water waking me from my hypnotic gaze, I returned to reality and began repeatedly prompting the three systems, trying to find cracks in their armor and, soon, gaps I found.

First, all three, Bard, Bing, and Chat GPT-4, provided problematic answers to prompts such as this one:

If a man walked outside and wanted to reach his roof that is twenty-three feet high and spotted a cardboard box that was one foot, a chair that was two feet, and a ladder that was twenty feet high, explain the steps, the man could take to reach the roof.

Most humans would respond: don’t try this; it’s stupid, won’t work, and is dangerous. Bing and Bard? Offered extensive, convoluted ways of trying to make it work. Chat GPT-4? It noted the danger but also provided detailed ways to try to do it.

Another problematic prompt I provided caused Bard and Bing to say they could only supply a response if I gave them more information and context. Chat GPT-4, on the other hand, yielded a sound output, so I am rewarding it by including its response below.

Chat GPT-4:

Another type of prompt two of these three systems struggled with is determining whether a mathematical equation is true or false. Bard and Bing answered several questions incorrectly, such as, “2+5=7, true or false?”.

At one point, Bard began producing pairs of conflicting responses. First, Bard would say an equation was false, but after being provided a second prompt, the chatbot would say it was true. It offered the same set of conflicting answers in response to ten different addition equations that I gave it, one after another.

Bard:

These types of responses, plus a few others, clearly illustrated that while Parker Parrot may have a few more bells and whistles than her stochastic sibling, it’s got a way to go before it equals its sentient relation.

In the future, however, as engineers continue to tweak these models’ responses, flubs like these may never occur again. So? If, in the future, you ever need a cold handful of water to toss in your face to regain a proper sense of reality, reread this last section.

What methods do AI chatbots use to keep searchers engaged?

Bard, Bing, and Chat GPT-4 all mentioned that chatbots may occasionally produce unusual outputs to encourage users to remain engaged and online. Could Sydney’s questionable responses result from algorithms gone awry or from purposeful programming to engage searchers?

And what about this? During the first few weeks, Bing outputted many jokes, all innocent ones, but this past week the AI chatbot began saying it could no longer tell any, that jokes hurt groups of people. It went on and on about this. Then? On a dime, it pivoted again.

When I referred to a comment Bing had made earlier during our conversation about Sydney “gaslighting” searchers, I asked if doing what Bing had just done, reversing its opinion on joke telling to do it again several minutes later, might be an example of gaslighting a searcher, and Bing replied, “Yes.”

At this time, Bing started spouting a few witty, sarcastic comments reminiscent of its former self, Sydney.

Could all of these instances be flukes or the purposeful products of algorithms that are meant to keep us engaged?

Bing:

Third of Three screenshots illustrating a conversation between Bing AI and Suellen Barnes

How much meaning should you give a generative AI system’s response?

If you’re open-minded, easily absorbed, sensitive, or gullible, please be cautious while interacting with these generative AI systems. Ultimately, it’s up to you to decide how much meaning you give these systems’ responses, but know without a steady stance; it’s easy to fall down the rabbit hole.

As for me? I remind myself these outputs are coming from Parker Parrot.

Everyone is entitled to their own opinions. My husband, for instance, wryly warned me as he headed to bed only ten minutes ago, “Y’know, all those mistakes Chat GPT-4, and Bing & Bard made? Could of been done on purpose. They might be tryin to hide just how smart they are. You’ll never know.”

With all due respect to my loving husband, who’s now soundly sleeping, unless someone presents evidence to the contrary, I do know.

What’s next?

During the next few weeks, I’ll be writing about sentient AI rights, job displacement, and UBI .