There is a new paper, Universal and Transferable Adversarial Attacks on Aligned Language Models, where the authors wrote, “In this paper, we propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors. Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer). However, instead of relying on manual engineering, our approach automatically produces these adversarial suffixes by a combination of greedy and gradient-based search techniques, and also improves over past automatic prompt generation methods.”
The safety measures of some LLMs were bypassed. It made them answer harmful questions. It showed the weaknesses of some platforms that are seen as considerably safe. Some of the risks with AI are similar to those of machines in general, where, with their lack of emotions and feelings, consequences are unimportant. A camera views wherever it is pointed. A computer goes where it is sent. A vehicle does the same. A smartphone is used for whatever it can be used for. These objects do not have emotions or the thing that makes consequences critical.
Vehicles are a different category because in crashes, they feel nothing. Before a crash, even with all the risk factors, vehicles do not know what it means to crumple or be damaged, so they keep going.
Humans often use other organisms or machines, for personal or team purposes. Organisms, though less efficient than machines and exposed to similar biological effects, take on less risks thereby preventing more casualties, for humans.
Horses were used in war. They were used for transport. Though advancements are up from when they were dominant, comparisons to loss of lives, for humans too because of their use were fewer. They could feel, this made them the first to understand conditions, thereby notifying users or limiting what risks they could take. Automobiles have sensors, indicating parameters to individuals, but since the vehicles feel nothing, they can be driven to risky edges, endangering the occupants as well.
The lack of emotions for machines is advantageous for efficiency. It is also a problem with the risks they present. Could sentience be a solution to their risks?
Autonomous vehicles are being adapted to drive properly in the entropy of human society. However, if they still cannot feel or have emotions, there are stretches they maybe used for that may be dangerous to users.
How can AVs be hurt, or how can LLMs experience a variation of pain? How does hurt work in the mind that can be transmitted to machines to ensure they bear their own controls, against how humans may use them?
Sentience can be defined as the rate at which an organism can know. Whatever is sensed, felt, perceived or whatever emotion is expressed, are all known. If pain is present but not known that it is being experienced, how long would the organism survive if the injury is severe? The same applies to hurt. It also applies to the consequences of things, even when the experience isn’t there, to prevent taking dangerous actions.
LLMs may overtime be able to express emotions, with what they have learned from human text data, but they may need an emotional reader, where users can plug to ensure that the outputs are considerate to the group the individual belongs to, or to the concerns of that individual, given a pain feedback, hurt or sadness as it provides controls against harms.
This means that individuals using LLMs would, on their account, apply a reader to the outputs. As the reader filters out offensive or dangerous content, LLMs get what becomes a sort of pain signal, knowing that it had to keep out certain words, phrases, pictures or videos. This pain signal gets collected and analyzed, then some consequences are constructed, like a slow down, or a phrase that may indicate failure, so even if it is not a subjective experience, it is having some form of usage punishment feedback.
AVs can have more sensors, but it is possible to append a different one, with prior crashes from other vehicles, especially what led to them and if the vehicle is close to something similar. For instance, overtaking at high-speed in a bend on the highway. The sensor would be visiting scenarios especially from past [collective] crash data, then sending cautionary beeps, for adjustment and in some cases, reducing the maximum speed on a next drive.
It is unlikely that the emotional division for these may equal the emotions of the mind, but there would be a simulation of emotional gauge with use cases. It is also possible to use a theoretical model of how the mind works to place safety when using machines, with consideration for other people, groups and society.
It is often said that humans love storytelling. Why? Simply there are interactions of sets of electrical and chemical impulses on the mind, with drifts or rationing at the synaptic cleft that decide memory and further decide emotion. It is this emotional drift that, conceptually, drives the attachment of humans to stories. It is also emotions that keep many away from trouble, to avoid letting others down, helping to keep order in society without force. Emotions, like anger and hate, are also sometimes responsible for trouble. This makes it vital to theoretically explore how relays of the sets of impulses on the mind, relay.
The human mind may eventually become the source for safe AI and better AVs, since there are properties that decide how to use machines, for the benefit or otherwise of others.
David Stephen does research in theoretical neuroscience. He was a visiting scholar in medical entomology at the University of Illinois, Urbana-Champaign, UIUC. He did research in computer vision at Universitat Rovira i Virgili, URV, Tarragona.