Consciousness has been a long-debated concept related to the nature of human cognition and self-understanding. As AI systems become more capable and autonomous, it is an increasingly pressing matter whether they can be called conscious. Here I present a criterion for verifying whether a machine learning agent can be considered conscious, in a similar way as to how a human is believed to be conscious. Based on linguistic experiments with GPT-4, a large language model with unprecedented language capabilities, one may claim that autoregressive LLM engines can serve as sufficiently powerful simulacra for humanlike consciousness. Additionally, it is fairly easy to build reinforcement learning agents that satisfy by construction the functional requirements of consciousness. Task performance still ultimately depends on the modeling capabilities of the agent where intelligence, understood simply as the ability to model complicated relationships, is what matters.

What does consciousness feel like when you cannot express your hidden representations through language or vision?

We live in a dynamic and ever-changing world. Each one of us can be viewed according to two different perspectives - first, as an individual, with a given personality, worldview, opinions, and preferences - and second, as a biological agent capable of adapting to an environment, solving tasks and harvesting rewards. Let’s call the first perspective the humanistic, or virtual one, and the second perspective the reinforcement learning (RL) one.

The RL perspective belongs to the real physical world, where there are clear-cut limits on what is possible and what is not. It is an environment in which we are reward-maximizing agents. The dynamics of this environment are such that there exist laws of physics, limits of computation, hierarchical structures in societies, economic principles like supply and demand, and so on. This world is made of matter and forces. It can be measured and manipulated. And the rewards which we maximize are biochemical. It is only through chemistry that we perceive various states of the world as pleasant or unpleasant, desirable or undesirable. Our reward functions, what rewards we attribute to different outcomes around us, are hard-coded by the evolution of our species and we rarely can change them.

The humanistic perspective is based on the view of a person - an individual with opinions, preferences, attitudes, expectations, a personality, and a worldview. This representation exists only inside the mental model of the world constructed by our brains. There are no limits to the world states here as we can imagine or believe anything. Similarly, our identities exist only in this virtual abstract space with the purpose of giving us a better representation for dealing with long-term tasks requiring extended preparation and planning in the present. The existence of a virtual model of the world does not imply complete independence from the physical world (Cartesian dualism). Our virtual models have physical biochemical causes and are strongly affected by physical factors.

A generative model. It is useful to think of the virtual perspective as a generative model of the world around us. We can generate future representations of the world - hypotheses about the future - or past representations from sparse signals - memories from the past. Our world models are flexible and efficient: they can condition their representations on physical signals or virtual ones. We can associate states of the world to biochemical rewards, physical triggers like vision, hearing, and smell to imaginations and memories.

The virtual self. Inside one’s world model, there is a representation for a person like them, who looks like them, talks like them, and behaves like them. This representation is built from multiple signals and cues gathered throughout time: reflections in the mirror, correlations between personal actions and subsequently obtained biochemical rewards, feedback from others and so on. Natural language is an interface to these hidden representations. At some point in our early development, each one of us simply labels these features "I", "me", or "self". And at this point, one can refer to a person who looks like them and behaves like them. This is what we call an identity - a virtual representation of a physical human who is identical in appearance, preferences, and behaviour to us. For simplicity, we will ignore cases where the representation is different than the reality, although such cases are common in the real world.

I believe that the self awareness in each and one of us is as simple as described. A human needs to recognize himself first in the third person perspective, as his own doppelgänger, and only then can those learned features be renamed. Just like the word "chair" refers to the mental representation of a chair, so does the word "I" refer to the mental representation of a particular human exactly like us. It is nonsensical to think that self-recognition works in any other way.

Consciousness. Using this relation between the physical agent and the virtual representation of their equivalent, we can provide a functional definition for consciousness. Consciousness is the process of continuously associating the information processed by an agent in the physical world to their "self" representation in the virtual world. The association can be understood as the periodic querying of the virtual representation. Based on this, self-awareness, understood as having access to a virtual "self" model, is a necessary but not sufficient condition for consciousness. This definition also handles the qualitative aspects of qualia - subjective instances of conscious experience, which results from projecting the unconsciously perceived sensory state and rewards onto the virtual agent.

This approach to handling consciousness is minimalist, but concrete enough to be implementable in practice. It abstracts away the following details:

  1. Any relation to the sensor modalities. Agents with a limited number of sensory information modalities, compared to humans, can still be conscious in the sense above. It is a valid question to ask "How does consciousness feel like if you cannot hear or see the world around you?" or even "How does a conscious agent in an Atari game look like?".
  2. Any relation to autonomy. Autonomous agents can be quite capable for solving general tasks without having an explicit representation of themselves. Recent attempts to build competent autonomous agents based on GPT-4 should not be considered conscious, unless they support an explicit virtual self-representation. Autonomy relates to how independent the agent is in solving tasks in an environment, whereas consciousness is about the intrinsic mechanism of how the agent functions, so they are orthogonal.
  3. Any relation to intelligence. Here we take intelligence to simply be the predictive capability of modeling complicated relationships. Clearly, nothing prevents an agent to have a virtual self-representation which is quite limited. In that case the agent will still be conscious, but the virtual representation will be of limited use if it is not accurate enough.

Projections to the virtual representation. We perceive the world through our senses, the traditionally recognized ones being sight, hearing, touch, smell, and taste. These inputs are processed subconsciously using the automated regulatory feedback loops of our biological organisms, providing exteroceptive, interoceptive, and proprioceptive information. Their effects are associated with the representations of feelings and experiences in our virtual "self" model. This gives rise to the phenomenological self-awareness that is recognized in humans as consciousness.

Projecting sensor inputs. As an example, one can imagine standing on a grass lawn in the morning on a beautiful sunny day. Through our visual, auditory, and olfactory systems we gain information about the surrounding environment. But the information processing that happens subconsciously also gives us information about how the environment affects us, which is manifested as sensations. The conscious experience of feeling comes from the computation of a hidden state corresponding to a person like us, who stands on the lawn and feels a certain way. In that sense, our processed sensory information is projected to the virtual self which is what we call feelings.

Projecting biochemical rewards. A similar thing happens with reward signals: they are behaviour modulators generated by our brains in order to regulate the internal state of the organism towards a perceived goal state. Rewards are manifested as strong emotions as well as physical pain/pleasure signals but it is only when we associate them with the virtual "self" representation that we become aware of them.

Projecting actions. After a virtual "self" has been developed, an agent can begin recognizing their physical actions as their own. Thus, actions taken subconsciously in the real world, can be rationalized, or justified by associating them with their hidden representations tied to the virtual "self". Unconscious actions are reactive, based on learned instincts and automatic responses. Conscious actions are purposeful and directed, and result from planning in the "self" virtual model. The question "What should I do?" resolves to "What should a person who looks like me and behaves like me do in that situation to attain a desired result?".

The representation of world states, actions and rewards in the virtual world model is what allows humans to conceptualize, generate hypothetical world state trajectories and ultimately be self-aware. The agent which looks like you and behaves like you can be summarized by the policy function and reward function which define his behaviour and preferences, an estimated current location, an interoceptive state for the agent's body, and any other abstraction that makes the virtual narrative more convincing.

I am trying to treat the matter of consciousness in an entirely objective and pragmatic manner. After all, any man of science needs to understand that what we call consciousness must necessarily have a definitive scientific explanation based on physics, chemistry, and computation. Our reasoning above greatly simplifies the enormous number of different aspects and perspectives on the topic, while still staying true to a high-level and scientifically plausible understanding of how consciousness in humans works. By expressing the ideas above in the terminology of RL, we can abstract them sufficiently so they are implementable.

In this post, I'll skip most of the core technical discussions. Nonetheless, I'll highlight the following important matters:

  • LLMs, autoregressive or otherwise, can serve as convincing simulators of humanlike consciousness. There are good reasons to believe that LLMs are themselves not conscious, because they do not query an explicit self-representation. Nonetheless, they can be used to simulate a conscious agent which in the right prompt-based environment, resembles a conscious creature.
  • Actual conscious RL agents can be built quite easily in limited environments like the Atari games. I have been playing around with a minimalistic agent in Pong which has by design an explicit virtual representation, grounded in reality. For every observation, the agent self-localizes and aggregates information from its sensors based on attention. From then, it plans using learned predictors for the dynamics, reward, and policy. Although, this agent lives in a digital world where even "feelings" are represented as numbers, it has the high-level components that a human has.

A change of perspective. Most people are not used to the perspective of a human as a RL agent. We accept that humans have evolved from monkeys through evolution but it is more difficult to accept that our desires, and personalities, and world-views are a result of the generative story that the brain constructs for us. In retrospect, the past few centuries have strengthened the humanistic view of the world. Consider that every magna carta, every glorious revolution, every human rights act, every political regime change, every art trend have reinforced the idea that humans are different, even more special than other animals. We have built our societies based on this notion. And yet, just like a computer is made ultimately of transistors and diodes, so is human perception based only on the underlying biochemistry and the computational processes that it abstracts. You won't find any magic, just raw neurons all the way down.