Hallucination, prediction, guessing

Tyler Cowen makes a series of excellent points in a recent post where he muses over the value of large model hallucinations. He notes that he would not like for these systems to stop hallucinating, since the hallucinations have value in representing something. What this something is, he suggests, could be our statistical average view of the world. Where a hallucination is factually wrong, it reveals a deeper pattern of ‘wrongness’ in our overall understanding of the world. Maybe, then, hallucinations can allow us to map areas of our understanding that are candidates for research, improvement and factual discovery?

‘Hallucination’ is not merely a large models phenomenon. People make things up too — a lot of the time, actually. You have surely done so as well at some point, when you did not know the answer to something – and you made, say, ‘an educated guess’. The educated guess is simply a statistical prediction given the knowledge you have trained on, and so exploring hallucinations will be a lot like exploring the nature of guessing.

A real question here is how guessing incorporates more than linguistic knowledge – and if our guessing is better or worse than that of language models. Can we guess better if we are embodied? Guessing might be a much more important mode of cognition than we have allowed for – and may provide a really helpful mental model for thinking about artificial intelligence; not as an omniscient being, but a different guesser.

Our default position has been to assume that what these models do is that they predict the next word or set of words in any situation. This opens the possibility of thinking more carefully about the relationship between guessing and predicting — and where we do one and where we do another. In a closed setting, such as a game, we predict – and in an open setting, or open world, we guess – or something such. There is a need for some conceptual clarity here.

What are some ideas we could explore? Maybe the below.

  • A guess is a prediction with a certain probability range.
  • A guess is a prediction but without a probability assignment at all – and where no such assignment makes sense. I.e. guesses are about uncertainty and predictions about probability.
  • A guess is very different from a prediction in that it serves as a basis for our actions as it is made – we do not evaluate guesses as we evaluate predictions. (Somewhat unclear – what is the key we are getting at here? That guesses are reflected in actions?)
  • A guess is never about a single fact, but about a complex system.
  • A guess is about the unfolding of a narrative, a prediction is about an isolated proposition.
  • A guess is revealed prediction (as in when we say, thoughtfully, “I guess I believe he did not really mean it…”) through introspection.

There is more to be done here — linking up the statements we make about the future in ways that help create a grammar of the future and how we relate to it in different ways. This grammar now changes with the advent of more powerful artificial intelligence, perhaps?

What is your cathedral?

Time is a funny thing, and the perspectives that you can get if you shift time around are extraordinarily valuable. Take a simple example: not long ago it was common to engage in building things that would take more than one generation to finish – giant houses, cathedrals and organizations. Today we barely engage in projects that take longer than a year – in fact, that seems long to some people. A three month project, a three week sprint is preferable.

And there is some truth to this. Slicing time finely is a way to ensure that progress is made – even in very long projects. But the curious effect we are witnessing today where the slicing of time into finer and finer moments also shortens the horizons of our projects seems unfortunate.

Sir Martin Rees recently gave a talk at the Long Now Foundation where one of the themes he mused on was this. He offered a theory for why we find ourselves in this state, and the theory was this: the pace of change is such that it makes no sense to undertake very long projects. We can build cathedrals in a year if we want to, and the more powerful our technology becomes the faster we will be able to do so. The extreme case? Starting to build a cathedral in an age where you know that within a short time frame – years – you will be able to 3-d print one quickly and with low cost makes no sense — better then to wait for the technology to reach a stage where it can solve the problem for you.

If we dig here we find a fundamental observation:

(i) In a society where technology develops fast it always makes sense to examine if the time t(1) it takes to create something is greater than the time (t2) you have to wait for it to be done in much shorter time t(3).

If you want to construct something that it would take 5 years to build, but think you will be able to build it in two years if you wait one year – well, the rational thing to do is simply to wait and then do it – right?

That sentiment or feeling may be a driving factor, as sir Martin argues, behind the collapse of our horizons to short term windows. But it seems also to be something that potentially excludes us from the experience of being a part of something greater that will be finished not with you, but by generations to come.

The horizon of your work matters. It is fine to be “productive” in the sense that you finalize a lot of things, but maybe it would also be meaningful and interesting to have a Cathedral-project. Something you engage in that will live on beyond you, that will take a 100 or a 1000 years to complete if it is at all completed.

We have far too few such projects today. Arguably science is such a practice – but it is not a project. Think about it: if you were to start such a project or find one — what would it be? The Long Now Foundation has certainly found such a project in its clock, but that remains one of the few examples of “cathedral”-projects today (Sagrada Familia is also a good example – it is under way and is a proper cathedral, but we cannot all build cathedrals proper).

Computational vs Biological Thinking (Man / Machine XII)

Our study of thinking has so far been characterised by a need to formalize thinking. Ever since Boole’s “Laws of Thought” the underlying assumption and metaphor for thinking has been mathematical or physical – even mechanical and always binary. Logic has been elevated to the position of pure thought, and we have even succumbed to thinking that is we deviate from logic or mathematics in our thinking, then that is a sign that our thinking is flawed and biased.

There is great value to this line of study and investigation. It allows us to test our own thinking in a model and evaluate it from the perspective of a formal model for thinking. But there is also a risk associated with this project, a risk that may become more troubling as our surrounding world becomes more complex, and it is this: that we neglect the study of biological thinking.

One way of framing this problem is to say that we have two different models of thinking: computational and biological; the computational is mathematical and follows the rules of logic – and the biological is different, it forces us to ask things about how we think that are assumed in computational thinking.

Let’s take a very simple example – the so-called conjunction fallacy. The simplest rendition of this fallacy is a case often called “Linda the bank teller”.

This is the standard case:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

Linda is a bank teller.

Linda is a bank teller and is active in the feminist movement.


What computational thinking tells us is that the first proposition is always more probable than the second. It follows from the fact that the probability p is always bigger than the probability p x q if either probability is less than 1.

Yet, a surprising amount of people seem to think that it is more likely that Linda is a bank teller and active in the feminist movement. Are they wrong? Or are they just thinking in a different mode?

We could argue that they are simply chunking the world differently. The assumption underlying computational thinking is that it is possible to formalize the world into single statement propositions and that these formalizations are obvious. We thus take the second statement to be a compound statement – p AND q – and so we end up saying that it is necessarily less probable than just p. But we could challenge that and simply say that the second proposition is as elementary as the first.

What is at stake here is the idea of atomistic propositions or elementary statements. Underlying the idea of formalized propositions is the idea that there is a hierarchy of statements or propositions starting from “single fact”-propositions like “Linda is a bank teller” and moving on to more complex compound propositions like “Linda is a bank teller AND active in the feminist movement”.

Computational thinking chunks the world this way, but biological thinking does not. One way to think about it is to say that for computational thinking a proposition is a statement about the state of affairs in the world for a single variable, whereas for biological thinking it is a statement about the state of affairs for multiple related variables that are not separable nor possible to chunk into individuals.

What sets up the state space we are asked to predict is the premises, and they define the state space we are asked to predict as one that contains facts about someones activism. The premises determine the chunking of the state space, and the proposition “Linda is a bank teller and active in the feminist movement” is a singular, elementary proposition in the state space set up by the premises — not a compound statement.

What we must challenge here is the idea that chunking state spaces into elementary propositions is the same as chunking them into the smallest possible propositions. For computational thinking this holds true – but not for biological thinking.

The result of this line of arguing is intriguing: it suggests that what is commonly identified as a bias here is in fact just a bias if you assume that computational thinking is the ideal to which we are all to be held — but that in itself is a value proposition. Why is one way of chunking the state space better than another?

Another version of this argument is to say that the premises set up a proposition chunk that contains a statement about activism, so that the suppressed second part of “Linda is a bank teller” is “and NOT active in the feminist movement” and cannot be excluded. That you do not write it out does not mean that the chunk does not automatically contain a statement about that as the second chunk and the premises set that up as the natural chunking of the state space we are asked to predict.

The real failure, then, is to assume that “Linda is a bank teller” is the most probable statement – and that is not a failure of bias as such, but an interesting kind of thinking frame failure; the inability to move away from computational thinking instilled through study and application.

It is well-known that economists become more rational than others, that they are infected with mathematical rationality through study. Maybe there is this larger distortion in psychology where tests are infected with computational thinking? Are there other biases that are just examples of being unable to move from the biological frame of thinking?