In this article Robin Hill suggests something that may seem both obvious and strange at the same time: that our artificial intelligence systems might not cut the world up in the same way we do, or that they may not use the same features to cluster concepts as we do. The example she gives is a dog – we recognize it by looking at the fur, ears, wet nose etc – those are the features we focus on – but why should the machine focus on them? Why should we assume that it divides the world into wholes and parts the way we do?
This is a deep question, one associated with the often neglected subject of mereology1 The sciences of wholes and parts, see for example https://plato.stanford.edu/entries/mereology/ – a very useful way of thinking about concepts. In mereology we recognise that wholeness of the “dog” and the parts that go into constructing that whole, but we also realise quickly that there are many, many different ways in which we could construct that particular whole. This includes lumping together or slicing things differently across a number of different dimensions, including a temporal dimension. The dog may be made of moments in space.
We could hypothesize that an all-knowing super-intelligence might actually converge to finding patterns in the particle paths through space-time, and so would recognize entirely different concepts than we do. A clustering of such paths may conjure the concept of “Wegobans” who represent certain particle paths through space time with strong commonalities that we cannot even begin to guess.
Now, the way we slice and lump (to use the evocative language that Lee Anne Fennell uses in her excellent book Slices & Lumps: Divisions and Aggregation in Law and in Life (2019)) the world is not arbitrary, it is rooted in evolution. Our concepts have evolved for a function – as Ruth Millikan has shown – and so we should expect human mereology to follow from that evolutionary path. But what if that particular mereology is not preserved in the training of large language models? What if that is a feature that is lost in the methods we currently use?
What would that mean?
We get into deep philosophy of language territory here – there seems that there is the chance that we end up in false communication patterns, where the symmetry of the way we use the signs convince us that we mean the same, but the structural composition of those signs into wholes and parts are radically different.
This is something like what Nelson Goodman suggested with his grue/bleen-experiment2 See the “new riddle of induction” here https://en.wikipedia.org/wiki/New_riddle_of_induction , where a concept can be composed in arbitrarily many different ways – or at least along the temporal dimension of the concept3 One interesting question is of course if there are general dimensions along which a mereology is constructed, and if that can be used to explore alternative mereologies?. That is in turn interesting because there will be, in any such faux communication, points where the difference in the composition of the signs lead to drastic break-downs in the ability to convey meaning to each-other. 4 A simple example may be one in which I mean “allergy” to be a medical condition that only applies before the 31st of January 2024 – a far-fetched example, but still.
And again, it is almost obvious to point out – but the study of the mereologies of artificial intelligences will be an absolutely essential piece of getting both security and safety right. This is slightly different from the study of explainability, where we look just for a mapping of systems to our own mereology so we can translate – and more fundamental: it is the view that we should look for general principles of mereological composition in large language models.
As we do so there are a number of interesting questions to consider, such as:
- Do we believe that AI mereologies become more like human mereologies as the size of the training data set grows? Or could the relationship between human / AI mereologies be, say, U-shaped? Why?
- Are there fundamentals in mereology that have to do with perception, and if so – does this mean that when we add sensors to AI that represent no human sensing capabilities we will end up with vastly different mereologies (cf what it may be to be a bat, as Nagel asks — what is the mereology of a bat – are the commonalities rooted in evolution in some way?)
- What do mereological safety risks look like and how can they be addressed in the best way?
- Is shared mereological composition a pre-condition for alignment?
And so on.
- 1The sciences of wholes and parts, see for example https://plato.stanford.edu/entries/mereology/
- 2See the “new riddle of induction” here https://en.wikipedia.org/wiki/New_riddle_of_induction
- 3One interesting question is of course if there are general dimensions along which a mereology is constructed, and if that can be used to explore alternative mereologies?
- 4A simple example may be one in which I mean “allergy” to be a medical condition that only applies before the 31st of January 2024 – a far-fetched example, but still.