Is Connectionism the Saviour of Artificial Intelligence?

Do Neural Networks Address the Philosophical Objections to AI?

In this essay we shall investigate the continuing quest towards replicating general intelligence within a non-biological artifact and ask whether the approach represented by the connectionist agenda successfully addresses the more serious objections to the possibility of true artificial intelligence. To answer this question we need first to explore the underlying premises embraced by the artificial intelligencia and examine how valid and subject to empirical testing these principles actually are. This is not the place to explain in detail the underlying principles of artificial neural networks, rule based inference systems or the functionalist model of mind and representation and a certain knowledge of these topics shall be assumed on the part of the reader. Instead we shall concentrate on examining whether connectionist architectures are actually different in their epistemological nature from equivalent symbolic computational systems and thus offer a more fruitful route towards the truly intelligent machine.

Firstly then we must define what we understand by the term ‘artificial intelligence’ (a.i.) and explore its guiding principles. The fundamental premise of a.i is the conjecture that there potentially exists some algorithmically calculable system which is capable of exhibiting general intelligence. Given this then there must exist some practical algorithm (executable within finite time with non-infinite computing resources) which is capable of computing (reproducing) the behaviour of such a system. It is the quest for just such an algorithm (or equivalently a complex network of co-operating algorithms) which has occupied the a.i. research community for the last 40-50 years. This assumption, the Church-Turing hypothesis, is taken by most a.i. advocates as being so obviously true that its proper questioning does not bear any serious consideration. Setting aside for the moment matters of practicality and technological implementation, the key question is whether this assumption is justified or whether the nature of general intelligence as exhibited in mammalian brains is somehow non-computational in nature and thus forever beyond the grasp of a.i. as currently pursued.

This assumption that there is something ultimately mechanical and thus artificially reproducible about the thinking process is not a new idea. Philosophers from Descartes through Liebniz and Husserl have all pursued attempts to reduce rationality to its essential axioms, rules and inference strategies. Thomas Hobbes also believed that mentality consisted of a set of fundamental atoms of through linked by purely syntactic relations so that rationality could thus be reduced to calculation (‘reason is nothing but reckoning’). Even the early Wittgenstein in the Tractatus set out his belief that reality is ultimately about the set of facts and axioms which describe the world and that there is thus some ‘theory of reality;’ which is in principle knowable.

Despite the early practical work by Charles Babbage, it was not until the elegant and far reaching insights into formalised symbolic computation by Alan Turing that the potential realisation of an artificial mind arose. Today all modern digital computers are based on the concept of the Turing machine : a thought experiment devised by Alan Turing in the late 1930’s as a means of mechanising the proof of mathematical theorems . Put simply, a Turing machine is an automaton which has a finite set of internal states, an infinite ‘tape’ passing through a read/write head, and a table of rules which decide which state to go to next and which action to take: write a new symbol, erase the current symbol or move the tape left or right, or stop. Through a suitable encoding of symbols and machine table, a given Turing machine can be made to perform any conceivable information processing task which is computable, i.e. can be expressed in the form of a sequence of simple steps applied according to a ‘recipe’ or algorithm. Examples range from long division through to playing chess or finding the prime factors of large integers. The behaviour of each Turing machine is determined explicitly by its machine table. Turing’s major leap of imagination was to allow the rule table of any given Turing machine to be encoded onto the input tape. A special Turing machine with an appropriately complex rule table could then read the tape, internalise the machine description and then simulate the Turing machine described on the tape. This Universal Turing Machine concept underlies all modern digital computers and also the computational model of the mind.

The formal processing of symbols according to a fixed table of rules has been, until the advent of connectionist models of computation, the driving force of a.i. research. There were early successes where formal logical inference was applied to clearly defined and partitioned problem domains and perform cognitive tasks which are conventionally associated with intelligence. Solving puzzles, playing chequers (Arthur Samuel), and deriving proofs for theorems in logic and mathematics (Newell and Simon, Logic Theorist, 1956) are examples of tasks well within the capabilities of such rule-based systems. The 70’s and early 80’s were the high point for ‘Good Old Fashioned AI’ where the high church computationalists (led by Marvin Minskey et al.) were prone to overblown predictions of the successful recreation of true intelligence within their creaking rule-based a.i. programs. The prevailing viewpoint was that intelligence was just a matter of combing a few tens of thousands of facts about the world with suitably powerful inference rules and that common sense reasoning could be achieved purely by virtue of sufficient memory and processing power.

But symbolic a.i. soon ran into serious difficulties as it tried to graduate from the toy problem domains of the kindergarten (e.g. the blocks world of Terry Winograd’s SHRDLU program). As soon as any reasoning about the real world is required then the number of possible relationships and rules which need to be applied soon explodes and the whole creaking machine grinds to a halt. This is the notorious ‘Frame Problem’ which the advocates of the rule-based approach are still struggling to circumvent. However, even a two year old child has no trouble in navigating accurately through a complex three dimensional world, recognising a multitude of objects and understanding their physical properties and behaviour. Is there then an alternative computational mechanism at work which escapes the combinatorial explosion inherent in a top-down approach?

One currently popular approach adopts a computational architecture which, at a superficial level, has much in common with the mammalian brain. The connectionist approach [sometimes called ‘parallel distributed processing’] uses a highly connected network of simple processing nodes (artificial neurons) arranged as a sequence of connected layers. Each node in the network accepts inputs via connections to many other nodes and performs a simple summing function over its inputs. If the input exceeds some (variable) threshold value then the node ‘fires’ and passes an output value through connections to other nodes in the network. The connections are assigned ‘strengths’ which bias or weight the output of a node to make its effect more or less significant to the nodes to which it connects.

Such a ‘neural network’ operates by applying an input pattern to the first layer of nodes, propagating the activation pattern through one or more layers of ‘hidden’ nodes according to the current connection weights until a new pattern emerges at the output layer. By assigning functional roles to specific output nodes then the network can be said to give an ‘answer’ to the question posed by the input pattern. For example, the input pattern may represent a portion of handwriting (scanned as pixel values) while the corresponding output may be the network’s best guess at the letter or word represented in the handwriting.

A key feature of such networks is their ability to be trained by example and then to generalise from specific training examples to handle new cases. Not only can such networks product correct classifications of perfect input patterns, they are very good at correctly matching incomplete or distorted patterns in a way that is beyond a simple rule-based approach. Another factor is their holistic nature as opposed to the particulate nature of the knowledge in symbolic a.i. programs. The ‘knowledge’ embodied in a neural network is a property of the entire network (the connection weights, activation thresholds and node states). Instead of representing concepts as related atomic symbols, the entire network represents several concepts concurrently with each node playing different roles in different concepts. This gives such networks great robustness, allowing their performance to degrade gracefully as nodes are removed from the network. In comparison a conventional symbolic program is brittle and will fail catastrophically if even a single line of code is modified.

The impressive ability of such networks to classify patterns has opened a great number of potential applications; handwriting recognition, finger print matching, artificial ‘noses’ for drug detection and automated inspection of assembly line items for faults are just some examples. However, just as rule-based programs are poor at pattern recognition, neural networks have proved difficult to apply to problems where sequential stepwise reasoning is needed. Neither are these networks immune from combinatorial explosion as the number of nodes needed to handle continuous real world problems grows rapidly and the corresponding number of connections grows exponentially.

The question at the heart of this essay however is whether connectionist systems are actually fundamentally different from symbolic a.i. programs in a way that overcomes the main objections to a.i. Do neural networks with their soft, holistic approach solve problems which are in principle beyond the scope of symbolic systems or are they just an implementation technique for ‘smearing out’ our rule table and symbol set? Have the connectionists found an escape hatch from the locked room of rigid computational systems or does this avenue lead into yet another padded cell?

Given our two complementary approaches we shall set out briefly the main philosophical objections which have been raised mainly against the top-down symbolic approach. Later we shall ask whether connectionism offers any solution.

Objections of the external form question whether any form of computer program however complex can even simulate the external behaviour of a rational mind (as in the Turing test). Many critics such as Hubert Dreyfus point out the manifest problems of formal rule-based systems: the ‘frame problem’ and combinatorial explosion. Dreyfus claims that procedural knowledge (instinctive and conditioned behavioral know-how) gathered through bodily interaction with the world is required for true intelligence and common sense reasoning.

Another famous line of attack is to claim that any formal system, such as instantiated within a computer program (either top-down rules-based or a connectionist network) is subject to the limitations imposed by Gödel’s Incompleteness Theorem. JR Lucas first claimed that Gödel’s theorem somehow places a fundamental limit on the intelligence permissible within any finite state machine realisation of the mind. Roger Penrose has famously cited mathematical insight as one example of a mode of human thought that is evidently not using an algorithmic style of deduction. The human mathematician, says Penrose, can see the truth of a Gödel sentence that any mechanical system such as a computer program can never prove. Because human intelligence can manifestly perform inferences which are provably beyond any formal algorithmic system then, says Penrose, human intelligence cannot be algorithmic in nature. From this it follows that the quest for a.i. via the Turing machine is a hopeless pursuit.

But surely Penrose’s argument depends on the spurious claim that human rationality – human mathematical ability – is consistent. Gödel’s theorem only applies to consistent systems and says nothing about what can be deduced by an inconsistent system. Penrose is ignoring the fact that the mind is shot through with inconsistency and self-contradiction. Human reasoning provides no guarantee of soundness or freedom from error and yet Penrose insists that this must be a property of any machine intelligence. There is no perfectly sound algorithm for winning chess and yet this does not prevent computers using heuristic algorithms from beating Grand Masters. As long as the algorithm being used is good enough, then a guarantee of its soundness is not essential.

Other critics argue from internal objections which grant that a computer simulation of mind is possible but that it can never capture the intrinsic understanding or the non-computational reasoning powers of the human mind. The most famous argument here is the Chinese Room experiment proposed by John Searle which seeks to show that the mere syntactic shuffling of symbols cannot capture true understanding. This argument and its supposed refutation by the a.i. community has been discussed at length in an earlier essay and will not be covered here.

Do these objections against symbolic a.i. apply equally to connectionist systems or is there something fundamentally different about neural networks? Firstly, most current neural networks are still formal computational systems – indeed they are usually implemented (simulated) as virtual machines running on top of a conventional symbolic digital computer. Any rule based program can be implemented in terms of a correctly trained neural network and vice versa – they are isomorphic instantiations of the same abstract state machine. If general intelligence is indeed non-decidable in nature then current networks can never completely simulate the behaviour of a brain.

Consider a mammalian brain as a system with a set of inputs, a set of outputs and a finite number of internal elements which can be in a finite number of states. Given an input pattern in state S1 we may produce an output pattern and a new internal state S2. The number of potential input patterns taken together with the set of potential internal states is in practice infinite. Furthermore it is an open question whether this is a decidable set i.e. if the computation of a successor state to a given state with input pattern is possible using a finite practical algorithm. If the brain is, as seems likely, a non-computational system then no currently formalised connectionist network can approach a complete simulation of its behaviour across all states.

One possible remedy to this problem is to render the activation function of the nodes in our network non-computational. This would entail making the new state of a given node depend holistically upon the state of the entire network and not just upon some first order local neighborhood of nodes. This corresponds to making our node activation functions 2nd order logical relationships rather then the current 1st order functions used. [We might even resort to Roger Penrose’s fabled quantum gravity to provide the required non-computability needed.]

Another question is whether a neural network can really be considered as either just a strangely implemented rule-based system or as a glorified lookup table for mapping input patterns to outputs. If either case is true then they can offer no salvation for the a.i. enthusiast. Merely because one can devise a rule-based system which is functionally equivalent to a given neural network does not mean that neural networks do not embody some extended capabilities in addition which are beyond replication as a set of rules. Consider a simple symbolic program which uses a simple formulaic rule to convert temperatures in Fahrenheit into Celsius. It is easy to create a neural network which performs the same mapping perfectly across all practical domains. Where then in such a network is the knowledge of the conversion rule? It is not explicit in the representation of concepts and symbols by the nodes and connections of the network and neither is it implicit in the underlying activation functions and base operation of the computational medium of the network. The hidden nodes within neural networks typically have patterns of activation which cluster around certain micro-features given certain classes of input pattern. The features detected by these nodes do not however correspond to hard symbols meaningful in any way at the surface level of the problem domain. Instead of encoding the hard constraints of the conversion function itself the network embodies a large number of ‘soft constraints’ within the encodings of its connection strengths. By trying to satisfy as many of these soft constraints as possible the network provides the right answer – but not through the explicit or implicit application of any analogue of the conversion rule. The system behaves as if it embodies the hard rule while lacking knowledge at any symbolic level of such a rule.

Neither can the network be dismissed as just a lookup table. Connectionist networks can act as glorified lookup tables where they have a sufficient number of hidden nodes and hence effectively memorise all legal pairings of inputs and outputs. However, a cut down network can still produce sensible results in cases where the inputs are incomplete or inconsistent and where there is no answer in a simple lookup table of legal combinations.

This potential escape from a strict one-to-one mapping to equivalent rules and symbols is vitally important. As Wittgenstein pointed out in his attack on his earlier Tractatus, any analysis of the world into facts and rules is only meaningful in some defined context or purpose – the elements chosen for our explicit rule-based representation already reflect the narrow goals of our intention. But this attempt to distill knowledge into a set of context free rules and axioms (a ‘theory of the world’), as must be done for a symbolic a.i. program, will destroy the very pragmatic ability of our intelligence to solve real world problems.

What then of the limitations of current neural networks? How can they be applied more effectively to problems where sequential reasoning are required? While the majority of the underlying operation of the brain is indeed connectionist in nature, it seems likely that higher level cognitive functions are implemented as virtual rule-based machines riding on top of a neural substrate. Daniel Dennett’s conception of nests of virtual machines implemented in terms of each other seems a more plausible architecture for explaining intelligence than either a purely symbolic or connectionist approach. Rather than simple pattern mappers or pure top-down inference engines, our minds are likely to be a rag bag of co-operating and competing processes which combine to give a non-computational behaviour suited to solving the problem of survival in a complex world.

While the connectionist agenda is now in the ascendant (having recovered from its earlier setback by Minsky and Papert) we are still fundamentally adrift from a successful middle way to a.i. While our neural networks adopt a fuzzier approach to a fuzzy world, they are still formal computational systems subject to the fundamental limitations of all such systems. Only through the injection of a non-deterministic element into our silicon networks will we begin to see intelligent behaviour applicable to the real world and not just to the blocks world. If Mother Nature, that most ham-fisted of engineers can create an intelligent artifact then perhaps in time we may do the same.

References

1. Rene Descartes, Meditations of First Philosophy, 1
2. David Chalmers, The Conscious Mind
3. Daniel Dennet, Consciousness Explained
4. Stephen Priest, Theories of the Mind

Writing

Is Connectionism the Saviour of Artificial Intelligence?

Do Neural Networks Address the Philosophical Objections to AI?

Contents