Rebel Science News
11/28/2012
Jeff Hawkins Is Close to Something Big
 
8/26/2012
The Myth of the Bayesian Brain
 
8/23/2012
The Second Great AI Red Herring Chase
 
8/15/2012
Rebel Speech Recognition Theory
 
8/8/2012
Rebel Speech Update
 

Animal (under revision)


Learning to Play Chess

 

 

Rebel Science Home

Temporal Intelligence
Animal
Perceptual Learning
Perceptual Network

Memory

Motivation

Motor Learning

Something Different
Contact Me

 

Background
The Eye
Non-visual Sensors
Proprioceptive Sensors
Effectors
The Gripper
The Network
Instructions
 


Note: Due to recent developments, the Animal network is being redesigned. Stay tuned for an update.


Background

Animal is a spiking neural network that learns to play chess. It is based on the notion that intelligence is a temporal discrete signal processing phenomenon. It is written in C++ for MS Windows® and DirectX®. I have only tested it with Windows 2000, so it may not work on other Windows platforms. Keep in mind that Animal is not really about chess. I use chess simply because it is a sufficiently complex causal environment that my spiking neural network (the artificial brain) can interact with. Another reason is that a software environment is cheap: all I need is a sufficiently fast desktop computer and enough free time to write code. If I had the resources, I would use a multi-legged robot in a real world environment and a full assortment of sensors and effectors.

Animal is not a typical chess program. There is no look-ahead tree-searching algorithm. Animal does not generate millions of moves like IBM's Deep Blue supercomputer. More significantly, Animal does not have a position evaluation function. It learns pretty much the same way a human being does, that is, by sensing and interacting with its environment through trial and error.

The program uses a virtual eye and a virtual gripper to sense and move virtual chess pieces on a virtual 2-D chess board. When first launched, Animal starts out with a completely blank network with no a priori knowledge (connections) whatsoever. The network has sensors and effectors that it uses to interact with the chess environment. Both eye and gripper can move around the board. They are programmed to remain within two squares of each other. So one of the first things that Animal must learn is to coordinate the motion of its eye and gripper. The human player also uses a gripper to grab and move pieces on the board. Animal has sensors in its eye that can sense what the human player is doing.

The main advantage of Animal's spiking network is that it is not designed with a specific environment or specific sensors and effectors in mind. That is to say, the network is not programmed to learn chess or anything in particular. It is only programmed to discover temporal correlations in multiple streams of sensory events without regard to the origin or the type of the events. Consequently, it can be adapted to serve as the brain for all sorts of intelligent systems and applications, such as robots, speech or optical character recognition systems. The network is also designed to be scalable, i.e., it automatically creates new neurons as it learns.

The Eye

The eye is a 3x3 array of sensors with two degrees of freedom for moving on the board. At each of the nine positions in the array, there are 78 sensors, 39 positive and 39 negative, for a total of 702 sensors. A positive sensor detects the onset of its assigned stimulus. A negative sensor detects the offset of its assigned stimulus. Here is a breakdown of the eye sensors:

11 Board Sensors

6 sensors for the piece types: pawn, king, etc...
2 sensors for piece color: black or white.
1 sensor for square occupied.
2 sensors for square color: black or white.

10 Gripper Sensors (computer)

6 sensors for holding each of the various pieces.
1 sensor for gripper holding a piece.
2 sensors for held piece's color: black or white.
1 sensor for gripper visible.

10 Gripper Sensors (human player)

6 sensors for holding each of the various pieces.
1 sensor for holding a piece.
2 sensors for held piece's color: black or white.
1 sensor for gripper visible.

8 Game Status Sensors

2 sensors for computer's or human player's move legality. These sensors indicate whether a square on the board is a legal destination for a piece being held.
2 sensors for computer's turn.
2 sensors for computer or human in check.
2 sensors for computer or human mated.

There are four locations on the board where Animal's eye can detect check, mate and turn status.

Non-visual Sensors

In addition to the visual sensors above, Animal uses 16 non-visual sensors as follows:

Top edge sensor.

Bottom edge sensor.

Right edge sensor.

Left edge sensor.

Gripper open sensor.

Gripper closed sensor.

Gripper semi-open sensor.

Legal square sensor.

The last sensor in the above list is used to detect legal destination squares under Animal's gripper whenever Animal grabs a chess piece.

Proprioceptive Sensors

There are 260 sensors that keep track of motor events generated internally by Animal's effectors:

Eye moving left.

Eye moving right.

Eye moving up.

Eye moving down.

Gripper moving left.

Gripper moving right.

Gripper moving up.

Gripper moving down.

Gripper opening.

Gripper closing.

Gripper grasping control sensor.

Gripper motion control sensor

Eye motion control sensor

As usual, there are two sensors for every sensed phenomenon: positive for stimulus onset and negative for stimulus offset. The number obtained (26) is then multiplied by 10 because there are 10 effectors (each has a different activation duration) for every type of motor action.

Effectors

Animal uses 100 effectors to move its eye and gripper, and grab and release chess pieces. There are 10 effectors for each action type. Every effector has a pre-programmed duration from 1 to 10.

Close gripper.

Open gripper.

Move gripper down.

Move gripper up.

Move gripper left.

Move gripper right.

Move eye down.

Move eye up.

Move eye left.

Move eye right.

In addition to the normal effectors above, there are 30 special excitatory and inhibitory effectors that are used to control final output. This way Animal can "think" about its actions without actually performing them. As with the other effectors, there are 10 pre-programmed durations for each effector type.

Activate/deactivate gripper actuator.

Activate/deactivate gripper motion.

Activate/deactivate eye motion.

 

The Gripper

The gripper has three degrees of freedom, two for moving around the board and one for opening and closing its grip actuator. There are three grip positions: open, semi-open and closed. The network uses the gripper to grab chess pieces and release them. The board's software is designed in such a way as to prevent Animal from grabbing pieces that have no legal moves. If Animal releases a chess piece on an illegal destination square, the piece immediately goes back to its original square. If the move is legal, the piece is placed on the destination square.

The Network (under revision)

Note: Due to recent developments, the following sections are now obsolete. Stay tuned for an update.

Animal uses a feed-forward, recurrent spiking neural network composed of several specialized layers. Most layers can be thought of as two-dimensional sheets of neurons. The following is a list of the various layers, not necessarily in signal propagation order.

Note: Keep in mind that the information on this page may not always be up-to-date as the network's design frequently changes.

Sensory Layer
Proprioceptive Layer
Signal Separation Layer
Coincidence Layer
Short-Term Memory (under development)
Long-Term Memory (under development)
Temporal Clustering Layer (under development)
Command Selection Layer
Motor or Effector Layer
Motivational Layer (not yet implemented)

Complementary Design

The most important principle that governs the design of the network is the Principle of Complementarity. Here is a list of the main complementary modules:

Sensory Layer

Motor Layer
Signal Separation Layer Command Selection Layer
Short-Term Memory Long-Term Memory
Reactive Memory Anticipatory Memory
Perceptual Clustering  Conceptual Clustering 
Reactive Memory Anticipatory Memory

 

Sensory Layer

The axon of every sensor in the sensor layer divides into multiple branches or paths that contact individual neurons in the signal separation layer.

Proprioceptive Layer

This layer is like the normal sensor layer except that it receives sensory feedback signals originating internally from the system. For example, proprioceptive feedback signals are generated by all effectors. It also receives special signals that inform the system about the grip status of the gripper such as open, closed or semi-open. 

Signal Separation Layer

Signals from the sensor and proprioception layers are separated into as many distinct parallel paths as possible in the signal separation layer. This layer contains special two-input separation neurons that find contiguous correlations between two input signals. Massive feedback is used to find temporal correlations within and across individual sensory streams. Another important thing to note about this layer is that it is highly compartmentalized in that neurons in each compartment receive only input connections from a single type of sensor. This is analogous to the brain's cortical columns.

Coincidence Layer

After separation, signals are joined into small concurrent groups in the coincidence detection layer. This layer essentially forms single time-scale associations or clusters between the signals coming from the separation layer. The neurons in this layer can have an unlimited number of input connections. Their function is to detect concurrent signals. They send their output signals to the association layer. The separation and coincidence layers together form what is called recognition memory.

Short-Term Memory Layer

This is part of the mechanism responsible for causal learning and remembering short-term events. It receives its inputs from both the perceptual stream and the association layer. Event memory is also known as short-term memory or STM. It feeds directly into the association layer to form a massive loop. These events are repeated over a short time and contribute to the creation of associations over multiple time-scales in the association layer.

As seen in the diagram, the perceptual stream arriving at event memory is divided into two complementary streams, one for generating recent events and the other for anticipated or future events.

Long-Term Memory Layer

This is where long-term memory resides and where conceptual and causal learning occurs. This layer forms a massive loop with short-term memory to create associations over multiple time-scales. This layer is divided into two complementary cell assemblies or modules. The reactive module generates signals associated with recent events. The anticipatory module generates signals associated with probable future events. Both cell assemblies are crucial to adaptation in that they give the system a sense of how its environment evolves over time.

Temporal Clustering Layer

This layer receives afferent signals from the long-term memory layer. Its purpose is to combine reactive and anticipatory signals into single events representing single phenomena that span multiple time scales.

Command Selection Layer

The function of the command selection layer is to select inputs from the memory layer for motor output. Its organization is closely tied to that of the motor layer. For each and every effector neuron in the motor layer, there are two command neurons, one to start an action and another to stop it. The command selection layer fuse multiple signal streams into single streams. It is the complementary inverse of the signal separation layer.

Motor Layer

The motor or effector layer is the logical mirror image of the sensor layer. It works together with the command selection mechanism to maintain motor coordination. Effectors are tonically active cells. That is to say, once activated by a start command, they repeatedly fire for a pre-programmed duration unless stopped by a stop command. Animal uses 10 effectors for each type of action such as move eye right, move gripper left, open gripper, etc... There are 100 effectors and 200 command neurons.

Motivation Layer

The idea behind motivation is to use reward and punishment stimuli to motivate the network to behave a certain way, i.e., to play a good game of chess. The motivational system will involve both sensors and effectors. Whatever the final motivation principle turns out to be, it will be tied to expectation learning in short and long-term memory. Above all, it will be a simple mechanism based on the powerful Principle of Complementarity.

Instructions

Requirements

You will need the following:

MS Windows with DirectX 6 or better installed.
128 Mbytes of RAM. More is better.
Graphics card with at least 8 Mbytes of ram.
Sound card, optional.

Note: Most Windows installations already have DirectX installed. If your version of Windows does not have it or if your version of DirectX is 5 or less, you can download the latest version (8.1 the last I checked) free from Microsoft. I found that setting my graphics card to 16-bit color gives the best performance. You may want to experiment with different settings. If you have an old graphics card, it may not support transparent color blitting, so you will see some strange effects such as a square box around the images.

User Instructions

Download the zipped file (186K) and unzip it into a destination directory. Double click on the 'animal.exe' file to start Animal. You will see the following:

Chess board.

Chess pieces.

Animal's eye, blue ring.

Animal's gripper, also blue.

Human's gripper, red.

List of various statistics on the right.

Notice that Animal's eye and gripper move every half a second. These random movements are called saccades. Animal cannot see anything on the board unless it moves or the eye moves. The eye and the gripper are programmed to remain within two squares of each other. By the time the number of cells reaches about 8,000 or so, Animal should be coordinating the motion of its eye and gripper, i.e., they will begin to move in unison. Animal learns to generate its own saccades as it moves around the board. I have added a couple of sound effects to Animal's actions, so you may want to turn up the volume of your speakers a little. Turn them off if you find the sound annoying. 

To make your move, use the arrow keys on your keyboard to move the red gripper. Make sure you move it over the eye many times so that Animal can learn to recognize it. To grab a chess piece, move the red gripper over the piece and press the space bar to grab it. To release the piece onto a square, press the space bar again. A piece can only be released on a legal destination square.

A Few Observations:

The current version of Animal can only play black. The current version of Animal can only play black. Also, the Save and Load features are temporally disabled due to a bug. Stay tuned for an update. Even though the program's code is not optimized, Animal moves rather fast. One reason is that spiking networks are much faster than traditional ANNs. I use a 450 MHz machine for testing. If you use a faster machine, Animal's movements may be too fast to fully appreciate. The program is written so that new cells are added as needed. By the time the number of cells reaches around 12,000, you may notice some new and interesting behavior. Sometimes Animal displays an uncanny insect-like behavior. You may observe Animal's gripper opening and closing or moving back and forth independently of eye movements. Animal uses feedback to "think." If the eye appears to be stuck in a corner or moving back and forth, do not panic. Eventually it will start moving again. If not, just start a new game. Like any newborn, Animal engages in random play.

There is no motivation mechanism as of yet. Animal does not have any innate "desire" to move its pieces on the board when it is its turn. It may or may not develop the desire. It does not yet have a sense of purpose and has no idea what a good move is. After around 10,000 cells, try checking its king to see if it eventually gets out of check.

Future Plans

My hope is that, when I implement motivation, attention and working memory in Animal, it will learn to play a passable game of chess. I have plans for many improvements. If I can get Animal to play at a beginner's level, I have every reason to suppose that, given sufficient computer speed/memory, playing experience and training, it can reach expert and even grandmaster levels. When that happens, I plan to adapt the network to what can only be described as the holy grail of game AI, the ancient Chinese board game of Go (Weichi). Go defies non-intelligent (non-learning) approaches to game board programming such as the brute force, tree-searching methods used by Deep Blue and commercial chess programs.

Enjoy.

Next: Perceptual Learning

Animal Chess Graphics by Joanna Field

Microsoft®, DirectX® and MS Windows® are registered trademarks of Microsoft Corporation.

 

©2004-2005 Louis Savain

Copy and distribute freely