The aim of this project is develop software and hardware sufficiently powerful to be capable of simulating an entire human brain. The project is divided into two parts; the development of the software and the design of the hardware.
The complexity of the human brain is such that it cannot be simulated it is entirety on anything less than a supercomputer. The power of a supercomputer can be made available at a relatively low cost using what is termed a 'beowulf' cluster. A beowulf cluster is a supercomputer made by assembling a network of PCs using off the shelf components (see http://tux.anu.edu.au/Projects/Beowulf/ & http://www.beowulf.org/ for more details). An alternative which will also be considered is the use of a larger number of PCs communicating over the internet to the same end. Rough calculations (brainsim) estimate that around 100 Tb = 100,000 Gb of storage is required to hold a state of the brain representing a single istant of activity and this is after making significant simplifications to the model in use. Note: storage systems on this scale are available from E.g. Sun Microsystems from around 200,000 UK pounds. The processing power required is a less significant factor since the simulation need not be real time. Based on price versus storage capacity the best value hard drives available currently have a capacity of 45Gb and cost roughly 125 UK pounds. In order to acheive the required storage capacity we assume each PC in the cluster to have 3 such harddrives. This means that 100,000 / (45 * 3) ~= 740 PCs are required in the cluster. A very conservative estimate of 1500 pounds per PC brings the cost of this machine to a round million. The cost in dollars would probably be similar regardless of exchange rates. The processing power available in such a network would be more than adequate for the needs of this project. That is to be capable of simulating a network 100 billion neurons at all, rather than to simulate such a network in real-time. Such a large cluster is, however, a formideable feat of engineering an order of magnitude greater in scale than most current efforts. The recent KLAT2 supercomputer (http://www.aggregate.org/KLAT2/) is a beowulf cluster of 64 1GHz Athlon based PCs and could be considered representative of the current state of the art. However, the machine required by this project has an advantage over more general purpose projects like KLAT2 in the simplicity of the design required. A lot of thought has to go into the design of a beowulf cluster in order to optimise the speed of communication between nodes on the network. The requirement for extensive message passing between machines is the major factor preventing a linear increase in number of processors leading to a linear increase in the total processing power that can be applied to a given problem. The majority of connections in a brain are local in nature hence it should be possible to get away with purely local interconnections between processing nodes. This of course means that performance would be significantly reduced if the machine was applied to problems with anything other than a local topology. However, there are plenty of problems that fall into this category. Also, the project benefits from the work already done on other beowulf clusters which should make set up very much easier.
It is anticipated that the requirements and costs will vary significantly over time, especially at the rate technology becomes available in the consumer PC arena. The following requirements are the current estimates:
Knowledge of bulk purchase discounts available would be appreciated as would knowledge of alternative bulk storage technologies that would bring costs down (e.g. a cheap supplier of raid arrays). It is quite important that each PC be identical. This avoids tricky configuration issues having to be evaluated on an individual basis. It also avoids the slowest PCs becoming bottlenecks while the power of faster PCs is wasted.
Given that the Base units are a main expense in this design it may be preferably to reduce the number of units involved to something more convenient. Assuming that the motherboards have 4 IDE slots each of which capable of housing a master and slave hard drive (some motherboards support this already otherwise an additional controller card costs around 100UKP). Each node can have 8 x 45 = 360Gb of storage. Only 278 such nodes would then be required to hold the entire simulation on disk. This of course reduces the available processing power by a factor of around two and a half (assuming a linear scaling).A program and or graph for optimising these values may be written later.
The prices above are based on information available from PCIndex (www.pcindex.co.uk) and various local on-line stores for the smaller items. In a project this size few other factors that have not yet been considered are:
It is proposed that the nodes would be arranged in a 27x27 (=729) or 17x17 (=289) rectangular grid with each PC
connected to the four neighbouring PCs immeadiately to its North, East, South & West
respectively. PC's at the opposite edges of the grid would be linked together to form
a continuous sheet. In a worse case scenario a message from an individual node would have
to travel through nine intermediaries to reach its destination (for the big grid).
This would have an unacceptable performance hit on a conventional beowulf cluster.
However, for full brain simulation it
is anticipated that each node should rarely if ever require messages to be tranmitted though even
a single intermediary node. Hence the network cards on each neighbouring PC could be connected
directly without the additional complications of hubs and routers. Note, however, that the brain
is a three dimensional structure and that a 3D topology of 9x9x9 (= 729) or 6x6x7 (=294)
may be more appropriate.
Extending the previous idea, two additional network cards would be required on each node giving
connections up and down respectively. This is possible using a motherboard with 6 PCI slots
available. Due to the local nature of the connections in this design it would
be possible to develop an alternative connection scheme using one or more PCI slots potentially
with higher throughput. Such a 'beowulf' card although
desireable in principle would add significantly to the development costs and is thus omitted
from immediate consideration. Firewire cards are another alternative, however their cost is
currently prohibitive.
Note: other beowulf architectures often use several network cards together
as a 'bonded channel' to increase the overall message throughput. This scheme is not possible
here as insufficient PCI slots would be available.
Another possibility to consider is the small world phenomenom. Put simply this states that a small
number of random interconnections in an otherwise locally connected network can greatly decrease
the average separation between any two nodes in a network. Random rewiring could improve the
performance of this machine on more general problems. Also, It is possible that the brain uses such
a scheme in order to transmit information faster (likely even, considering nature's tendency to
discover such phenomena before man does).
Note: I would be eager to hear from those with a greater knowledge and experience of developing
beowulf systems with comments & criticisms of the above design.
The software is divided into three components:
The application level software is considered in more detailed in the associated document:
It is difficult to give an accurate estimate of the performance of a system before a prototype
has been built and particularly before the software it is to run has been made available.
The two main bottlenecks will be disk access and network data transfer. For this application
disk access will be particularly intense since there cannot be enough RAM available in the system.
4Gb per machine is the typical upper limit at present.
Direct Memory Access controllers will play a large part in reducing the
cost of access but extensive testing will be required to determine to what extent it can be minimised.
It may turn out that a special filesystem and driver may need to be written in order to minimise
this cost. However, this should not be entered into lightly.
The network data transfer bottleneck needs to be approached with similar caution. The maximum data
transfer rate of 100Mb/s of the card needs to be considered along with how quickly data can be
read in and written out to it. At the present time even the volume of data to be transferred has
not been estimated.
TO BE COMPLETED
The estimated
Relative Thought Rate
for the 729 Model is: 1/5000 (SlowDown 5000) giving 17 minutes of simulated time a week (ouch!)
The estimated
Relative Thought Rate
for the 294 Model is: 1/15000 (SlowDown 15000)