The Virtual Brain Machine Project

The aim of this project is develop software and hardware sufficiently powerful to be capable of simulating an entire human brain. The project is divided into two parts; the development of the software and the design of the hardware.

Hardware

The complexity of the human brain is such that it cannot be simulated it is entirety on anything less than a supercomputer. The power of a supercomputer can be made available at a relatively low cost using what is termed a 'beowulf' cluster. A beowulf cluster is a supercomputer made by assembling a network of PCs using off the shelf components (see http://tux.anu.edu.au/Projects/Beowulf/ & http://www.beowulf.org/ for more details). An alternative which will also be considered is the use of a larger number of PCs communicating over the internet to the same end. Rough calculations (brainsim) estimate that around 100 Tb = 100,000 Gb of storage is required to hold a state of the brain representing a single istant of activity and this is after making significant simplifications to the model in use. Note: storage systems on this scale are available from E.g. Sun Microsystems from around 200,000 UK pounds. The processing power required is a less significant factor since the simulation need not be real time. Based on price versus storage capacity the best value hard drives available currently have a capacity of 45Gb and cost roughly 125 UK pounds. In order to acheive the required storage capacity we assume each PC in the cluster to have 3 such harddrives. This means that 100,000 / (45 * 3) ~= 740 PCs are required in the cluster. A very conservative estimate of 1500 pounds per PC brings the cost of this machine to a round million. The cost in dollars would probably be similar regardless of exchange rates. The processing power available in such a network would be more than adequate for the needs of this project. That is to be capable of simulating a network 100 billion neurons at all, rather than to simulate such a network in real-time. Such a large cluster is, however, a formideable feat of engineering an order of magnitude greater in scale than most current efforts. The recent KLAT2 supercomputer (http://www.aggregate.org/KLAT2/) is a beowulf cluster of 64 1GHz Athlon based PCs and could be considered representative of the current state of the art. However, the machine required by this project has an advantage over more general purpose projects like KLAT2 in the simplicity of the design required. A lot of thought has to go into the design of a beowulf cluster in order to optimise the speed of communication between nodes on the network. The requirement for extensive message passing between machines is the major factor preventing a linear increase in number of processors leading to a linear increase in the total processing power that can be applied to a given problem. The majority of connections in a brain are local in nature hence it should be possible to get away with purely local interconnections between processing nodes. This of course means that performance would be significantly reduced if the machine was applied to problems with anything other than a local topology. However, there are plenty of problems that fall into this category. Also, the project benefits from the work already done on other beowulf clusters which should make set up very much easier.

It is anticipated that the requirements and costs will vary significantly over time, especially at the rate technology becomes available in the consumer PC arena. The following requirements are the current estimates:

Base unit:
- 1GHz AMD Athlon processor - 160 UK pounds
- Case with 230W PSU - 45 UK pounds
- Motherboard 6PCI slots - 125 UK pounds
- 256Mb PC133 RAM - 150 UK pounds
- Allowance for cables and floppy drive - 40 UK pounds
- Total price: 520 UK pounds
3 x 45Gb IDE UDMA 66 Harddrive 10,000rpm @ 125 UK pounds each
4 x 100Mb/s PCI network cards
Keyboard & Monitor not required

This brings the totals to:

740 - 1GHz Athlon base units @ 520 UKP each = 244400 UK Pounds
3 * 740 = 2220 45Gb hard drives @ 125 UKP each = 277500 UK Pounds
4 * 740 = 2960 100Mb network cards @ 45 UKP each = 133200 UK Pounds
Total cost: 655,100 UK Pounds

Knowledge of bulk purchase discounts available would be appreciated as would knowledge of alternative bulk storage technologies that would bring costs down (e.g. a cheap supplier of raid arrays). It is quite important that each PC be identical. This avoids tricky configuration issues having to be evaluated on an individual basis. It also avoids the slowest PCs becoming bottlenecks while the power of faster PCs is wasted.

Given that the Base units are a main expense in this design it may be preferably to reduce the number of units involved to something more convenient. Assuming that the motherboards have 4 IDE slots each of which capable of housing a master and slave hard drive (some motherboards support this already otherwise an additional controller card costs around 100UKP). Each node can have 8 x 45 = 360Gb of storage. Only 278 such nodes would then be required to hold the entire simulation on disk. This of course reduces the available processing power by a factor of around two and a half (assuming a linear scaling).

Alternative configuration 1:

Base unit:
- 1GHz AMD Athlon processor setup as above 520UKP
- 4 IDE slots + 100 pounds (covers cost of PCI card where not onboard)
- Total: 620 UK pounds each
8 x 45Gb IDE UDMA 66 Harddrive 10,000rpm @ 125 UK pounds each
4 x 100Mb/s PCI network cards
Keyboard & Monitor not required

This brings the totals to:

289 - 1GHz Athlon base units @ 620 UKP each = 179180 UK Pounds
8 * 289 = 2312 45Gb hard drives @ 125 UKP each = 289000 UK Pounds
4 * 289 = 1156 100Mb network cards @ 45 UKP each = 52020 UK Pounds
Total: 520200 UK Pounds

Alternative configuration 2:

Base unit: 1GHz AMD Athlon processor 4 IDE slots
8 x 45Gb IDE UDMA 66 Harddrive 10,000rpm @ 125 UK pounds each
4 x 100Mb/s PCI network cards
Keyboard & Monitor not required

This brings the totals to:

294 - 1GHz Athlon base units @ 620 UKP each = 182280 UK Pounds
8 * 294 = 2352 45Gb hard drives @ 125 UKP each = 294000 UK Pounds
4 * 294 = 1176 100Mb network cards @ 45 UKP each = 52920 UK Pounds
Total: 529200 UK Pounds

A program and or graph for optimising these values may be written later.

notes

The prices above are based on information available from PCIndex (www.pcindex.co.uk) and various local on-line stores for the smaller items. In a project this size few other factors that have not yet been considered are:

Location
The power consumption & electricity bills
Component failures - how many spare PCs are required?
Time & labour costs of set up

Topology

It is proposed that the nodes would be arranged in a 27x27 (=729) or 17x17 (=289) rectangular grid with each PC connected to the four neighbouring PCs immeadiately to its North, East, South & West respectively. PC's at the opposite edges of the grid would be linked together to form a continuous sheet. In a worse case scenario a message from an individual node would have to travel through nine intermediaries to reach its destination (for the big grid). This would have an unacceptable performance hit on a conventional beowulf cluster. However, for full brain simulation it is anticipated that each node should rarely if ever require messages to be tranmitted though even a single intermediary node. Hence the network cards on each neighbouring PC could be connected directly without the additional complications of hubs and routers. Note, however, that the brain is a three dimensional structure and that a 3D topology of 9x9x9 (= 729) or 6x6x7 (=294) may be more appropriate. Extending the previous idea, two additional network cards would be required on each node giving connections up and down respectively. This is possible using a motherboard with 6 PCI slots available. Due to the local nature of the connections in this design it would be possible to develop an alternative connection scheme using one or more PCI slots potentially with higher throughput. Such a 'beowulf' card although desireable in principle would add significantly to the development costs and is thus omitted from immediate consideration. Firewire cards are another alternative, however their cost is currently prohibitive.
Note: other beowulf architectures often use several network cards together as a 'bonded channel' to increase the overall message throughput. This scheme is not possible here as insufficient PCI slots would be available.
Another possibility to consider is the small world phenomenom. Put simply this states that a small number of random interconnections in an otherwise locally connected network can greatly decrease the average separation between any two nodes in a network. Random rewiring could improve the performance of this machine on more general problems. Also, It is possible that the brain uses such a scheme in order to transmit information faster (likely even, considering nature's tendency to discover such phenomena before man does).
Note: I would be eager to hear from those with a greater knowledge and experience of developing beowulf systems with comments & criticisms of the above design.

Development Path

The initial part of the project begins with getting a single PC to boot without keyboard or monitor attached and with four PCI cards recognised and operating.
This would also be a good time to review the speed of access to the hard drives on each machine.
The second step is to connect two such PCs together and ensure that they can communicate effectively.
This would also be a good time to review the communications speed to feedback on the performance estimate.
The next step is to link together four PCs and conduct further tests.
Up to this point the costs are not prohibitive. Only four PCs are required and a good estimate of performance on the entire problem can be gathered without committing to the full cost of the system (7500 UK Pounds at a generous 1500 per PC). Moreover, as this is a test platform, cheaper or preexisting PCs may be used as a proving ground.
A minicluster such a this might be considered a wulfling :-).
On a network this size it is important to develop & include self-testing software for all stages wherever possible. The failure of some components in use is inevitable but many more are likely to be found to be faulty from the time of purchase or installation.
Finally efforts may be duplicated to the complete network which may then be tested for performance as a whole and applied to the target problem.

Software

The software is divided into three components:

Software for describing the simulation itself and to assist in its development
A high level component dealing with the network topology and simulation problem at the application level.
A low level component dealing with optimising segments of the application layer for the target machine envisioned.

Both subsystems should be written with portability in mind. Though this is much more difficult for the low level system, success would assist in the parallel development of a simulation system capable of running over a more diverse and widely distributed computing environment such as that presented by a large number of volunteers and the internet.

The application level software is considered in more detailed in the associated document:

Large Scale Neural Network Application Programming Interface

Please note this document is still in the very early stages of development. It's development is parallel but subordinate to that of the associated software though it is intended as an implementation independent standard. Contributions and criticisms are welcome and encouraged.

For information about the kind of simulation that would be run on this machine see Whole Brain Simulation

Performance estimates

It is difficult to give an accurate estimate of the performance of a system before a prototype has been built and particularly before the software it is to run has been made available. The two main bottlenecks will be disk access and network data transfer. For this application disk access will be particularly intense since there cannot be enough RAM available in the system. 4Gb per machine is the typical upper limit at present. Direct Memory Access controllers will play a large part in reducing the cost of access but extensive testing will be required to determine to what extent it can be minimised. It may turn out that a special filesystem and driver may need to be written in order to minimise this cost. However, this should not be entered into lightly.
The network data transfer bottleneck needs to be approached with similar caution. The maximum data transfer rate of 100Mb/s of the card needs to be considered along with how quickly data can be read in and written out to it. At the present time even the volume of data to be transferred has not been estimated.
TO BE COMPLETED
The estimated Relative Thought Rate for the 729 Model is: 1/5000 (SlowDown 5000) giving 17 minutes of simulated time a week (ouch!) The estimated Relative Thought Rate for the 294 Model is: 1/15000 (SlowDown 15000)

The Virtual Brain Machine Project

Hardware

notes

Topology

Development Path

Software

Performance estimates

See also