VRANA project: network guide

Some guidelines about the network choice from VRANA and LoBoS experiences


In general the recommended way to build PC clusters is to use switches
and not the point to point (P-to-P) parallel architecture. It is truth
that point to point can be very efficient and cheap for middle sized
clusters, but there are some drawbacks which need a special expertise
to control it. Lets name a few of them:

- one node dies and a lot of others (or all of them) are effected
  (it is possible to cheat hre a little bit, like we do)

- in order to prevent the cable nightmare one has to make
  them on the spot by himself.

- special programs have to be written to build routing tables for each
  specific architecture. It is not possible to connect more then 10
  machines by hand written routing tables. One such program was
  written here for hypercube and it could be
  extended for other architectures fairly easy, but you must spend
  some time on the problem.


>> - What brand/model of NIC?

The NICs we are using are about $10. They are RTL8139B based.

>> - If not, do you think a VRANA-4-type cluster *could* work with multi-port
>>   NICs?

It could but I don't think you get 6 ports for $60.

>> - What toplogy exactly are you using?  If you tested different ones, which
>>   one(s) did you find the best for what programs?

The reason we decided to connect the machines in a hypercube is that if
one wants to connect 64 machines, one must have 64 equivalent ports in
one switch. The `cheap' switches usually have just 24 ports, which
means one have to connect several switches to get so many ports. Then
there is the bottleneck in the system: you have to connect switches
among them. The best I saw is that there is 1Gbit connection between 2
switches but this is about 5 times slower than what is needed for a full
non-blocking connection with 3 switches. Any proper solution will cost
as much or more then CPU boxes do.

>> - I'll very soon have a bunch of P III 866MHz nodes here.  As far as I can
>>   tell, there are 5 free PCI slots left in each.  What topology would you
>>   build with such systems?

6 PCI slot motherboards are quite popular now, that's why we decided
to build the hypercube. But CHARMM could run pretty efficiently also
on a 3-D torus meaning each node has connection to its
left-right-up-down neighbours. Ends are wrapped around. This means 4
NICS. But if one node dies all of them are not connected anymore, ie
the routing table should be modified. Also it becomes a problem to
partition such a system. Hypercube is much more flexible to partition.
Lets say 3 jobs want to use 3-D torus - it is impossible, to do it
efficiently unless you reconnect the cables.

>> - Did you have to change CHARMM to get it to work well (or at all?)
>>   with this system?

CHARMM runs on almost any parallel architecture. For an efficient use
of 3-D torus I would have to make new communication routines but that
is easy because one could use the ring topology routines (they are in
CHARMM) in 2 dimensions.

>> - From the (preliminary) benchmarks on VRANA-4 on http://kihp6.cmm.ki.si/
>>   parallel/summary.html, I saw that up to 16 nodes, it scales quite
>>   well, but for 32 nodes, there seems to be a noticeable dropoff.
>>   Do you think you will be able to improve this?

Not really with this hardware. I am hoping to do something for PME and
maybe it could scale better. We are using this machines also for QM/MM
calculations. With average system (30 - 50 QM atoms) one can get the
speedup of 20 on 32 processors.

>> - Anything else that might be important to observe when building a
>>   VRANA-4 type system?

In short the guideline is like this:

~ 5 boxes: You just connect everyone to everyone (hypernet). No
           routing tables are needed, no special SW. If there are 4
           boxes only one can connect them in the ring and this is
           basically equivalent to any other parallel architectures:
           hypercube, torus, etc. Routing tables can be written by
           hand or using the program for a hypercube.

~ 10-20 boxes: buy the $1000 or cheaper switch

more than 20 boxes: Start thinking big and go the LoBoS's way, or make
                    2 or more loosely connected clusters, still each
                    group of 20 is fully connected. Anyway CHARMM
                    will not scale on more than 20 machines...


The bottomline: Don't try this (hypercube, or other P-to-P) at home,
                unless you know what you are doing. But then you
                don't need reading the above :-)