Hi,
I think this benchmark might be interesting to some VASP
users, so I post the result here. We have managed to
compile VASP 4.6.26 with PGI 6.0.8/MPICH 1.2.6, with
GOTO library, on our Opteron 275 cluster with gigabit
ethernet switch.
The purpose of doing this benchmark is to find out the way
to improve the parallel performance of VASP on this cluster,
and find out the most effective way to run parallel VASP job
on dual-core dual-opteron cluster.
At the moment, each node has only 4GB memory and 4
cores. The test case is a system of 8 TiO2 uint cell. The
benchmark was done on a seperated 4-node cluster.
We are thinking to upgrade to infiniband switch and increase
to 8GB memory on each node.
Regards
Jyh-Shyong Ho, Ph.D.
Research Scientist
National Center for High Performance Computing
Hsinchu, Taiwan, ROC
8core2node
Total CPU time used (sec): 904.177
User time (sec): 801.994
System time (sec): 102.182
Elapsed time (sec): 1384.815
Minor page faults: 5446219
Major page faults: 0
Voluntary context switches: 900481
free energy TOTEN = -210.160380 eV
Iteration 1( 14)
4core2node
Total CPU time used (sec): 1717.815
User time (sec): 1557.233
System time (sec): 160.582
Elapsed time (sec): 2019.904
Minor page faults: 10251676
Major page faults: 0
Voluntary context switches: 1875775
free energy TOTEN = -210.160380 eV
Iteration 1( 14)
2core2node
Total CPU time used (sec): 3168.858
User time (sec): 2974.274
System time (sec): 194.584
Elapsed time (sec): 3608.532
Minor page faults: 19797483
Major page faults: 0
Voluntary context switches: 2477002
free energy TOTEN = -210.161120 eV
Iteration 1( 14)
4core4node
Total CPU time used (sec): 1671.648
User time (sec): 1530.916
System time (sec): 140.733
Elapsed time (sec): 1859.653
Minor page faults: 10251508
Major page faults: 0
Voluntary context switches: 2015003
free energy TOTEN = -210.160380 eV
Iteration 1( 14)
8core4node
Total CPU time used (sec): 791.257
User time (sec): 703.924
System time (sec): 87.333
Elapsed time (sec): 1095.928
Minor page faults: 5446097
Major page faults: 0
Voluntary context switches: 699000
free energy TOTEN = -210.160380 eV
Iteration 1( 14)
16core4node
Total CPU time used (sec): 518.280
User time (sec): 382.768
System time (sec): 135.512
Elapsed time (sec): 1483.999
Minor page faults: 2777032
Major page faults: 0
Voluntary context switches: 1526597
free energy TOTEN = -210.160380 eV
Iteration 1( 14)
1core1node
Total CPU time used (sec): 5399.341
User time (sec): 5237.087
System time (sec): 162.254
Elapsed time (sec): 5435.051
Minor page faults: 32855418
Major page faults: 0
Voluntary context switches: 3
free energy TOTEN = -210.159666 eV
Iteration 1( 14)
2core1node
Total CPU time used (sec): 3318.183
User time (sec): 3117.879
System time (sec): 200.305
Elapsed time (sec): 3442.154
Minor page faults: 19797485
Major page faults: 0
Voluntary context switches: 639192
free energy TOTEN = -210.161120 eV
Iteration 1( 14)
4core1node
Total CPU time used (sec): 1746.881
User time (sec): 1584.151
System time (sec): 162.730
Elapsed time (sec): 2071.770
Minor page faults: 10251756
Major page faults: 0
Voluntary context switches: 1759164
free energy TOTEN = -210.160380 eV
Iteration 1( 14)
VASP benchmark on dual-core dual-opteron cluster
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 7
- Joined: Tue Nov 15, 2005 9:01 am
VASP benchmark on dual-core dual-opteron cluster
Last edited by c00jsh00 on Fri Apr 07, 2006 2:06 am, edited 1 time in total.
-
- Full Member
- Posts: 107
- Joined: Wed Aug 10, 2005 1:30 pm
- Location: Leiden, Netherlands
VASP benchmark on dual-core dual-opteron cluster
I'm lazy. Can you rephrase those elapsed times in terms of speedups? ;-)
Last edited by tjf on Fri Apr 07, 2006 9:09 am, edited 1 time in total.
-
- Newbie
- Posts: 7
- Joined: Tue Nov 15, 2005 9:01 am
VASP benchmark on dual-core dual-opteron cluster
There is no simple answer to the speed up on dual-core dual opteron cluster, that is why I list all timing data. The difference between the total cpu time and the elapsed time includes the time for I/O, memory-conflict within a node and cross-node communication. The efficiency of the NSF file system for output file read/write is crucial for VASP performance on a cluster.
For the current configuration of our cluster, elapsed time:
For using one node:
5436s(1core), 3318s(2core1node), 2072s(4core1node)
For using four nodes:
1859s(4core4node), 1095s(8core4node), 1484s(16core4node)
For using 2 nodes:
3609s(2coe2node), 2020s(4core2node), 1384s(8core2node)
For using 2 core per node:
3318s(2core1node), 2020s(4core2node), 1095s(8core4node)
For the current configuration of our cluster, elapsed time:
For using one node:
5436s(1core), 3318s(2core1node), 2072s(4core1node)
For using four nodes:
1859s(4core4node), 1095s(8core4node), 1484s(16core4node)
For using 2 nodes:
3609s(2coe2node), 2020s(4core2node), 1384s(8core2node)
For using 2 core per node:
3318s(2core1node), 2020s(4core2node), 1095s(8core4node)
Last edited by c00jsh00 on Sat Apr 08, 2006 12:31 pm, edited 1 time in total.
-
- Newbie
- Posts: 7
- Joined: Tue Nov 15, 2005 9:01 am
VASP benchmark on dual-core dual-opteron cluster
Hi,
The bad performance of VASP on our dual core dual opteron cluster was due to the slow gigabit ethernet interconnection. We set up a test environment of 4 nodes with Infiniband network, and here is the result of the same test case:
Infiniband band
16core4node 8core4node 4core4node
Total CPU 637.20 s 992.61 s 1306.25s
User CPU 636.46 s 991.79 s 1305.46s
Elasped 685.40 s 1060.56s 1396.31s
The elspased times are very close to the total CPU times, that means that the CPUs are fully loaded most of the time. Compared with the results from using gigabit ethernet:
16core4node 8core4node 4core4node
Total CPU 5037.52s 5064.03s 2811.70s
User CPU 1200.56s 1678.06 s 2653.25s
Elasped 5535.34 s 5399.91s 2844.00s
When the computing power of the computing nodes are much higher than the speed of the interconnection, you'll get very bad parallel performance.
The lesson:
For running parallel VASP:
single core/single CPU, single core/dual CPUs, dual core/single CPU cluster, gigabit ethernet interconnections is OK
dual core/dual CPU cluster, infiniband interconnection is necessary.
The bad performance of VASP on our dual core dual opteron cluster was due to the slow gigabit ethernet interconnection. We set up a test environment of 4 nodes with Infiniband network, and here is the result of the same test case:
Infiniband band
16core4node 8core4node 4core4node
Total CPU 637.20 s 992.61 s 1306.25s
User CPU 636.46 s 991.79 s 1305.46s
Elasped 685.40 s 1060.56s 1396.31s
The elspased times are very close to the total CPU times, that means that the CPUs are fully loaded most of the time. Compared with the results from using gigabit ethernet:
16core4node 8core4node 4core4node
Total CPU 5037.52s 5064.03s 2811.70s
User CPU 1200.56s 1678.06 s 2653.25s
Elasped 5535.34 s 5399.91s 2844.00s
When the computing power of the computing nodes are much higher than the speed of the interconnection, you'll get very bad parallel performance.
The lesson:
For running parallel VASP:
single core/single CPU, single core/dual CPUs, dual core/single CPU cluster, gigabit ethernet interconnections is OK
dual core/dual CPU cluster, infiniband interconnection is necessary.
Last edited by c00jsh00 on Thu Jun 01, 2006 5:02 am, edited 1 time in total.