Whoa – weird load ave and cpu freq reports from openSUSE 10.3
Just built a cluster of 25 Dell systems for our developers. These are Dell 1435SC systems, each with a pair of Dell 1435SC and 8 GB RAM. We installed openSUSE 10.3 on them all, added the Ganglia gmond and also the OpenIB infiniband successor OFED.
Handed them off to our developers to certify and they came right back asking weird questions – like:
- Why is the load always at least 1.00?
- Are these *really* 1000 MHz CPUs?
- How come the screen background color is always dark puce?
(Just kidding about that last one – Prasad!)
Sure enough. Uptime shows load on all 25 systems is always at least 1.00. Usually right there. And the cpu MHz in /proc/cpuinfo is almost always 1000. I saw it at 2600 for all four cores on one machine and the next time I looked, it had dropped to 1000 on all four cores.
Here’s part of the output of “cat /proc/cpuinfo” for the first proc, number 0:
id : AuthenticAMDcpu family : 15model : 65model name : Dual-Core AMD Opteron(tm) Processor 2218stepping : 3cpu MHz : 1000.000cache size : 1024 KBphysical id : 0siblings : 2core id : 0cpu cores : 2fpu : yesfpu_exception : yescpuid level : 1wp : yesflags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacybogomips : 2001.35TLB size : 1024 4K pagesclflush size : 64cache_alignment : 64address sizes : 40 bits physical, 48 bits virtualpower management: ts fid vid ttp tm stc
Here are the uptime reported load aberations
for num in $(seq 50 74)do echo -n supib$num ssh supib$num uptimedone 2>&1 | grep 'average'
supib50 12:03pm up 6 days 20:35, 3 users, load average: 2.18, 3.93, 3.99supib51 12:03pm up 1 day 2:16, 0 users, load average: 1.99, 3.80, 3.82supib52 12:03pm up 5 days 22:51, 0 users, load average: 1.00, 1.01, 1.31supib53 12:03pm up 5 days 22:51, 0 users, load average: 1.00, 1.01, 1.31supib54 12:03pm up 6 days 1:12, 0 users, load average: 1.00, 1.01, 1.32supib55 12:03pm up 5 days 22:32, 0 users, load average: 1.00, 1.01, 1.31supib56 12:03pm up 5 days 22:35, 0 users, load average: 1.00, 1.02, 1.32supib57 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.01, 1.29supib58 12:03pm up 6 days 0:54, 0 users, load average: 1.08, 1.02, 1.01supib59 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.00supib60 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.00supib61 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.00supib62 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.00supib63 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.00supib64 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.00supib65 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.02, 1.00supib66 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.01supib67 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.00supib68 12:03pm up 6 days 0:54, 0 users, load average: 2.00, 2.00, 2.00supib69 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.01supib70 not reachable through the networksupib71 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.02supib72 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.01supib73 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.01supib74 12:03pm up 6 days 0:54, 0 users, load average: 1.00, 1.00, 1.00
OK, so this is weird. Off to dig
OK, turns out that the ofed packages that are stock with openSUSE 10.3 – which I’ll list here:
ofed-1.2.5-18.x86_64.rpm
ofed-doc-1.2.5-18.x86_64.rpm
ofed-kmp-default-1.2.5_2.6.22.5_31-18.x86_64.rpm
… install a module named ib_mthca – the core module for infiniband support it appears.
Once that module hits the kernel, the load steadily rises to the +1 state I reported earlier.
Odd – moving to report to the openib folks.
New bug …
https://bugs.openfabrics.org/show_bug.cgi?id=866