Visible number of cpu on OpenVZ and LXC

Franck Pachot - Aug 5 '21 - - Dev Community

You may find this blog post because you hit the same issue. Note that for the moment, I have no solution. I hope to remove this 1st paragraph and add a solution one soon ;)

I was trying to install YugabyteDB on Jelastic, which uses Virtuozzo/OpenVZ virtualization, but starting the yb-master failed with:

*** Check failure stack trace: ***
F0805 13:33:58.388178    28 locks.h:201] Check failed: cpu < n_cpus_ (21 vs. 6) 
    @     0x7f2760d809a1  yb::(anonymous namespace)::DumpStackTraceAndExit()
    @     0x7f276016200d  google::LogMessage::Fail()
    @     0x7f2760164536  google::LogMessage::SendToLog()
    @     0x7f2760161a6a  google::LogMessage::Flush()
    @     0x7f2760165159  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f27695992c8  yb::percpu_rwlock::get_lock()
    @     0x7f276958ebcd  yb::log::Log::Reserve()
    @     0x7f27695905bb  yb::log::Log::AsyncAppendReplicates()
    @     0x7f2769849e67  yb::consensus::LogCache::AppendOperations()
    @     0x7f276982b0d2  yb::consensus::PeerMessageQueue::AppendOperations()
    @     0x7f276985b8e5  yb::consensus::RaftConsensus::AppendNewRoundsToQueueUnlocked()
    @     0x7f276985a2b3  yb::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked()
    @     0x7f276985a7df  yb::consensus::RaftConsensus::BecomeLeaderUnlocked()
    @     0x7f276986fcc4  yb::consensus::RaftConsensus::DoElectionCallback()
    @     0x7f2760e1ec54  yb::ThreadPool::DispatchThread()
    @     0x7f2760e1b40f  yb::Thread::SuperviseThread()
    @     0x7f275c6de694  start_thread
    @     0x7f275be1b41d  __clone
Enter fullscreen mode Exit fullscreen mode

In short this means that, in this percpu_rwlock() assert, the CPU number is higher than the number of CPUs available... how is this possible?

Apparently, on OpenVZ, the number of CPU (n_cpus_ here) reported is number of virtual CPUs made visible by K8s but the logical CPU number (cpu here) comes from the hypervisor and can cover the whole VM processors.

In order to check that, I've written this little C program to get both values:

  • logical CPU number from sched_getcpu()
  • visible number of CPU (the same you can see with lscpu) from _SC_NPROCESSORS_CONF
cat > sched_getcpu.c <<CAT
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
int main(void) {
    printf("sched_getcpu = %3d _SC_NPROCESSORS_CONF = %3d\n", sched_getcpu(),sysconf(_SC_NPROCESSORS_CONF));
    return 0;
}
CAT
type gcc || yum install -y gcc
gcc sched_getcpu.c && for i in {1..10} ; do ./a.out ; done
Enter fullscreen mode Exit fullscreen mode

I'm running this on a container in Jelastic, with OpenVZ virtualization:

[root@yb-tserver-0 ~]# yum install -y virt-what
Package virt-what-1.18-4.el7.x86_64 already installed and latest version
Nothing to do
[root@yb-tserver-0 ~]# virt-what
openvz
lxc
Enter fullscreen mode Exit fullscreen mode

And the result is:

[root@yb-tserver-0 ~]# gcc sched_getcpu.c && for i in {1..10} ; do ./a.out ; done
sched_getcpu =  31 _SC_NPROCESSORS_CONF =   6
sched_getcpu =  15 _SC_NPROCESSORS_CONF =   6
sched_getcpu =  17 _SC_NPROCESSORS_CONF =   6
sched_getcpu =  17 _SC_NPROCESSORS_CONF =   6
sched_getcpu =  19 _SC_NPROCESSORS_CONF =   6
sched_getcpu =  21 _SC_NPROCESSORS_CONF =   6
sched_getcpu =  19 _SC_NPROCESSORS_CONF =   6
sched_getcpu =  19 _SC_NPROCESSORS_CONF =   6
sched_getcpu =  21 _SC_NPROCESSORS_CONF =   6
sched_getcpu =  19 _SC_NPROCESSORS_CONF =   6
[root@yb-tserver-0 ~]#
Enter fullscreen mode Exit fullscreen mode

Bad luck, the processor number I'm running on is always larger than the number of CPUs (which substract the CPUs made offline by K8s). One value comes from the host, the other from the Kubernetes limit. And programs like YugabyteDB (or OpenJDK had the same problem, or puppet) which want to manage processor affinity need to find a workaround.

So, should the container show the online vCPU only? Probably, but at least it need to be consistent with the processor number. Here is an example from Oracle Cloud free VM, running 1/8 OCPU, so we see 2 CPUS from the OS:

[opc@a ~]$ lscpu | grep -E "^Model|^CPU|^Hyper"

CPU op-mode(s):        32-bit, 64-bit
CPU(s):                2
CPU family:            23
Model:                 1
Model name:            AMD EPYC 7551 32-Core Processor
CPU MHz:               1996.250
Hypervisor vendor:     KVM
Enter fullscreen mode Exit fullscreen mode

This is 2 vCPU on a 64 threads processor.

The virtualization is KVM:

[root@yb-tserver-0 cores]# virt-what
lxc
kvm
Enter fullscreen mode Exit fullscreen mode

And my little program shows consistent sched_getcpu() numbers:

[root@yb-tserver-0 cores]# for i in {1..10} ; do ./a.out ; done
sched_getcpu =   6 _SC_NPROCESSORS_CONF =   8
sched_getcpu =   1 _SC_NPROCESSORS_CONF =   8
sched_getcpu =   4 _SC_NPROCESSORS_CONF =   8
sched_getcpu =   3 _SC_NPROCESSORS_CONF =   8
sched_getcpu =   5 _SC_NPROCESSORS_CONF =   8
sched_getcpu =   7 _SC_NPROCESSORS_CONF =   8
sched_getcpu =   1 _SC_NPROCESSORS_CONF =   8
sched_getcpu =   4 _SC_NPROCESSORS_CONF =   8
sched_getcpu =   6 _SC_NPROCESSORS_CONF =   8
sched_getcpu =   2 _SC_NPROCESSORS_CONF =   8
Enter fullscreen mode Exit fullscreen mode

And showing only the number of visible CPUs is a feature (it was considered a bug when LXC displayed all the host ones in /sys)

Let's have a look at where the numbers come from in YugabyteDB master server.

root@node88695-yb-demo ~ $ cat /sys/devices/system/cpu/present
0-5
Enter fullscreen mode Exit fullscreen mode

This looks like the correct behavior but is not consistent with the numbers coming from sched_getcpu() in some hypervisors. Probably we need to remove this assert (as it is done for OSX)

My dirty hack is adding the following before the command (which is exec /home/yugabyte/bin/yb-master in my case) to hi-jack sched_getcpu() with LD_PRELOAD and return always the last processor number from _SC_NPROCESSORS_CONF:

yum install -y gcc && echo -e "#define _GNU_SOURCE\n#include <unistd.h>\nint sched_getcpu (void) { return sysconf(_SC_NPROCESSORS_CONF)-1 ; };\n" > sched_getcpu.c && gcc -shared -o /tmp/sched_getcpu.so -fPIC sched_getcpu.c ; export LD_PRELOAD=/tmp/sched_getcpu.so && 
Enter fullscreen mode Exit fullscreen mode

There's also the alternative to give a larger range of CPU by reading from a custom /tmp/devices/system/cpu instead of /sys/devices/system/cpu:

echo 0-64 > /tmp_devices_system_cpu_present ; sed -e 's@/sys/devices/system/cpu/present@/tmp_devices_system_cpu_present@g' -i /home/yugabyte/lib/yb/libgutil.so
Enter fullscreen mode Exit fullscreen mode

I patch an existing yb-master statefulsets in this way to force sched_getcpu to return the highest online one:

for i in yb-master yb-tserver ; do kubectl get statefulsets $i -n yb-demo -o yaml | awk '/^ *exec [/]home[/]yugabyte[/]bin[/]yb-master/{sub(/exec/,patch" exec")}{print}' patch='yum install -y gcc ; echo -e "#define _GN
U_SOURCE\\n#include <unistd.h>\\nint sched_getcpu (void) { return sysconf(_SC_NPROCESSORS_CONF)-1 ; };\\n" > sched_getcpu.c ; gcc -shared -o /tmp/sched_getcpu.so -fPIC sched_getcpu.c ; export LD_PRELOAD=/tmp/sched_getcpu.so ; ' | kubectl apply -f /dev/stdin -n yb-demo ; done
Enter fullscreen mode Exit fullscreen mode

or to harcode a 0-64 range corresponding to the host CPUs.

for i in yb-master yb-tserver ; do kubectl get statefulsets $i -n yb-demo -o yaml | tee $i.b.yaml | awk '/^ *exec [/]home[/]yugabyte[/]bin[/]yb-/{sub(/exec/,patch" exec")}{print}' patch=' echo 0-64 > /tmp_devices_system_cpu_present ; sed -e 's@/sys/devices/system/cpu/present@/tmp_devices_system_cpu_present@g' -i /home/yugabyte/lib/yb/libgutil.so ; ' | tee $i.e.yaml | kubectl apply -f /dev/stdin -n yb-demo ; done
Enter fullscreen mode Exit fullscreen mode

but be aware that this is completely unsupported. Check https://github.com/yugabyte/yugabyte-db/issues/9619 for a solution

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .