Linux Optimus Setup

I’ve finally obtained a laptop. It is an end-of-life Metabox Prime-S P950EP; based on the Clevo P950EP6. Its sole purpose is for mobile development and demonstration of `Edict' (and other associated software) at some meetups around Melbourne; I will continue to use my desktop for almost all day-to-day work.

It includes what has become the standard combination of an Intel GPU, for simple desktop workloads, and an nVidia GPU, for the more resource intensive graphics operations.

Naturally (for me) I’m using Linux for a majority of my work. However I rapidly encountered a constellation of platform quirks and driver frictions that one hears about so often in the Linux community in relation to nVidia devices.

ACPI

If your device hangs with a black screen as your graphical environment is starting, and the GPU is disabled, you may need to workaround some ACPI bugs.

My particular laptop appears to require Windows 10 ACPI functionality is disabled. We can do this by adding an option to the kernel’s command line; in my case by appending this line to my default GRUB configuration:

/etc/default/grub
GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} acpi_osi='!Windows 2015'"

Other laptops require some combination of `Windows 2009', `Windows 2013', `Windows 2015'; or the negation of some or all of these. Unfortunately, the only way to discover the correct combination may be through brute force.

bumblebee/bbswitch

The most commonly cited mechanism for switching between GPUs on Linux is a combination of bumblebee and bbswitch.

Bumblebee provides an method to execute an application using a hidden X server (for exclusive use of the nVidia driver), and an environment that is modified to promote nVidia’s libGL.so (and friends) above the default Intel installation.

bbswitch is a kernel module that provides a robust mechanism to power down (and up) the nVidia GPU, and manage the loading and unloading of the nvidia kernel module.

However, given that this appears to trigger ACPI related system hangs on newer systems the better option appears to be avoiding the use of bbwitch altogether. Instead we can rely on the kernel’s default PCIe power management facilities.

/etc/modprobe.d/blacklist-nvidia.conf
blacklist nvidia
blacklist nvidia-uvm
blacklist nvidia-drm
blacklist nvidia-modeset

The above prevents the nvidia module from being automatically loaded at boot, but does not prevent it from being manually loaded as the modules_blacklist kernel parameter does.

/etc/bumblebee/bumblebee.conf
[driver-nvidia]
PMMethod=none
AlwaysUnloadKernelDriver=true

Bumblebee requires the PMMethod directive is set to none so as to avoid the use of bbswitch. It will instead default to the kernel’s power management system.

The kernel will only power down the device when the driver is unloaded, so we also require AlwaysUnloadKernelDriver.

xinitrc.d

Alas, while the driver was not loaded at the point my greeter was displayed, it was loaded at some point while XFCE was starting up.

After evaluating some overkill solutions to answering the who loaded the module' question via `systemtap I instead used a technique usually used for blacklisting the module.

/etc/modprobe.d/blacklist-nvidia.conf
install nvidia /tmp/trace.sh

The install directive will execute the listed command instead of loading the module.

/tmp/trace.sh
#!/bin/sh
/usr/bin/pstree > /tmp/trace

Instead of actually loading the module we’ll dump a list of all running processes in the system. With a bit of luck one might see a likely candidate.

In our case the likely offender was nvidia-settings which is a sufficiently unique name to just grep `/etc' and come out with a call to `/etc/X11/xinit/xinitrc.d/95-nvidia-settings'. It’s extraodinarily easy to accidentally trigger nVidia binaries/libraries into loading the kernel module; so anything related is a good candidate.

The 95-nvidia-settings script belongs to the nvidia-drivers package which we obviously can’t remove. But we can disable it by removing execute permissions from the script (and thus punt the problem back to our future selves when we next reinstall the driver and undo our changes).

Power Management

Now that the driver is likely to be unloaded by default we can set the PCIe bus to automatically power down when idle.

echo "auto" > /sys/bus/pci/devices/0000:01:00.0/power/control

An easy method for this is to use something like powertop --auto-tune, or automate it via a Laptop Mode Tools' rule, or via the `systemd tmpfiles. facilities.

Confirmation

To verify we’ve got the correct behaviour after all this we reboot, login and then check:

  • The GPU fan isn’t overly loud, and
  • lsmod | grep nvidia does not report any loaded modules, and
  • optirun glxinfo | grep NVIDIA reports the vendor is some variant of `NVIDIA'

I hope this helps someone avoid a goodly number of painful days rebooting their system.