A while ago I decided to assess Linux on tablets by acquiring a Microsoft Surface Go 2 and installing Arch-based Manjaro, replacing the preinstalled Windows 10 entirely. While the tablet experience itself is - for the most part - surprisingly smooth, one issue really took a toll on me and impeded the mobile experience drastically: The physical power button stopped working right after the first hibernate (suspend to disk) cycle. Without a power button, the typical mobile experience degrades: One can’t put the device to sleep or turn off the screen with the press of a button. Suspending from the menu or attaching the keyboard (called “Type Cover”) are disappointing workarounds. Not using hibernate wasn’t an option for me either due to the battery drain during software suspend (or “suspend to RAM”). The issue was reproducible on all kernels I tried, both vanilla and surface ones. I encountered and commented on an issue in the linux-surface GitHub repository describing this very problem, but after another year passed without any activity there, I decided to look into it myself.

On here I’m going to document meticulously how I investigated and finally rectified the issue, even though with just another workaround - this one however without any noticeable drawbacks.

Investigation

The very first thing to do when encountering such an issue is to ensure it can be reliably reproduced. Here, the power button was always working as expected (triggering whatever was configured in the “Power Settings”, e.g. software suspend) after a normal “cold” boot. After executing systemctl hibernate to suspend to disk and then resuming operation by powering the system back on, all power button presses were then seemingly ignored by the system. Further suspend or hibernate cycles didn’t help, a full reboot was required to get it back to a working state.

A good next step is to check logs for potentially relevant messages. The aforementioned GitHub issue mentioned an ACPI Error in dmesg output to look out for, which I could indeed verify on my own system: The very first time the power button is pressed after hibernation, the log outputs something like

ACPI Error: No installed handler for fixed event - PowerButton (2), disabling (20210730/evevent-255)

Afterwards there’s just silence. Since the open ACPI standard (Advanced Configuration and Power Interface) is typically used for everything related to power management on current operating systems, I wasn’t surprised to see an ACPI error here. Since a lot of things happening in the ACPI subsystem - especially pressing a power button - are event-based, another strategy to quickly gather more data is to launch the acpid daemon followed by acpi_listen. This will show a live view of ACPI events as they occur on the system. In my case, pressing the power button when the system had just (re-)booted this read

button/power PBTN 00000080 00000000 K

However, ever after the first hibernate cycle pressing the power button didn’t produce ACPI events anymore. That confirmed the issue, but sadly didn’t provide further insights.

Looking up ACPI Error: No installed handler for fixed Event on the search engine of my choice resulted in only a few results. Luckily, in one of them - a closed bug report on bugzilla.kernel.org - a commenter tried to explain the meaning of that error message:

this is got from the FADT table of your laptop"
           Control Method Power Button (V1) : 1
           Control Method Sleep Button (V1) : 1
this means that there is control method power button, and power button events are routed via GPE rather than fixed ACPI event, thus we do not register an handler for the fixed power button event(0x00000002).

[125352.494915] ACPI Error: No installed handler for fixed event [0x00000002]
(20110112/evevent-272)
this error message means that there is an power button fixed event generated, although the power button interrupt should be GPE.

The ACPI Specification

Well, deciphering these words without any further knowledge of ACPI itself wasn’t going anywhere. For me, the most useful introduction to the concepts of ACPI after reading the Wikipedia article was the rather lengthy ACPI Specification at uefi.org, specifically chapter 4, “ACPI Hardware Specification” and in there section 4.7, “ACPI Hardware Features” and 4.8, “ACPI Register Model”. What follows is my interpretation of these resources, which is most likely not accurate at all. For a credible take on these topics, don’t quote me and read the specification yourself.

My personal summary of the relevant bits: ACPI itself is a specification/open standard for power management features. One of the primary design goals was to keep this specification independent from operating systems and hardware platforms, which has the (sometimes just theoretical) benefit that an ACPI-compliant operating system should be able to utilize power management features of ACPI-compliant hardware. On the hardware-side, ACPI requires - among lots of other stuff - the presence of various ACPI Registers, an ACPI BIOS and ACPI Tables. The ACPI tables are a chunk of data placed into memory by the BIOS/UEFI at an expected address prior to the actual operating system kernel taking control. They contain lots of metadata to describe power management features of the particular hardware either in the form of structured static data (think of a struct in C) or executable platform-independent AML (ACPI Machine Language) code. An ACPI-compatible operating system has to parse that structure and execute the AML code in there to figure out where in memory the ACPI registers are located, what the supported suspend modes are or which power-saving states are available on the CPU and other devices. For that, the Linux kernel relies on the ACPI Component Architecture, an open source platform-independent reference implementation of the ACPI specification maintained by Intel. In the kernel sources, it can be found in drivers/acpi/acpica/.

Lots of ACPI functionality, especially power button handling as relevant for this article, is based on event processing. According to my probably oversimplified understanding, pushing the power button causes the firmware to set the “PM1 Event Registers” and trigger an interrupt. The corresponding interrupt handler as set up by the OS then identifies that the interrupt was caused by ACPI and subsequently hands over execution to the ACPI subsystem of the kernel and finally the ACPICA. Later on, the event will be reported to userspace so that applications such as the GNOME Power Manager get notified and can react accordingly (e.g. suspend the system). In the specification, the PM1 Event Registers are a set of registers bundled together to multiple Register Blocks, which then form Register Groupings. For the purpose of this article, we can ignore that and just remember that the register data reported by the kernel is in reality a consolidated view of multiple individual registers. The next thing to know is that in ACPI there are generally two types of events: Fixed Events and General Purpose Events (GPE). Fixed events are certain events mandated by the specification, such as the power button, sleep button and various timers. GPEs on the other hand seem to be an extension point allowing manufacturers to define additional hardware-specific events using a set of ACPI General Purpose Registers.

For reasons beyond my understanding, the power button (subsection 4.8.3.1.1 in the specs) can either cause fixed events (called Fixed Power Button) or general purpose events (called Control Method Power Button). I didn’t manage to figure out the benefit for using one over the other, but that’s something the manufacturer has to decide anyway. This brings us to the ACPI Tables: They are the manufacturer’s (or firmwares) description of the ACPI hardware present on a system. We learn from the specification that the type of power button “is indicated by the PWR_BUTTON flag in the FADT”. Now, as the term implies there is not a single ACPI table but quite a lot of them, and they all use 4-letter acronyms: BERT, DRTM, ECDT, EINJ, ERST and so on (isn’t the list of tables in the Linux kernel documentation a joy to look at?). To make matters more interesting, the signatures of the actual tables in the ACPI firmware differ in a few cases from their actual name: For example, the FADT - or Fixed ACPI Description Table - we are looking for actually uses the signature FACP. While probably obvious to anyone familiar with this, I spent quite a lot of time wondering why the firmware of my Surface tablet seemed to be missing a table as crucial as the FADT.

For now, this basic knowledge gained from the ACPI specification should suffice to understand the error message we received from the kernel upon pressing the power button after a hibernation a bit better: ACPI Error: No installed handler for fixed event indicates that the kernel received a fixed event it didn’t prepare a handler for, probably because the ACPI tables didn’t say that this system uses a Fixed Power Button. The curious comment on the closed bug report confirms that:

from the FADT table of your laptop
Control Method Power Button (V1) : 1
[...] 
this means that there is control method power button, and power button events are routed via GPE rather than fixed ACPI event, thus we do not register an handler 

Debugging ACPI events

Looking at the kernel log (e.g. via dmesg) on the Surface tablet, we seem to be in the same boat as the original bug reporter:

[    0.697597] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[    0.697620] ACPI: button: Power Button [PWRB]

The presence of a device called PNP0C0C is - according to the specification - an indicator that my tablet uses Control Method Power Button and thus should emit general purpose events when the power button is pressed. However, right after the first hibernation cycle the kernel suddenly complains about receiving a fixed event. I decided to dig deeper by looking at the raw event data.

When the kernel receives an ACPI-related event, it fetches the contents of the ACPI registers to figure out what exactly had happened. Pressing the power button is signalled to the OS via the PM1 Event Registers, of which there are two kinds: PM1 Status Registers and PM1 Enable Registers. Since neither the logs I saw in dmesg nor the output of acpi_listen contains such raw data, I had a look at the kernel (5.x) sources surrounding my ACPI Error for debugging clues.

The function that logs our ACPI Error is a dispatcher called acpi_ev_fixed_event_dispatch, sits in drivers/acpi/acpica/evevent.c and fails due to not having a reference to an event handler for an supposedly fixed event:

if (!acpi_gbl_fixed_event_handlers[event].handler) {
  ...
	ACPI_ERROR((AE_INFO, "No installed handler for fixed event - %s (%u), disabling", acpi_ut_get_event_name(event), event));
	return (ACPI_INTERRUPT_NOT_HANDLED);
	...
}

Since the Surface tablet uses “Control Method Power Button” instead of fixed events, this is sort of expected. During initialization, the kernel assigns fixed event handlers only if the device in question mandates it: The array acpi_gbl_fixed_event_handlers is populated by acpi_install_fixed_event_handler() in drivers/acpi/acpica/evxface.c, which in case of the power button is called from acpi_device_install_notify_handler() (in drivers/acpi/bus.c) only if the ACPI device is of the correct type, e.g. is a “Fixed Hardware Power Button”:

if (device->device_type == ACPI_BUS_TYPE_POWER_BUTTON) {
	status = acpi_install_fixed_event_handler(ACPI_EVENT_POWER_BUTTON, acpi_device_fixed_event, device);
} ...

Based on our prior observations we can safely assume that this handler wasn’t installed on the Surface. But why does it try to dispatch a fixed event? For that, we can follow the trail leading to acpi_ev_fixed_event_dispatch. That event handler is called from acpi_ev_fixed_event_detect (in drivers/acpi/acpica/evevent.c), which effectively checks the PM1 Status and PM1 Event registers to figure out if an event is fixed or a GPE. This method also contains a debug statement to show the raw values of both registers, which is exactly what we’re looking for:

ACPI_DEBUG_PRINT((ACPI_DB_INTERRUPTS,
			  "Fixed Event Block: Enable %08X Status %08X\n",
			  fixed_enable, fixed_status));

Debug logs from ACPI_DEBUG_PRINT aren’t shown by default (due to their sheer volume), but can be enabled selectively by adding acpi.debug_layer and acpi.debug_level to the kernel command line. The Linux Kernel documentation has a short guide on that, including some examples. So to receive output from that particular statement, I added acpi.debug_layer=0x4 acpi.debug_level=0x8000000 to the GRUB command line during boot. With that set, dmesg -w started displaying PM1 register values as soon as the power button was pressed.

After a fresh reboot, the output was:

Fixed Event Block: Enable 00000020 Status 00000100

That line always occurred twice, once when the button is pressed and a second time upon release. Interestingly, after a forced hibernate and resume cycle, the value of the Enable register suddenly differed:

Fixed Event Block: Enable 00000120 Status 00000100
ACPI Error: No installed handler for fixed event - PowerButton (2), disabling

There we go, that definitely looks like we’re on the right track. The specification contains tables that show the purpose of each individual bit for both registers. In both cases the PM1 Status register has its 8th bit set (0x100), called PWRBTN_STS, which is an optional and indicates that the power button has been pressed. That’s fine. The PM1 Enable register has in both cases its 5th bit set (0x20), which is the “global enable bit” (GBL_EN). The interesting part is the 8th bit of the PM1 Enable register: That one isn’t set after a reboot and suddenly present after hibernation. It is called PWRBTN_EN and the specification describes its purpose as “This optional bit is used to enable the setting of the PWRBTN_STS bit to generate a power management event (SCI or wake)." From my understanding of the terminology, this means that if both PWRBTN_STS and PWRBTN_EN bits are set, the resulting event (via an SCI - a System Control Interrupt) is a fixed one. However, the specification for PWRBTN_EN continues with:

Support for the power button is indicated by the PWR_BUTTON flag in the FADT being reset (zero). If the PWR_BUTTON flag is set or a power button device object is present in the ACPI Namespace, then this bit field is ignored by OSPM.

Since the Surface tablet uses a “Control Method Power Button”, the PWR_BUTTON flag is set in the ACPI FADT table (as we will see shortly) and the power button device object PNP0C0C is present, this field should be ignored by the OSPM (ACPI subsystem). I have no clue why the tablet’s firmware sets this bit after hibernation in the first place, but on the same time I’m wondering why it isn’t ignored by the kernel’s ACPI implementation.

At this point I decided to attempt a dirty workaround: What if the PWR_BUTTON flag in the ACPI FADT table wasn’t set anymore, thus announcing a “Fixed Power Button” and forcing the kernel to assign a handler for it? This would probably solve the issue after hibernation, but I wasn’t sure whether the button would still work after a regular boot (where the firmware would signal a GPE).

Workaround: Patching ACPI tables

The good news about trying to overwrite ACPI tables with own stuff is that one doesn’t have to modify the vendor’s tables within the firmware. Upon boot, most operating systems support temporarily overriding the firmware-supplied tables upon boot with custom data (for Linux, read initrd_table_onverride.txt for some documentation). In case the changes don’t work as expected or break something, one can simply boot in a different configuration without the modifications.

The following steps are mostly taken from the DSDT article of ArchWiki. Even though it explains how to overwrite the DSDT, the same steps can be applied for any other ACPI table. In this case, we want to overwrite the PWR_BUTTON field in the FADT table. The kernel exposes its loaded ACPI tables within the sysfs filesystem, which allows us to dump the FADT table (which has the signature FACP, only god knows why) via

$ cat /sys/firmware/acpi/tables/FACP > facp.dat

Since facp.dat is binary ACPI Machine Language (AML) data, we require the Intel ACPI Source Language compiler/decompiler iasl that is part of the ACPICA implementation and available in most Linux distributions' repositories. Alternatively, we could also use the Microsoft ASL Compiler - which was also originally used to compile the Surface’s ACPI tables as the string MSFT in our table dump implies. Due to laziness I just went with iasl and didn’t encounter any issues.

To decompile the FACP’s AML data into ACPI Sourcec Language (ASL), we use

$ iasl -d facp.dat

This creates a new human-readable text file facp.dsl. In there, we find the line with the PWR_BUTTON flag, it is called

Control Method Power Button (V1) : 1

and - as expected - set to 1. For our genius patch, we change its value to 0. For the overwrite on boot to work, we also have to increase the value of Oem Revision at the top of the file. All in all, my patch looked like this:

-[018h 0024   4]                 Oem Revision : 00000000
+[018h 0024   4]                 Oem Revision : 00000001
-            Control Method Power Button (V1) : 1
+            Control Method Power Button (V1) : 0

The modified facp.dsl can now be recompiled back to AML by executing

$ iasl -tc facp.dsl

This creates a file facp.aml, which is the patched FADT in AML format. We then put it into an CPIO archive and move it to the boot partition (again, steps are taken from the ArchWiki):

$ mkdir -p kernel/firmware/acpi
$ cp facp.aml kernel/firmware/acpi
$ find kernel | cpio -H newc --create > acpi_override
# cp acpi_override /boot

The last step is to configure the bootloader to load the ACPI overrides via the initrd during boot. My Manjaro installation uses the GRUB2 bootloader, which can be instructed to include our CPIO archive by adding the line

GRUB_EARLY_INITRD_LINUX_CUSTOM="acpi_override"

to /etc/default/grub and invoking update-grub to regenerate the GRUB2 configuration on the boot partition.

After another reboot, we can verify that the override was applied by searching through dmesg output for a messages such as:

ACPI: FACP ACPI table found in initrd [kernel/firmware/acpi/facp.aml][ox10c]
ACPI: Table Upgrade: override [FACP-MSFT  -MSFT    ]
ACPI: FACP 0x000000008CFF3000 Physical table override, new table: 0x000000008A64A000

Afterwards, the power button works as expected even prior to the first hibernation cycle. Unsurprisingly, the ACPI events seen in userspace differ after the override was applied, as can be seen in the following acpi_listen output after having pushed the power button:

Without FADT override:
button/power PBTN 00000080 00000000 K

With FADT override, after regular boot:
button/power PBTN 00000080 00000000 K
button/power PBTN 00000080 00000000
button/power LNXPWRBN:00 00000080 00000001

With FADT override, after hibernation:
button/power PBTN 00000080 00000000
button/power LNXPWRBN:00 00000080 00000005

With the ACPI override in place, an additional event will be raised when pressing the power button prior to the first hibernation cycle (probably the regular power button GPE). Since GNOME Power Manager seemed to process the events properly and I didn’t encounter any further issues, I didn’t investigate that further. Please keep in mind that this is just a workaround and not a clean solution.