Porting a flash LED driver upstream

Translations: br
Publication date: Nov 20, 2020
Tags: nexus5, kernel

Now that I had a working serial cable for my Nexus 5, as described in the "Making an UART cable for the Nexus 5" post, I was ready to face some action and help in upstreaming.

Looking through Brian Masney's TODO page there were a couple options, but the one I ended going for was the rear flashlight. Hardware doesn't get much simpler than an LED and it would be easy to test if it was working.

To my surprise though, the downstream driver had over 3500 lines! But it wasn't only for the flash LED, it supported multiple LED types: WLED, Flash/Torch, RGB, MPP and KPDBL. To make it easier to port and since the flash would be the only one I would be able to test and be sure that worked, I decided to create a new file for the driver and only copy over what was needed for the flash LED.

I started by copying the probe function, compiling the driver, and seeing which errors of missing definitions were thrown. If those definitions had "flash" or "torch" in the name, I would also copy them, otherwise I just removed those references. I repeated this until eventually there were no more missing definition errors and my driver had everything needed for the flash LED.

At this point I still had a lot of compiling errors though, since the downstream driver was for kernel 3.4 and I was compiling for 5.7.6. So I just went through each error, referencing the definitions on both the downstream and upstream kernels, and made the necessary changes.

With the driver successfully compiling, I added a CONFIG for it and enabled it as a module in the defconfig used by Nexus 5 (qcom_defconfig). I also went through the devicetree files on the downstream tree to find which nodes I needed to describe the flash LED hardware to the driver, and the properties needed in them, so I could also add them upstream.

Note: The design of devicetrees can make the properties of a node be scattered across different files. Later I learned that I could compile the downstream kernel and read the source for the complete tree from the blob with dtc -I dtb -O dts -o downstream_dt.dts kernel_lge_hammerhead/arch/arm/boot/msm8974-hammerhead-rev-11.dtb.

With a driver that compiles, a valid devicetree, and configs enabling the driver as a module, I was ready to finally compile my kernel with the driver and flash it to the phone. So I commited my changes and went to battle.

And of course it failed. In fact, it failed so badly that the driver wasn't even binding to the device. Not having a solid understanding of devicetrees and how the binding between the devices and drivers worked, I started researching about it.

A great resource I found was Solving Device Tree Issues (more on eLinux). It was in fact using the dt_node_info script that presentation shows that I found out that the device was being loaded but the driver wasn't. Also, the debugging techniques shown, like enabling the dynamic debug prints on probe functions helped me discover that my driver's probe function wasn't even being called.

After a lot of reading, both online documentation and other drivers' code, I noticed that my driver was registering in the SPMI bus, which made sense to me since it needs to communicate over that bus, but given that the flash LED node in the devicetree was being registered in the platform bus, my driver also needed to, otherwise they would never bind. So that was something I needed to change.

But making my driver register on the platform bus, made me no longer have the spmi_device pointer that I needed to use the SPMI functions to read and write on the registers. Again, looking around on other drivers, like qcom-spmi-iadc, I discovered there was this cool regmap thing I could use instead to read and write on the registers over SPMI but without being specific to SPMI. I did the reasonable thing and just tried it out!

With these changes commited, the driver was now binding to the device, but the probe function was failing with the following messages:

[   14.547394] spmi spmi-0: pmic_arb_wait_for_done: transaction failed (0x3)
[   14.547405] qcom,leds-qpnp fc4cf000.spmi:pm8941@1:qcom,leds@d300: Unable to read from addr=5, rc(-5)
[   14.547503] qcom,leds-qpnp fc4cf000.spmi:pm8941@1:qcom,leds@d300: Regulator get failed(-517)
[   14.547512] qcom,leds-qpnp fc4cf000.spmi:pm8941@1:qcom,leds@d300: Unable to read flash config data

Given that the regulator was the one failing, and that I had just copied over the regulator nodes from the downstream devicetree, the problem was clearly there. I needed to find out what regulators were needed for the LED to work, and add them upstream if they weren't already there.

At this point I emailed Brian Masney asking for some light, and the advice he gave me was to compile and test the downstream kernel. Having something that works to use as a reference, even if it is very outdated, is invaluable.

Using an older toolchain to compile, as instructed in the build script, and after a minor problem, I got the downstream kernel compiling, flashed it, and verified that the LED and the downstream driver indeed worked. I should really have done this from the beginning... Imagine if the flash LED itself was actually faulty!

Then I started exploring the sysfs on this system to see how it worked. I found the regulator that was being used by the LED, whose status went to enabled whenever I turned the LED on with echo 1 > /sys/class/leds/led\:flash_torch/brightness.

With the downstream regulator node and devicetree, and the upstream driver for the regulator (qcom_spmi-regulator) and its dtbinding in hand, I started doing some detective work.

After some investigation I finally discovered that, given that the regulator address was 0xa000, the regulator named 8941_boost downstream is known as s4 in the upstream kernel, or also by its nickname pm8941_5v.

I still had to discover the other regulator's real identity, but having gained some confidence from that fine detective work, and with some tips from the devicetree, I bet all my money that pm8941_chg_boost's real identity was 5vs1, also called pm8941_5vs1.

Commit, flash, test, aaand... nope. It still didn't work, although I had clearly made some progress. Now the probe function was successfully being executed, although the read and write operations on the SPMI registers were still failing:

[   13.346704] leds_qpnp:qpnp_leds_probe: Probe called!
[   13.346746] spmi spmi-0: pmic_arb_wait_for_done: transaction failed (0x3)
[   13.346760] qcom,leds-qpnp fc4cf000.spmi:pm8941@1:qcom,leds@d300: Unable to read from addr=5, rc(-5)
[   13.347250] leds_qpnp:qpnp_dump_regs: ===== led:flash_0 LED register dump start =====
[   13.347285] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x40 = 0x0
[   13.347319] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x41 = 0x0
[   13.347353] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x42 = 0x0
[   13.347385] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x43 = 0x0
[   13.347419] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x44 = 0x0
[   13.347445] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x48 = 0x0
[   13.347470] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x49 = 0x0
[   13.347496] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x4b = 0x0
[   13.347530] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x4c = 0x0
[   13.347563] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x4f = 0x0
[   13.347589] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x46 = 0x0
[   13.347614] leds_qpnp:qpnp_dump_regs: led:flash_0: 0x47 = 0x0
[   13.347622] leds_qpnp:qpnp_dump_regs: ===== led:flash_0 LED register dump end =====

Since I was confident that the devicetree was now right, I went back to the driver code. I sprinkled some pr_debug() over the probe function and noticed that the reg value, which is read from the devicetree and used as the base address for all read and write operations, was 0, even though it should be 0xd300.

Oh. Really?? Well, this wouldn't be a complete adventure without me introducing a dumb bug myself, right? 😝

After fixing the bug, I recompiled, reflashed, rebooted, retested aaand... YESS!!!

Nexus 5's flashlight being turned on and off from the command line

Isn't it sooo beautiful? 😍

But let's not get too amazed by that gorgeous light. Now that I finally had a driver that actually worked, I rebased everything on top of mainline (at this point the kernel had already moved from 5.7 to 5.9) to make sure it still worked and sent an RFC patch!

And that's the tale of "How I ported a driver upstream"! Now, that's the end of this post, but not of the adventure. There are lots of things that I need to do to get that driver patch series actually included in mainline (some of which I will still find out from the feedback on my patches).

Since my goal here was always to just to get it working, changing as minimum as possible, and only after that clean it up and make it decent, this is what I'll need to start doing now πŸ™‚.

Just one last thing. This post might have made it seem that the problem resolutions were streamlined, but it wasn't like that at all. There were multiple points were I had no idea what to do, sat on it for multiple weeks, and even thought about giving up and working on another project.

But I'm glad I always kept trying and asking for help, because as much as it was frustrating at times, it was also so much fun and I learned so much. I can't even express my happiness the moment that light finally turned on after 4 months (on and off) of work.

And that's it! I hope to be back in a few months with a new post telling my adventure on getting an RFC patch for a new driver into an actual upstream driver πŸ™‚. See you then!