CardOS: Now compiling without Arduino!

Publication date: Jul 31, 2024

Tags: cardputer, esp32, os

In the "CardOS: Writing an OS for the Cardputer" post I shared about the OS that I'm writing for the Cardputer and that the next step was to move away from the Arduino toolchain. It took me two months but I finally did it. The end product was this commit but I'd like to go over the process to get there in this post.

I started by running the Arduino commands to build and flash with verbose output enabled and saving that output for reference. This way I could see what exactly Arduino was doing and understand the steps needed to get from a source to a binary flashed to the chip's memory.

I then got to doing the obvious needed changes to convert from Arduino to C: I renamed the single source of the project from cardOS.ino to cardOS.c, changed the setup and loop functions into a main function, and changed the compilation step in my Makefile to rather than calling Arduino, call the ESP32 C cross-compiler (xtensa-esp32s3-elf-gcc), which had been installed on my machine by Arduino and whose path I learned from Arduino's output.

Next I ran the compilation, and fixed each of the errors that were thrown by the compiler. Namely I had to add a few includes, function prototypes, change the way a couple variables were defined and pass the -fno-builtin flag to the compiler so it would allow me to define my own stdlib functions. With that, I had an ELF file and needed to figure out how to do the flashing.

From looking through Arduino's output, I learned that esptool.py was the command used to convert the ELF file into a binary, and then called again to flash the binary into the chip. Besides the application code, there were other things being flashed: a bootloader and a partition table. In order to keep things simple, I did a quick test to verify whether I could ignore them for now (that is, assume they were already flashed before): I tweaked the OS code and used the Arduino toolchain to build and flash just that and the tweak showed up on the Cardputer, so the answer was yes.

With that in mind, I updated the Makefile to call esptool.py to convert the ELF generated by the compiler and then flash it to the board. At this point I had the whole procedure to get from the source to the flashed binary on the board figured out (or so I thought), so I ran it with make upload. But it did not work, the screen on the cardputer just wouldn't turn on.

I realized assuming all the code would just work in the new setup was too optimistic and decided to simplify the test as much as possible: I changed the code in main to simply drive the GPIO1 pin high and connected an LED to that pin. But even then, the LED didn't turn on.

This is where I got stuck for a while. The issue was clearly somewhere in the application binary. My two main theories were that either the binary itself was malformed, or I was missing some initialization code, since Arduino included a bunch of extra files during compilation.

Configuring the addresses in the ELF file

Hoping the issue was in the binary itself, since identifying missing initialization code from Arduino could take a while, I started investigating it.

I used readelf to see the contents of both my ELF and the one generated by Arduino and compared them. The biggest change in the header was this:

Mine:

Entry point address:               0x40017d

Arduino's:

Entry point address:               0x40376778

Arduino's ELF had much more code in it, so I expected its address to be higher, but this was too much of a difference. It looked like some base address was intentionally set to something different.

I looked up the meaning of the "Entry point address" in an ELF and confirmed that this is the address where execution starts from. So getting it wrong could definitely make my code not work at all.

Looking further through the ELF's content:

Mine:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00400000 0x00400000 0x0183c 0x0183c R E 0x1000
  LOAD           0x00183c 0x0040283c 0x0040283c 0x001a1 0x0075c RW  0x1000

 Section to Segment mapping:
  Segment Sections...
   00     .text .rodata .eh_frame
   01     .ctors .dtors .data .bss

Arduino's:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x001020 0x3c000020 0x3c000020 0x2a678 0x2a678 RW  0x1000
  LOAD           0x02c000 0x3fc88000 0x3fc88000 0x0c018 0x0d8a8 RW  0x1000
  LOAD           0x039000 0x40374000 0x40374000 0x0ca97 0x0ca98 RWE 0x1000
  LOAD           0x046020 0x42000020 0x42000020 0x1f347 0x1f347 R E 0x1000
  LOAD           0x000000 0x50000000 0x50000000 0x00000 0x00010 RW  0x1000

 Section to Segment mapping:
  Segment Sections...
   00     .flash_rodata_dummy .flash.appdesc .flash.rodata
   01     .dram0.dummy .dram0.data .dram0.bss
   02     .iram0.vectors .iram0.text .iram0.data
   03     .flash.text
   04     .rtc_noinit

After reading up online about ELF program headers and sections this started making sense.

Looking at the program headers on the Arduino ELF, there is indeed an address very close to the entry point address: 0x40374000 on segment 2. That is the base address I was suspecting. The code that starts executing then must be in the iram0.text section, since sections with text in the name represent code, and that one is mapped to segment 2. The questions that come to mind are:

Why is this address used?
How do I set this address in my ELF file?

To answer the first question, I went looking in the ESP32-S3 manual. Figure 4-1 shows a diagram with how each memory range is mapped. The starting address of the segment used for the entry point code in Arduino, 0x40374000, is inside the range 0x40370000 - 0x403dffff which is shown as mapped to SRAM through an instruction bus which totally makes sense!

But reading on, Table 4-1 further breaks down the memory regions and names the region containing 0x40374000 as "Internal SRAM0". In its description, it is mentioned the first 16KB of the space can be reserved as a cache for instructions stored in the flash memory. If we do the math, that means the usable instruction memory starts at... 0x40374000, exactly the address that was used for the code segment in Arduino's ELF! So that explains where this address came from.

To sum up, the problem I uncovered is that the ELF file I was generating had the code assigned to addresses that do not map to a memory region that can be accessed by the ESP32's processor to fetch instructions (ie accessible through an instruction bus) and therefore it couldn't be executed.

What was left was to answer the second question: How do I set the right address in my ELF file?

This was a big gap in my knowledge, I had no idea. But looking through Arduino's output, I saw that the compiler was being passed some .ld files through a -T parameter. Inside one of them called memory.ld I found addresses being defined! The -T flag entry in the compiler's manual page revealed these files were linker scripts, so I knew what I needed to learn about next.

I found this wonderful blog post about linker scripts that taught me everything I needed to know. With that information I was able to write my linker script to assign the right addresses to each ELF section: Not only for the code (.text), but also for the zero-initialized variables (.bss) and other variables (.data), whose addresses I figured out similarly by referencing the manual and the Arduino ELF. A few additional sections also had to be assigned to get rid of linker errors.

As you might have noticed from the ELF contents, Arduino has two code sections, iram0.text and flash.text, while I only have one, .text. For that reason, I decided to assign my .text section to flash since it has much more space. However, there was still no sign of life from the cardputer. Since I noticed that my code was small enough to fit entirely inside the SRAM, and Arduino's code started from SRAM, I decided to try that, and it worked!!

The ESP-IDF's application startup flow documentation describes what the second-stage bootloader (which is the one I'm relying on from Arduino) does and does not, and it mentions that it is the application's duty to finish setting up the flash MMU. That is probably why using the flash address for the code didn't work and why Arduino splits the code between an SRAM and a flash part. So at some point I will need to set up access to flash, but for now SRAM was enough.

Initialization code

Now that I had at least an LED turning on, I changed the code back to full operation, that is, initializing the display, rendering the shell and reading the keyboard.

The screen started getting cleared as usual but stopped midway. I removed the screen clearing routine temporarily and the shell prompt was written to the screen. I could type in, but after a short interval my position would go back to the intial one. I realized that the Cardputer was getting reset after a precise interval.

I remembered from the application startup flow documentation that the bootloader enabled the watchdog. And since now the application code was fully provided by me, I had to take care of it myself: either keep feeding the watchdog or disable it.

I chose to disable the watchdog for simplicity (as always), and after referencing the watchdog section in the ESP32-S3 manual and adding a few register writes to the code, it was done: the system no longer crashed. This allowed the shell to stay on screen but it was garbled:

Again, remembering from the application startup flow page, the application code (me!) was responsible for initializing the .bss section to zero. Since the .bss section contains the zero-initialized variables, if I don't initialize it, those variables will contain trash, which is what I was seeing here.

I didn't know how to do this the proper way, but I could definitely do it the dirty way, that is, manually listing all global variables in a function and zeroing them out 🙈. Which I did and it worked! (Note: I have figured out how to do it properly since and I did it in this commit)

Finally I had a working OS without needing Arduino to build! And that's the full story behind the "Move from Arduino to generic C build flow" commit.

There was still one difference from before the move though, it was much slower. That's because, for the last time referring back to the application startup flow page, the application is supposed to set the CPU clock frequency to the desired value. The default frequency is quite slow, but the startup code implicitly embedded by Arduino would set it to a higher value. I fixed that in a follow up commit.

Further improvements

With the move away from Arduino done, I was finally able to work on some much needed improvements.

The first thing I did was to split the monolithic source file into several different files in this commit. I was really looking forward to this and it felt great to finally do it 😌. Now the code is much more organized, it's easier to find things and to focus on a single component.

I also enabled all the main compiler warnings in this commit, including warning about implicit fallthrough in case switches, which would have saved me some minutes investigating a bug early in this project.

Conclusion

This was another great step for the project. I had a lot of fun and learned so much from it.

The next big thing to tackle is implementing SDcard read and write, and a file system. It will likely be a while before I get that done and come back with an update on the blog. Feel free to check the repository for the latest updates in the meantime if you're curious!