Nerd Ralph: April 2021

Friday, April 16, 2021

Pi ethernet gadget with reverse SSH proxy

I love my Pi Zeros. I think every hacker should have one in their toolbox. When I got my firs Pi Zero several years ago, I used a USB-TTL serial adapter to connect to the console UART on pins 8 and 10 of the Pi header. Once I learned how to setup the Zero as an ethernet gadget, things were a bit easier. However updating software was still a cumbersome process of downloading files to the host computer and then using scp to transfer them to the Pi. This blog post documents how to setup the Pi to use a SSH reverse proxy so utilities like git and apt work.

When I got my first Pi Zero, I chose the Pi OS Lite image. I decided to update to the March 4, 2021 release, and this time I used the Pi OS with desktop because it includes development tools like git. I followed the ethernet gadget setup instructions, modifying config.txt, cmdline.txt, and creating an empty file called "ssh". The next step is to configure the multicast DNS component of Zeroconf. As mentioned in the Adafruit instructions, if you are using Windows, the easiest way to do this is installing Apple's Bonjour service.

To use a reverse proxy over ssh, Windows users can't use putty as that feature is not supported. OpenSSH supports reverse socks5 proxies as of version 7.6. For connecting from Windows, I installed MSYS2, including OpenSSH 8.4. On Windows 10, WSL is probably the easiest option. To connect to the Pi and enable a reverse socks5 proxy on port 1080, enter, "ssh -R 1080 pi@raspberrypi.local".

Once connected to the Pi, set "http_proxy" to "socks5h://localhost:1080". The "h" at the end is important as it means the client will do hostname (DNS) resolution through the proxy. I added the following line to .profile to set it every time I login:

export http_proxy="socks5h://localhost:1080"

Programs such as git and curl will automatically use the socks proxy when the http_proxy environment variable is set. Note that github defaults to showing https URLs for repositories, which need to be changed to "http://" for the proxy to work.

The last configuration I recommend is setting the current date, since the Pi does not have a battery-backed RTC. I normally use ntpdate from the ntp project for manually setting the date and time on Linux, but it does not work with a socks proxy. After some searching I found a suggestion of using the HTTP Date: field from a reliable internet server. The command I use is:

date -s "`curl -sI google.com | grep "^Date:" | cut -d' ' -f3-7`"

Once the Pi Zero is configured and has the proper date and time set, I recommend running "apt update". If everything is working properly, it will use the socks5 reverse proxy to connect to the raspbian servers and update the local apt repository cache.

Saturday, April 3, 2021

Honey, I shrunk the Arduino core!

One of my gripes about the Arduino AVR core is that it is not an example of efficient embedded programming. One of the foundations of C++ (PDF) is zero-overhead abstractions, yet the Arduino core has a very significant overhead. The Arduino basic blink example compiles to almost 1kB, with most of that space taken up by code that is never used. Rewriting the AVR core is a task I'm not ready to tackle, but after writing picoCore, I realized I could use many of the same optimization techniques in an Arduino library. The result is ArduinoShrink, a library that can dramatically reduce the compiled size of Arduino projects. In this post I'll explain some of the techniques I used to achieve the coding trifecta of faster, better, and smaller.

The Arduino core is actually a static library that is linked with the project code. As Eli explains in this post on static linking, libraries like libc usually have only one function per .o in order to avoid linking in unnecessary code. The Arduino doesn't use that kind of modular approach, however by making use of gcc's "-ffunction-sections" option, it does mitigate the amount of code bloat due to the non-modular approach.

With ArduinoShrink, I wrote more modular, self-contained code. For example, the Arduino delay() function calls micros(), which relies on the 32-bit timer0 interrupt overflow counter. I simplified the delay function so that it only needs the 8-bit timer value. If the user code never calls micros() or millis(), the timer0 ISR code never gets linked in. By using a more efficient algorithm and writing the code in AVR assembler, I reduced the size of the delay function to 12 instructions taking 24 bytes of flash.

In order to minimize code size and maximize speed, almost half of the code is in AVR assembler. Despite improvements in compiler optimization techniques over the past decades, on architectures like the AVR I can almost always write better assembler code than what the compiler generates. That's especially true for interrupt service routines, such as the timer0 interrupt used to maintain the counters for millis() and micros(). My assembler version of the interrupt uses only 56 bytes of flash, and is faster than the Arduino ISR written in C.

One part that is still written in C is the digitalWrite() function. The Arduino core uses a set of tables in flash to map a given pin number to an IO port and bit, making for a lot of code to have digitalWrite(13, LOW) clear PORTB5. Making use of Bill's discovery that these flash memory table lookups can be resolved at compile time, digitalWrite(13, LOW) compiles to a single instruction: "cbi PORTB, 5".

ArduinoShrink is also designed to significantly reduce interrupt latency. The original timer0 interrupt takes around 5us to run, during which time any other interrupts are delayed. The first instruction in my ISR is 'sei', which allows other interrupts to run, reducing the latency impact to a few cycles more than the hardware minimum. The official Arduino core disables interrupts in several places, such as when reading the millis counter. My solution is to detect if the millis counter has been updated and re-read it, thereby avoiding any interrupt latency impact.

The only limitation compared to the official AVR core is that the compiler must be able to resolve the pin number for the digital IO functions at compile time. Although the pin may hard-coded, even with LTO enabled, avr-gcc is not always able to recognize the pin is a compile-time constant. Since AVR is not a priority target for GCC optimizations, I can't rely on compiler improvements to resolve this limitation. Therefore I plan to write a version of digitalWrite that is much smaller and faster, even when avr-gcc can't figure out the pin at compile time.

Although ArduinoShrink should be compatible with any Arduino sketch, given some of the compiler tricks I've used it's not unlikely I've missed a potential error. If you do find what you think is a bug, open an issue in the github repository.