Rick Swift & Apple & Embedded I make things. Sometimes, I’ll talk about it here.

My Gorram Frakking Blog

Using Templates to Statically Allocate Thread Working Area in ChibiOS

On both 8-bit AVR and 32-bit ARM (AT91SAM7X and SAM3S), I've been using ChibiOS. It's a nifty little OS that supports fully-static operation. That is to say, it's possible to allocate all OS structures statically, at compile time, so none need be allocated dynamically at run-time, an operation that can possibly fail. This also allows the exact memory requirements to be known before loading the code onto the target.

I wrote a CThread class (so named to avoid conflict with the OS Thread object) that wraps the allocation of the thread working area and OS thread creation. To do this, CThread is a template class, parameterizing the stack size. Clients subclass CThread and implement the virtual msg_t entry() method.

Note: If you're looking for the description of what goes wrong when the compiler fails to properly initialize static C++ object instances, I moved that material to a new post, Trouble with Static C++ Constructors in ChibiOS/ARM.

ChibiOS Threads

In ChibiOS, to create a thread, you allocate the thread working area with a macro provided by the OS, and call chThdCreateStatic():

[cpp]
WORKING_AREA(sMyThreadWA, 512);

msg_t
MyThreadEntry(void* inArg)
{
// …do work

return 0;
}

void
MyProgram::startAThread()
{
Thread* t = chThdCreateStatic(sThreadWA, sizeof (sMyThreadWA), NORMALPRIO, MyThreadEntry, NULL);
}
[/cpp]

The last parameter to chThdCreateStatic() is passed into the thread's entry point. We use it later to pass a reference to the thread class.

As soon as chThdCreateStatic() is called, the thread begins executing. ChibiOS provides numerous synchronization primitives, but we won't get into those here.

CThread Class

The idea with the CThread wrapper is to provide a class to be subclassed to tidy up the creation of a thread. It would be used like this:

[cpp]
class
MyThread : public CThread<512>
{
protected:
virtual msg_t entry();
};
[/cpp]

And the implementation:

[cpp]
msg_t
MyThread::entry()
{
// …do work

return 0;
}
[/cpp]

Finally, the thread is allocated as a global (as before), and started:

[cpp]
MyThread sMyThread;

void
MyProgram::startAThread()
{
sMyThread.start(NORMALPRIO);
}
[/cpp]

Considerably tidier, isn't it?

Pulling this off requires two classes: A non-template BaseThread class that provides the basic thread functionality, and the CThread template class that derives from it. Note that I do this to try to avoid redundant code generation, which can probably be done using partial specialization or a smart compiler, but I wasn't sure how much luck I would have. The approach does result in a an extra member variable in the base class: the working area size from construction to be used when the thread is started.

BaseThread::entry() should be pure virtual, but I had link errors on AVR with that.

Here's the complete implementation.

CThread.h:

[cpp]
/**
CThread.h

Created by Roderick Mann on 2/3/11.
Copyright 2011 Latency: Zero. All rights reserved.

*/

#ifndef __CThread_h__
#define __CThread_h__

#include "ch.h"

/**
*/

class
BaseThread
{
public:
BaseThread(void* inWorkingArea, size_t inWorkingAreaSize);

void start(tprio_t inPriority = NORMALPRIO);

msg_t sendMessage(msg_t inMsg, void* inContext);
Thread* getSysThread() { return mSysThread; }

protected:
virtual msg_t entry();

private:
static msg_t ThreadEntry(void* inArg);

void* mWorkingArea;
uint32_t mWorkingAreaSize;
Thread* mSysThread;
};

inline
msg_t
BaseThread::ThreadEntry(void* inArg)
{
BaseThread* self = reinterpret_cast<BaseThread*> (inArg);
return self->entry();
}

/**
*/

template<size_t inStackSize>
class
CThread : public BaseThread
{
public:
CThread()
:
BaseThread(mWorkingArea, sizeof(mWorkingArea))
{
}

protected:
virtual stkalign_t* getWorkingArea() { return mWorkingArea; }

private:
WORKING_AREA(mWorkingArea, inStackSize);
};

#endif // __CThread_h__
[/cpp]

And the implementation:

[cpp]
/**
CThread.cpp

Created by Roderick Mann on 2/3/11.
Copyright 2011 Latency: Zero. All rights reserved.
*/

#include "CThread.h"

#include "ch.h"

BaseThread::BaseThread(void* inWorkingArea, size_t inWorkingAreaSize)
:
mWorkingArea(inWorkingArea),
mWorkingAreaSize(inWorkingAreaSize),
mSysThread(NULL)
{
}

msg_t
BaseThread::entry()
{
return 0;
}

void
BaseThread::start(tprio_t inPriority)
{
mSysThread = chThdCreateStatic(mWorkingArea,
mWorkingAreaSize,
inPriority,
ThreadEntry,
this);
}
[/cpp]

What you see above is a little messier than it could be, given a number of issues I ran into while developing it, and concerns about code bloat. But it works reasonably well.

Baro Sensor Works!

It's amazing how often a problem that seems unsolvable while you're working on it ends up having an easy solution after you put it aside for a while.

Several months ago I wrote about a problem with a Measurement Specialties barometric pressure sensor. I had come to the conclusion that either the sensor was faulty, or I had damaged it during installation on the board. I kept putting off desoldering it, partly because it's a challenging part to solder, and partly because I only had two spares, and they're expensive; I didn't want it to be faulty.

Baro Sensor

Well, late last night I got the bug to look at it again. The data sheet shows the calculations that need to be made to get a calibrated result, and shows "typical values" for each of the six factory calibration parameters, uncalibrated pressure and temperature measurements, and each step of the process. It never dawned on me that those values might all be part of one measurement and calculation operation.

So I wrote a small app on the Mac that used the same calculation code that was on the sensor board, but put in the example values instead. Sure enough, the result I got did not match, and I started looking into the intermediate results. I noticed one of those was exactly double the example value, and that got me looking at the implementation of the equation. Looking very closely at the data sheet, I started re-writing the equations. Turns out, the code I had found on their site was incorrect, and the code I wrote based solely on the data sheet worked correctly.

For reference, here is C/C++ code that works. mC1 through mC6 are the calibration parameters from the device ROM. mRawTemperature and mRawPressure are the raw sensor readings. mTemperature and mPressure are the final, calibrated result. Temperature is in degrees Celcius * 100, so you have to divide the result by 100 to get the temperature. Pressure is in millibars * 100, so do a similar division to get mb.

[cpp]
{
…
int64_t dT = mRawTemperature - mC5 * 256LLU;
mTemperature = 2000LLU + dT * mC6 / 8388608LLU;

int64_t offset = mC2 * 65536LLU + dT * mC4 / 128LLU;
int64_t sens = mC1 * 32768LLU + dT * mC3 / 256LLU;
mPressure = (mRawPressure * sens / 2097152LLU - offset) / 32768LLU;
…
[/cpp]

This sensor should prove to be very accurate, and will give us the balloon's pressure altitude, as well as the temperature inside the insulated payload box. It's only real drawback is a lower pressure limit of 10 mbar, corresponding to an altitude of about 26 km (~85 kft). We're hoping to go past 30 km (~100 kft). Hopefully the GPS will be a good backup.

Decimal Days and Metric Time

I always thought it would be cool if we counted time in units that were multiples of ten of each other. This is known as decimal time. There'd be 100 seconds in a minute, 100 minutes in an hour, and ten hours in day. A recent Twitter conversation got me thinking about this again.

The problem with this idea is that it requires the second to be redefined. To be sure, there are other problems, too, like how do you convince six billion people to change their notion of what seconds, minutes, and hours are?

But changing the length of a second means changing a lot of scientific constants, and that's a pretty serious undertaking. It would be better to start with the existing unit of a second (which, incidentally, is a metric unit), and build on top of that.

So, let's put aside the actual labels we'll give each of these units, and let's keep a second a second. Now let's put 100 seconds in a minute, 100 minutes in an hour, and that gives us 8.64 hours in a day. That's not a very nice, round number. How important is it that it be a "nice" number, though?

On Earth, the average day is 86,400 seconds long. On Mars, the average day (sol) is 88,775 seconds. In our decimal time units, that would be 8.86 hours. On Earth's moon, a day is 2,551,443 seconds long, or 255.1 hours (are you starting to see the advantage of decimal time?). Obviously, as our species spreads out to other celestial bodies, having an even number of hours in the day everywhere will impossible, because not all days will be the same duration.

So, let's compare some other durations:

Duration Traditional Units Decimal Time
Second 1 s 1 s
Traditional Minute 1 m 0.6 m
Traditional Hour 1 h 0.36 h
Day 24 h 8.64 h
Shower 15 m 0.36 h
Typical Work Day 8 h 2.88 h
Lunch 1 h 0.36 h
Movie 2 h 0.72 h

Yuck! The problem with decimal time based on the existing second is that there are no conveniently-sized units for most day-to-day human activity.

The quarter-hour, or 15 minutes, is 900 seconds. That's nearly 1000 seconds, so perhaps the kilosecond would be a convenient unit. Ten kiloseconds would be a little over 2 h 45 m traditional, so maybe we're on to something here.

Duration Traditional Units Kilseconds
Kilosecond 0.27 h, 16.6 m 1 ks
Traditional Hour 1 h 3.6 ks
Decimal Hour 2.78 h 10 ks
Day 24 h 86.4 ks
Shower 15 m 1 ks
Typical Work Day 8 h 29 ks
Lunch 1 h 4 ks
Movie 2 h 7 ks

I've rounded the kilosecond times for the activities, because their durations aren't very precise to begin with. It seems like the kilosecond could be a fairly convenient unit, after all.

Now we just need to find good names for these decimal units. Seconds are fine, but the rest need new names that are easy to say, abbreviate acceptably to unit labels, and don't sound cheesy (the old Battlestar Galactica used "centons," and I never did figure out how much time that represented).

We might also ponder how one writes decimal time. Traditionally, in the U.S. and other parts of the world (but definitely not all!), a time of day (or duration) is written as double-digit numerals separated by colons: 12:37:58. Decimal time can be written much more simply, as decimal hours in the day: 4.548. Now, imagine you want to add (in traditional units) 4 minutes and 22 seconds to 12:37:58. Try it. It sucks. But adding 262 seconds to 4.548 decimal hours is much easier: 4.548 h + 0.0262 h = 4.574 h.

It may seem weird, but if you grow up with these units, and everything around you uses them, they'd be obvious, and the units we use today would seem strange.

Sources

Metric time. (2011, February 17). In Wikipedia, The Free Encyclopedia. Retrieved 05:51, March 27, 2011, from http://en.wikipedia.org/w/index.php?title=Metric_time&oldid=414432996

Timekeeping on Mars. (2011, January 15). In Wikipedia, The Free Encyclopedia. Retrieved 05:52, March 27, 2011, from http://en.wikipedia.org/w/index.php?title=Timekeeping_on_Mars&oldid=408111863.

Lunar day. (2011, March 17). In Wikipedia, The Free Encyclopedia. Retrieved 05:53, March 27, 2011, from http://en.wikipedia.org/w/index.php?title=Lunar_day&oldid=419235268.

NASA Mission Audio

http://wow05.ustream.tv/ustreamVideo/114136/BW2/streams/live_1_audio/playlist.m3u8

Troubleshooting Adventure

The last 24 hours have been an exercise in frustration, sleeplessness, wrong turns, dead ends, and embarrassment. I've been working on a little project that developed a problem, seemingly inexplicably, and I could not find the cause.

This little AVR-based project includes a 16x2 character LCD display. For the last week, it has been working like a champ. I got a lot of fundamentals worked out, and decided to start cleaning up the code. It had gotten to be quite a mess, as I quickly worked through the various building blocks of the overall device, and I needed to make it look more like the final product would.

I would make small changes, compile them and load them onto the device, making sure things worked, or worked the way I wanted them to. Suddenly, the LCD stopped working. I could tell the MCU was doing its job, as the heartbeat LED kept blinking, and I was getting debugging output from the serial console attached. But no LCD.

Since the last change I made was to add a MOSFET so that I could power down the LCD when not in use, I thought perhaps I had damaged the LCD. I removed the new circuitry, and spent some time searching for a similar LCD to try. Found one, popped it in, and it behaved the same way!

Perhaps I had damaged the MCU. Unlikely, because everything else was working perfectly. So I replaced that, flashed it, and tried again. No dice. Now, I was already operating on very little sleep this weekend, and it was after midnight. Had I been firing on all cylinders, I would've abandoned the effort and gone to bed. Or had looked for a software problem first. But I had been on such a roll, and I don't back down easily from a challenge like this.

I fired up the oscilloscope and started probing the connections between the MCU and LCD. They were one of the first things I had checked, making sure everything was still connected. Since I had tried the new MCU with the old LCD, I thought maybe it was damaging the MCU pin drivers. So I checked each one to see if it was changing state.

I found five that weren't! I pored over the code, thinking somewhere I had introduced a change (by accident), that disabled some of the pins. I couldn't find anything. I tried a third MCU. I tried a third LCD. Nothing. Same five pins not working. Then I realized that there must be an internal peripheral on those pins inside the MCU that was overriding the general I/O functionality. Looking at the data sheet, I saw that the JTAG interface lived on those pins, and a vague memory floated up: new MCUs have the JTAG enabled by default.

So, I disabled that, excited that I had figured things out, and tried again.

No luck. Still didn't work. Argghhhhhh!

I gave up. Went to bed (now well past 2 am), got up late the next morning, went to work, came home. Watched an hour of TV, then came in here to figure out the problem.

I decided to revert my source code back to a known-working revision. I was currently on revision 15, and the checkin comments showed the LCD had started working after r7. So, I updated my code to that revision, and tried it. It worked! Praise jeebus!

I tried to see the differences between that code and the latest, but there were too many changes. I updated to the latest code and tried again, just to verify the problem was in the code, and it still worked! WTF? Now, somewhere in here I had made a couple other changes to the code, trying to undo the most recent additions. The latest code had those changes (I had checked them in before reverting), so I was really confused.

But sure enough, it seemed to work. So I put back the code I had just taken out, to try to reproduce the problem. It still worked. WTH? (Ironically, I was now looking for failure, because that would tell me I had figured out the "root cause," as NASA likes to put it.) I could not reproduce the problem.

So, I cleaned up the recent experimentation, checked in the code, and set about to do new work. I wanted to measure the current consumption in various operating modes, so I wired in the ammeter.

Suddenly, the problem reappeared. Argghhhhhh!

I started to think that I wasn't giving the LCD sufficient voltage. The prototype design had a diode from the Vcc to the rest of the circuit, and I thought maybe that little voltage drop was enough to cause problems. That didn't really explain why it had been so reliable up until now, or why it was so unpredictable. I thought maybe the LCD backlight was drawing too much current, and the supply was sagging. Maybe I had accidentally increased its brightness in the code. So I set it to be really dim. No help. I removed the diode. No help.

It finally occurred to me that perhaps I wasn't giving the LCD enough time to get stable power before beginning the initialization sequence. I went looking for the lcdInit() call in the code to add a small delay before it. But I couldn't find it! Some where in last night's cleanup, I had deleted the call to initialize the LCD! The cascade of emotion that washed over me was intense. Relief that I had finally tracked down the problem, anger at the hours wasted, regret for the sleepiness I'd felt all day, all stewing in the embarrassment that I hadn't gone about the troubleshooting more methodically.

I put the call back in, and everything worked great. Phew.

But why did it sometimes work and sometimes not?

It turns out, the LCD currently doesn't get shut down when the system goes to sleep or resets. So, when I ran older code, the LCD got initialized. When I loaded newer code, it remained initialized, and so would work. But when I disconnected power in order to insert the ammeter into the circuit, the LCD reset itself. Because I was pulling power to the circuit deliberately to try to track down the problem, it was resetting itself.

So many time-consuming wrong turns, so many red herrings. In the end, it was a software problem, one I should have caught much earlier, but because I didn't carefully examine all the changes made between revisions, I didn't notice it.

Hopefully a lesson re-learned.