Tuesday, March 26, 2013

Toward Reliable Virtual Wire Interface Messaging

As described in the previous posting on the Virtual Wire Interface (VWI) the communication channels are not reliable and lack node addressing. Messages may be lost due to noise, collision, etc. The basic message passing provided by VWI or the original VirtualWire library is boardcast and the application must, if needed, realize a method of addressing and retransmission. This posting presents the Cosa VWI client-server example sketches demonstrating basic implementation techniques addressing these issues.

The first step is to design a protocol for addressing and retransmission. The simplest method is to add a client node address and message sequence number to the data messages sent from the client to the server. The server should transmit an acknowledgement message back to the client. The client will retransmit the message until an acknowledge is received (CosaVWIclient.ino). To keep things simple, data is only sent from the client to the server. The server only sends acknowledgements to one client at a time. 

There are error situation that are ignored in this simple solution. A final solution should include a maximum limit of retransmission. If this limit is exceeded the communication channel, server, should be regarded as inoperative; e.g. server is down.

The message structure for this simple protocol will be sent as the payload of the VWI message. The data message sent from client to server will contain the address of the client (32-bits), a message sequence number (8-bit), and data payload. The acknowledgement message sent back from the server to the client will contain the client address and the message sequence as received in the data message. Additional data could be piggy-backed with the acknowledgement. This is a common method of reducing messages in protocols. The above extra message fields are often hidden from the application and part of the protocol stack.

Note that messages are sent in binary form (struct) without any special serialization. The sending of a message  may be viewed as copying a memory block between processors. This works nicely in a homogeneous environment such as sending between Arduino's but would give problems between processors with different byte order/data representation. The data representation below is in a portable data width but the byte order dependents on the host architecture.

AVR is little endian (LSB first) while network order is big endian (MSB first). Many of the original VirtualWire examples use text strings as messages. Sending, for instance, an integer in textual representation (ASCII) avoids the byte order issue but may be inefficient, require longer messages, be more difficult to extend and requires more processing to translate to and from text.

// Data message type
struct msg_t {
  uint32_t id;
  uint8_t nr;
  uint16_t data[12];
};

// Acknowledge message type
struct ack_t {
  uint32_t id;
  uint8_t nr;
};


To improve the message passing between processors with different byte order the data could be put in a specific order before transmission and put back when received. This is know as network order. Typically the functions/macros hton() (host-to-network) and ntoh() (network-to-host) are used to force byte order on the data fields in the message.

// Encoder/decoder for the Virtual Wire Interface
VirtualWireCodec codec;

// Virtual Wire Interface Transmitter and Receiver
VWI::Transmitter tx(Board::D9, &codec);
VWI::Receiver rx(Board::D8, &codec);
const uint16_t SPEED = 4000;

void setup()
{
  ...
  // Start virtual wire interface, transmitter and receiver
  VWI::begin(SPEED);
  tx.begin();
  rx.begin();
}


The above snippet from CosaVWIclient.ino is the setup() section for the Virtual Wire Interface (VWI). Please note how the transmitter and receiver are declared to use the VirtualWire Codec and the Board pins D9 and D8. Also that the transmitter and receiver are separate objects.

In the loop() section below, the client constructs a message with the node address (0xC05A0001) and the message sequence number (cnt). In the example sketch the data send is two analog pin readings and a rotating data pattern.

  // Statistics; Number of messages and error count (retransmissions)
  static uint16_t cnt = 0;
  static uint16_t err = 0;

  // Message types (data and acknowledgement)
  msg_t msg;
  ack_t ack;
 

  // Initiate the message with id, sequence number and payload data
  msg.id = 0xC05A0001;
  msg.nr = cnt++;
  msg.data[0] = luminance.sample();
  msg.data[1] = temperature.sample();
  for (uint8_t i = 2; i < membersof(msg.data); i++)
    msg.data[i] = ((cnt << 8) | ((i << 4) + i)) ^ 0xa5a5;


The client will send the message and wait for an acknowledgement. A retransmission will occur if an acknowledgement is not received within the time limit, 64 ms, or the acknowledgement message was wrong.

  // Send message and receive acknowledgement
  uint8_t nr = 0;
  int8_t len;
  do {
    nr += 1;
    tx.send(&msg, sizeof(msg));
    tx.await();
    len = rx.recv(&ack, sizeof(ack), 64);
    if (len != sizeof(ack))
      DELAY(300);
  } while (len != sizeof(ack) || (ack.nr != msg.nr) || (ack.id != msg.id));


  // Check if a retransmission did occur and print statistics
  if (nr > 1) {
    err += 1;
    INFO("cnt = %ud, err = %ud, nr = %ud (%ud%%)",
     cnt, err, nr, (err * 100) / cnt);
  }


The client will collect statistics on the number of retransmission and the total number of errors. The DELAY(300) is used to reduce collision between retransmission and acknowledgements.

The server is even simpler then the client as it only receives data messages, sends acknowledgements and "processes" new messages (CosaVWIserver.ino). Below is essential section of the server loop().

  // Wait for a message
  rx.await();
  msg_t msg;
  int8_t len = rx.recv(&msg, sizeof(msg));

  // Check that the correct message size was received
  if (len != sizeof(msg)) return;
 
  // Send an acknowledgement
  ack_t ack;
  ack.id = msg.id;
  ack.nr = msg.nr;
  tx.send(&ack, sizeof(ack));
  tx.await();


The above retransmission solution must be extended with additional logic to handle multiple clients and/or channels per client. Also the protocol header (address and sequence number) may be compressed to reduce the size of the messages, and improve wireless bandwidth efficiency. There are also techniques to improve throughput and flow-control with a sliding window protocol. The above sketch is the basic framework for reliable messaging.

The two VWI example sketches may be used to benchmark the different Codec's with regard to error and retransmission rate. The bandwidth efficiency may be calculated from the statistics and the encoder parameters. The original VirtualWire Codec is 4 to 6 bits (+50%), Manchester 4 to 8 bits (+100%), 4B5B and fixed bitstuffing are 4 to 5 bits (+25%).

Typical retransmission rates are in the order of 1-5% depending on antenna, bit-rate/speed, noise and distance but also the length of the message; i.e. number of bits used to transmit the message. With even distribution of noise, collisions, etc, the probability of an error will increase with the number of bits sent.

Another factor is how well the software Phase-Locked-Loop (PLL) in the receiver code can regenerate the clock and sample the data channel correctly. Where Manchester code increases the number of bits with 100% the clock is perfectly recovered and error due to drift are almost zero.

The Next Step

The above examples are part of the prototyping for further development of the Virtual Wire Interface (VWI) in Cosa. The idea is to evolve VWI to handle both addressing and automatic retransmission as this is a very common usage pattern for wireless connections. The current proposed changes to the API are to 1) add a new network address parameter to the begin() method, 2) some new functions for retransmission control and statistics. All other details about the address matching and retransmission would be hidden.

An addressing scheme is proposed to be added to allow server and client address matching. Depending on the number of units needed in a small network the address would be 16- or 32-bits. A simple interpretation of the address may be used to distinguish between servers and clients. A servers would have a network address where the lowest byte is zero(0). A server would be allowed to have at most 255 clients. On recv() a server will match addresses with the same highest bytes in the address as the server it self while clients would match the whole address for acknowledgement frames.

VWI: [preamble, start-symbol, size, payload, checksum]
VWI+: [preamble, start-symbol, size, address, seqnr, payload, checksum]

The VWI frame contains a preamble sequence, a start symbol, frame size, payload and a 16-bit checksum. This would be extended (VWI+) to include the network address and a frame sequence number. The send() and recv() methods would be as VWI and the handling of the new frame fields would be internal.

Sunday, March 24, 2013

Object-Oriented Interrupt Handling

Interrupt Service Routines (ISR) are traditionally written as callback functions. AVR provides a set of interrupt vector callback functions that may be defined in application code. These are defined by using the macro ISR() or SIGNAL().

Cosa approaches the challange of integrating ISR functions into an object-oriented context with a number of design patterns. The first design pattern is to define an abstract class, interface, for Interrupt Handlers; Cosa/Interrupt.hh. This is used to achieve a common virtual method prototype for interrupt callback functions. The provided mapping is basically from the ISR() function to the Interrupt Handler method on_interrupt(). The major difference is that the object-oriented method will execute with the instance as a context and have access to the instance member variables. The traditional ISR() callback function often needs global variables for the context and cannot be shared among interrupt service routines.

class Interrupt {
public:
  class Handler {
  public:
    virtual void on_interrupt() {}
    virtual void on_interrupt(uint8_t arg) {}
    virtual void on_interrupt(uint16_t arg) {}

  };
};


The second design pattern is to allow ISR() functions to access the internals of the class as if they where part of the class definition. In C++ this can be achieved with a friend declaration of the function in the class. The C functions need to be declared as extern first. Below is a snippet from the Cosa class for handling External Interrupts (Cosa/Pins.hh) to give an idea of how this works. 

extern "C" void INT0_vect(void) __attribute__ ((signal));
extern "C" void INT1_vect(void) __attribute__ ((signal));


class ExternalInterruptPin :
  public InputPin,
  public Event::Handler,
  public Interrupt::Handler {
private:
  friend void INT0_vect(void);
  friend void INT1_vect(void);

  ...
  virtual void on_interrupt();
  ...

};

The ExternalInterruptPin class uses multiple inheritance and inherits from three classes; the InputPin, Event and Interrupt Handling.

 Fig.1: ExternalInterruptPin class hierarchy

The InputPin provides that basic functionality for handling the pin, the Interrupt::Handler asynchronous events and Event::Handler synchronous events.

For the standard Arduino board external interrupt pins, AVR defines two callback function in the interrupt vector. These have the same names as in the ATmega328P documentation (INT0_vect and INT1_vect). The Cosa implementation of these functions will call the Interrupt Handler on_interrupt() method.

ISR(INT0_vect)
{
  if (ExternalInterruptPin::ext[0] != 0)
    ExternalInterruptPin::ext[0]->on_interrupt();
}


The ExternalInterruptPin constructor will register the instance. The default interrupt handler performs a translation to Event::CHANGE_TYPE. This reduces the amount of processing in the interrupt service routine.

void
ExternalInterruptPin::on_interrupt()
{
  Event::push(Event::CHANGE_TYPE, this);
}


The Cosa implementation of handling of external interrupts allows two levels of extension. The normal usage of ExternalInterruptPin is to sub-class and override the virtual methods on_interrupt() and/or on_event().

An example of usage is the Cosa IR class, Cosa/IR.hh. The IR::Receiver class uses an External Interrupt Pin to read a sequence of pulses from TSOP4838 (IR Receiver Modules for Remote Control Systems).

class IR {
public:

  class Receiver : private ExternalInterruptPin, private Link {
    ...
  private:

    ...
    /**
     * @override
     * Interrupt pin handler: Measure time periods of pulses in sequence
     * from IR receiver circuit. Push an event when a full sequence has
     * been recieved; READ_COMPLETED(this, code) where the code is the
     * recieved binary code or key if a key map was provided.
     */
    virtual void on_interrupt();
    ...

};

The Event::Handler (Cosa/Event.hh) is also an abstract class to define the callback structure for events.

class Event {
public:

  ...
  class Handler {

  public:
    /**
     * Default null event handler. Should be redefined by sub-classes.
     * Called by Event::dispatch().
     * @param[in] type the event type.
     * @param[in] value the event value.
     */
    virtual void on_event(uint8_t type, uint16_t value) {}
  };
  ...

};

The above design pattern for Object-Oriented Interrupt Handling have been applied to several of the Arduino libraries that have been ported and/or rewritten in the object-oriented style of Cosa. The most important attribute is to achieve higher levels of encapsulation, performance and code quality.

Saturday, March 16, 2013

The Analog Pin Classes; An Introduction to Event Driven Programming

An Arduino analog pin take approx. 100 us to sample and convert a measurement from analog to digital value (max resolution 10-bits, and max recommended conversion frequency). The conversion is done by special hardware in the processor. In the Arduino/Wiring implementation, analogRead(), the processor will loop, busy-wait, for the conversion to be completed. Other things could be done while waiting or the processor could be put in idle mode to reduce power consumption.

This posting presents some of the ways to use the Cosa AnalogPin class. The first style of usage is, as Arduino, pure synchronous where the processor will busy-wait for the analog conversion.

AnalogPin sensor(Board::A0);
...
uint16_t luminance = sensor.sample();


As described in previous postings, Cosa is an object-oriented approach to programming the Arduino where the resources in the processor are C++ objects. Analog pins are objects with methods/functions. In the above example the object named sensor is an instance of AnalogPin and may perform the methods (member functions) available in the class. The method sample() corresponds directly to the Arduino function analogRead() with the difference that no parameter is required as the pin number this is already known by the AnalogPin instance.

Fig.1: AnalogPin Member Functions
 
The AnalogPin constructor requires the Board analog pin name (e.g. Board::A0) and has also a reference voltage parameter with the default value AVCC_REFERENCE. The full expanded statement for the above example is:

AnalogPin sensor(Board::A0, AnalogPin::AVCC_REFERENCE);

There are three reference voltage types defined as an enum in the AnalogPin class; APIN_REFERENCE, use the Arduino reference voltage pin, AVCC_REFERENCE, use the power supply voltage as reference, and A1V1_REFERENCE, 1.1 internal voltage reference. Depending on your application you should select the correct reference voltage and perform necessary scaling of the sampled values.

The reference voltage may be changed with the method, set_reference(). It should be noted that the reference voltage is per analog pin (per sample) and not a single setting for all analog pins. This is a major difference compared to the Arduino analogReference() function which is global and does not use strong data typing, i.e., provide compile checking of the parameter. The Cosa AnalogPin class also provides an operator>> variant of the sample() method.

sensor >> luminance;

The latest sample value is available with the access method get_value(). This is an important aspect of Cosa AnalogPin as the state of the analog pin is maintained by the object. This reduces application code and memory footprint.

There are two primary programming styles in Cosa to allow applications to execute code while waiting for a conversion to complete. The first method is to request a sample, execute some code and then wait the conversion. This partially reduces the busy-wait section and adds concurrency.

sensor.sample_request();
...

// Some code to execute before waiting for the conversion to complete
...
uint16_t luminance = sensor.await();


For instance, you could print a message before waiting for the analog conversion to complete. This is an interesting example as the UART is also a hardware unit and works concurrently with the processor. In this case, you could have three concurrent activities processing at the same time; 1) the analog to digital conversion (ADC), 2) the UART transmitting characters from the IO buffer, 3) the processor itself which could be doing some computation with a previous sample value.

The Arduino processor (ATmega328P) has a number of hardware units (Timers, UART, SPI, I2C, EEPROM, etc) all possible to run concurrently with the processor. It is important to have a programming paradigm that does not limit this ability. The IO processing should be conducted, synchronized, by the processor and not worked in a serial, sequential, fashion. An analog to digital conversion takes about 1800 instructions cycles (112 us) to complete, and the maximum number of conversions per second is less than 10,000. This should be compared to the 16,000,000 instructions per second that the processor can execute. Busy-waiting for the conversion is basically a waste of 1800 instruction cycles and "electricity".  

The default behavior for the AnalogPin interrupt handler (ISR) is to push  Event:: SAMPLE_COMPLETED_TYPE with the object and conversion value onto the event queue. The interrupt handler, on_interrupt(), is also a virtual method and may be replaced by the application if needed.  The interrupt handler is per instance which allows additional modification after application requirements.

The sketch must use an event dispatcher in the Arduino loop() to process the completion events.  The events in the queue should be viewed as delayed function calls. Instead of executing code in the interrupt handler the code, action, is delayed as an event.

void loop()
{
  Event event;
  Event::queue.await(&event);
  event.dispatch();
  ...

}

The event dispatcher will call the AnalogPin implementation of the virtual method Event::Handler::on_event(). The default behavior of this method is to handle two events, Event::TIMEOUT_TYPE, and the above Event::SAMPLE_COMPLETED_TYPE. On timeout the event handler will issue a sample_request() which allows periodic sampling of an analog pin by attaching the pin object to one of the Watchdog timer queues. There is one queue for each of the Watchdog timeout levels.

Watchdog::attach(&sensor, 64);

The above statement will attach the sensor to receive timeout events every 64 milliseconds and automatically request a new sample. The analog pin may be viewed as holding a continues snapshot of sampled values.

When receiving Event::SAMPLE_COMPLETE_TYPE the AnalogPin default event handler will check if the value has changed and call the on_change() virtual method. Applications should defined their own action by sub-classing AnalogPin and implementing the on_change() virtual method. Other methods of allowing extension is to provide a callback function. The advantage of using a virtual method is that the context of the callback is provided by the object.

The flow of control from interrupt to event handler and action is:

(1)  ISR(ADC_vect) 

     => AnalogPin::on_interrupt(uint16 value) 
     ===> Event::push(Event::SAMPLE_COMPLETED_TYPE, this, value);

(2)  Event::queue.await(&event);
(3)  event.dispatch() 

(4)  => AnalogPin::on_event() 
     ===> AnalogPin::on_change()
  1. Interrupt Service Routine (ISR) is called when the analog conversion is completed. It will enqueue an Event::SAMPLE_COMPLETED_TYPE, together with the AnalogPin object and the converted value.
  2. The dispatch loop function Event::queue.await(&event) will dequeue the new event. 
  3. The event.dispatch() corresponds to calling the on_event()  method for the receiving object; the AnalogPin object.
  4. The default on_event() behavior for the AnalogPin will call the on_change() if the value on the analog pin has changed compared to previous sample.
To summarize: The Cosa AnalogPin class allows the same sampling of analog pins as Arduino together with two asynchronous methods where applications may perform additional operations while waiting for the analog measurement to be completed. The Cosa AnalogPin interrupt handler is fully integrated with the Cosa Event handler and adapted to both timed sampling of analog pins and callback when analog pin values change.

Cosa also support the sampling of a batch of analog pins. Please see the documentation of Cosa AnalogPins class for more details.

Monday, March 11, 2013

The Virtual Wire Interface (VWI)

One of the main project goals for Cosa is to provide an efficient object-oriented programming platform for small Internet of Things/M2M devices. This will require a number of components and especially support for wireless communication.

The latest addition to Cosa is a set of classes to support low level wireless communication on RF315/433 devices.

Fig.1: RF433 Receiver/Transmitter

The starting point is the popular VirtualWire library, which has been ported from C to C++ and refactored to the object-oriented style of Cosa. It has also been extended to allow multiple codecs, i.e. methods of encoding/decoding messages.

Fig.2: Codec class hierarchy

Currently the following codecs are supported:
  1. The original 4-to-6 bit symbol codec from VirtualWire
  2. Manchester phase encoding
  3. 4B5B block coding
  4. Fixed bit stuffing(4)
The codecs can be dynamically selected at run-time. It is possible to even select them automatically depending on the message start symbols. Please note that there is no API for this feature yet.

The Virtual Wire Interface (VWI) is constructed with separate Receiver, Transmitter and Codec classes to allow fine-tuning of memory footprint for ultra small devices such as the ATtiny85. For instance, it is possible to use the Transmitter without including the Receiver.

Fig.3: VWI Static Member Functions

The Interrupt Service Routine (ISR) is controlled by static member functions in the VWI container class. The VWI::begin() member function is used to setup the Virtual Wire Interface for a given communication bit rate and idle mode. The ISR is also enabled by VWI::begin(). To allow power down sleep mode and ultra low power consumption there are two static member function to enable/disable the ISR (Timer Interrupt Handler).

Fig.4: VWI::Receiver Member Functions

The VWI::Receiver and VWI::Transmitter classes handle the issuing of message send and receive. The VWI::Receiver interface is simple; begin() for setup, await() for message and recv() for receiving a message. There is a possible timeout limit on the receiving of a message.

Fig.5: VWI::Transmitter Member Functions

The Virtual Wire Interface (VWI) also has a device driver for IOStream (VWIO). This allows printouts to be easily redirected and sent over the wireless interface without changes to source code other than the Trace or other IOStream driver binding. All the printing function in IOStream will print to the wireless connection.

Fig.6: VWIO-IOSTream::Device Classes

Below is a snippet from the CosaVWIOtrace example sketch:

VirtualWireCodec codec;
VWIO tx(Board::D12, &codec);


void setup()
{
  ...
  // Start virtual wire output stream and trace
  tx.begin(4000);
  trace.begin(&tx, PSTR("CosaVWIOtrace: started"));
  ...
}

void loop()
{
  // Monitor digital pin values
  trace << RTC::millis() << PSTR(": D0..10:");
  for (uint8_t i = 0; i < 11; i++)
    trace << ' ' << InputPin::read(i);
  trace << endl;
  SLEEP(2);

  // Monitor analog pin values
  trace << RTC::millis() << PSTR(": A0..7:");
  for (uint8_t i = 0; i < 8; i++)
    trace << ' ' << AnalogPin::sample(i);
  trace << endl;
  SLEEP(2);
}


The blue sections are the only parts changed compared to printing to the serial output. The internal buffer in VWIO will be sent as a VWI message when either it becomes full or when carriage return is received. This is a normal buffer flush policy for buffered devices.

In the Cosa examples directory there are the following VWI sketches.
  1. CosaVWIsender, sends a simple message with an identity, sequence number and two analog readings. Can be modified for the different Codecs available for VWI. Can run on an ATtiny85.
  2. CosaVWIreceiver, receiver for CosaVWIsender. Can also be modified for the different Codecs. Sender and receiver should use the same codec. 
  3. CosaVWIOtrace, sends trace output over the Virtual Wire Interface. Should be used together with CosaVWImonitor. 
  4. CosaVWImonitor, receive stream of text printout over VWI.
  5. CosaVWItempsensor, ATtiny85 sketch that reads temperature measurements from a 1-Wire DS18B20 Digital Thermometer and sends the value using VWI. Message contains ROM identity of the 1-Wire device, sequence number and temperature reading. See Fig.7-8 below.
  6. CosaVWItempmonitor, receives temperature readings from CosaVWItempsensor(s) and prints to serial output.
  7. CosaVWIkey, simple application to demonstrate power down sleep mode and wakeup/send of a message on pressing a button. 
  8. CosaVWIclient is an example of a simple implementation of reliable, in-order, message protocol with addressing and message sequence numbering. This sketch will retransmit messages if an acknowledgement is not received from CosaVWIserver. Collects and prints statistics on retransmissions and error rate. Allows the different Codecs to be compared with regard to throughput and package drops (noise, etc).
  9. CosaVWIserver receives and print messages from CosaVWIclient. Sends an acknowledgement back to the client. 
The provided communication links are not secure, or reliable and messages may be dropped due to noise, collisions and other types of errors. The basic communication style is a bit like UDP but without addressing. Applications must add retransmission or other methods to achieve more reliable communication if required. A protocol stack with windowing is planned to be added later in the project.

 Fig.7: CosaVWItempsensor on ATtiny85 with 1-Wire Digital Thermometer

To allow small ultra low-power devices, the VWI classes are fully adapted for ATtiny85/85V. There are several examples and demonstration sketches in the examples directory. The example with 1-Wire CosaVWItempsensor is without any reduction less than 6 Kbyte leaving more than 2 Kbyte for further application code.

Fig.8: CosaVWItempmonitor printout of received temperature readings

As a bonus the new object-oriented/C++ version of VirtualWire is ported back to Arduino and is available.

The next step for this sub-set of Cosa will be to introduce additional message formats to allow addressing of nodes (MAC/IP style); broadcast/point-to-point and build the next step of the protocol stack for more reliable communication links and middleware.

Please note that Cosa also has a device driver for NRF24L01+. This device is a much more powerful chip for low power communication links.

[Update 2013-11-17]
Please note that since this blog post was written the Virtual Wire interface has been refactored to implement an abstract Wireless interface in Cosa. This allows applications to move more or less seamlessly between the different implementations of the Wireless interface. Currently there are three implementation; Virtual Wire (VWI), NRF24L01P and CC1101. The example sketches above work on all three.

There are some minor changes to the VWI interface but the overall functionality is the same (See the new VWI.hh). It is still possible to use the Cosa refactored Virtual Wire class directly.

[Update 2014-05-03]
Please note that the links above are broken and this post is not up to date. The examples directory is https://github.com/mikaelpatel/Cosa/tree/master/examples/Wireless.

[Update 2015-03-03]
Hamming(8,4) and (7,4) Codec has been added. This maps 4 bit data to 8 resp 7 bit symbols. The great advantage is 1-bit error detection and correction, and possible multi-bit error detection. This reduces the frame errors with almost 90% compared to the other Codecs (at 4 Kbps, 7 byte payload, 5 second message intervals, https://github.com/mikaelpatel/Cosa/blob/master/examples/Wireless/CosaWirelessDS18B20/CosaWirelessDS18B20.ino). The Hamming Code encoders/decoders are table driven which gives excellent performance and low memory footprint (144 resp 80 byte program memory).

Tuesday, March 5, 2013

Benchmarking the Pin Classes

Time to present some of the performance evaluation of the design of the Cosa Pin classes. How well do they work compared to the standard Arduino/Wiring functions for digital/analog read and write to pins?

This posting is also a short introduction to the iostream (Cosa/IOStream.hh) and trace support classes in Cosa (Cosa/Trace.hh). These are a set of powerful classes that allows easy redirection of basic output operations to several devices; UART, Textbox on Canvas, Virtual Wire Interface, IOBuffer, etc. Just by changing a few lines in your code you can get trace output to another IOStream device. There is also support for automatic prefix with file line number, function name, and log type.

The benchmark I want to present is CosaBenchmarkPins.ino. It measures the time to perform different types of read and write operations in both Arduino/Wiring and Cosa within the same sketch. This is possible as Cosa is written so that Arduino/Wiring code may be used together with Cosa. This help migrate Arduino/Wiring libraries for more object-oriented variants. More on this in coming posts.

The benchmark begins with a declaration of the pins that are going to be used.  These are instances of the Pin classes and defined as:

#include "Cosa/Pins.hh"

...
InputPin inPin(Board::D7);
OutputPin outPin(Board::D8);
OutputPin dataPin(Board::D9);
OutputPin clockPin(Board::D10);
AnalogPin analogPin(Board::A0);


The setup() starts by initiating the UART (Cosa's version of Arduino Serial) and binding it to the trace output stream with a short banner. Note that whenever possible string constants are defined in program memory using the GCC-AVR PSTR() macro. The banner is given as a program memory string.

#include "Cosa/Memory.h"
#include "Cosa/Trace.hh"
#include "Cosa/IOStream/Driver/UART.hh"

...
void setup()
{
 ...
  // Start the trace output stream on the serial port
  uart.begin(9600);
  trace.begin(&uart, PSTR("CosaBenchmarkPins: started"));


  // Check amount of free memory and size of instances
  TRACE(free_memory());
  TRACE(sizeof(Event::Handler));
  TRACE(sizeof(InputPin));
  TRACE(sizeof(OutputPin));
  TRACE(sizeof(AnalogPin));

 ...
}

The setup() also prints the amount of free memory and the sizes of some of the used data structures.

There is the first support for tracing: the macro TRACE() will expand to a print statement on the trace stream with a program memory string with the given expression and a print of the value. C++ overloading will select the correct IOStream print member function:
     
     CosaBenchmarkPins: started 
     free_memory() = 1673 
     sizeof(Event::Handler) = 2 
     sizeof(InputPin) = 4 
     sizeof(OutputPin) = 4 
     sizeof(AnalogPin) = 9  


All the details of using strings in program memory (PSTR) and trace print is hidden in the TRACE() macro.

The next step in setup() is to measure the time to execute an empty 1-M loop block. To force the compiler to generate this we insert a "nop" instruction into the loop as an assembly instruction.

For the measurement we use the Cosa Realtime Clock (Cosa/RTC.hh) static class. The class implements the Arduino micros() and millis() functions but uses the class as a name space (RTC::).

#include "Cosa/RTC.hh"
...
void setup()
{
  uint32_t start, stop;
  uint32_t us, base;

 ...
  // Start the timers
  RTC::begin();

 ...
  // Measure the time to perform 1,000,000; empty loop block
  start = RTC::micros();
  for (uint16_t i = 0; i < 1000; i++)
    for (uint16_t j = 0; j < 1000; j++) {

      // Here is where the test code will be put
      __asm__ __volatile__("nop");
    }
  stop = RTC::micros();
  base = (stop - start) / 1000L;
  INFO("Loop: %ul us per 1,000 nop loops\n", base);

 ...
}

There is a new trace support macro in this section, INFO(). It has a parameter list like printf_P() but the printout is controlled by a trace log mask. There are eight levels of log message; EMERGency, ALERT, CRITical, ERRor, WARNING, NOTICE, INFOrmation, DEBUG. Each log message, when active, will print the line number, function name, and the given message on the trace stream. Again the message string is automatically put in program memory to save SRAM. The typical output from the above INFO() statement is:

     70:setup:info:Loop: 504 us per 1,000 nop loops  

The compiler/preprocessor will provide the line number and function name when the sketch is built. The printout also contains the type of message. In this case "info". 

Now for the benchmark: It begins by measuring the time to perform 1M Arduino/Wiring digitalRead() and compare with Cosa inPin.is_set(), inPin >> var, Input::read().

  start = RTC::micros();
  for (uint16_t i = 0; i < 1000; i++)
    for (uint16_t j = 0; j < 1000; j++) {
      digitalRead(7);
      __asm__ __volatile__("nop");
    }
  stop = RTC::micros();
  base = (stop - start) / 1000L;
  INFO("Arduino: %ul us per 1000 digitalRead(7)", base);


The Cosa version of the benchmark uses inPin.is_set(), etc, and the INFO-message below with the measurement. 

  INFO("Cosa(%ulX): %ul us per 1000 inPin.is_set()", base/us, us);


Running the full benchmark on an Arduino Nano (ATmega328P/16MHz) gives the results below :

CosaBenchmarkPins: started
free_memory() = 1557
sizeof(Event::Handler) = 2
sizeof(InputPin) = 4
sizeof(OutputPin) = 4
sizeof(AnalogPin) = 9
74:setup:info:Loop: 504 us per 1,000 nop loops

85:setup:info:Arduino: 4151 us per 1000 digitalRead(7)
95:setup:info:Cosa(6X): 629 us per 1000 inPin.is_set()
106:setup:info:Cosa(6X): 629 us per 1000 inPin >> var
116:setup:info:Cosa(7X): 567 us per 1000 InputPin::read(7)

128:setup:info:Arduino: 8302 us per 1000 digitalWrite(8, 1); digitalWrite(8, 0)
139:setup:info:Cosa(3X): 2327 us per 1000 outPin.write(1); outPin.write(0)
150:setup:info:Cosa(3X): 2327 us per 1000 outPin.set; outPin.clear()
161:setup:info:Cosa(3X): 2327 us per 1000 outPin << 1; outPin << 0
172:setup:info:Cosa(1X): 6541 us per 1000 OutputPin::write(8, 1); OutputPin::write(8, 0)

183:setup:info:Arduino: 8271 us per 1000 digitalWrite(8, !digitalRead(8))
193:setup:info:Cosa(3X): 2076 us per 1000 outPin.write(!outPin.read())
203:setup:info:Cosa(3X): 2076 us per 1000 outPin.is_set/clear/set()
215:setup:info:Cosa(3X): 2076 us per 1000 outPin >> var; outPin << !var
225:setup:info:Cosa(3X): 2076 us per 1000 outPin.set/is_clear()
235:setup:info:Cosa(6X): 1195 us per 1000 outPin.toggle()
245:setup:info:Cosa(2X): 3648 us per 1000 OutputPin::write(8, !OutputPin::read(8))
258:setup:info:Cosa(2X): 3743 us per 1000 OutputPin::read/write(8,0/1)
268:setup:info:Cosa(10X): 755 us per 1000 OutputPin::toggle(8)

283:setup:info:Arduino: 15 us per bit data transfer() digitalWrite()
297:setup:info:Cosa(5X): 3 us per bit data transfer() pin.write()
311:setup:info:Cosa(5X): 3 us per bit data transfer() pin.write/toggle()
325:setup:info:Cosa(1X): 11 us per bit data transfer() OutputPin::write()
339:setup:info:Cosa(3X): 4 us per bit data transfer() OutputPin::write/toggle()
373:setup:info:Cosa(7X): 2 us per bit data transfer() pin.write/toggle() unrolled

383:setup:info:Arduino: 17 us per bit data transfer() shiftOut()
393:setup:info:Cosa(4X): 4 us per bit data transfer() dataPin.write()

401:setup:info:Arduino: 112 us per analogRead()
408:setup:info:Cosa(1X): 112 us per analogPin.sample()
417:setup:info:Cosa(1X): 112 us per analogPin >> var
424:setup:info:Cosa(1X): 112 us per AnalogPin::sample()


[Update 2014-05-16]
Below are the latest benchmark measurement. A lot has happened since the initial post and Cosa has be optimized further. Some pin operations are reduced to only 1-6 instructions. See below:

CosaBenchmarkPins: started
free_memory() = 1576
sizeof(Event::Handler) = 2
sizeof(InputPin) = 4
sizeof(OutputPin) = 4
sizeof(AnalogPin) = 12
F_CPU = 16000000
I_CPU = 16

119:void loop():info:Measure the time to perform an empty loop block
127:void loop():info:nop:315 ns

129:void loop():info:Measure the time to perform an input pin read
138:void loop():info:inPin.is_set():125 ns
150:void loop():info:inPin >> var:125 ns
160:void loop():info:InputPin::read(7):62 ns
170:void loop():info:read digitalRead(7):62 ns

172:void loop():info:Measure the time to perform an output pin write
182:void loop():info:outPin.write():910 ns
195:void loop():info:outPin._write():690 ns
206:void loop():info:outPin.set/clear():910 ns
219:void loop():info:outPin._set/_clear():691 ns
230:void loop():info:outPin << val:910 ns
241:void loop():info:OutputPin::write(8, val):314 ns
252:void loop():info:digitalWrite(8, val):314 ns
263:void loop():info:outPin.toggle():502 ns
276:void loop():info:outPin._toggle():596 ns
287:void loop():info:OutputPin::toggle(8):62 ns

289:void loop():info:Measure the time to perform input pin read/output pin write
298:void loop():info:outPin.write(!inPin.read()):1633 ns
308:void loop():info:inPin.is_set();outPin.clear/set():1633 ns
320:void loop():info:inPin >> var; outPin << !var:1633 ns
330:void loop():info:outPin.set(inPin.is_clear()):1633 ns
340:void loop():info:OutputPin::write(8, !InputPin::read(7)):565 ns
353:void loop():info:OutputPin::read(7)/write(8,0/1):849 ns

355:void loop():info:Measure the time to perform 8-bit serial data transfer
363:void loop():info:pin.write(data,clk):18 us
376:void loop():info:pin.write();clock.write(1/0):27 us
391:void loop():info:pin._write();clock._write(1/0):22 us
404:void loop():info:pin.write/toggle():20 us
419:void loop():info:pin._write/_toggle():20 us
432:void loop():info:OutputPin::write():12 us
445:void loop():info:OutputPin::write/toggle():8 us
477:void loop():info:pin.write/toggle() unrolled:15 us

479:void loop():info:Measure the time to read analog pin
485:void loop():info:analogPin.sample():112 us
494:void loop():info:analogPin.sample_request/await():112 us
503:void loop():info:analogPin >> var:112 us
510:void loop():info:AnalogPin::sample():112 us

512:void loop():info:Measure the time to read analog pin with varying prescale
521:void loop():info:prescale(128):bits(10):analogPin.sample():112 us
521:void loop():info:prescale(64):bits(9):analogPin.sample():56 us
521:void loop():info:prescale(32):bits(8):analogPin.sample():30 us
521:void loop():info:prescale(16):bits(7):analogPin.sample():17 us
521:void loop():info:prescale(8):bits(6):analogPin.sample():10 us
521:void loop():info:prescale(4):bits(5):analogPin.sample():6 us
521:void loop():info:prescale(2):bits(4):analogPin.sample():5 us

Sunday, March 3, 2013

The Pin Classes; An Introduction

Cosa contains a rich set of classes and member functions for handling the Arduino Pins. The object-oriented nature of Cosa allows both strong data typing, a declarative style of the definition of pins in a sketch or library but also a higher level of compiler optimization and code size reduction.

The Arduino/Wiring digital and analog I/O functions are replaced with an instance of a Pin class and member functions. Cosa provides several synonyms for the basic functions to allow natural expression depending on the type of application and coding style.

Fig. 1: InputPin Member Functions

The InputPin class has a wide range of member functions for reading the pin value. The Arduino/Wiring function digitalRead() corresponds directly to aPin.read() but can also be expressed as aPin.is_set(), aPin.is_high() and aPin.is_on(). The selection is yours and should depend on the context of your sketch and what you want to express. There is no extra overhead involved. And the actual reading of the pin is several times faster than Arduino/Wiring's implementation of digitalRead(). It is also possible to use C++ operator>> syntax for reading an InputPin.

  InputPin aPin(Board::D7);
  uint8_t var;
  aPin >> var;

In the example code above the operator>> will read the InputPin aPin and assign the value to the given variable, var. Note the definition of the board pin symbol. There is a mode parameter for the InputPin constructor. It has the default value NORMAL_MODE. The full, expanded, statement for the above example is:

  InputPin aPin(Board::D7, InputPin::NORMAL_MODE);

To define an InputPin with internal pullup resistor use the following statement:

  InputPin aPin(Board::D7, InputPin::PULLUP_MODE); 

C++ also allows overloading of member functions. The compiler determines which function to use from the actual parameter list. An example of usage of this is the member function read() which has two implementations. The first (Fig.1) which corresponds to the Arduino/Wiring function digitalRead() and the second with two parameters (OutputPin and data direction) corresponds to Arduino/Wiring shiftIn().

Fig. 2: OutputPin Member Functions

The OutputPin class has the above member functions. Several have implicit new values associated with them such as set() and clear() which are synonym with write(1) and write(0). There are other logical named member functions to choose from; high()-low(), on()-off(). All to make you program easier to read depending on the context. The fastest possible change to an OutputPin is the member function toggle().

The operator<< is defined for writing to the OutputPin as if it was a stream. The example below reads an InputPin inPin and writes the complement value to the OutputPin outPin.

  InputPin inPin(Board::D7); 
  OutputPin outPin(Board::D8);
  uint8_t var;
  inPin >> var;
  outPin << !var;

All pin value change functions are protected with a synchronized block. Interrupt handling is not allowed when changing a pin value. The Cosa/Types.h file adds some syntactic sugar to make it easy to express. 

  /**
   * Set the output pin.
   */
  void set()
  {
    synchronized {
      *PORT() |= m_mask;
    }
  }

The keyword synchronized marks the block as protected from interrupts. Please see Cosa/Types.h for further details. Last a few words on the usage of enum data types in Cosa. Most Arduino libraries and source code use global enum or #define for constant values. To increase scalability and readability Cosa, whenever possible, uses the class as a name space.

The Board class collects all the Arduino pin symbols as a set of local enum declarations. The usage is Board::symbol which allows the compiler to further check your code. For instance, it is not possible to select any digital pin as a PWM pin. The PWM class with only allow the PWM pins defined in the Board class. This improves code but also allows runtime checks to be removed and more optimized code to be generated by the compiler. 

Fig. 3: Cosa Documentation (Doxygen)

Learn more about the Cosa Pin class hierarchy in the online documentation.