mirror of
https://github.com/google/pebble.git
synced 2025-06-22 17:16:16 +00:00
Import of the watch repository from Pebble
This commit is contained in:
commit
3b92768480
10334 changed files with 2564465 additions and 0 deletions
165
docs/pulse2/history.md
Normal file
165
docs/pulse2/history.md
Normal file
|
@ -0,0 +1,165 @@
|
|||
History of PULSEv2
|
||||
==================
|
||||
|
||||
This document describes the history of the Pebble dbgserial console
|
||||
leading up to the design of PULSEv2.
|
||||
|
||||
In The Beginning
|
||||
----------------
|
||||
|
||||
In the early days of Pebble, the dbgserial port was used to print out
|
||||
log messages in order to assist in debugging the firmware. These logs
|
||||
were plain text and could be viewed with a terminal emulator such as
|
||||
minicom. An interactive prompt was added so that firmware developers and
|
||||
the manufacturing line could interact with the running firmware. The
|
||||
prompt mode could be accessed by pressing CTRL-C at the terminal, and
|
||||
could be exited by pressing CTRL-D. Switching the console to prompt mode
|
||||
suppressed the printing of log messages. Data could be written into the
|
||||
external flash memory over the console port by running a prompt command
|
||||
to switch the console to a special "flash imaging" mode and sending it
|
||||
base64-encoded data.
|
||||
|
||||
This setup worked well enough, though it was slow and a little
|
||||
cumbersome to use at times. Some hacks were tacked on as time went on,
|
||||
like a "hybrid" prompt mode which allowed commands to be executed
|
||||
without suppressing log messages. These hacks didn't work terribly well.
|
||||
But it didn't really matter as the prompt was only used internally and
|
||||
it was good enough to let people get stuff done.
|
||||
|
||||
First Signs of Trouble
|
||||
----------------------
|
||||
|
||||
The problems with the serial console started becoming apparent when
|
||||
we started building out automated integration testing. The test
|
||||
automation infrastructure made extensive use of the serial console to
|
||||
issue commands to simulate actions such as button clicks, inspect the
|
||||
firmware state, install applications, and capture screenshots and log
|
||||
messages. From the very beginning the serial console proved to be very
|
||||
unreliable for test automation's uses, dropping commands, corrupting
|
||||
screenshots and other data, and hiding log messages. The test automation
|
||||
harness which interacted with the dbgserial port became full of hacks
|
||||
and workarounds, but was still very unreliable. While we wanted to have
|
||||
functional and reliable automated testing, we didn't have the manpower
|
||||
at the time to improve the serial console for test automation's use
|
||||
cases. And so test automation remained frustratingly unreliable for a
|
||||
long time.
|
||||
|
||||
PULSEv1
|
||||
-------
|
||||
|
||||
During the development of Pebble Time, the factory was complaining that
|
||||
imaging the recovery firmware onto external flash over the dbgserial
|
||||
port was taking too long and was causing a manufacturing bottleneck. The
|
||||
old flash imaging mode had many issues and was in need of a replacement
|
||||
anyway, and improving the throughput to reduce manufacturing costs
|
||||
finally motivated us to allocate engineering time to replace it.
|
||||
|
||||
The biggest reason the flash imaging protocol was so slow was that it
|
||||
was extremely latency sensitive. After every 768 data bytes sent, the
|
||||
sender was required to wait for the receiver to acknowledge the data
|
||||
before continuing. USB-to-serial adapter ICs are used at both the
|
||||
factory and by developers to interface the watches' dbgserial ports to
|
||||
modern computers, and these adapters can add up to 16 ms latency to
|
||||
communications in each direction. The vast majority of the flash imaging
|
||||
time was wasted with the dbgserial port idle, waiting for the sender to
|
||||
receive and respond to an acknowledgement.
|
||||
|
||||
There were other problems too, such as a lack of checksums. If line
|
||||
noise (which wasn't uncommon at the factory) corrupted a byte into
|
||||
another valid base64 character, the corruption would go unnoticed and be
|
||||
written out to flash. It would only be after the writing was complete
|
||||
that the integrity was verified, and the entire transfer would have to
|
||||
be restarted from the beginning.
|
||||
|
||||
Instead of designing a new flash imaging protocol directly on top of the
|
||||
raw dbgserial console, as the old flash imaging protocol did, a
|
||||
link-layer protocol was designed which the new flash imaging protocol
|
||||
would operate on top of. This new protocol, PULSE version 1, provided
|
||||
best-effort multiprotocol datagram delivery with integrity assurance to
|
||||
any applications built on top of it. That is, PULSE allowed
|
||||
applications to send and receive packets over dbgserial, without
|
||||
interfering with other applications simultaneously using the link, with
|
||||
the guarantee that the packets either will arrive at the receiver intact
|
||||
or not be delivered at all. It was designed around the use-case of flash
|
||||
imaging, with the hope that other protocols could be implemented over
|
||||
PULSE later on. The hope was that this was the first step to making test
|
||||
automation reliable.
|
||||
|
||||
Flash imaging turns out to be rather unique, with affordances that make
|
||||
it easy to implement a performant protocol without protocol features
|
||||
that many other applications would require. Writing to flash memory is
|
||||
an idempotent operation: writing the same bytes to the same flash
|
||||
address _n_ times has the same effect as writing it just once. And
|
||||
writes to different addresses can be performed in any order. Because
|
||||
of these features of flash, each write operation can be treated as a
|
||||
wholly independent operation, and the data written to flash will be
|
||||
complete as long as every write is performed at least once. The
|
||||
communications channel for flash writes does not need to be reliable,
|
||||
only error-free. The protocol is simple: send a write command packet
|
||||
with the target address and data. The receiver performs the write and
|
||||
sends an acknowledgement with the address. If the sender doesn't receive
|
||||
an acknowledgement within some timeout, it re-sends the write command.
|
||||
Any number of write commands and acknowledgements can be in-flight
|
||||
simulatneously. If a write completes but the acknowledgement is lost in
|
||||
transit, the sender can re-send the same write command and the receiver
|
||||
can naively overwrite the data without issue due to the idempotence of
|
||||
flash writes.
|
||||
|
||||
The new PULSE flash imaging protocol was a great success, reducing
|
||||
imaging time from over sixty seconds down to ten, with the bottleneck
|
||||
being the speed at which the flash memory could be erased or written.
|
||||
After the success of PULSE flash imaging, attempts were made to
|
||||
implement other protocols on top of it, with varying degrees of success.
|
||||
A protocol for streaming log messages over PULSE was implemented, as
|
||||
well as a protocol for reading data from external flash. There were
|
||||
attempts to implement prompt commands and even an RPC system using
|
||||
dynamically-loaded binary modules over PULSE, but they required reliable
|
||||
and in-order delivery, and implementing a reliable transmission scheme
|
||||
separately for each application protocol proved to be very
|
||||
time-consuming and bug-prone.
|
||||
|
||||
Other flaws in PULSE became apparent as it came into wider use. The
|
||||
checksum used to protect the integrity of PULSE frames was discovered to
|
||||
have a serious flaw, where up to three trailing 0x00 bytes could be
|
||||
appended to or dropped from a packet without changing the checksum
|
||||
value. This flaw, combined with the lack of explicit length fields in
|
||||
the protocol headers, made it much more likely for PULSE flash imaging
|
||||
to write corrupted data. This was discovered shortly after test
|
||||
automation switched over to PULSE flash imaging.
|
||||
|
||||
Make TA Green Again
|
||||
-------------------
|
||||
|
||||
Around January 2016, it was decided that the issues with PULSE that were
|
||||
preventing test automation from fully dropping use of the legacy serial
|
||||
console would best be resolved by taking the lessons learned from PULSE
|
||||
and designing a successor. This new protocol suite, appropriately
|
||||
enough, is called PULSEv2. It is designed with test automation in mind,
|
||||
with the intention of completely replacing the legacy serial console for
|
||||
test automation, developers and the factory. It is much better at
|
||||
communicating and synchronizing link state, which solves problems that
|
||||
test automation was running into with the firmware crashing and
|
||||
rebooting getting the test harness confused. It uses a standard checksum
|
||||
without the flaws of its predecessor, and packet lengths are explicit.
|
||||
And it is future-proofed by having an option-negotiation mechanism,
|
||||
allowing us to add new features to the protocol while allowing old and
|
||||
new implementations to interoperate.
|
||||
|
||||
Applications can choose to communicate with either best-effort datagram
|
||||
service (like PULSEv1), or reliable datagram service that guarantees
|
||||
in-order datagram delivery. Having the reliable transport available
|
||||
made it very easy to implement prompt commands over PULSEv2. And it was
|
||||
also suprisingly easy to implement a PULSEv2 transport for the Pebble
|
||||
Protocol, which allows developers and test automation to interact with
|
||||
bigboards using libpebble2 and pebble-tool, exactly like they can with
|
||||
emulators and sealed watches connected to phones.
|
||||
|
||||
Test automation switched over to PULSEv2 on 2016 May 31. It immediately
|
||||
cut down test run times and, once some bugs got shaken out, measurably
|
||||
improved the reliability of test automation. It also made the captured
|
||||
logs from test runs much more useful as messages were no longer getting
|
||||
dropped. PULSEv2 was made the default for all firmware developers at the
|
||||
end of September 2016.
|
||||
|
||||
|
||||
<!-- vim: set tw=72: -->
|
Loading…
Add table
Add a link
Reference in a new issue