A DSP Version of Coherent-CW (CCW) Obtain the advantages of coherent detection of Morse signals using DSP Coherent CW (CCW) is an old technique for digging weak CW signals out of the noise. The system synchronizes the receiver to the sender's keying. Thereafter, due to the rules of Morse code (each DAH equals exactly three "dit-times" and spaces between marking elements are always an integral multiple of one dit time)the receiver knows precisely when the carrier is allowed to turn on or off. This is useful information that would otherwise have to be transmitted at the expense of an additional power and/or bandwidth. Of course, the sender must have an absolutely rock-steady rhythm for the scheme to work. For this reason, CCW stations employ special KEYERS or keyboards which can generate perfectly timed code. Traditionally, CCW uses a keying rate of 12WPM (1OO milliseconds per dit) and a received audio tone of 800 Hz. The receiver divides its time into 100-ms windows, or frames. Once synchronized, there is complete certainty that the incoming tone is either present or absent for the entire 100-ms window the rules of Morse code prevent it from switching somewhere in the middle. This knowledge makes it possible to use a matched filter(sometimes called an integrate-and-dump filter) to reduce the received bandwidth down to about 9 Hz (main lobe), eliminating most of the noise while letting the CW tone pass through unscathed (see Fig 1). All coherent 800-Hz energy received during the window is integrated (accumulated), and only at the very end of each window does the receiver make its decision and the question is: during the previous 100 milliseconds, was the key at the transmitter up or down? The question is answered by comparing the measured amplitude of the800-Hz tone to some threshold value more received energy means the key probably was down, less energy means it probably was up. If the receiver decides the key was down, it sounds a local side tone oscillator for the next100 ms. Apart from the 100-ms delay between the incoming audio and the regenerated tone, the side tone signal follows the received code,faithfully,and of course there is absolutely no electrical noise present from that point on since the signal has been completely rebuilt by the receiver. It is like a repeater on a fiber-optic cable. At each stage, hardware makes a firm decision based on a marginal situation then announces confidently whether the window was marking or spacing. Noise (at least in its common form) is completely removed from the incoming signal, so a human operator can copy it without stress. That is, as long as the hardware makes the right decision at the end of each window! If the signal-to-noise ratio is so awful that the matched filter gets a wrong answer, a different kind of noise enters the picture: the system still generates perfect code at its output, but it's no longer the same message the transmitter originated. Then the human operator can step in and say, "Hey, that didn't make sense," and try to figure out what the real message must have been based on context, prior knowledge, etc. It has been argued that a seasoned, skilled CW operator can perform the same filtering operation as CCW performs with his brain, concentrating on the incoming code while blissfully ignoring all the other stuff in the receiver's passband at the time. Maybe so, but it's very hard work. An equally narrow(9-Hz) conventional audio filters wouldn't help either: the circuit's response time would be so slow it would spread out the leading(attack; and falling (decay) edges of the keyed waveform in time, blurring the decision (lnhrtwsf~n markR and February 1994 25 spaces. After all the votes are in, it would seem that coherent CW carries an advantage of some 20 dB over regular CW at 12 WPM. So why doesn't everybody do it that way? Answer: Up until now it has been quite difficult to get a CCW station on the air. Once synchronization had been established, in order to keep the transmitter and receiver from drifting apart,expensive frequency standards were needed at each end of the link. Transceivers had to be stabilized, usually by phase-locking their master oscillators to some common external standard such as WWVB. And that integrate-and-dump filter circuit was no piece of cake to build and align. These technical challenges have kept all but the most dedicated devotees away from the mode until now. Can DSP Help? Some years ago I designed and built a DSP engine dedicated to receiving CCW. It worked, but it was far too complicated some fifty ICs on the board. I concluded that no one else would ever build one of those things, and I was right. At the time, personal computers were just coming on the scene and I didn't think there was enough processing power in them to do the job. But then there appeared ATs,386s, 486s, and I thought it might be worth another look. Maybe, with very careful coding, we could do the DSP function on existing ham-shack computers with a minimum amount of external hardware. If so, the cost would be next to nothing and many more people could get in on CCW. After doing an audio spectrum analyzer project (see "A Receiver Spectral Display using DSP," in Jan 1992 QST) I realized that a CCW program using the same analog interface was feasible. This interface is a Sigma-Delta analog-to-digital converter circuit that measures about 4 inches by 1.8 inches and runs off a 9-V battery. It uses a handful of common CMOS chips and can be built in an evening. It also can be purchased assembled and tested,ready to hook up. This circuit samples the received audio 7,200 times each second, converts the measured voltages to numbers, and passes these to the computer through one of its serial ports (eg, COM1) running at 115kbaud. No modifications to the radio or the computer are required. It is exactly the same board described in the QST article and also in the '93 Handbook. Some of you will already have one. What Does the Software look like? The primary thing to keep in mind is that we are going to be hard pressed for time. At 7,200 samples per second,a new sample point's numerical value is fed into the computer every 139 microseconds. The software has to service the UART (serial port) interrupt,read the measured voltage, and perform the DSP filtering operation 7200times per second. With real tight coding it can be done, and in the time leftover we can do some other pretty neat things as well. The challenge is to reduce the time spent handling interrupts to the absolute minimum. The computer has to be able to process the incoming numerical samples at least as fast as they're acquired (ie, in less than 139 ms per sample, on average). For this algorithm to work, we cannot afford to turn off the interrupts while we do some heavy computing; the interrupts have to be always enabled to guarantee that we don't miss a single sample. DOS systems usually have several background tasks that become active periodically, sometimes shutting off the system interrupts for up to a millisecond at a time. So the program's first action is to take over the DOS timer interrupt to make sure no other program gets control of the machine even fora milliseconds while the DSP algorithm is running. In writing the program, I concentrated on shaving cycles from the serial port interrupt service routine. In real-time programming,what's crucially important is to reduce the amount of time the computer spends on things which are done often(eg, 7,200 times every second). Things which happen less frequently can be coded a little more sloppily. How to minimize the interrupt service time? Well, we would like very much to avoid any particularly long machine instructions, such as multiply or divide. The classic integrate-and-dump(I&D) filter works by first shifting the frequency of the incoming 800-Hz tone down to a baseband" (dc). This is done by mixing the audio with an 800-Hz reference tone. Once at baseband, the energy in the signal is split into two separate channels called In-phase (I)and quadrature (Q). These channels are then independently integrated over the 100-ms window. This is essentially an averaging over time operation and is only feasible because at0 Hz (dc) the I and Q channels don't change their values with time. However, if the incoming tone is not exactly at 800 Hz, a beat note will be generated and will cause problems for us, at least to some degree. Let's take an example. At 800 Hz, a 100-ms frame consists of exactly 80 cycles of that sinusoidal waveform. We mix it with our 800-Hz reference tone and we get exactly 0 Hz, or dc, which can be averaged. Now let's say the incoming frequency was not 800 Hz, but rather790 Hz. The beat note will be 10 Hz.But if you look at it over 100 ms, or a tenth of a second, there will be only one complete cycle. If we take the average value of its voltage over that 100-ms period, we come up with zero! It's positive for half the time, equally negative for the other half, and the average value is zero volts. It matters not one iota what the starting amplitude of that 790-Hz tone was; the output of our I&D filter will be zero. And likewise for any frequency that is spaced away from 800 Hz by an any exact multiple of 10 Hz. Between 800 Hz and 790 Hz,the filter's response takes on intermediate values, ranging all the way from0 dB at 800-Hz down to minus infinity dB (total attenuation) at 790 Hz. At the end of each 100-ms integration period, the classic hardware filter freezes the instantaneous voltages on its integrating capacitors and, based on the values of the I and Q channels averaged over the interval, computes the average amplitude of the 800-Hz tone during that period. (It could also compute the phase of the tone relative to that of the 800-Hz local reference oscillator, but that generally is not done. ) The hardware version of an I&D filter then has to dump all the charge from those capacitors instantly to start measuring the signal in the next window. This is physically impossible with real hardware integrators, so in practice we always waste a little of each frame right at the beginning while the capacitors are discharged of the voltages built up during the previous frame. (Well, not absolutely impossible: one could set up two complete filters and switch between them on alternate frames, I guess.). So, how can we get the same answer with a DSP algorithm, get around the old problems, and hopefully avoid having to build the hardware filter at all? Our basic challenge is to estimate the amplitude of an 800-Hz sinusoid which is assumed to be unvarying throughout the entire 100-ms window,and which is likely to be much weaker than any number of interfering carriers present in the receiver's passband at the same time. The classical DSP solution to this would be to emulate the hardware I&D filter in software. We multiply the incoming samples with a unit vector rotating at 800 Hz to get instantaneous values of the I and Q components of the baseband signal, then we integrate (add up) all these values over100 ms, eventually dividing by the total number of samples taken to obtain the averaged I and Q values. Then we compute the square root of the sum of the squares of the two mutually orthogonal components, and that's the answer. (We could also calculate the phase of the received sinewave by computing the arctangent of the ratio of the Q and I components. ) In this particular case, at7,200 samples per second, each 100- ms window would consist of 720 samples. Which would mean720 x 2 multiplications, 720 x 2additions, plus a whole bunch of other time consuming stuff at the end of each window (such as calculating the square root, clearing the accumulators,etc). All of this is mathematically equivalent to ~fitting" an800-Hz sinusoid to the sampled data points by least-squares, then solving for its amplitude and phase. That's alot of number crunching to get done in tenth of a second, even for today's faster home computers. And there isn't a whole tenth of a second available either a good portion of the time February 1994 27 goes for overhead (pushes, pops, etc)in servicing those 720 interrupts the DSP algorithm has to operate in whatever time is left over after all that. Soit's tough, eh? Well, here is where we start getting lucky. Oh most computers a multiply instruction takes much longer to execute than a simple addition, so we would be far ahead if we could eliminate those two multiplies per sample and it just so happens we can. We know that at 7,200 samples per second, each individual cycle of an800-Hz tone takes exactly 9 samples to cover. In other words, during the time between each sample and the following one, an 800-Hz sinusoid will have advanced through 40 degrees of phase. And 40 x 9 = 360 degrees, which is one complete revolution exactly. Eureka! After 9 samples have been precessed, the coefficients we have to multiply the samples with will start to repeat, taking on the same sequence of values for the next 9 samples, and so on. So of our 720 samples in total,80 of them will be multiplied by two particular numbers, another 80 will be multiplied by a new two-number set,etc. This good fortune is entirely attributable to the fact that our sampling rate just happens to be an exact integral multiple of the frequency of the sinusoid whose amplitude we want to measure. We can take advantage of it by using something called the distributive law (in algebra): AxB + AxC+ AxD equals Ax(B+C+D). You get exactly the same answer in the end,but one way needs three multiplies and two adds, the other way needs only one multiply and two adds. We will use this to solve for the average I and Q component values with only 18 multiples per 720 samples instead of 1440! But there is another trick available. It turns out that the cosine of 40 degrees (one of our sample phases) has the same value as the cosine of 320 degrees (another one of our points). Likewise, of the remaining seven coefficient, six of them can be paired up in this way, the seventh is unity, and it's real easy to multiply by one! On the sine side, 8 of the 9 coefficients can be paired (allowing for sign changes),the other coefficient is zero (it's even easier to multiply by that, hi!). The bottom line is that by combining terms we can get by with just 8 multiplication instead of the 18 that would otherwise be needed. This is starting to look doable! What we have to obtain during each100-ms window are the end values of nine (9) accumulators, and on each of the 720 interrupts we only have to add the current sample into one of those nine accumulators. For each interrupt we sum into another accumulator, and after the ninth sample has been processed we start over with the first accumulator. Thus, aside from the housekeeping associated with servicing the interrupt, checking for overrun and/or clipping, figuring out which accumulator to address and such mundane things, we are left with a simple addition to perform. For sure, at the end of the 100-ms window we must do some further calculations, but we have all the time in the world (relatively speaking) to get them done in the 100 ms during which the data for the next-following window are being acquired. Now, there's one little complication: I would like to run not one, but three (3) I&D filters concurrently. These three filters should overlap in time so that by comparing the three outputs I can decide if the window phase is drifting and it would be advantageous to make a slight adjustment to the phasing cycle that determines when each window starts and ends. The most economical way to run three such I&D filters concurrently is never to clear the various accumulators. Instead, I keep a "delay line" in memory, consisting of the last 720samples taken. For each new sample, I add it into the appropriate accumulator, then subtract off the value of the sample taken 720 time slots earlier. In this way the accumulators are guaranteed not to overflow, because in the long run we subtract out as many counts as we add in. Furthermore,at the end of every 9 samples (one cycle of the 800-Hz audio signal)the nine accumulators always hold exactly the same numbers you'd get if you started 100 ms ago with cleared accumulators and integrated through one complete window's worth of samples. Try doing that with analog integrators! It works on the computer because digital fixed point arithmetic is absolute there are no errors such as would arise in an analog integrator due to charge leaking off a capacitor,component values changing slightly with temperature, etc. Any analog integrator would eventually saturate at one rail or the other due to such errors. The digital integrator can run for hours (or days) with absolutely no accumulated long-term error. We still must ~sample and hold" the values in those nine accumulators whenever we have to compute the amplitude of the measured component over that particular window. This involves moving nine 16-bit numbers to secondary storage positions. It is done with a single block move instruction and has no impact whatsoever on the processing of subsequent samples, soothe computer version can start to process each new window immediately(not throwing away any of the incoming energy) as opposed to the analog version, which has to wait for the capacitors to discharge fully before starting a new integration cycle. At the end of each window, we must then compute the amplitude of the measured 800-Hz component. This involves those eight multiplications mentioned above, then taking the square root of the sum of the squares. This is done in tightly coded assembly language in order to execute in the shortest possible time. As a fine-tuning aid for the operator, the program also measures the frequency of that 800-Hz component (to the nearest tenth of a Hz) and displays it on the computer screen, updated at the end of each 100-ms window. There is a trick to this: the frequency has to be measured after the DSP filtering operation. Otherwise, any nearby strong carrier could disrupt the measurement and give an erroneous reading. In fact, during each marking window (once we have decided that the 800-Hz tone was indeed present during that window) we also measure its phase (averaged over the entire100-ms window) and save this for later reference. On the next marking window, as long as it is not too far in time away from the last measured one, we measure the phase again. If there is any slight discrepancy between the frequency of the incoming 800-Hz tone(from the receiver) and our precise800-Hz reference tone (which is actually determined by the 1.8432-MHZcrystal oscillator on the Sigma-Delta board), there will be some phase shifting the detected signal (in the same way the amplitude of the "beat note" varies regularly whenever there is a slight difference between two compared frequencies). If we know the amount of phase shift as well as the time interval over which this phase shift accumulated, we can figure out how much the received frequency differs from the nominal 800-Hz value, and there you have it. The operator can set a software switch to enable this phase comparison to also occur across an intervening space frame (or not). If the transmitter is phase-coherent from one key-down to the next, then it is feasible to use those two key down periods to measure his frequency. If not, then the software only uses phases measured in consecutive marking frames(eg, during a "dah"), when the key is presumably held down for 300 ms continuously and the transmit carrier could not change its phase during that period. That is the essence of a DSP version of the integrate-and-dump filter. The algorithm runs on just about any IBM-compatible computer. The program incorporating this algorithm is called Coherent. There is enough time left over after the DSP calculations to allow for lots of other CCW goodies. For instance, Coherent has an auto-tune feature. It is important to keep the incoming audio tone centered in the filter's rather narrow passband. Coherent tracks the incoming signal's frequency. If it deviates more than half a hertz from the nominal 800-Hzvalue, Coherent issues a pulse on one of two RS232 control lines to make the receiver tune up or down by 1 Hz.Many modern rigs can tune in precise1-Hz steps by pressing the MIC up/down buttons, so Coherent makes the signals needed to do this automatically. Once the signal has been tuned in initially, the operator can sit back,put his feet up, and leave the driving to the computer. Even if his receiver drifts in frequency (or if the transmitter drifts) it's no problem because the software will retune the radio as necessary to maintain the CW tone at800 Hz.Coherent also has a frame-phasing tracking loop. After each 100-msframe is processed, the program looks at whether the SNR would have been better had the window ended 1 cycle(1.25 ms) earlier or 1 cycle later. If there is consistent evidence that going to a slightly earlier window would improve the SNR, then the program does this automatically. What this means is that once synchronization has been achieved, the operator can let the program track the incoming signal and adjust the phasing as necessary to maximize the SNR advantage the mode is capable of. With this system, special frequency standards and rig stabilization are no longer needed. The only equipment you need to operate CCW is a reasonably stable transceiver, the little Sigma-Delta interface board,and a computer. And, of course, the Coherent program also lets you send CCW just by typing on the keyboard. That should go without saying. As well, the program has a "beacon"mode, where a prestored CCW message can be scheduled to go out at precise time intervals based on the computer's clock. And Now,Something to think about... We have seen that by synchronizing our receiver to the keying at the remote transmitter it is possible to shrink the passband of our receiving filter down to a rather astonishing nine hertz or so,in the process eliminating much of the noise that would otherwise render the signal unreadable. That is fine for coherent CW stations, but what about ordinary CW.where the guy at the other end is sending by hand and his carrier can turn on or off at any arbitrary time? After all, the overwhelming majority of amateur stations around the world don't use coherent CW.Is there anything we can do with DSP to help dig these weak signals out of the mud? Consider a train of RF pulses, where a carrier is switched on for, say 100milliseconds,then switched off for the next 100milliseconds. Let's assume this signal is received by a normal amateur side band rig with its local oscillator tuned800 Hz away from the incoming carrier. Looking at the audio coming out of the speaker, what frequency components are present? Well,that depends on your point of view! On the one hand,if we take the position that for a frequency component to exist it must be present always with unvarying amplitude and phase,then we must presume many frequencies, all adding up to make the on/off pulsed waveform. On the other hand,if we examine the waveform on an oscilloscope, we see 80 cycles of a pure(800 Hz) sinewave inside each pulse with no other frequencies present during either the pulses or the silent period. Common sense tells us there is but one frequency(800 Hz), and that its only there some of the time. Since there is only one frequency in the signal, it would make a lot of sense to use an arbitrarily narrow filter (ie, 0-Hzwide) centered on that 800-Hz tone. Such a filter would eliminate all the noise (QRM, QRN)except that which happened to be on exactly the same frequency. The way we usually design highly selective (narrow) filters is to take many samples spaced over a long interval of time and combine them mathematically to isolate the contribution of all the individual frequencies. Analog filters use the same techniques, storing energy in reactive components. The narrower the filter (the higher the "Q") the longer the energy from any given cycle stays around inside the tank circuit. The drawback with all these filters is that it takes along time for them to attain their final output value after a step change in the input signal (as is the case when a CW carrier is keyed). The sharper a filter is in frequency, the longer it takes (in time) for it to respond. This is why experienced CW operators will tell you it doesn't pay to use IF filters much narrower than about 250 Hz when trying to copy code by ear. Narrower filters actually make it harder to copy because they obliterate (smear) the sharp leading edges of the keyed tones which the ear needs to recognize code patterns. But the characteristic time spreading of such filters is not a result of some insurmountable law of physics! It follows entirely from the particular way they were designed: they observe a signal over a long time span to make fine distinctions in frequency in order to realize the narrow response. The ideal filter for copying ordinary CW would be 0-Hz wide and have an instantaneous response time. When the key went down at the transmitter,the output of the "sliver" filter at the receiver would reflect the amplitude change immediately. What approach can we take in designing such a filter? A good question is: for any given signal, how much of it do we need and how long do we have to observe it before we can break it down into its component frequencies and state what the amplitude at any specific frequency must be? Could we take just a momentary snippet out of a waveform, analyze it extensively on a fast computer and figure out its complete spectral content just from that tiny portion we looked at? The answer will surprise you. The answer is yes! In fact, there is no minimum amount of time for which we need to observe a waveform in order to completely characterize it. In theory, we could sample a complex signal for just one instant and immediately know the amplitude of an 800-Hz sinusoid in it regardless of whatever other frequencies might be present. The calculation gets a30 QEX whole lot more complicated when many other frequencies are present,but it still can be done. The most straightforward case is when we know there is only one sinusoid, say at800 Hz, and we want to ascertain its amplitude with just one instantaneous glance at the waveform. Hmmm,if we knew the phase, it would be easy. With only a single voltage measurement taken at a known point along a sine curve (phase), we can determine its amplitude. Not knowing the phase in advance, we have to solve for it. Which means we need at least two (2) independent measurements, both taken at the same instant in time. The actual sampled voltage will do for one of them. For the other we can use the first derivative the rate the voltage is changing at that particular moment. This derivative can be obtained without taking anytime: convert the voltage to a current,run it through an inductor, and measure the instantaneous voltage across the inductor. Here we have an implementation of our "ideal CW filter for the simplest case where there is only one frequency component to resolve. When there are many frequency components to separate, we will obviously need more information, but it is all available in that same instant; the higher order derivatives are mutually orthogonal, hence independent, and they're there for the measuring. So it is possible (at least in principle) to design a CW receiving filter with arbitrarily narrow bandwidth (approaching 0 Hz) and virtually instantaneous response time. What's needed is hardware to differentiate a signal repeatedly and a very fast computing machine to crunch the numbers. Obtaining the software. The following can be ordered from the author: Coherent CCW software package,$20 Bare circuit board for constructing Sigma-Delta interface, $24 Assembled and tested Sigma-Delta board, ready to hook up, $95. All prices in US dollars, and please include $5 for airmail shipment to anywhere on the planet. For more information on CCW, contact: CCW interest group Peter Lumb,G3IRM 2 Briarwood Ave Bury St Edmunds Suffolk IP33 3QF England 0029 Sommet VertSt. Adolphe d'Howard, QCJOT 2BO Canada24-hr BBS: 514 226 7796 by Bill de Carle, VE21Q