** 2 page regular feature / 1571 words **

Good vibrations

Shiuming Lai discusses some advanced STe sound programming techniques...

Textbook explanations of the STe/TT's enhanced sound system have been doing the rounds for years; you have a set of control registers for playback and simple global attributes - tone, volume and balance. Correspondingly, numerous STe-enhanced programs exist (mainly games, on the PD and shareware circuit) whose only sonic improvements are some samples played using no more than these standard controls, perhaps in stereo.

However, there is a lot of scope beyond using hardware features, so the easy route as often taken is to do what the rules say it can. Safe, but hardly exciting!

To appreciate the newer hardware let's take a look at how things were done before. The Yamaha YM2149 PSG (Programmable Sound Generator) fitted in the original ST, and all derivatives for compatibility, can be made to play sampled sound.

Playing samples incurs a performance hit, increasing with the higher frequencies necessary for good quality. This is because playing a sample through the Programmable Sound Generator (PSG) requires the main processor to spoon feed it with large amounts of data - far more than could ever be used by internally synthesised sound (see boxout). Additionally the PSG also has to use a Digital to Analogue Convertor (DAC) because the PSG doesn't contain a dedicated circuit for this job.

Atari caught on to this and with the revamped STe, added a new stereo sound system driven by DMA (Direct Memory Access). DMA relies on a special processor to handle the mundane and time consuming task of moving large memory blocks between sub-systems within the computer, relieving the main processor to get on with something more useful, like processing.

So there it was - a new sound system capable of playing straight samples without burdening the main processor. around about the same time another phenomenon was all the rage - Soundtrackers.

Let's take another step into history, this time, looking at the ST's old rival, the Amiga. Commodore's machine had a custom sound chip with four independent voices, which could be either synthesised or, more interestingly for our purposes here, sampled.

The architecture of this chip featured four DACs, each with variable sample period. It wasn't long before some bright spark realised the sample period increments were small enough to make a fair approximation of a musical scale, simply by playing a sample faster or slower. Thus, the idea was born to make realistic four-track music, using samples of real instruments. Although not the best method, the fidelity offered by 8-bit resolution and the available processor power of the day didn't warrant anything more advanced.

The STe/TT models have two audio DACs, also with step-variable sample period, but only enough give a rather less flexible 6.258KHz, 12.517KHz, 25.033KHz and 50.066KHz playback frequencies. We need to use software to generate more voices and overcome the small replay frequency set, both of which are required to play a soundtracker tune, more commonly known as tracker "modules" or "mods".

Pitch shifting

Professional electronic musical instruments commonly use samples for sounds. In contrast, they don't achieve this by modifying the D/A conversion rate, instead, the DAC runs at a constant and the sample is stretched or squashed with respect to time before reaching it.

Stretched samples produce a lower tone while squashed ones produce higher notes. This offers the advantage that the basic quality of the sound can then be enhanced.

In the first case, interpolation can be used to smooth out any coarse quantisation steps relative to the magnified sampling interval caused by stretching - the equivalent of "jaggies" resulting from enlargement of bitmap images.

A simplified illustration of linear interpolation is shown in figure 1, where the mean average of the preceding and succeeding amplitude levels is used to "fill the gap." Further improvement can be obtained with high order polynomial interpolation, which accounts for a range of preceding and succeeding values. This technique of artificially increasing the information density is called "oversampling" and is standard on most digital audio sources.

** STRETCH.GEM here **
** caption **
Figure 1: Post-processing of a stretched digital sound
** /caption **

This is how it has to be done on an STe and the only chip able to perform this pitch shifting is the main processor. Unfortunately a big chunk of its power can be drained while engaged in this activity. Consequently, real time interpolation is practically impossible on a stock STe except at the lowest frequencies, and is not usually implemented. The Falcon's D/A circuitry also has a fairly limited number of conversion rates, although they are of much higher, 16-bit resolution, complemented by a fast Digital Signal Processor (DSP) which is typically used for both of these jobs.

Track mixing

The second stage in playing a tracker module on an STe is getting more than one voice out of each hardware channel. An obvious way to do this is to sum the amplitude data from n voices then normalise to avoid clipping distortion (when the resultant value is out of the DAC's input range and the output literally gets clipped off at the amplitude extremes, making a very nasty noise) by scaling to the factor 1/n. As far as the DAC is concerned the input is still a single stream of data and for all of its preset frequencies, this works just fine. At higher frequencies though, a lot of processor muscle is needed. For instance, at 50KHz stereo, it has to generate around 100Kb of data per second more or less flat out.

Alternatively, the mixing can be simulated by time-domain multiplexing two voices into one signal, and setting the DAC to twice the speed to compensate for the twofold increase in data. Look at figure 2 (ignore the DMA buffer in the middle for the moment) to see how this works.

** MULTIPLX.GEM here **
** caption **
Figure 2: Functional block diagram of tracker replay operation with multiplexed track mixing
** /caption **

Multiplexing halves the processor load in generating data for the DACs but results in a high frequency noise component, proportional to the speed of multiplexing. It's only of any use with the DACs at 50KHz, where the noise frequency is pushed up to transparent levels. In reality, both methods are compromises - the first compresses dynamic range whereas multiplexing loses frequency definition.

After mixing all that's left is to transfer the data into the DACs. Normally, samples are data physically present in memory. Tracker music, on the other hand, is effectively a long sample that is dynamically synthesised. It must therefore first be placed into memory. Simply dumping to memory isn't a good idea because you'll soon run out of it - a three minute module at 50KHz stereo would require almost 18Mb!

The solution is to apply a technique more commonly known for displaying smooth animated graphics, called "dual frame buffering." A series of seamless frame transitions is achieved by writing data to one "invisible" buffer of a pair while the other is output, then switching the physical frame address pointer when the invisible buffer is ready. A good example of this method being used to overcome physical memory limits is the NED Player sample player, reviewed last issue.

Stereo sample data are stored in memory as alternate left/right channel bytes, so if multiplexed mixing is used, the two multiplexed signals have to be multiplexed together themselves to fit in a single linear memory space suitable for the playback hardware. With the standard track ordering shown in figure 2, the respective sample bytes would be stored in memory as shown in figure 3.

** MEMORY.GEM here **
** caption **
Figure 3: Byte ordering of multiplexed tracker data output in memory
** /caption **

Buffering the sound data also has the advantageous side effect of ensuring it flows constantly, which is useful because the pitch shifting and number of tracks in use at any particular time varies (and can sometimes even kill the processor if a high playback frequency and lots of effect commands are used). Figure 4 shows the "insides" of this DMA buffer.

** DMABUF.GEM **
** caption **
Figure 4: Data flow from processor into memory and out to DACs
** /caption **

STe tracker programming has broken many widely held beliefs to reach great heights. As always, we are indebted to demo coders for their skill in constantly pushing beyond technical boundaries.

I hope you have found this article useful. The techniques can be adapted for other uses; for example, the multiplexing method of mixing could provide multi-voice effects for a game, with much less processor overhead. Sounds truncated because another effect started playing before the current one finished would be reduced, giving a more convincing and polished effect.

** Boxout **
Analogue synthesised sound

Getting a noise out of a programmable sound chip only takes a few bits of data - set some registers to activate one of the preset waveforms, generated by on-board oscillators, then there might be some other registers offering more control over the sound like envelope shaping or filters.

A digital representation of the same sound explicitly defines the overall waveform shape, rather than containing information about how to generate it. To use another graphics analogy, synthesised sounds are like vector images whereas samples are "bitmap sounds."

Figure 5 is a comparison of the data storage requirements between a hypothetical PSG and digital sample, for a complete cycle of a wave with period 1s.

** PSG.GEM **
** caption **
Figure 5: Analogue synthesised wave and digital representation
** /caption **
** /boxout **