Main

 
AppleCrate Polyphonic Music Synthesizer

AppleCrate Polyphonic Music Synthesizer

Michael J. Mahon – April 30, 2005
Revised – July 21, 2005
Revised - October 04, 2008

Introduction

After I constructed and tested the AppleCrate, a parallel computer made of eight Apple //e’s, I wrote several test programs and demonstrations, but did not have a real "application" that used its parallel computing capabilities to produce a useful result.

Some conversations with other Apple II enthusiasts, particularly Simon Williams and Patrick Collins, led me to consider whether the sound producing technology of DAC522 first exploited in Sound Editor v2.2 could be usefully adapted to the AppleCrate.

Since CRATE SYNTH was initially written, I have constructed a 17-processor AppleCrate II, that uses the program to create 16-voice music (WhenIm64.mp3).

Background: DAC522

DAC522 is a software digital-to-analog converter for the Apple II that plays a stream of 11.025kHz sound samples through the 1-bit Apple speaker port using a pulse-width modulated (PWM) stream at a pulse rate of 22.05kHz, or two pulses per sample. The 22kHz pulse rate renders the pulses themselves virtually inaudible to human ears, but the average output, changed by varying the pulse width in proportion to sample values, reproduces the sampled sound to a precision of 5 bits. Since the period of a 22kHz pulse is 46 Apple II clock cycles, and the Apple II can only create pulse edges to one cycle resolution, at most 32 distinct pulse widths, or 5 bits of precision, can be produced using equal pulses.

DAC522 is a set of pulse generators, each of which generates two pulses with phase and width controlled to one cycle accuracy, while fetching the next sample of a stream, testing for the end of the stream, computing the generator corresponding to the next sample, and vectoring to the selected generator. This process continues at a rate of 11kHz until the entire sample stream has been played, when DAC522 returns to its caller.

At a rate of 11kHz, 48KB of sound samples in an Apple II’s memory are played in a little over four seconds. It was evident that a music synthesizer capable of sustained sounds could not practically be based on DAC522 in its original form.

Direct Digital Synthesis

The fundamental problem a music synthesizer must address is the production of notes of many frequencies and arbitrary durations having specified waveshapes (voices). As has been noted, simply storing all the needed combinations in limited memory is not practical.

A workable solution is to store each waveshape needed as a single-frequency sample, then resample this waveshape on the fly to create any desired frequency.

Most instrument sounds change as a note sounds. For example, many sounds have an "attack" that sounds different from the rest of the note. And many instrument sounds change in amplitude as a note is held, usually decaying in amplitude or changing in "timbre" or spectral composition. Synthesis of notes with changes appropriate to particular instruments, therefore, requires that the synthesized waveform change as a function of the length of time the note is played.

As we shall see, SYNTH performs all the calculations required to carry out these tasks while it is generating the pulses corresponding to the previously calculated sample.

SYNTH Data Structures

The data structures used by SYNTH are designed for high speed. They make extensive use of single-byte pointers to page-aligned data structures to speed operation. In most cases, the data structures are 256 bytes long, so there is little waste in aligning them to page boundaries, but there are a few cases in which page-alignment leads to sparse memory usage, most notably in the code for SYNTH itself.

When SYNTH is called, it starts fetching from the "music" stream pointed to by the zero-page pointer ‘music’. A music stream is a sequence of "events" that are fetched sequentially by SYNTH and determine its operation.

Each "event" in a music string is 3 bytes long. The first byte of an event is the "op" byte. The sign bit of the op byte specifies whether it is a "note" or a "command". Positive bytes specify notes, from 0..127 using the standard MIDI note mapping (with some exceptions to be described later). Negative bytes specify commands, including "rest" ($81), "voice change" ($82), and "stop playing" ($80).

The note number indexes two pitch tables embedded in the SYNTH code, ‘pitchhi’ containing the integer part and ‘pitchlo’ containing the fractional part of the phase increment. These tables reside in the "holes" following ‘gen5’ and ‘gen4’, respectively. Pitches higher than MIDI key number 112 are "silent" (frequency = 0), since their frequencies are too high to be properly reproduced at an 11kHz sample rate. (Key number 127 is a special case used for atonal percussive voices, as described later.)

If the op byte specifies a note or a rest command, the next two bytes specify the duration of the note or rest in 92-cycle sample periods (corresponding to a sample frequency of 11.092kHz). If the op byte is a voice change command, then the second byte is an index into the "voice table" specifying the voice to be played next and the third byte is ignored. If the op byte is a "stop playing" command, both following bytes are ignored.

The "voice table" is a variable-length table of the voices loaded into memory. It is indexed by voice change commands embedded in the music string. An entry in the voice table is a single-byte page number pointing to the "envelope" table corresponding to a stored voice. The maximum size of the voice table is 128 entries, though in practice its length is limited by the smaller number of voices that will fit into memory. The voice table is embedded in the region of memory occupied by SYNTH, in the "hole" following ‘gen3’. A voice is selected by storing its envelope page number in the zero-page pointer ‘env’.

A "voice" is a digital representation of the sound of an instrument. It is composed of an envelope page, which is a page-aligned table of page pointers to waveforms, the collection of which specifies the sound of the instrument as a function of time from the inception of a note. Since both the waveforms and the envelope table are page-aligned, single byte page numbers suffice as pointers. Voice waveforms frequently begin with a few pages of unique waveforms, followed by repetitions of a relatively stationary waveform of diminishing or sustaining amplitude. The envelope table thus often refers repeatedly to waveform pages, resulting in greatly reduced voice storage requirements. Since the sound generators are also on page boundaries, starting at page $08, waveforms are represented as generator page pointers in the range of $08..$27 based upon their 5-bit sample values.

Atonal percussive voices are somewhat different, in that they are "played" directly from memory without resampling. (This means that atonal voices occupy 11 thousand bytes per second that they "sound".) SYNTH plays these voices without special case code by specifying a "pitch" of 127 ($7F) whose pitch table entry is a frequency of 1.0 (in units of 43.31Hz), corresponding to advancing exactly one stored waveform sample per sample period.

SYNTH Structure

The framework of DAC522-like sound production requires that all the work required for computing the next sample be completed within the 92-cycle sample period, simultaneous with the production of the two pulses specified by the previous sample.

SYNTH is composed of 32 distinct pulse generators, one for each duty cycle, and each is aligned on a page boundary so that only its 8-bit page number need be specified to vector to a given pulse generator. Initialization code and music stream processing is embedded in the "holes" between pulse generators.

Each pulse generator creates different precisely timed pulse widths, but all generators do basically the same work between pulse edges. The listing of generator 0 is shown below:

0800: 8D 30 C0 >6    gen0     sta   spkr       ; <==== start time: 0
0803: EA       >7             nop              ; Kill 2 cycles
0804: 8D 30 C0 >8             sta   spkr       ; <===== stop time: 6
0807: 85 EB    >9             sta   ztrash     ; Kill 3 cycles
0809: E6 ED    >10            inc   scount     ; Compute envelope
               >11            ciny             
080B: F0 01    >11            beq   *+3        ; If =, branch to iny
080D: A5       >11            dfb   $A5        ; "lda $C8" to skip iny
080E: C8       >11            iny               
               >11            eom              
080F: 18       >12            clc              
0810: A5 EC    >13            lda   frac       ; Compute next sample
0812: 65 FE    >14            adc   freq       
0814: 85 EC    >15            sta   frac       
0816: 8A       >16            txa              
0817: 65 FF    >17            adc   freq+1     
0819: AA       >18            tax              
081A: B1 06    >19            lda   (env),y    ; Next sample page
081C: 8D 30 C0 >20            sta   spkr       ; <==== start time: 46
081F: EA       >21            nop              ; Kill 2 cycles
0820: 8D 30 C0 >22            sta   spkr       ; <===== stop time: 52
0823: 85 EB    >23            sta   ztrash     ; Kill 3 cycles
0825: 8D 2A 08 >24            sta   :ptr+2     
0828: BD 00 00 >25   :ptr     ldaa  0*0,x      ; Fetch sample.
082B: 8D 3C 08 >26            sta   :sw0+2     
082E: C6 FC    >27            dec   dur        ; Decrement duration
               >28            cdec  dur+1      
0830: F0 02    >28            beq   *+4        ; If eq, branch to dec
0832: EA       >28            nop              ; Else kill 2 cycles and
0833: AD       >28            dfb   $AD        ;  "lda xxxx" to skip dec
0834: C6 FD    >28            dec   dur+1      ;    of zero-page param.
               >28            eom              
0836: A5 FD    >29            lda   dur+1      
0838: F0 03    >30            beq   :quit      ; Finished.
083A: 4C 00 00 >31   :sw0     jmp   0*0        ; Switch to gen, T = 89
               >32   
083D: 4C 40 09 >33   :quit    jmp   quit

As shown, ‘gen0’ is of typical length, and only uses $40 bytes, or ¼ of a page. The remainder of each generator’s page is used for other SYNTH code, or data tables, or is left unused. This sparse use of space is a conscious tradeoff to reduce the time required to vector dynamically to each generator. Approximately 5KB of SYNTH’s 8KB is unused, split into 26 page fragments.

The critically timed events are the sta spkr instructions. They toggle the state of the Apple’s speaker output and thereby generate the variable width high frequency pulses that perform the digital-to-analog conversion that is responsible for the synthesizer’s audio output. (Note that the cycle counts are all relative to the first cycle of the fetch of the corresponding instruction, not the execution cycle during which the toggle actually occurs. Since all toggling instructions are identical 4-cycle instructions in which the toggle occurs at the start of the 4th cycle, this relative method of counting produces correct results.)

This generator, gen0, generates the shortest duty cycle used by the synthesizer, corresponding to a sample value of 0. While this generator is "playing" a sample with a value of zero, it is computing the value of the next sample and doing all necessary bookkeeping, as detailed below:

Lines 10-11 count the number of samples produced so far, and advance the Y register by one every 256 sample times, or a little less than 1/40th of a second. The Y register, then, is an index into the current "envelope" page.

Lines 12-18 compute the next sample from the current waveform page by adding the 16-bit phase increment to the phase accumulator, for which the location ‘frac’ is the fractional part and X contains the integral part.

Lines 19 and 24 set up the current waveform page from the envelope table.

Lines 25-26 retrieve the correct waveform sample, stored as the page number of the corresponding pulse generator, and uses it to set up :sw0 to vector to that generator next.

Lines 27-28 decrement the 16-bit duration of the note (measured in samples).

Lines 29-30 test the remaining duration and terminate the note if it has expired.

All generators perform all of these tasks, sometimes in a slightly different (but irrelevant) order. There is one exception to this: the "end test" in lines 29-30. This test is performed only in generators 0 through 3, so that a note may play on beyond its intended duration until its waveform amplitude is in the range of 0 to 3 (out of 31). The note fetch routine in SYNTH compensates for any extra samples played by subtracting them from the duration of the following note or rest.

Since all notes start and end with sample values near 0, this has the effect of minimizing switching noises when one note transitions to another. As we shall see, SYNTH further capitalizes on this regularity by continuing to generate pseudo-samples with value 0 during note transtions and control operations, so there are no discontinuities in pulse generation as music is played.

The code for the 32 generators was actually created by an Applesoft program, which scheduled all the specified "work" instructions into the cycles between the speaker-toggling instructions, adding "padding" instructions as needed to produce generator-specific cycle-accurate timings for each of the toggling instructions. From a practical point of view, I found it simply too error-prone to repeatedly schedule all 32 generators manually as the synthesis strategy and code evolved. The BASIC program generates Merlin source code, and takes most of the pain out of making changes.

SYNTH Initialization

CRATE.SYNTH is an Applesoft program run on the master machine that boots the AppleCrate machines (if they are not already serving), and prompts the user for the music file to be played. That file is BLOADed into the master machine’s memory.

The music file contains the music streams and specifies the set of voices for the AppleCrate machines. CRATE.SYNTH reads the specified voices from the master machine’s disk and &POKEs them into each AppleCrate machine, allocating its memory, filling in its voice table, and relocating the voice envelope tables as it goes. As each machine’s loading is completed, SYNTH is &POKEd into its memory and &CALLed, whereupon it loops calling SERVE until a &BPOKE "start" signal is received.

When all the required machines have been loaded, CRATE.SYNTH prompts the user to start, and then it sends a &BPOKE, releasing all copies of SYNTH to start fetching and playing their music streams within three cycles of perfect synchronization.

Empirically, the AppleCrate machines diverge from synchronization at a rate of one millisecond for every 40 seconds of execution. Since up to 10 milliseconds of temporal misalignment is virtually inaudible, if the machines are started in sync, they remain sufficiently well synchronized for song of at least 400 seconds duration without any additional synchronization.

Audio Output

The audio output from the AppleCrate is quite simple. Eight 10k resistors connect each of the boards’ speaker outputs to a mixing node, a 2.2k resistor to ground, paralleled by a 0.1 uF capacitor to serve as a simple first-order lowpass filter. The output at the mixing node is 200-300 millivolts peak-to-peak, and is input to an audio amplifier.

The MIDI Converter

Music files for CRATE.SYNTH are created from standard MIDI files. MIDI.CVT is an Applesoft program that reads a user-specified MIDI file and converts it into a music file, composed of multiple music streams, one for each digital oscillator (machine running a copy of SYNTH) that is needed to play the music.

MIDI.CVT is a work in progress. It has grown incrementally from a conceptual prototype into a usable tool without the benefit of being rewritten for style or machine language speed. Bear that disclaimer in mind as you peruse its code. ;-)

My primary Apple //e has an 8MHz Zip Chip accelerator, which makes even Applesoft acceptably fast as a prototyping language. Still, even with acceleration, MIDI.CVT runs at a fraction of the "real time" required to play a piece of music. Running it at 1MHz will require patience!

MIDI.CVT works by merging all parallel "tracks" of the MIDI file into a single sequential stream of events. The most important MIDI events are "key down" and "key up" events, which specify the pitches and durations of notes to be played. But there are also control events such as tempo and voice changes in the MIDI stream.

Since a MIDI file may contain many parallel tracks, MIDI.CVT maintains multiple buffers, one for each track, which it refills as needed as the merging progresses. This approach permits arbitrary-sized MIDI files to be processed.

Similarly, MIDI.CVT produces multiple parallel music streams as output, which are combined at the end into a single music file. It maintains multiple buffers for the output streams, as well, which are written to disk as they fill, so that memory size does not constrain the size of file that may be processed.

As MIDI.CVT scans the merged stream of MIDI events, it allocates "notes" to idle oscillators, preferring ones that have sounded the current voice previously (and so already have a need for it in memory). In some cases, more oscillators are required than AppleCrate supports (8), so it is necessary to preempt a currently sounding oscillator for the new note. In this case, it chooses the oscillator that has been sounding the longest, in an attempt to make the preemption as benign as possible.

A special case is made of "re-striking" a key that is already sounding—in that case, the current sounding is ended and the new one started on the same oscillator.

When a note ends, its oscillator is returned to "idle" status for re-use.

There is much room for improvement in the way that MIDI.CVT allocates oscillators. The current algorithm is a sequential, one-pass method. It is easy to show that such an algorithm, with no knowledge of future events, cannot do an optimal assignment of notes/voices to oscillators. The consequence of a non-optimal assignment is that each oscillator (SYNTH machine) is required to sound a large proportion of the total voices required for a piece, which can easily overflow the memory of the SYNTH machines.

I am continuing to study ways to create better voice assignments, and I expect that this will result in major changes in MIDI.CVT in the future.

Voice Generation Tools

SYNTH requires voices that are samples of "instruments", represented by 256-byte pages of sampled waveforms compressed by use of an envelope table.

My starting point for current tonal voices has been synthesized instruments played at approximately 43Hz, or the key of F in octave 1. I sample to a .wav file at 11.025kHz, which creates a waveform with a period of about 258 samples—not perfect, but close enough to extract 256-sample waveforms easily.

The waveform is first "ramped" by subtracting the negative envelope of the waveform from it. This has the effect of making each cycle of the waveform start and end with the minimum sample value, allowing note transitions to be made without "pops". After ramping, the waveform is normalized in amplitude so that at its loudest point it covers the full range of 0..31. Then cycles are selected from the total waveform which differ significantly in amplitude, and the envelope table is constructed so that it will reproduce a good facsimile as the note sounds.

Atonal voices are also sampled at 11.025kHz, and the resulting .wav file is "ramped" and normalized in amplitude to a sample range of 0..31 (mapped to $08..$27 for use with SYNTH).

Voice files are named "V.n", where n is the MIDI "patch number" (0..127) corresponding to the instrument. Atonal percussive instruments are mapped to "V.k", where k is the MIDI "note number" (0..127) + 128 for the instrument in the MIDI "percussion channel".

Downloads

ShrinkIt disk image containing SYNTH, voices, and source