Accreted Drivel

Steven Tattersall's personal blog

YM2149 Sync-Square (Part One - How Square Waves Work)

Created on: 2021-07-14

Posted in atari chiptune ym2149


This is the start of a series of posts about the Sync Square feature in Atari ST chip trackers. It turns out that the subject looks simple but starts getting complicated, so I'll be splitting the whole thing into pieces to make it more manageable.

Warning: this topic gets pretty heavy, pretty quickly. So brew some strong coffee before reading.

This first part will be a fairly-in-depth look at how the YM2149 chip in the ST generates square waves, since a good understanding of this is necessary to understand how the Sync Square works.

A quick Introduction to YM2149 Square Waves

If you know your YM, you can probably skip this section, although the fundamentals of note generation are quite useful.

The YM2149 is the sound chip used in the Atari ST. This chip is a licenced version of the AY-3-8910, which dates back to 1979. It's a really simple piece of hardware, which has been pushed to its limits in the past 40 years. Internally it features 3 square wave generators with some rudimentary volume control (16 non-linear levels), a single noise generator and a single envelope generator.

This article is mainly about the square wave generators. There are 3 of them, so they generally occupy the bulk of any piece of music you'll hear on the ST. These are usually called something like channels A/B/C.

For each square generator, there are 2 hardware registers that specify the period of the note. One register uses 8 bits for the least significant bits of the period, the other uses 4 bits for the most significant bits, making a total of 12 bits of period control.

For channel A, the least-significant part of the period is set in register 0, and the most-significant in register 1. (You can find all this in the original datasheet.)

Here is an example in assembly language to set the period of channel A to "$0567". This will indirectly control the note frequency. To select the YM register to write, the 68000 CPU writes a byte to the address $ff8000. Then we write the data to the byte at $ff8802.

It's possible to do each register in 2 instructions per register as above, but in assembly language we can write both a register select and data in one instruction, by writing a 32-bit "longword" starting at $ffff8800, and hence writing to $ffff8800/1/2/3. We then do this twice, once per register. I have filled the remaining bits of the write to "ff" to show that they don't do anything:

    move.l      #$00ff67ff,$ffff8800.w	; Write $67 to register $00 (least-significant period)
    move.l      #$01ff05ff,$ffff8800.w	; Write $05 to register $01 (most significant)

What effect will this have? The datasheet tells us that the period of the note will be

    F(t) = F(master) / 16TP

Where F(t) is the resultant note frequency, F(master) is the clock speed of the running YM, and TP is the period we have set. The YM in the Atari ST runs at 2 Megahertz, so 2000000 cycles per second, so our note frequency will be

    F(t) = 2000000 / (16 * $567) = 125000 / 1383 ~= 90.38 Hertz

... a reasonably low note, a rather off-key "F" in a low octave.

Armed with this information we can also predict the exact lengths and shape of the square wave. Since the period is divided by "16TP", we know that the wavelength of the full square wave is 16TP cycles, and each half of the square wave will be 8TP cycles. In ASCII form this is:

        <------one square wave --------->

        <--8TP cycles--->

"Up"    +---------------+               +-- ...
        |               |               |
"Down"  |               +---------------+

                        <--8TP cycles--->

We can also tie this "8TP" cycles on the YM chip, to the number of cycles on the ST's 68000 CPU. We know the ST's 68000 CPU and the YM run off the same clock via a divider; the ST runs at 8MHz, the YM at 2MHz. Every 1 cycle on the 68000 matches 4 cycles on the YM2149. Slightly simplified, every NOP instruction on the CPU, which takes 4 cycles, is one single clock cycle on the YM. And in our diagram above, the "8TP" YM cycles will map to "32TP" 68000 cycles, or "8TP" 68000 NOP instructions. This will come in useful later.

What's Sync-Square?

One of the more interesting effects you can create using the Atari ST's YM2149 soundchip is "sync-square", or "square-sync". (It's also called "hard-sync" sometimes, although that term can be misleading.)

Normally with the YM2149, when you program tones using the square-wave features, you set the note period, and the YM autonomously goes off and oscillates the square wave up and down, based off that period. Ultimately, the YM's square wave is "running free" and we can't control it other than its frequency. For some effects, it would be really handy if you could control the timing of the edge of the square wave, and sync-square is designed to do that.

For example, using the envelope generator you can create a simple repeating sawtooth or triangle envelope at a low frequency to create a harsh or soft bass tone. If you then, on the same channel, play a square wave at a similar frequency, you get a really interesting interference effect between the two, which is one of the "squelchy bass" effects most characteristic in classic ST chiptunes.

However, since you can't control the square wave's position relative to the buzzer, you can't guarantee the timbre of the effect when a note starts. So when the note starts it might sound "thin" or fat, depending on the relative positions in the waves of the envelope and square. (You can, however, easily control the start of the envelope generator. When writing to the envelope register, the envelope itself is guaranteed to restart.)

Sync-square is designed to help with that, and trigger the square wave edge under the user's control. It was first documented by the Lord of YM, Gwem, in Alive Magazine issue 9 in 2005, and involves a specific sequence of writes to the YM:

    move.l      #$00000000,$ffff8800.w
    move.l      #$01000000,$ffff8800.w
    move.l      #$0000xx00,$ffff8800.w
    move.l      #$0100xx00,$ffff8800.w

... where the "xx" is the subsequent note period. It's worth reading the whole article, it's really good. One thing to note is that we can't control the "up" or "down" nature of the square wave, only the point at which we change it. Why does setting the period to 0, then to a new period, work? That will be explained later, but the introductory section should give a good clue.

Using this technique should allow us to do some really funky effects. For example, we could do a perfect PWM (Pulse-Width Modulation) effect with one timer. By setting a timer to doing a square-sync to generate an edge, then set a YM period to generate another edge somewhere in between timer interrupts, we can vary the width of the pulse. Time for an ASCII diagram!

    PWM with sync-square
    ====================
    
    A                  B
    <----timer width -->

   timer             timer
   interrupt         interrupt
   syncsquare        syncsquare  
    |                  |
    v                  v

    +---------------+  +---------------+ ...	output wave
    |               |  |               |
    |               +--+               +--+
    <---YM period-->

The ST's CPU timer interrupts happen at A and B, and reset the square wave. For example, if we set the timer to interrupt every 6000 (YM) cycles, and then use a YM period of 5000 cycles, we'll see the YM flip the square wave for us in between the interrupts, giving a 5:1 ratio between successive edges of the square wave (5000 cycles : 1000 cycles). This gives a "thinness" to the note, similar to duty cycle effects on the C64's SID chip. GwEm's Maxymiser is already using this technique, using some of the ideas I'll outline here, and we'll be adding it to ttrak at some point soon.

This is all well and good, but the main problem is that it doesn't work 100% reliably in the form given above. In some cases the code will fail to do a clean edge, and you get horrid clicking and glitching noises. The result varies depending on the period used and the distance between the timers. It also fails on emulators, but that's another story.

To improve on this, we need to understand a bit more about how the YM seems to work, and how the current square-sync code is functioning.

YM Under the Bonnet

People have been discussing the YM for years, as well as the AY-3-8910, the YM's older sibling (the YM is a licenced copy, but with some differences). I don't have access to a logic analyser, but we do have access to the original datasheet. We also have access to the ST's schematics, which show how it is connected to the CPU via GLUE, and also this fantastic die shot of the AY-3-8910. There's also a huge bunch of posts on Atari-Forum about how it might work. We could just take all these snippets and assume that they are correct, but never assume!

Taking all this together, I did some experiments to verify the behaviour.

If we disable all the interrupts on the ST, we can then run in "lock-step" with the YM and carefully alter register values, then record the output using professional equipment (that is, a breadboard wire poked into the monitor jack, connected to a 3.5mm cable.) Doing this on an ST it's perfectly possible to exactly control the behaviour of a square wave's edge and measure the effects that different period settings have.

My conclusions were effectively the same as what the Hatari developers had come to. As far as I can tell:

For all my tests of different periods and cycle write delays, the output from an ST seems to give absolutely predictable outcomes, matching the rules above. On the STE it doesn't, but we'll look at that later.

There are some clues that we're on the right track. For a start, this model is really simple, and that matches a 1979 chip which is visibly really simple. From the connections on the die shot images, we can see that the clock pulse is connected to the tone generators after a 3-stage divider (divides by 8) so that ties in too. There is also incredibly little logic in the tone generator blocks too.

The update every 8 cycles neatly ties in with the calculation that a note's frequency is "master clock / (16 * register period)" as described in the datasheet. It's 8 rather 16 since we need to both rise and fall to complete the square wave, so it updates twice as fast. This approach also explains why setting a period of 0 acts the same as a period of 1: the internal counter will always be greater-than-or-equal-to 1 at every update, so an edge will be generated every 8 cycles in either case.

There are some important things to note here:

  1. To ensure a square wave edge is created when we write to the period register, the written period must be something equal to or less than the current value of the YM's internal counter. Sometimes this value won't be known, but at other times we will know it (within a range tolerance) and we can use this to our advantage.
  2. For the YM to spot we have set, we must wait for 8 YM cycles to pass for it to notice the set period. If we don't, the YM can "miss" our setting. From our calculations earlier, assuming a 2Mz-8Mhz clock ratio, 8 YM cycles comes out at 32 CPU cycles ("8 Nops"). This "32 cycles" has already been discovered by Ben of Overlanders in the comments to this video.
  3. If we are writing 2 register values to the YM (for both coarse and fine), it's possible that the YM updates while only one register is set and it sees a partial update. When the YM updates, it might see a "small" period value and add a squarewave edge without you wanting it.

Doing some tests

Given our theory about how the YM works, we can do some tests. The 68000 and YM run at a fixed ratio of 4:1 cycles, and on the 68000 we can turn off all interrupts and do very accurate cycle-timed tests where we change registers at precise times, then record the output.

Here's an example piece of code. We run it in Supervisor mode with all CPU interrupts disabled by setting the Status Register to $2700. The cycle timings (as multiples of 4-cycle NOPs) are commented for each line, since they will be important later.

    move.w  #2000-1,d7                  ;            d7=loop counter
loop:
    move.l  #$00cc02cc,$ffff8800.w      ;7 nops      write short period (reset square)
    nop                                 ;1 nop
    move.l  #$00ccf4cc,$ffff8800.w      ;7 nops      write full period
    nop                                 ;1 nop
    rept   (240*8)                      ;1920 nops
    nop
    endr
    dbf    d7,loop                      ;3 nops

Let's assume that register 1, the coarse period, is set to 0. So all period settings will be in the range [0..255]. This makes our reasoning simpler and avoids the "updating 2 registers" problem.

What do we expect to happen?

And the good news is that, if we run this on an STFM, it creates a lovely clean square wave. You can try it yourself, since I'll attach the code and .PRG file at the bottom. Congratulations, you've made a square wave the hard way!

Here's a view of the output recorded from my STFM:

Even better, using self modifying code, we can change that "$f4" above to change the period the YM uses. If the period is over $f3 or above, we get a pure square wave, since this is enough to ensure the YM counter never reaches the set period before the code loops and we write "2" to the register again.

If we set something between say, $80 and $f0, we hear a waveform resembling the Pulse-Width Modulation effect mentioned earlier. The YM will count up to e.g. $80, then make a square wave edge, and reset the counter. Then, because we are over halfway through our loop, the YM won't have time to do another square wave edge before we reset it. The closer to $f1 we get, the "thinner" the sound it creates:

    A                  B
    <-----our loop ---->

  Reset              Reset
    |                  |
    v                  v
    +---------------+  +---------------+ ...	output wave
    |               |  |               |
    |               +--+               +--+
    <---YM period-->
         e.g. $e0

So far so good. If we set $f1 or $f2, we get something interesting: a pattern where sometimes the square wave resets, and sometimes it doesn't, but the pattern repeats in a stable fashion. This is because our entire loop takes 1939 nops (1939 YM cycles), or 242.375 YM updates -- not a whole number of YM updates. So on some loops (actually 3/8 of them!), there is one more YM update before the period is "reset".

With most period values, this doesn't matter, but if you set $f1, for example, the YM should always create its own square wave edge because the period is low enough to hit that value before the loop gets back to a reset. After we set the period to "2" for our own square sync, sometimes a square sync will happen (creating an immediate second edge, looking like a "spike" in the sample), and sometimes the internal YM counter will only be 0 or 1, and the edge will be missed (looking like a big "normal" square wave edge.)

You can actually see this in this sample waveforms here, a mixture of spikes and big edges:

Three out of every 8 reset points "miss" because not enough YM updates happen and create a big, square wave, but all the rest do. And the pattern is stable (look at the top "spikes" repeating in a 2:2:1:2:2:1 pattern).

Even better, we can count the ratio of "spikes" to "normals". I count 69 items: 26 "normal" edges and 43 "spikes". 26 divided by 69 is 0.3768, almost exactly 0.375 or 3-eighths (the error is because I picked an total not divisible by 8). This suggests our theory is indeed correct!

If we choose a period of around $80 to $f0, we can get a nice stable Pulse-Width Modulation effect that we are looking for. The nearer to $f3 we get, the "thinner" the sound, until we hit the "wobbly" effects at $f1 and $f2. Excellent! Let's ship it. But...

The Bad News: the STE

Notice I said "if we run this on an STFM".

If we run it on an STE, it doesn't quite work. Even on "stable" periods like $f4 or above, there will occasionally be a glitch where the square wave edge doesn't appear. If you try $f1 or $f2 you tend to get a horrible unstable noise. What's going on?

(Here's a picture of the STE "missing" a sync-square edge:)

This took me a long time to work out. My first theory was that the writes to the YM, through the GSTMCU on the STE, somehow took longer, or the time it was written was unstable. However, the rediscovered ASIC designs suggest that this should not be the case.

It was only when I looked at the STE schematics that I realised the probable cause. The STFM runs from a single 32MHz clock, which is then subdivided down to 8MHz for the CPU and 2MHz to the YM. This means that all the chips run aligned with one another.

The STE is different; I suspect it was rejigged to handle things like DMA sound. The STE has 2 separate clock oscillator chips. On PAL, the oscillator for the CPU is at 32.084988Mhz (called CCLK in the schematics).

The oscillator for the YM is called SCLK (or CLK2), and runs at 8.010613Mhz. It's fed directly into the YM and doesn't touch the other clock. We can be confident about these values since they are labelled on the components on the circuit board.

And guess what? The ratio of these two clocks is not exactly 4! From my calculator, it's 4.005309956. This doesn't sound much, but it means that 32 CPU cycles is no long exactly 8 YM cycles -- it's 7.98 or so. And since the clocks will drift apart, and probably oscillate relative to each other, there are cases where leaving 32 CPU cycles isn't enough for the YM to complete its update and "notice" that we have set a low period momentarily to force a square sync. If the clocks don't oscillate it would happen around 1 in every 95 YM updates. In practice, the failure is quite random -- sometimes it won't happen for a few seconds, other times it will be frequent.

When using a period of $f1 or $f2, this clock difference is enough to account for several YM fewer cycles in the time the CPU takes to loop. This changes the pattern of syncs from the "three out of eight" on the STFM, to almost a regular pattern (but not quite)

In short: on STE, you can't trust an exact 4:1 ratio of CPU cycles to YM cycles., which might have an impact on sync square code, because 32 CPU cycles might not always be enough.

Other bad news: Emulators

Currently some of the emulators don't support this kind of effect fully yet either. Hatari is very close, and I suspect it will support it very soon; it's low-level sound generation emulator is very good, it's just the handling of register writes which, at the time of writing, didn't cope with quick repeated writes with precise timing. Steem I have not tested recently, but I'm led to believe it didn't work correctly either.

Towards a better sync square

That's more than enough for Part One. In Part Two I'll look at applying some of the lessons from this part, to creating a robust sync square effect that avoids some of the problems we've had before. Some of them are probably becoming apparent from the work we've done here.

Sample code

This file syncsquare.zip contains an executable .PRG file for ST or STE, plus the (very hacky) code to build it. Use the left and right cursor keys to change the period settings, and hit space to play the tone for a second or so. I've also included my recordings of the output in WAV format for a range of periods, both on ST and STE.

Thanks

Thanks to gwEm/PHF and Damo/RG for the original impetus to look at this topic, and to Ben/OVR for the notes about the "8 NOPs" approach. Also thanks to spkr/SMFX and ggn/KUA for testing on their hardware and proofreading!

Back to Index