The twelve notes
Why are there twelve notes?
Before we try to understand how we ended up with a 7-note scale (not a 5, 6, or 8-note one), let's understand why we’re choosing from a set of 12 notes to start with.
When you pluck a string (on a piano, guitar, violin etc) or create a pressure wave in a tube of air (organ, flute, human throat etc), the instrument creates not only a root note corresponding to the physical characteristics of the string (or air column) that is vibrating, but also a series of harmonics, which are frequency multiples of the root note. The harmonics are: the octave (2:1 frequency ratio), the perfect fifth (3:1), another octave (4:1), the major third (5:1) etc. These harmonics can be shifted down (by dividing them by multiple of two) so that we can express their pitch ratios relative to the root note. For example, the perfect fifth has a pitch ratio of 3:2 relative to the root note (it's 3:1 divided by 2:1) and the major third has a pitch ratio of 5:4 (5:1 divided by 2:1 divided by 2:1). The further out you go up in the list of harmonics, the more complex the frequency ratios become (see table below).
We consider octaves (2:1, 4:1, 8:1 etc) to be the same note, so one way to create new notes is to use the sequence of fifths. If you stack fifths on top of each other from F (chosen arbitrarily), you end up with the notes F–C–G–D–A–E–B–F♯–C♯–G♯–D♯–A♯–E♯–B♯–F♯♯–C♯♯ etc. We could create an infinite number of notes, but let's observe that E♯ is the 13th note of the sequence of fifth and its frequency ratio is 3:2 to the power 12, or 129.74. That's pretty close to 2:1 to the power 8 (128), so what if we just said that the 12th fifth is the same as the 8th octave and we call that note "F" instead of “E♯”? We can run the math and build a scale with 12 notes entirely based on the sequence of fifths! By arbitrarily starting the sequence on F, the first seven of these notes are the white keys, and the last five are the black keys! This doesn't explain why we want seven keys to be white (not four, five, six, eight, or nine), but it explains why we end up with a number of notes (or semitones) equal to 12.
Let's inventory these notes, and give names to intervals (table below).
Name of the intervals between the root note and the eleven other notes, together with the pitch ratios..
The astute reader will notice that the pitch ratios in the table do not correspond to the notes that we created by stacking fifths on top of each other. The ratios shown are called "5-limit tuning" and correspond to the actual harmonics – it gives a better idea of what sound intervals are more harmonious than others, but it’s impractical to use if you want to play with many instruments that have different ranges, and can play in different keys.
If you created the notes through the circle of fifths, the major third would have a ratio of 81:64 and the minor third a ratio of 32/27 (this is called Pythagorean tuning), whereas the harmonics produce ratios of 5:4 and 6:5, respectively. So while the sequence of fifths provides a good explanation for why we have 12 notes, it’s not a great way to actually tune your instrument. It caused all sorts of issues with organs that needed to be in-tune over many octaves.
Choosing a tuning technique is an art that involves compromises, and the closest we have to a universal standard way to define these intervals is 12-TET or 12 tone equal temperament where each semitone is a ratio of the twelfth root of 2 (~1.05946). The idea to remember here is that some intervals are more "harmonious" than others, and you can derive “harmoniousness” from the pitch ratios. A ratio of 2:1 produces a note that is barely distinguishable from a psychoacoustic standpoint, so we give it the same name. A ratio of 3:2 is somewhat distinguishable, but quite harmonious, so it’s the basis for our 12-note sequence. The most dissonant interval in our sequence is augmented fourth/diminished fifth (the dreaded "tritone" that cuts the scale in half) which has a pitch ratio of 25:18 at best (but its theoretical value is the square root of two). For the purpose of this article, we don’t need to understand all the math, we will just use the fact that fourths and fifths are perfect, that the tritone is most dissonant, and there are twelve equal semitones in the scale. The "equal semitone" statement is only true in an equal temperament, where fourth and fifths aren't that perfect (they're off by 1.96 cents each, where a cent is one hundredth of the pitch ratio of an equal-temperament semitone). But I won't discuss pitch ratios any more – there are outstanding resources on this topic elsewhere.
Side note on harmonics: the particular blend of harmonics (how strong each particular harmonic is relative to the others) is what gives every instrument its timber. This is also how your brain is able to identify vowels. Your brain is so intent on analyzing harmonics that it will psychoacoustically create ghost root notes. For example, if you play a C and a G above, you will sometimes "hear" another C an octave below, because these are the first two harmonics of that low note. This is the concept behind throat singing.
Next post: The scales