4.1: An Introduction to Music Analysis with Computer
Andrew R. Brown
Note: A revised version of this article was published in 2007 as a chapter in the book "Computers In Music Education: Amplifying Musiciality".
The computer is widely used as an analytical tool. Most obviously in business where spreadsheets are routinely used to track and forecast financial and inventory details. While there is are a small group of dedicated musicologists working with computing tools for musical analysis, the computer is more commonly used by musicians for sequencing, recording, or publishing. Musical analysis can be aided by the computer using those same sequencing, recording and publishing tools to capture and examine music in familiar notations. Statistically analyse music can be achieved by converting music to numbers and utilising the comprehensive resources of spreadsheets. Specialized programs music for analysis do exist, some of which will be mentioned below. They perform specific functions directly related to musicological theories. This article introduces the issues and tools of musical analysis with computers, exploring the potential and sign-posting the pitfalls.
Analysis of what, and how?
Analysing music is generally a search for features. What key is the piece in? Where does it modulate? How and when does the tempo change? One of the most significant features searches involves finding patterns. Which pitches regularly coincide? When does a phrase reappear? What are the most common rhythms? In order to make features like these apparent, a variety of strategies can be used in analysis.
A common technique is reduction; limiting notes to a pitch class set, for example, or consolidating phrases into larger forms, or dissolving surface features into higher order structures&emdash;as in Schenkerian analysis. Another technique of analysis is to change the mode of representation; from audible to visual, as in transcription, or from one visual notation to another, such as changing note duration to a 'piano roll' display.
Music analysis generally focuses on some aspect(s) of music while ignoring others. The isolating of an element can make particular features come to light which might otherwise be difficult to observe in the full context. This process is similar to creating a map. Maps can show different features depending upon their function, which might be geographic, political, or climatic. Maps might indicate population centres, transportation routes, or not. In the same way, a musical analysis is a map of the music, which hi lights some features while it obscures others. A musical analysis might focus on any one of, structure, rhythm, timbre, texture, melody, tonality, harmony, dynamics, expression, interpretation, spacialisation, context, text, or any other feature which the investigator considers important.
The significance of particular elements, or features, will not always be obvious. Choosing elements to examine is easier if we know the precisely what is sought. If looking for structural boundaries, then cadences or rhythmic aspects would probably be appropriate. Often the analysis question is more vague, such as seeking to uncover the composer's processes, or perhaps even the composers intention. Perhaps it seeks to uncover what defines a style or performance practice. In these cases a systematic exploration of several aspects of the music is likely. Most one of the most obscure, but perhaps most profound, cases might be an analysis which searches to reveal aspects of a piece which make it meaningful. In this realm the significance of elements is not at all obvious, and non-sonic elements such as social context become quite influential.
Can the machine 'listen'?
When people analyse they look, listen, think, discuss, and describe. These are not attributes we normally ascribe to computers, but one way or another, if the computer is to help in analysis it needs have musical input, to process that data, and to communicate the results. The computer's equivalent of hearing is recording, or sampling. When these recordings are converted into amplitude and frequency analysis over time they tell us a great deal about the sound we are hearing. However, much analysis of music is done at a symbolic, or iconic, level where we talk about notes, phrases, parts, and voices. It is currently awkward for computers to extrapolate common music notation (CMN) from a recording, as a result scores are often entered in various forms of code with similar properties to CMN, such as the Musical Instrument Digital Interface (MIDI) description.
The computer's equivalent of music reading is to have the music coded into a notational scheme. Considerations for such a scheme include readability, coherence with perceptual categorisation, and integration with historical analytic techniques. Another important consideration is that with any coding scheme we loose detail. As Robert Rowe points out,
The transformations of musical information made by listening systems are not lossless. In other words, the abstractions made do not produce a representation that can be transformed back into the original signal, as, for example, a Fourier analysis can be changed back into sound pressure wave. The abstraction of MIDI already throws away a good deal of timbral information about the music it represents. (Rowe 1993:120)
The computer is a medium of simulation. One of the most powerful aspects of the computer is its ability to model; to be like paper, like film, like a record, and so on. This notion underpins why the spreadsheet is a powerful tool for business, because it can model the financial and inventory aspects of the business, both actual and projected. While computers can offer the same power of modelling to the music analyst as they do to the business person, numbers in a spreadsheet correspond to a business's financial position more closely than numbers correspond to music. The blocks of construction are not always suitable as blocks of perception, and vice versa. A MIDI sequence only approximates the performance nuance that generated it, ignoring dynamic changes within a note's duration for example, and common music notation (CMN) is even more abstract.
Encoding the music for entry into the computer is, technically speaking, digitising&emdash;although this term usually refers to the sampling of an audio signal. Data entry is normally at a more abstract level, for example, music can be represented by alpha-numeric codes, by arranging visual symbols, or writing nested 'sentences' of symbols to represent structural relationships. In this section of the article, music representations used for music analysis are presented as being iconic, symbolic, grammatical, or metaphoric.
The limitation of musical encoding for computer analysis is not the only obstacle of computer assisted analysis. Music is more than the sum of its parts, so even if all its features could be captured, the analysis of emotive aspects such as tension, excitement, embarrassment, and so on are still problematic
Computer Assisted Analysis
Common music notation is an iconic system. Notational marks indicate both audible (notes, accents) and non-sounding (barlines, key signatures) aspects of music. A great deal of music analysis deals with the score, so for the computer the score needs to be encoded. Once encoded, the score can be processed by the computer. Two common coding schemes are MIDI and DARMS.
MIDI (Musical Instrument Digital Interface) was designed for performance and deals with music as a series of events. In MIDI, notes have a beginning (note-on) message which indicates the channel (1-16), pitch number in semitones (1-127, 60 is middle C), and dynamic (1-127). The note-off message is similar except the velocity is zero, indicating that the note should stop. A stream of MIDI messages, each a group of numbers, represents the notes in a piece. There is no time information inherent in the MIDI messages, so the computer program must time-stamp the arrival of each message to determine the duration and spacing of notes. Other events, such as pitch bend or volume change, have their own messages.
DARMS (Digital Alternative Representation of Music Scores) is one of the oldest digital music codes, and is widely used for computing in musicology. As its name implies DARMS was designed to describe the visual appearance of a score. It has codes indicating beams, stem lengths, clefs, and so on. While at first this may seem excessive for analysis, there are times when visual details are important to the interpretation of a score. For example, the spelling of an accidental can make a change of key more evident. DARMS was designed for computer keyboard input, and employs a library of alpha-numeric symbols for each element of the score, !G (treble clef), !F (bass clef), !K2- (a key signature with two flats) and so on. Notes are described by there position on a stave. The bottom line of a stave is position 1, the space above that 2, the next line 3, and so on. The space-code for middle C on a treble clef is -1. The duration of notes is indicated by W (whole note), H (half note), and so on.
Most publishing and sequencing software programs can be used for score analysis. Sequencing and scoring programs enable the score to be heard. They allow aspects of a piece to be isolated by muting and soloing parts, and for alterations to be experimented with. However, the searching and finding capabilities are usually quite limited, and generally there will be little advantage over a printed score. Two programs which are intended for analysis of coded scores and are Humdrum and SARA.
David Huron's Humdrum program is one of the most widely used computing tools for score analysis. This is because it is very flexible and open, however, the process of encoding the score can be time consuming it is not one of about 9,000 already coded works available from the Centre for Computer Assisted Research in the Humanities. Conversion between MIDI files and Humdrum are possible. Humdrum is divided into two parts, 1) the Humdrum syntax which defines how musical data is to be organised and, 2) the Humdrum toolkit of software functions for analysing the data. The toolkit functions include pattern matching, user-defined similarity checking, and statistical measurement of any coded attribute.
Symbols imply a relationship between two objects. Symbols can be diagrammatic, like the guitar chord diagrams found on sheet music showing the finger position on the frets. Visual representation of sound is commonly used for timbral and dynamic analysis. The conventional waveform diagram often shown on computers simply translates each digit (sample) of a recorded (sampled) sound to a position on a graph. The waveform image rises and falls reflecting the sound's change in air pressure over time.
Statistical features of music can be usefully visualised using graphs. Graphs are useful in showing trends, such as potential harmonic tendencies by displaying the occurrence of pitch classes.
Graphical symbols can be used to display pieces which may not lend themselves to conventional notation, in score-like fashion. A significant example of this usage is the mapping of sound-objects in Musique Concrète and electronic music. In such representations each visual symbol corresponds to a sound, and visual manipulations of the symbol, such as stretching or splicing, indicate similar changes to the sound object. This method was used by Pierre Schaeffer (1966) to articulate his theories of sonemes (sound objects).
The analysis of sampled audio is becoming common for electroacoustic music and ethnomusicology where there are no 'scores.' Waveforms provide overall amplitude and timbre information, and can be used to calculate timing data through comparison of onset peaks. Many programs (even basic sequencers) display audio waveforms.
More detailed timbral analysis is possible using Fourier analysis which identifies the frequency of each harmonic and their amplitude over time. These are often displayed as 2D or 3D graphs or sonograms that can allow detailed comparison of instrument, ensemble, or architectural characteristics. A number of inexpensive audio editors, such as SoundEdit, support Fourier analysis, and AudioSculpt from Ircam has a very clear sonogram display. Spectral composers make use of Fourier analysis to generate pitch (frequency) sets for their compositions. This enables them to use a sound as a pitch source as well as a timbral component.
McAulay-Quatieri (MQ) analysis is a robust general sinusoidal analysis technique. Not unlike Fourier analysis, MQ analysis isolates the overtones in a sound however it usually displays them as horizontal bands over time. MQ analysis is often used to analyse sounds with a complex harmonic structure, or where multiple sound sources need to be differentiated. A number of audio processing toolkits, such as Lemur, implement MQ analysis.
Rather than examining the sounds themselves, it is often interesting to analyse the structural organisation of those sounds. Musical structures can be seen as similar to language structure. A music grammar describes rules for how music is commonly structured, and analysts apply those rules to pieces to uncover their structure. Such analysis is generally represented as a branching hierarchy, like a family tree. Two grammar-like analysis theories are Schenerkian analysis and the Generative Theory of Tonal Music (GTTM).
Schenkerian analysis is based on the work of Heinrich Schenker who viewed music as having layers of structure; the foreground in which all notes and ornaments are considered, the middle ground containing just the harmonically pivotal notes, and the background where any 'well composed' tonal piece was reducible to one of three patterns based on the tonic scale and triad. The process of analysis is a reductive one of uncovering fundamental structure (Ursatz). A Schenkerian analysis is visually depicted with a score of each layer drawn one under another, each showing only the notes relevant to that layer.
GTTM is, according to its developers, designed "to specify a structural description for any tonal piece; that is, the structure that the experienced listener infers in his hearing of the piece" (Lerdahl & Jackendoff, 1983:112). The theory is designed to infer the kinds of structures that we hear as listeners, rather than purely statistical features. There are four sets of rules, metrical and grouping rules to analyse rhythmic features, and time-span and prolongational rules to examine interaction between pitch and rhythm. The result is a multilevelled reduction of score detail through the rejection of all notes except those at structurally significant points in the score.
David Cope's EMI (Experiments in Musical Intelligence) program is famous for analysing and reconstructing music in the style of various composers. Cope has made available a cut-down version called SARA&emdash;available on the CD ROM of his book 'Experiments in Musical Intelligence.' SARA, like Humdrum, requires the score to be coded into a particular format. Each note entry contains a list including start-time, pitch (MIDI number), duration, channel number (part), and dynamic. SARA performs pattern matching processes on the data which does layer analysis not dissimilar to Schenkerian structural reductions. SARA identifies features which characterise the music which Cope calls 'signatures.' The user can specify the pattern matching latitude of various parameters.
Another, perhaps less academic, method of analysis is the description of music through analogy, or metaphor. Music can be considered to 'flow like a river', 'spiral downwards like a staircase', and so on. Music can be described as 'mechanical', 'organic', 'spacious', or 'dense'. While these terms might describe individual elements of the music, they are more likely to relate to the way several elements work together, or how the music is interpreted at an emotive, rather than functional level.
Even from this brief introduction, it is clear that the computer can contribute in many ways to musical analysis. With tools ranging from the flexible audiation of scores in sequencers, to detailed statistical analysis in Humdrum, computers can help investigators at any level of sophistication. Weather analysis isolates at the elements of music, or seeks to understand music in context, weather it is seen as purely theoretical or an extension of music practice, the careful study of music is part of that reflective process which creates understanding and leading to meaning. Computers have an increasingly important place in that reflective analysis, but do not change the fact that through our investigations we know more, and the more we see is still unknown. As we grasp the music with our analytical tools, like snow, it disintegrates and melts through our fingers. But computers do help increase our grasp.
Centre for Computer Assisted Research in the Humanities. Braun Music centre, Stanford University, CA 94305-3076. http://ccrma-www.stanford.edu/CCARH
Lerdahl, F., and R. Jackendoff. 1983. A Generative Theory of Tonal Music. Cambridge, MA: MIT Press
Rowe, R. 1993. Interactive Music Systems: Machine Listening and Composing. Cambridge, MA: MIT Press
Schaeffer, P. 1966. Traité des objects musicaux. Paris: Seuil.