Phonetics and Phonology

Phonetics studies the physical reality of speech sounds — how they are produced by the vocal tract, how they propagate as acoustic waves, and how the auditory system perceives them. Phonology studies the abstract sound systems that organize those physical events into the contrastive categories any given language exploits. The two fields share data but ask different questions: phonetics asks what is there?, phonology asks what counts as same or different in this language?

The Phonetics / Phonology Distinction

A pair of acoustic events can be phonetically distinct (measurably different in formants, duration, voice onset time) yet phonologically identical — the language treats them as the same sound. Conversely, two events that look nearly identical on a spectrogram can be phonologically contrastive if substituting one for the other changes the meaning of a word. English aspirated [pʰ] in pin and unaspirated [p] in spin are phonetically distinct but phonologically the same phoneme /p/ — they are allophones in complementary distribution. Hindi treats aspirated and unaspirated stops as distinct phonemes: pal “moment” vs phal “fruit.”

The diagnostic for phonemic status is the minimal pair: two words differing in exactly one segment, with different meanings. English pat / bat establishes /p/ and /b/ as separate phonemes; pat / spat does not establish a contrast between [pʰ] and [p] because no English word minimally contrasts them.

Articulatory Phonetics

Articulatory phonetics describes sound production in terms of the vocal tract — the supralaryngeal cavities (oral, nasal, pharyngeal) and the larynx with its vocal folds. Three independently controlled parameters generate most consonants: place of articulation (where in the vocal tract the airflow is constricted), manner of articulation (how it is constricted), and voicing (whether the vocal folds vibrate).

Places of Articulation (POA)

From front to back: bilabial (both lips, [p b m]), labiodental (lower lip + upper teeth, [f v]), dental (tongue tip + teeth, [θ ð] as in English thin, this), alveolar (tongue + alveolar ridge, [t d n s z l]), postalveolar (just behind the alveolar ridge, [ʃ ʒ tʃ dʒ] as in ship, measure, church, judge), retroflex (tongue tip curled back, [ʈ ɖ ɳ ʂ ʐ ɻ], characteristic of South Asian and some Australian languages), palatal (tongue body + hard palate, [c ɟ ɲ ʝ j]), velar (tongue body + soft palate, [k g ŋ x ɣ]), uvular (tongue back + uvula, [q ɢ ɴ χ ʁ], as in French r or Arabic qaf), pharyngeal (root of tongue + pharyngeal wall, [ħ ʕ], characteristic of Semitic languages), and glottal (vocal folds themselves as articulators, [ʔ h]).

Manners of Articulation (MOA)

  • Stop / plosive — complete closure followed by release: [p t k b d g ʔ]
  • Fricative — narrow constriction producing turbulent airflow: [f v θ ð s z ʃ ʒ x ɣ h]
  • Affricate — stop released into homorganic fricative: [tʃ dʒ ts dz]
  • Nasal — oral closure with velum lowered, air through nose: [m n ɲ ŋ ɴ]
  • Lateral — central closure with air over the sides of the tongue: [l ɭ ʎ ʟ]
  • Approximant — narrowing insufficient to cause turbulence: [j w ɹ ɻ ɰ]
  • Trill — multiple rapid closures: Spanish rr [r], French uvular [ʀ]
  • Tap / flap — single rapid contact: Spanish r [ɾ], American English intervocalic t in butter

Vowels

Vowels lack a constriction tight enough to cause turbulence; they are classified by tongue height (close / high, close-mid, open-mid, open / low), backness (front, central, back), and rounding of the lips. Daniel Jones (1881–1967) formalized the cardinal vowels — eight reference points defining the vowel space: [i e ɛ a ɑ ɔ o u] (primary cardinals) plus rounded / unrounded counterparts (secondary). Additional features include nasalization (French bon [bɔ̃]), length (Finnish tuli “fire” vs tuuli “wind”), and tongue-root advancement — Advanced Tongue Root [+ATR] vs Retracted Tongue Root [−RTR] — central to West African vowel harmony systems (Akan, Yoruba). Diphthongs combine two vowel qualities within a single syllable nucleus (English eye [aɪ], cow [aʊ], boy [ɔɪ]).

The International Phonetic Alphabet (IPA)

The IPA, maintained by the International Phonetic Association (founded 1886, alphabet first published 1888, last revised 2020), provides a unique symbol for every contrastive sound in human languages. The current chart contains 107 base letters, 52 diacritics (aspiration ʰ, nasalization ◌̃, palatalization ʲ, velarization ˠ, dental ◌̪, retracted ◌̠, voiceless ◌̥, etc.), 4 prosodic markers (primary stress ˈ, secondary stress ˌ, long ː, half-long ˑ), and tone diacritics spanning level (˥˦˧˨˩) and contour tones. The principle is one-symbol-per-phone: any speech sound a trained transcriber hears can be written unambiguously.

Acoustic Phonetics

Acoustic phonetics measures the speech signal itself — the pressure waveform a microphone records. The standard tool is Praat (Paul Boersma and David Weenink, University of Amsterdam, 1995–present), a free cross-platform application that displays waveforms, spectrograms, pitch tracks, intensity contours, and computes formants.

A spectrogram plots frequency (vertical, 0–5000 Hz typically displayed) against time (horizontal), with intensity coded as darkness. Vowels appear as dark horizontal bands — the formants F1, F2, F3, F4 — corresponding to resonances of the vocal tract. F1 inversely correlates with tongue height (low F1 ≈ 300 Hz for high vowels; high F1 ≈ 800 Hz for low vowels); F2 correlates with backness (high F2 ≈ 2300 Hz for front vowels; low F2 ≈ 900 Hz for back vowels). The classic Peterson-Barney 1952 measurements established adult male reference values: [i] ≈ F1 270, F2 2290; [ɑ] ≈ F1 730, F2 1090; [u] ≈ F1 300, F2 870.

Voice onset time (VOT, Lisker and Abramson 1964) measures the lag between the release of a stop consonant and the onset of vocal-fold vibration in the following vowel. Three categories cover most languages: prevoiced (negative VOT, voicing starts before release; Spanish b d g), short-lag (0–30 ms; English b d g, Spanish p t k), and long-lag / aspirated (>40 ms; English p t k, Hindi pʰ tʰ kʰ). Hindi is distinctive in using four-way VOT contrasts including breathy-voiced (murmured) stops [bʱ dʱ gʱ].

Other acoustic measures: fundamental frequency F0 (perceived as pitch; typical adult male ~120 Hz, female ~210 Hz, child ~300 Hz), harmonics-to-noise ratio (HNR), jitter and shimmer (cycle-to-cycle perturbation, raised in pathological voice), and spectral tilt (slope of spectrum, varies with phonation type — modal, breathy, creaky).

Auditory and Neural Perception

Speech reaches the cochlea, which performs a frequency analysis along the basilar membrane (Békésy Nobel 1961). The auditory nerve carries spike trains to the brainstem, midbrain, primary auditory cortex (Heschl’s gyrus), and higher-order regions including the superior temporal gyrus and the planum temporale. Categorical perception (Liberman, Cooper, Shankweiler, Studdert-Kennedy 1967) — listeners discriminate within-category acoustic differences poorly but discriminate cross-category differences sharply — is a hallmark of speech perception. The McGurk effect (McGurk and MacDonald 1976) shows that visual information from lip-reading reshapes auditory phoneme perception: audio [ba] dubbed onto video of [ga] is often heard as [da].

Phonemes, Allophones, Distinctive Features

A phoneme is a minimal contrastive sound unit of a language — a mental category that distinguishes words. Allophones are predictable phonetic variants of a phoneme conditioned by environment. English /t/ has allophones [tʰ] word-initial (top), [t] after /s/ (stop), [ɾ] intervocalic in American English (butter), [ʔ] before nasals (button), and [t̚] unreleased word-final (cat). The distribution is complementary: each allophone appears in environments where the others do not.

Distinctive features decompose phonemes into smaller phonetic primes. The first systematic feature theory — The Sound Pattern of English (Roman Jakobson, Gunnar Fant, Morris Halle 1952; expanded by Noam Chomsky and Morris Halle 1968) — proposed binary features: [±consonantal], [±sonorant], [±voice], [±nasal], [±continuant], [±strident], [±anterior], [±coronal], [±high], [±low], [±back], [±round], [±tense], etc. Each phoneme is a feature matrix. Natural classes — sets of segments sharing features — explain why phonological rules apply to coherent groups (e.g., all voiceless stops aspirate word-initially in English).

Phonological Rules and Frameworks

Classical generative phonology (SPE 1968) modeled phonology as ordered rewrite rules: A → B / C__D (“A becomes B between C and D”). For example, English plural allomorphy:

  • /z/ → [s] / [−voice]__# (devoicing after voiceless segment: cats [kæts])
  • /z/ → [əz] / [+strident]__# (epenthesis after sibilants: bushes [bʊʃəz])
  • /z/ → [z] elsewhere (dogs [dɒgz])

Autosegmental phonology (John Goldsmith 1976 PhD thesis) split the phonological representation into parallel tiers — segmental, tonal, prosodic — linked by association lines. Tone in particular floats free of segments: in many Bantu languages a single H tone can spread across multiple syllables, or a falling tone can be analyzed as HL on one vowel. Autosegmental representations replaced the awkward feature [±high tone] of SPE.

Feature geometry (George Clements 1985, Elizabeth Sagey 1986) organized features into a tree of class nodes — root → laryngeal, supralaryngeal → place → labial, coronal, dorsal — capturing the empirical fact that some features pattern together in assimilation while others do not.

Optimality Theory (Alan Prince and Paul Smolensky 1993/2004 Optimality Theory: Constraint Interaction in Generative Grammar) replaced rules with violable, ranked constraints. The architecture has three components: GEN generates candidate outputs from an input; EVAL evaluates candidates against the language’s ranked constraints; the optimal candidate (fewest violations of highest-ranked constraints) surfaces. Constraints divide into markedness (penalize structurally costly outputs: NoCoda, *VoicedObstruentCoda, OnsetRequired) and faithfulness (penalize departures from input: Max, Dep, Ident). Cross-linguistic variation reduces to constraint reranking. OT dominated phonological theory through the 2000s; variants include Harmonic Grammar (weighted constraints) and Stratal OT.

Markedness

A segment or structure is marked if it is rarer cross-linguistically, acquired later, lost first in language attrition, neutralized in weak positions, or implied by others (an inventory with /y/ implies /i/, /u/). Voiceless sonorants are marked relative to voiced; ejectives and clicks are marked relative to pulmonic egressive stops; CCC-onset syllables are marked relative to CV. Markedness asymmetries motivate both inventory typology (Lindblom 1986 dispersion theory) and constraint rankings in OT.

Syllable Structure, Sonority, Phonotactics

A syllable typically has structure (Onset)(Nucleus)(Coda), with the nucleus the obligatory peak (usually a vowel; sometimes a sonorant consonant — English bottle [bɒtl̩], Czech vlk “wolf”). Onset is preferred and often required (Arabic, Hawaiian); coda is often disfavored cross-linguistically — many languages (Mandarin, Polynesian) allow only sonorant codas or none at all.

The sonority hierarchy orders segments by inherent loudness: vowels > glides > liquids > nasals > fricatives > affricates > stops. The sonority sequencing principle holds that syllables rise in sonority toward the nucleus and fall away from it — explaining why prim [pɹɪm] is a possible English word but rpim is not.

Phonotactics are the language-specific constraints on permissible sequences. English allows [stɹ] onset (string) but not [tn] or [pf]; German allows [pf] (Pfeffer); Russian allows [vstr] (встретить vstretit’ “to meet”); Hawaiian allows neither consonant clusters nor codas, so loanwords adapt aggressively (Merry ChristmasMele Kalikimaka).

Stress, Meter, Intonation

Stress is relative prominence of one syllable over others, realized by some combination of pitch, duration, intensity, and vowel quality. Languages divide into fixed-stress (Finnish initial, French final, Polish penultimate, Czech initial) and variable / lexical stress (English, Russian, Spanish). Metrical phonology (Liberman and Prince 1977, Hayes 1995 Metrical Stress Theory) builds binary feet (trochaic SW, iambic WS) and parses syllables into prosodic words and phrases.

Intonation — the pitch contour over an utterance — conveys phrasing, focus, sentence type, and affect. The Autosegmental-Metrical framework (Pierrehumbert 1980 MIT PhD) and the ToBI annotation system (Tones and Break Indices, Silverman et al. 1992) describe intonation as sequences of pitch accents (H*, L*, L+H*, H+!H*) and boundary tones (L%, H%) linked to a prosodic hierarchy.

Tone

Roughly half of the world’s languages use lexical tone — pitch differences that distinguish words. Mandarin Chinese has four contrasting tones plus a neutral tone: tone 1 high level (mā 媽 “mother”), tone 2 mid-rising (má 麻 “hemp”), tone 3 low-dipping (mǎ 馬 “horse”), tone 4 high-falling (mà 罵 “scold”). Cantonese has six contour tones; Vietnamese has six tones distinguished partly by phonation type (creaky in tone ngã, breathy in huyền); Hmong has up to eight tones. Tone systems divide into register (level tone heights, e.g., Bantu) and contour (tonal shapes, e.g., East Asian). Tonogenesis — the development of lexical tone — typically arises from the loss of voicing, glottalization, or coda consonants that previously perturbed pitch (Haudricourt 1954 on Vietnamese; Hombert, Ohala, Ewan 1979 general theory).

Vowel and Consonant Harmony

Vowel harmony requires vowels within a word to share some feature. Turkish has both backness harmony (front/back) and rounding harmony: ev “house” → ev-ler-im-iz-de “in our houses” (all front, unrounded); kol “arm” → kol-lar-ım-ız-da “in our arms” (all back, with rounding harmony only on high vowels). Finnish and Hungarian have backness harmony with neutral vowels (i, e in Finnish). Akan (Ghana) and Maasai (Kenya/Tanzania) have ATR harmony: words are entirely [+ATR] or entirely [−ATR]. Consonant harmony (long-distance assimilation of consonants for place or other features) is rarer but attested in Navajo (sibilant harmony) and Chumash.

Reduplication

Reduplication — repetition of part or all of a base — is productive in many languages: Malay orang “person” → orang-orang “people”; Tagalog basa “read” → bumasa “to read” (infix) → ba-basa “will read” (CV-reduplication for future); Salishan languages use multiple reduplication patterns for diminutive, plural, distributive, and iterative. Theoretical treatment moved from rule-based template-filling (Marantz 1982) to correspondence-theoretic OT (McCarthy and Prince 1995 Faithfulness and Reduplicative Identity).

Historical Sound Change

Phonology connects to history through systematic sound change. Grimm’s Law (Jacob Grimm 1822, building on Rasmus Rask 1818) describes the Proto-Indo-European to Proto-Germanic stop shift: PIE voiceless stops became Germanic voiceless fricatives (PIE *p t k → Gmc f θ h, hence Latin pater, Sanskrit pitár, but English father); PIE voiced stops became Germanic voiceless stops (*d → t: Latin decem, English ten); PIE voiced aspirates became Germanic voiced stops or fricatives (*bʰ → b: Sanskrit bhárāmi, Latin ferō, English bear).

Verner’s Law (Karl Verner 1875) resolved residue from Grimm’s Law: PIE voiceless stops became voiced fricatives in Germanic when the preceding syllable was unstressed (in PIE accent). This explains alternations like German war / waren vs older was / wären patterns and English was / were.

Other classic sound laws: Bartholomae’s Law (Indo-Iranian aspirate clusters), Grassmann’s Law (Greek and Sanskrit dissimilation of aspirates), the High German Consonant Shift (~500–700 CE; English tooth, pepper, make : German Zahn, Pfeffer, machen), the Great Vowel Shift of English (~1400–1700: Middle English [iː uː] diphthongized while [eː oː aː] raised, explaining the mismatch between English spelling and pronunciation).

Comparative Method and Proto-Languages

Comparative reconstruction begins with cognate sets — words in related languages of common origin — and identifies regular sound correspondences. From Latin pater, Sanskrit pitár, Greek patḗr, Old Irish athair, Gothic fadar we reconstruct Proto-Indo-European *ph₂tēr “father.” The Neogrammarian hypothesis (Junggrammatiker, Leipzig 1870s — Karl Brugmann, August Leskien, Hermann Osthoff) held that sound change is exceptionless — given the same conditioning environment, every word undergoes the change. Apparent exceptions reduce to dialect borrowing, analogy, or unrecognized conditioning. The principle made comparative reconstruction rigorous and remains a cornerstone, though it has been refined by Labov’s empirical work on change in progress (lexical diffusion in certain change types).

Analysis Tools

  • Praat (Boersma and Weenink) — the canonical acoustic and articulatory analysis suite; scripting in Praat scripting language
  • Phon — phonological transcription database, originally for child language
  • R phonR package — vowel normalization, plotting, formant analysis
  • ELAN (Max Planck Nijmegen) — multimedia annotation for fieldwork
  • EMU-SDMS — speech database management
  • PsychoPy and OpenSesame — perception experiment design

Computational and Cognitive Phonology

Computational phonology applies finite-state methods (Kaplan and Kay 1994) to phonological rules — modeling the SPE-style derivation as a composition of finite-state transducers, which enables efficient parsing and generation. Cognitive / usage-based phonology (Joan Bybee 2001 Phonology and Language Use, 2010 Language, Usage and Cognition) rejects the abstract phoneme as the primary unit, arguing instead that phonological knowledge consists of exemplar clouds and emergent generalizations from frequency-weighted experience. High-frequency words show sound change earlier; phonetic detail is stored, not abstracted away.

Adjacent