Syntax and Grammar

Syntax studies how words combine into phrases and sentences — the structural rules and principles that determine grammaticality, constituency, and the mapping between form and meaning. Modern syntax began with Noam Chomsky’s Syntactic Structures (1957) and has since branched into competing formal frameworks (Minimalism, HPSG, LFG, CCG, Construction Grammar, Dependency Grammar) and functional-typological traditions (Givón, Comrie, Dryer). All share a commitment to explicit, falsifiable models of grammatical knowledge.

Descriptive vs Prescriptive Grammar

Linguistic syntax is descriptive: it documents what speakers actually do, what counts as a well-formed sentence in their internalized grammar. Prescriptive grammar — the school-grammar tradition condemning split infinitives, stranded prepositions, double negatives, ain’t, etc. — is a sociopolitical artifact, not a description of linguistic structure. Double negation, for instance, is grammatically systematic in many varieties (African American English, Spanish, French, Russian) and ungrammatical only in standardized varieties that have adopted single-negation as a norm.

Constituency

A constituent is a group of words that functions as a unit in the grammar. Constituency is established by diagnostic tests rather than by intuition alone:

  • Substitution — can the string be replaced by a single pro-form? The tall man with a hathe; if substitution preserves grammaticality and meaning, the string is a constituent.
  • Movement — can the string be fronted, clefted, or otherwise displaced as a unit? It was [the tall man with a hat] that I saw.
  • Coordination — can the string be coordinated with a like string? I saw [the tall man with a hat] and [the short woman with a cane].
  • Cleft / pseudocleftWhat I saw was [the tall man with a hat].
  • Answer to a questionWho did you see? — [The tall man with a hat].
  • Sentence fragment — natural standalone in dialogue.
  • Ellipsis — VP ellipsis John ate the cake and Mary did too (the elided constituent is the VP ate the cake).

Phrase Structure Rules and X-Bar Theory

Early generative grammar (Chomsky 1957) wrote phrase structure rules in context-free grammar (CFG) form: S → NP VP; NP → (Det) (AdjP) N (PP); VP → V (NP) (PP). Cross-linguistic generalization led to X-bar theory (Ray Jackendoff 1977 X-bar Syntax: A Study of Phrase Structure), positing that all phrases share a uniform schema:

  • XP → (Specifier) X’
  • X’ → X’ (Adjunct) or (Adjunct) X’
  • X’ → X (Complement)

Every phrase is headed (NP by N, VP by V, PP by P, AP by A, and crucially the clause IP/TP headed by Inflection/Tense, CP by Complementizer). The schema unified syntactic generalizations across categories and across languages: the difference between SOV and SVO reduces to the head-final vs head-initial setting of the complement-to-head linear order.

Generative Grammar — The Chomskyan Tradition

Noam Chomsky’s program is a sequence of increasingly austere formal systems aiming to characterize the human language faculty.

Syntactic Structures (1957)

Chomsky argued that finite-state grammars and even context-free grammars cannot capture certain syntactic patterns of natural language; transformational rules are required. He posited a base of phrase structure rules generating deep structures, with transformations (passive, question formation, negation, affix-hopping) mapping deep to surface structures.

Aspects of the Theory of Syntax (1965)

The “Standard Theory” added a lexicon, subcategorization frames, and the competence / performance distinction. Speakers’ tacit knowledge (competence) is the object of study; actual production (performance) is contaminated by memory, attention, error.

Government and Binding (1981)

Lectures on Government and Binding (Chomsky 1981) replaced rule-by-rule transformations with a small set of general principles plus parameters. The architecture comprised modules:

  • X-bar theory — phrase structure
  • Theta theory — assignment of thematic / semantic roles (agent, patient, experiencer, instrument, theme, goal, source, etc.) by predicates to arguments. The theta criterion: each argument bears exactly one theta-role; each theta-role is assigned to exactly one argument.
  • Case theory — every NP requires Case (nominative, accusative, etc.) assigned by a Case-assigner (finite I assigns nominative to specifier; V assigns accusative; P assigns oblique). The Case Filter: every overt NP must have Case.
  • Binding theory — Principle A: an anaphor (himself, each other) must be bound in its local domain. Principle B: a pronoun (him, her) must be free in its local domain. Principle C: an R-expression (a name, a definite description) must be free everywhere.
  • Bounding theory / Subjacency — movement cannot cross more than one bounding node (originally NP and S; refined to phase boundaries in Minimalism).
  • Control theory — interpretation of PRO (silent subject of nonfinite clauses): John tried [PRO to leave] — PRO is controlled by John.
  • ECP (Empty Category Principle) — empty categories (traces) must be properly governed.

Movement, Traces, and Locality

Movement displaces a constituent from its base position to a higher position, leaving a trace (later: a copy). Types include head movement (V-to-T, T-to-C: French finite verbs move to T; English finite auxiliaries move to C in inversion), A-movement (NP-movement to a Case position: passive, raising), and A’-movement (wh-movement, topicalization, focus).

John Robert Ross’s MIT dissertation Constraints on Variables in Syntax (1967) identified island constraints — environments from which movement is blocked: the Complex NP Constraint (no extraction from a relative clause or noun complement clause), the Coordinate Structure Constraint (no extraction from one conjunct), the Wh-island (no extraction from an indirect question), the Subject Condition (limited extraction from subjects). Luigi Rizzi’s Relativized Minimality (1990 Relativized Minimality) unified many of these as: an element of type X cannot move past another element of the same type.

Principles and Parameters

The “Principles and Parameters” framework cast cross-linguistic variation as a finite set of binary choices in otherwise universal principles. Examples:

  • Pro-drop parameter — Spanish, Italian, Greek allow null subjects (hablo “[I] speak”); English and French do not.
  • Head-directionality parameter — head-initial (English, Romance, Bantu) vs head-final (Japanese, Turkish, Hindi).
  • V2 parameter — German, Dutch, Scandinavian require a finite verb in second position of main clauses.
  • Wh-movement parameter — overt (English, German) vs in-situ (Chinese, Japanese).

Minimalist Program (1995–2020+)

The Minimalist Program (Chomsky The Minimalist Program 1995; Derivation by Phase 2001; Three Factors 2005; Problems of Projection 2013) sought the simplest theory that meets the interface requirements with sound (PF, phonological form) and meaning (LF, logical form). Core operations:

  • Merge — combines two syntactic objects into a new object. External Merge introduces lexical items; Internal Merge is movement (re-merging an already-merged element higher).
  • Move / Internal Merge — driven by feature checking.
  • Agree — a probe seeks a matching goal within its c-command domain; uninterpretable features are valued and deleted.
  • Phases — derivation proceeds in chunks (vP, CP); once a phase is complete, its complement is sent to the interfaces (Phase Impenetrability Condition).
  • Labeling — what determines the category of a merged constituent (Chomsky 2013, 2015).

Cartographic Approach

Parallel to mainstream Minimalism, the Cartographic Project (Luigi Rizzi, Guglielmo Cinque, Adriana Belletti; Università di Siena and University of Venice 1990s–2010s) elaborated a richly articulated left-periphery and IP-domain. Rizzi 1997 The Fine Structure of the Left Periphery split CP into ForceP > TopP*> FocP > TopP* > FinP. Cinque 1999 Adverbs and Functional Heads posited ~40 functional heads in fixed cross-linguistic order, hosting adverbial classes.

Alternative Generative Frameworks

Combinatory Categorial Grammar (CCG)

Mark Steedman The Syntactic Process (2000) developed Categorial Grammar (Adjukiewicz 1935, Lambek 1958) with combinatory rules. Each lexical item carries a syntactic category — primitive (S, N, NP) or functional (X/Y, X\Y) — and combination proceeds by forward application (X/Y Y → X), backward application (Y X\Y → X), type-raising, and composition. CCG handles coordination and extraction elegantly via type-raising and composition, and supports surface-compositional semantics. Widely used in computational parsing (CCGbank, EasyCCG).

Head-driven Phrase Structure Grammar (HPSG)

Carl Pollard and Ivan Sag Head-Driven Phrase Structure Grammar (1994; foundational papers 1980s) developed a constraint-based, monostratal framework with no movement. Information is represented in typed feature structures (Attribute-Value Matrices). The Head Feature Principle ensures heads share features with mothers; subcategorization is encoded in a SUBCAT list discharged as arguments combine; Slash features encode extraction nonlocally without movement. Implementations include LKB, TRALE, MERGE.

Lexical Functional Grammar (LFG)

Joan Bresnan and Ron Kaplan (founding paper 1982 “Lexical-Functional Grammar: A Formal System for Grammatical Representation”; Bresnan 2001 Lexical-Functional Syntax) split representation into constituent structure (c-structure, phrase tree) and functional structure (f-structure, an attribute-value matrix encoding subject, object, tense, etc.). The two are linked by structural correspondence functions. LFG accommodates non-configurational languages (Warlpiri, Latin) more naturally than configurational frameworks. Computational implementation: XLE (Xerox Linguistic Environment), ParGram project.

Construction Grammar

Adele Goldberg Constructions: A Construction Grammar Approach to Argument Structure (1995) and Charles Fillmore’s earlier work on frames developed a usage-based framework in which constructions — form-meaning pairings of any size, from morphemes to argument-structure templates to idioms — are the primary units. The English ditransitive construction Subj V Obj1 Obj2 has its own meaning of caused transfer, independent of any particular verb: Sue sneezed the napkin off the table (resultative construction borrowing the caused-motion template). Constructional approaches reject the strict lexicon-grammar divide and emphasize storage of patterns at all levels of generality.

Dependency Grammar

Lucien Tesnière’s posthumous Éléments de syntaxe structurale (1959) developed grammar in terms of binary dependency relations between a head and its dependents — no phrase structure nodes. A verb is the structural center; subjects, objects, and adjuncts depend on it. Dependency representations are widely used in computational linguistics for their simplicity and direct semantic correspondence. The Universal Dependencies project (Joakim Nivre and collaborators, since 2014) provides a single dependency annotation scheme covering 100+ languages, supporting cross-linguistically consistent treebanks (UD 2.13 in 2024 covers 148 languages, 287 treebanks).

Tree-Adjoining Grammar (TAG)

Aravind Joshi (Penn, 1975) developed TAG as a mildly context-sensitive formalism. Elementary trees (initial and auxiliary) are combined by substitution and adjoining. Lexicalized TAG (LTAG) associates each lexical item with elementary trees expressing its full syntactic potential. TAG generates languages slightly more powerful than CFG, capturing cross-serial dependencies (Swiss German, Dutch).

Role and Reference Grammar (RRG)

Robert Van Valin and William Foley Functional Syntax and Universal Grammar (1984), Van Valin Exploring the Syntax-Semantics Interface (2005) — a functional-typological framework using a clause structure of nucleus, core, periphery, and a layered description of operators (aspect, modality, tense, evidentials, illocutionary force). RRG aims to handle non-configurational languages on equal footing with English-type languages.

Functional and Typological Approaches

T. Givón Functionalism and Grammar (1995), Bernard Comrie Language Universals and Linguistic Typology (1981/1989), Matthew Dryer’s contributions to the World Atlas of Language Structures, William Croft Radical Construction Grammar (2001) develop syntax with primary attention to discourse function and cross-linguistic patterns rather than formal abstract universals.

Clause Types and Information Structure

Major clause types include declarative, interrogative (polar / wh-), imperative, and exclamative; each is associated with a sentence force / illocutionary force. Topicalization fronts a topic (That book, I haven’t read); focus marks new or contrastive information (cleft: It is JOHN who left; left-dislocation John, he left). The theme-rheme or topic-comment distinction underlies much information packaging.

Agreement, Case, Alignment

Agreement is the morphological covariation of one element with another’s features. Subject-verb agreement (English she walks), gender agreement on adjectives (Spanish el coche rojo, la casa roja), noun-classifier agreement (Bantu noun classes).

Case marks the grammatical relation of an NP. Nominative-accusative alignment (most European, Indo-Iranian languages) groups the subject of intransitives (S) with the agent of transitives (A) — both nominative — and contrasts with the object (P / patient) — accusative. Ergative-absolutive alignment (Basque, Georgian, Hindi-Urdu in perfective aspect, most Australian languages, Mayan, Inuktitut) groups S with P — both absolutive — and contrasts with A — ergative. Split-ergativity is common, conditioned by tense/aspect (Hindi: nominative-accusative in imperfective, ergative-absolutive in perfective) or by NP type (Dyirbal: pronouns nominative-accusative, full NPs ergative-absolutive). Active-stative alignment splits S into agent-like S_A and patient-like S_O (Eastern Pomo, Acehnese, Lakhota). Tripartite (S, A, P all distinct) is rare (Nez Perce, Pitta-Pitta).

Word Order Typology

Joseph Greenberg’s classic 1963 paper Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements proposed 17 implicational universals based on a 30-language sample. Headline findings: of the six logically possible orders of S, O, V, the empirical distribution is highly skewed:

  • SOV ~45% (Japanese, Turkish, Hindi, Korean, Latin, Quechua, Basque)
  • SVO ~42% (English, Mandarin, Russian, Yoruba, Indonesian)
  • VSO ~9% (Classical Arabic, Irish, Welsh, Tagalog, Maori)
  • VOS ~2% (Malagasy, Fijian)
  • OVS ~1% (Hixkaryana — Carib family)
  • OSV <1% (Warao, a few Amazonian languages)

Strong implicational correlations: SOV languages tend to have postpositions, genitive-noun order, relative-clause-before-head, auxiliary-after-verb; SVO/VSO languages tend to have prepositions, noun-genitive order, head-before-relative-clause. These correlations were systematized by Dryer’s branching-direction theory (Dryer 1992).

Polysynthesis, Incorporation, Serial Verbs, Converbs

Polysynthesis packs multiple morphemes — corresponding to whole English clauses — into a single word. Inuktitut qangatasuukkuvimmuuriaqalaaqtunga “I will have to go to the airport.” Mohawk washakotya’tawitsherahetkvhta’se’ “He made the thing that one puts on one’s body ugly for her” (Mark Baker The Polysynthesis Parameter 1996). Mapudungun (Chile/Argentina) shows extensive verbal morphology. The polysynthesis parameter sets the language to require argument incorporation / pronominal arguments on the verb.

Noun incorporation (Mithun 1984 typology) — a noun stem is compounded into the verb, often demoting the corresponding argument. Serial verb constructions (West African languages, Sranan, Mandarin, Thai) chain multiple verbs in a single clause sharing arguments and tense: Yoruba ó mú ìwé wá “he took book come” (“he brought a book”). Converbs (Turkic, Mongolic, Tungusic, Korean, Japanese) are non-finite verbal forms expressing adverbial relations between clauses. Switch reference (Pomoan, Iroquoian, Papuan languages) marks on the verb whether the subject of the next clause is the same as or different from the current subject.

Computational Syntax

Parsing assigns syntactic structure to input strings. The two dominant paradigms are constituency parsing (output is a phrase-structure tree, evaluated against treebanks like the Penn Treebank, Marcus et al. 1993) and dependency parsing (output is a dependency tree, evaluated against UD treebanks).

Transition-based parsing processes tokens left-to-right with shift / reduce / arc-left / arc-right actions; classical algorithm Nivre 2003 An Efficient Algorithm for Projective Dependency Parsing. Graph-based parsing scores edges globally and finds the maximum-spanning tree (McDonald, Pereira, Ribarov, Hajič 2005). Both have been re-implemented with neural scoring functions (Chen and Manning 2014, Dozat and Manning 2017 Deep Biaffine Attention for Neural Dependency Parsing).

Major systems and pipelines: Stanford CoreNLP, spaCy (industrial Python NLP), UDPipe, Stanza, Berkeley Neural Parser. CCG parsers: C&C, EasyCCG, depCCG.

Probing Pretrained Language Models for Syntax

A line of research asks whether transformer language models (BERT, GPT) encode syntactic structure implicitly. John Hewitt and Christopher Manning A Structural Probe for Finding Syntax in Word Representations (NAACL 2019) showed that a linear transformation of BERT embeddings recovers parse-tree distances. Ian Tenney, Dipanjan Das, Ellie Pavlick BERT Rediscovers the Classical NLP Pipeline (ACL 2019) found that BERT’s layers correspond roughly to the NLP pipeline (POS in lower layers, parsing in middle, semantics in higher). These findings suggest large language models acquire substantial syntactic structure from distributional learning alone, though they are not necessarily Chomskyan UG-grammars.

Adjacent