COBOL — Reference

Source: https://gcc.gnu.org/onlinedocs/gcobol/

COBOL

  • Created: 1959 by CODASYL (Conference on Data Systems Languages), with influence from Grace Hopper’s FLOW-MATIC. Designed under US DoD contract for portable business data processing.
  • Latest stable: COBOL 2023 = ISO/IEC 1989:2023 (published 2023). Predecessors: COBOL-60, -68, -74, -85, 2002, 2014.
  • Paradigms: Imperative, procedural, with object-oriented features added in COBOL 2002 (classes, interfaces, inheritance).
  • Typing: Static, with PICTURE clauses describing data layout at the byte/digit level rather than abstract types. Strong layout discipline; weak abstract type checking by modern standards.
  • Memory: No GC. Mostly static allocation declared in the DATA DIVISION; modern COBOL adds dynamic allocation (ALLOCATE/FREE). Working-storage, local-storage, linkage sections segregate lifetimes.
  • Compilation: Ahead-of-time. Major implementations: IBM Enterprise COBOL for z/OS (mainframe, dominant), IBM COBOL for AIX/Linux on Power, Micro Focus Visual COBOL (Windows/Linux/.NET/JVM), GnuCOBOL (free, transpiles to C), Fujitsu NetCOBOL, GCC COBOL frontend (gcobol) — merged into GCC 15 (2025) as the first FOSS COBOL compiler in the GCC tree.
  • Primary domains: Mainframe batch + transaction processing, banking core systems, insurance, government, payroll, ERP. Estimated 200+ billion lines in production; 95% of card-swipe transactions touch COBOL.
  • Official docs: https://gcc.gnu.org/onlinedocs/gcobol/ (gcobol), https://gnucobol.sourceforge.io/ (GnuCOBOL), https://www.iso.org/standard/74527.html (ISO/IEC 1989:2023), https://www.ibm.com/docs/en/cobol-zos (Enterprise COBOL).

At a glance

COBOL is the language of records and reports: fixed-layout data, decimal arithmetic without floating-point error, hierarchical record definitions, and English-like statements. Its longevity comes from two things — (1) trillions of dollars of compiled production code that nobody dares touch, and (2) it is genuinely good at exactly what it does (mainframe batch, sequential file processing, packed-decimal financial math). Modern dialects (Enterprise COBOL 6+, Visual COBOL, gcobol) support OO, embedded XML/JSON, JVM/.NET interop, and Unicode, but the workhorse style is still COBOL-85 + structured programming + CICS/DB2.

Getting started

Install (free options):

  • GnuCOBOL (3.2, July 2023): sudo apt install gnucobol (Debian/Ubuntu), brew install gnu-cobol (macOS). Compiler is cobc.
  • gcobol (GCC’s COBOL frontend, merged into GCC 15, Apr 2025): build GCC with --enable-languages=cobol or use a distro that ships GCC 15+.
  • IBM COBOL for Linux on x86 (commercial, free trial): https://www.ibm.com/products/cobol-compiler-linux.
  • Micro Focus Visual COBOL Personal Edition (free for non-commercial).

Hello world (hello.cob, free-form):

       IDENTIFICATION DIVISION.
       PROGRAM-ID. HELLO.
       PROCEDURE DIVISION.
           DISPLAY "Hello, world!".
           STOP RUN.

Compile + run with GnuCOBOL: cobc -x -free hello.cob && ./hello. With gcobol: gcobol -free -o hello hello.cob && ./hello. (-x = build executable, not module; -free = free-form source — without it, columns 1-6 are reserved.)

Project layout. No standardized layout. Convention is one program per .cob file, with shared record definitions in .cpy (COPY) files included via COPY MEMBER-NAME.. Modules are linked statically or dynamically called via CALL "PROGRAM-NAME".

Build tools. No package manager. GNU make is standard for FOSS COBOL. On z/OS, JCL (Job Control Language) drives compile+link+execute steps; DBB (Dependency Based Build) + Zowe are the modern Git-based replacements for legacy SCM (Endevor, ChangeMan).

REPL. None. Edit-compile-run cycle. Some dialects (Micro Focus) ship animator/debugger that feels interactive.

Basics

Source format. Fixed format (the default for legacy code): columns 1-6 sequence numbers, column 7 indicator (* = comment, - = continuation, D = debug), columns 8-11 “Area A” (division/section/paragraph headers), columns 12-72 “Area B” (statements), 73-80 ignored. Free format (modern, COBOL 2002+): no column rules, use cobc -free or >>SOURCE FORMAT FREE directive.

Four divisions, in order:

  1. IDENTIFICATION DIVISIONPROGRAM-ID., author, date metadata.
  2. ENVIRONMENT DIVISIONCONFIGURATION SECTION (source/object computer), INPUT-OUTPUT SECTION with FILE-CONTROL (assigns logical files to physical paths/datasets).
  3. DATA DIVISIONFILE SECTION (record layouts for files), WORKING-STORAGE SECTION (program-static), LOCAL-STORAGE SECTION (auto-storage per call), LINKAGE SECTION (parameters from caller).
  4. PROCEDURE DIVISION — the actual code, organized into SECTIONs and PARAGRAPHs.

Types via PICTURE clauses. A PIC clause describes character layout, not abstract type. Examples:

01  CUST-ID         PIC 9(8).            *> 8-digit unsigned integer (display)
01  PRICE           PIC S9(7)V99 COMP-3. *> signed 7.2 packed decimal (4 bytes)
01  COUNTER         PIC S9(9) COMP-5.    *> native 32-bit signed binary
01  NAME            PIC X(30).           *> 30 chars
01  RATE            PIC 9V9(4).          *> 1 digit before, 4 after implicit decimal
  • 9 = digit, X = any char, A = letter, S = sign, V = implicit decimal point, (n) = repeat.
  • USAGE clauses: DISPLAY (default; one digit/char per byte, EBCDIC or ASCII), COMP/COMP-4/BINARY (native binary), COMP-3/PACKED-DECIMAL (BCD, 2 digits/byte + sign nibble — the workhorse for money), COMP-5 (native binary, no truncation to PIC width), COMP-1 (single-precision float), COMP-2 (double float — rarely used; you want COMP-3 for money).

Hierarchical records via level numbers. 01 is a record root; 02-49 are subordinate fields; 66 for renaming, 77 for standalone, 88 for condition names:

01  CUSTOMER.
    05  CUST-ID         PIC 9(8).
    05  CUST-NAME.
        10  FIRST-NAME  PIC X(15).
        10  LAST-NAME   PIC X(20).
    05  STATUS-CODE     PIC X.
        88  ACTIVE      VALUE "A".
        88  INACTIVE    VALUE "I" "X".

Then test with IF ACTIVE ... (88-level condition names).

Variables/scoping. Lexical scope is the program. Sections/paragraphs share all DATA DIVISION storage (it’s effectively all global within the program). LOCAL-STORAGE gets fresh per call (re-entrancy). Nested programs (COBOL-85+) provide scoping, but flat structure is overwhelmingly common.

Control flow. IF / ELSE / END-IF, EVALUATE / WHEN (case), PERFORM paragraph-name (call), PERFORM ... THRU ... TIMES, PERFORM VARYING I FROM 1 BY 1 UNTIL ... (the for-loop), PERFORM ... WITH TEST AFTER UNTIL ... (do-while). Modern: structured END-IF/END-PERFORM/END-EVALUATE terminators close blocks unambiguously — always use them; legacy “period scoping” (statements terminated by .) is a bug minefield.

Procedures. Two flavors: (1) paragraphs invoked with PERFORM — share storage, no formal parameters. (2) CALL to a separate program: CALL "SUBPROG" USING BY REFERENCE A, BY VALUE B, BY CONTENT C. The called program declares matching LINKAGE SECTION items and PROCEDURE DIVISION USING .... BY REFERENCE (default) passes pointer; BY VALUE (COBOL-2002+) passes copy; BY CONTENT passes copy but receiver sees it as reference.

Strings. PIC X(n) is fixed-length, blank-padded, ASCII or EBCDIC depending on platform. STRING ... DELIMITED BY ... INTO ... for concatenation, UNSTRING for tokenization, INSPECT ... TALLYING / REPLACING for find/replace, MOVE FUNCTION UPPER-CASE(X) TO Y. National (Unicode UTF-16) via PIC N(n) and USAGE NATIONAL.

Collections. Arrays = “tables” via OCCURS:

01  MONTHLY-SALES.
    05  MONTH-AMT OCCURS 12 TIMES PIC S9(7)V99 COMP-3.

Index with MONTHLY-SALES (3). Multi-dimensional: nested OCCURS. Searchable tables: OCCURS 100 TIMES INDEXED BY I ASCENDING KEY IS K, then SEARCH ALL does binary search.

Intermediate

Modules. Programs callable via dynamic CALL. COPY books (.cpy) are textual includes for shared record layouts — every shop has hundreds. REPLACING clause on COPY does textual substitution: COPY CUSTREC REPLACING ==:PFX:== BY ==CUST==. produces a copy with :PFX: replaced by CUST everywhere. Modern (COBOL-85+) nested programs allow lexical nesting with IS COMMON and IS GLOBAL visibility.

Error handling. Per-statement: ON SIZE ERROR, ON OVERFLOW, INVALID KEY, AT END, NOT ON SIZE ERROR. File I/O sets FILE STATUS codes (two-digit string). No exceptions in classic COBOL; COBOL-2002 added RAISE/DECLARATIVES for exception handling but adoption is uneven. Mainframe shops use the Language Environment (LE) condition handler for cross-language error propagation.

Concurrency. None in the standard. Mainframe concurrency comes from outside: CICS transactions are inherently per-instance, batch jobs are scheduled by JES2/JES3, parallel sysplex provides cluster-level concurrency. COBOL programs themselves are single-threaded. Recent dialects (Visual COBOL on .NET/JVM) inherit host concurrency.

I/O. First-class, the language was designed around it.

  • Sequential files: OPEN INPUT INFILE / READ INFILE INTO REC AT END SET EOF TO TRUE END-READ / CLOSE INFILE.
  • Indexed files (VSAM/KSDS-style): ORGANIZATION IS INDEXED RECORD KEY IS CUST-ID ALTERNATE RECORD KEY IS NAME WITH DUPLICATES. READ ... KEY IS ..., START, REWRITE.
  • Relative files: access by record number.
  • Line-sequential (text files on Unix): GnuCOBOL/Visual COBOL extension.

Stdlib highlights. Intrinsic functions (FUNCTION …): math (SQRT, LOG, MEAN), strings (UPPER-CASE, LOWER-CASE, REVERSE, TRIM), date/time (CURRENT-DATE, INTEGER-OF-DATE, DAY-OF-WEEK), type conversion (NUMVAL, NUMVAL-C for currency-formatted strings). COBOL-2014 added JSON GENERATE / JSON PARSE and XML GENERATE / XML PARSE for in-language serialization (one of the big wins for modernization).

Advanced

Memory model. WORKING-STORAGE is allocated once, persists across calls (effectively static). LOCAL-STORAGE is allocated per invocation (needed for re-entrant programs called concurrently from CICS). LINKAGE SECTION items are not allocated by the called program — they reference caller’s storage. Pointers (USAGE POINTER, USAGE PROCEDURE-POINTER) and ADDRESS OF enable dynamic structures and FFI; ALLOCATE/FREE (COBOL-2002) heap-allocate.

Concurrency deep dive. In modern COBOL (Enterprise COBOL on z/OS), THREAD compiler option enables thread-safe code via LOCAL-STORAGE; CICS transactions and IMS message regions provide the actual concurrency. On distributed platforms, Visual COBOL on .NET/JVM exposes host threads. COBOL standard 2002 has no concurrency primitives.

FFI. Three patterns: (1) CALL “C-FUNCTION” with matching LINKAGE SECTION — works directly to C if calling conventions match. GnuCOBOL transpiles to C, so any C library is callable. (2) EXEC SQL preprocessor (DB2, Oracle, PostgreSQL via OpenESQL) embeds SQL with EXEC SQL ... END-EXEC, expanded by precompiler into CALLs to runtime. (3) EXEC CICS for CICS transaction services (EXEC CICS READ FILE('CUSTFILE') RIDFLD(CUST-ID) INTO(CUSTREC) END-EXEC). On z/OS, the Language Environment (LE) lets COBOL, PL/I, C, and assembler share a runtime, condition handling, and storage manager.

Reflection. Almost none. LENGTH OF returns the byte length of an item. ADDRESS OF returns a pointer. No introspection of record layouts at runtime; all layout knowledge is compile-time via copybooks.

Performance tools. On z/OS: Strobe, APA (Application Performance Analyzer), IBM Fault Analyzer for post-mortem dumps, CICS Performance Analyzer. Compile flags matter — OPT(2) (ARCH(11+) on modern z) for hot paths, LIST/MAP for storage-map listings. On distributed: standard profilers if compiled to native; .NET/JVM dialects use host profilers (PerfView, JFR).

God mode

REDEFINES lets two declarations alias the same storage — used to interpret a record different ways, or implement variant records:

01  TXN-RECORD.
    05  TXN-TYPE       PIC X.
    05  TXN-DATA       PIC X(99).
    05  PURCHASE       REDEFINES TXN-DATA.
        10  AMOUNT     PIC S9(7)V99 COMP-3.
        10  SKU        PIC X(15).
        10  FILLER     PIC X(78).
    05  REFUND         REDEFINES TXN-DATA.
        10  ORIG-TXN   PIC 9(12).
        10  AMOUNT     PIC S9(7)V99 COMP-3.
        10  FILLER     PIC X(82).

Combined with COMP-3 (packed decimal — 2 digits/byte plus sign nibble), this is how every credit card transaction record on Earth is laid out.

COPY … REPLACING as a poor man’s macro system. Use ==:tag:== markers and substitute. Combined with conditional >>IF / >>ELSE / >>END-IF directives (COBOL 2002+), copybooks become a real preprocessor.

Embedded SQL precompilers. DB2 precompiler (DSNHPC) and Oracle Pro*COBOL translate EXEC SQL into calls + cursors + descriptor structs. SQLCODE/SQLSTATE returned in the SQLCA (SQL Communications Area) record. Modern: OpenESQL (Micro Focus) targets PostgreSQL/MS SQL/ODBC. CICS preprocessor (DFHECP1$) translates EXEC CICS similarly.

REPORT WRITER. A largely-forgotten gem of the standard: declarative reports with RD (Report Description) clauses defining headers, footers, control breaks, page format. INITIATE / GENERATE record / TERMINATE. Eliminates hand-coded pagination and totaling logic. Optional in COBOL-85, mandatory in 2002. Most shops moved to dedicated report tools (SAS, Crystal) but it’s still in the standard.

OO COBOL (2002+). CLASS-ID. WIDGET INHERITS FROM BASE. METHOD-ID. PAINT. INVOKE WIDGET "PAINT" RETURNING .... Visual COBOL leans into this; mainframe shops mostly ignore it.

GnuCOBOL → C. cobc -E hello.cob shows the C output. Useful for understanding semantics, debugging, or hand-editing for unusual deployment. The C source is verbose but readable.

JCL integration. On z/OS, the COBOL program is just one step in a JCL job:

//STEP1   EXEC PGM=PAYROLL
//INFILE  DD DSN=PROD.CUSTOMER.MASTER,DISP=SHR
//OUTFILE DD DSN=PROD.PAYROLL.OUT,DISP=(NEW,CATLG),
//           SPACE=(CYL,(100,50)),DCB=(RECFM=FB,LRECL=200)

The DD statements bind logical file names (in SELECT INFILE ASSIGN TO INFILE) to physical datasets. Understanding JCL is essential for any z/OS COBOL work.

Language Environment intrinsics. On z/OS, LE provides callable services: CEELOCT (local time), CEEDATE (date format), CEE3ABD (abend), CEEDCOD (decode condition tokens). These are the platform-portable layer beneath COBOL/PL/I/C.

Idioms & style

  • Always use scope terminators (END-IF, END-PERFORM, END-EVALUATE, END-READ). Period-terminated statements are a bug magnet — one stray period closes a block early.
  • Use EVALUATE over chained IF/ELSE IF. It’s COBOL’s switch and supports WHEN OTHER, WHEN ALSO (multi-dimensional), and ranges.
  • Use 88-level condition names instead of magic literals: 88 ACTIVE VALUE "A" then IF ACTIVE reads cleanly and centralizes the constant.
  • Naming: KEBAB-CASE-WITH-DASHES. Prefix indicators are common: WS- for working-storage, LK- for linkage, FD- for file description, WS-CUST-ID-N (N = numeric edited).
  • Use COMP-3 for currency. Decimal arithmetic, no float error, ~half the bytes of DISPLAY.
  • Avoid GO TO and ALTER. ALTER was removed in COBOL 2002 for a reason. Use PERFORM for structured control flow.
  • Keep paragraphs small and PERFORM them. Resist falling-through paragraph boundaries — it’s allowed and almost always a bug.
  • Formatter/linter: cobol-check (unit tests + assertions), cobolcritic, OpenSource COBOL Lint (rare). Most shops rely on code reviews + compiler warnings (cobc -W flags). IBM provides Code Coverage + Debug Tool for Enterprise COBOL.
  • Expert review focus: (1) Period placement — do scope terminators close every nested block? (2) MOVE between mismatched PICs — silent truncation or padding? (3) ON SIZE ERROR handled on every COMPUTE? (4) File status checked after every I/O verb? (5) INITIALIZE called on records before reuse? (6) Re-entrant programs using LOCAL-STORAGE rather than WORKING-STORAGE? (7) COMP vs COMP-3 vs DISPLAY — wrong choice for money or for binary keys?

Ecosystem

  • Mainframe runtimes: z/OS (CICS, IMS, DB2, JES, RACF, MQ), z/VSE, IBM i (formerly AS/400, now Power) with ILE COBOL.
  • Distributed compilers: GnuCOBOL (FOSS, transpiles to C), GCC gcobol (FOSS, native, GCC 15+), Micro Focus Visual COBOL (Win/Linux/.NET/JVM), Fujitsu NetCOBOL, IBM COBOL for Linux on x86/Power.
  • Modernization: Heirloom Computing (mainframe → JVM), AWS Mainframe Modernization (Blu Age, Micro Focus), Astadia, Asysco (mainframe → .NET).
  • DevOps for mainframe: IBM Z Open Editor (VS Code), Zowe (open-source mainframe CLI/SDK), DBB (Dependency Based Build, Git-based replacement for Endevor), Bridge for Git.
  • Testing: cobol-check (xUnit-style), MFUnit (Micro Focus), IBM zUnit, CoboMojo.
  • Notable users: Every major US bank, IRS, Social Security Administration, Fortune 500 insurance, FAA, every airline reservation system, most state DMVs and unemployment systems (the headline-grabbing pandemic crash-stories were all COBOL backends overwhelmed, not COBOL itself failing).

Gotchas

  • Period scoping. IF X = 1 MOVE A TO B. MOVE C TO D. — the second MOVE is outside the IF because of the period. Use END-IF. This single rule has caused more COBOL bugs than any other.
  • Implicit MOVE truncation. Moving PIC X(10) to PIC X(5) silently truncates the right side; moving PIC 9(5) to PIC 9(3) silently drops high digits without raising ON SIZE ERROR (that fires only on COMPUTE/arithmetic). Use MOVE ... TO X END-MOVE is no help; check lengths manually or use INSPECT.
  • EBCDIC vs ASCII. Mainframe COBOL data is EBCDIC; distributed COBOL is ASCII. Files moved between platforms need conversion (iconv, FTP ASCII mode, special FD clauses). PIC X(1) VALUE "A" is a different byte on z/OS vs Linux.
  • COMP vs COMP-5. COMP (a.k.a. COMP-4, BINARY) respects PIC widthPIC S9(4) COMP truncates beyond 9999. COMP-5 is “true binary” — uses the full storage width regardless of PIC. Mixing them when interfacing with C will silently corrupt data.
  • COMP-3 sign nibble. Packed-decimal stores sign in the low nibble of the last byte: 0xC = positive, 0xD = negative, 0xF = unsigned. Hand-edited or imported data with the wrong sign nibble silently parses but fails arithmetic comparisons.
  • STOP RUN in a called program. On z/OS LE, STOP RUN from a sub-program may terminate the entire LE enclave, not just the sub-program. Use EXIT PROGRAM or GOBACK (the modern, always-correct verb) instead.
  • CALL overhead. Static CALL is cheap; dynamic CALL identifier does a load-on-first-call and pays a lookup cost. For inner loops, prefer static. Cancel dynamically loaded programs with CANCEL to free storage; otherwise they leak across CICS transactions.
  • File status not checked. READ/WRITE/OPEN set a 2-byte status code; if you don’t check it (AT END, INVALID KEY, or explicit FILE STATUS), failures silently propagate. Mandatory practice: every I/O verb has a status check.
  • INITIALIZE and FILLER. INITIALIZE skips FILLER items by default and only sets numeric to zero, alphanumeric to spaces. Use INITIALIZE REPLACING ALL DATA BY ... or INITIALIZE ... WITH FILLER for full clearing.
  • Standards drift. “COBOL” without qualification could be any of -85, 2002, 2014, 2023, IBM Enterprise (which adds extensions), Micro Focus (adds different extensions), GnuCOBOL (subset). Always compile-check on the target dialect.

Citations