Digital Forensics / E-Discovery / DFIR DSLs Family Index


type: language-family-index family: forensic-evidence languages_catalogued: 28 tags: [language-reference, family-index, forensic-evidence, dfxml, plaso, case-uco, aff4, yara, vql, kql, edrm, relativity, veris, sysmon, openc2]

Digital Forensics / E-Discovery / DFIR — Family Index

Family overview

Digital-forensics and incident-response (DFIR) is one of the most XML-and-JSON-fragmented domains in computing, and the reason is historical: every wave of forensic tooling — from the late-1990s disk-imaging vendors, to the 2000s law-enforcement timeline tools, to the 2010s threat-intel sharing era, to the 2020s detection-as-code wave — produced its own schema, its own container format, its own query language, and almost never deprecated the prior generation. The result is a stack with at least five distinct semantic layers, each with multiple competing standards: disk-image containers (E01/EVF, EVF2, AFF, AFF4, AFF4-L), filesystem and artifact descriptions (DFXML, TSK schema), timeline events (plaso/log2timeline JSON), cross-tool case ontologies (CASE / UCO, MAEC), detection rules (YARA / YARA-X, Sigma, Sysmon config XML), endpoint-query DSLs (Velociraptor VQL, OSQuery, GRR flows), SIEM/log query languages (KQL, EQL, SPL, Lucene-for-SIEM), incident-sharing schemas (VERIS, STIX/TAXII, OpenIOC, MISP), command-and-control automation (OpenC2), and the parallel e-discovery world (Relativity DAT, Concordance, EDRM load files, EDRM XML).

The disk-image container battle is the oldest and least-resolved: Guidance Software’s proprietary E01 / EVF (1998+, now OpenText after the 2017 acquisition) became the de facto law-enforcement standard despite being effectively undocumented, while the open alternative AFF (Simson Garfinkel, 2005) and its successor AFF4 (Cohen / Garfinkel / Schatz, 2009) won academic and open-tool adoption but never displaced E01 in police labs. AFF4-L (logical-evidence variant, NIST-funded, presented at DFRWS 2019) layered deduplicated logical-image semantics on top of ZIP, but uptake outside Evimetry’s own tooling has been modest. The 2018–2025 push for CASE (Cyber-investigation Analysis Standard Expression), layered on top of the UCO (Unified Cyber Ontology) and stewarded under the Linux Foundation’s Cyber Domain Ontology project since December 2021, finally provides a JSON-LD-based cross-tool evidence ontology — CASE 1.0.0 shipped August 2022 and is the first formally stable cross-vendor standard the field has had.

The detection / hunting layer changed shape twice in the 2020s. First, YARA (VirusTotal, ~2008) — already cross-listed in network-protocol-dsls — was effectively frozen and superseded by YARA-X (Rust rewrite, VirusTotal), which reached stable 1.0.0 on 2025-06-04 and now powers VirusTotal Livehunt/Retrohunt. Second, the detection-as-code movement crystallised around Sigma (Florian Roth, 2016, cross-listed under government-civictech) as a vendor-neutral SIEM rule format that compiles down to KQL, EQL, SPL, Lucene, and others — making it possible to write a detection once and ship it to Microsoft Sentinel, Elastic Security, and Splunk. KQL (Microsoft Kusto, Defender XDR / Sentinel) is now the de facto SIEM query language at large-scale; EQL (Elastic) introduced sequence-and-window semantics for endpoint events; Splunk SPL remains entrenched in enterprise SOCs.

The e-discovery world runs in an almost completely parallel universe, governed by litigation practice rather than infosec. Despite the EDRM (Electronic Discovery Reference Model) consortium’s repeated attempts to standardise on XML (EDRM XML 1.2, 2010+), the dominant production format remains the Concordance-style DAT file — a 1990s pilcrow-and-thorn-delimited flat file, paired with OPT (Opticon) page-image cross-reference files. Relativity (kCura / Relativity LLC) won the e-discovery platform war in the 2010s and its DAT/OPT export is the closest thing to a real-world standard. The structural mismatch — DFIR moving toward JSON-LD ontologies while e-discovery still produces pilcrow-delimited flat files — is why “forensic data interchange” remains a polite fiction more than a reality.

In our deep library

None catalogued. Forensic / e-discovery DSLs do not have standalone deep-library notes; they live in this Tier 3 family index plus cross-listings.

Cross-reference:

  • government-civictech — sibling family; STIX 2.1, TAXII 2.1, CSAF 2.0, Sigma, MISP, OpenIOC are catalogued there. This note cross-lists them for DFIR completeness but treats them as primary residents of government-civictech.
  • network-protocol-dsls — sibling; YARA (original C) and Sigma are catalogued there; this note covers the modern YARA-X Rust rewrite as the new primary.
  • api-description — adjacent; CASE/UCO are JSON-LD ontologies that fit a similar serialisation tradition as JSON-Schema and OpenAPI.
  • notation-spec — adjacent; UCO’s RDFS/OWL underpinnings sit next to formal-grammar and notation families.
  • identity-auth-policy — adjacent; authentication-event evidence (Windows Event Log, OIDC token traces, Entra sign-in logs) is a common forensic artifact source.
  • graph-log-event-query — sibling; KQL, EQL, SPL, Lucene-for-SIEM are deep relatives of the log-and-graph query family.

Tier 3 family table — Disk-image / evidence containers

FormatFirst appearedOriginTypeStatus (2026)URL
EnCase E01 / EWF / EVF1998Guidance Software (now OpenText)Proprietary segmented disk-image container with embedded metadata, hashing, compressionDominant in law enforcement despite proprietary nature; libewf open-source reader is the de facto interop layerhttps://github.com/libyal/libewf
EVF2 (Ex01 / Lx01)~2012Guidance / OpenTextSuccessor to E01 with AES encryption and stronger hashingActive in EnCase product line; libewf has partial EVF2 read supporthttps://github.com/libyal/libewf/wiki/Format-EVF
AFF (Advanced Forensic Format)2005Simson Garfinkel + Basis TechnologyOpen disk-image container with extensible metadata segmentsLegacy, superseded by AFF4https://www.forensicswiki.xyz/wiki/AFF
AFF4 v1.02009 (paper); v1.0 spec maturing through 2018+Michael Cohen, Simson Garfinkel, Bradley SchatzOpen ZIP-based container; supports streams, logical objects, hash-based dedupActive standard, primary OSS alternative to E01; powers Evimetry, supported by Magnet AXIOMhttps://github.com/aff4/Standard
AFF4-L (logical)DFRWS 2019 (NIST-funded)Schatz / Cohen / GarfinkelLogical-evidence container variant of AFF4 with deduplication semanticsActive but niche outside Evimetry; primary use in selective-collection workflowshttps://dfrws.org/wp-content/uploads/2019/06/2019_USA_paper-aff4_l_a_scalable_open_logical_evidence_container.pdf
AccessData FTK FXMLZ~2014AccessData (now Exterro)XML-Zip case-export bundle from FTKActive in FTK product line; effectively vendor-internalhttps://www.exterro.com/digital-forensics-software/forensic-toolkit
DD / raw image1972 (Unix dd)AT&TBit-for-bit raw image, no metadataUniversal as lowest-common-denominator formathttps://man7.org/linux/man-pages/man1/dd.1.html
Volatility memory profile (.dwarf, .vtypes)2007+ (Vol 2), 2017+ (Vol 3)The Volatility FoundationPython-defined memory layouts + symbol tables for OS kernelsActive; Volatility 3 (Python rewrite, 2019+) uses ISF JSON symbol files instead of Python vtypeshttps://volatility3.readthedocs.io/

Tier 3 family table — Timeline / artifact / case formats

FormatFirst appearedOriginTypeStatus (2026)URL
DFXML (Digital Forensics XML)2009+Simson Garfinkel (NPS / NIST)XML schema for fileobject, byte_runs, hashes, timestamps, hashdbActive but slow-moving; still the canonical XML for filesystem-walk outputhttps://github.com/dfxml-working-group/dfxml_schema
plaso / log2timeline JSON2012 (l2t orig.); plaso rewrite 2013+Kristinn Gudjonsson + communityPlaso’s “super-timeline” event JSON; one event per row, plugin-derivedVery active; current stable release 20260119 (Feb 2026) per plaso releaseshttps://plaso.readthedocs.io/
CASE (Cyber-investigation Analysis Standard Expression)2014 (DC3 + MITRE); 1.0.0 in Aug 2022DC3 / MITRE; Linux Foundation CDO since Dec 2021JSON-LD ontology layered on UCO; classes for Trace, Investigation, Action, ProvenanceRecordActive standard, 1.0.0 stable since 2022; 2.x is the next majorhttps://caseontology.org/
UCO (Unified Cyber Ontology)2014 (Ebiquity / UMBC origin)UMBC Ebiquity → CDO under Linux Foundation (Dec 2021)RDFS/OWL/JSON-LD cyber-domain ontology underlying CASE; classes for ObservableObject, Identity, ActionActive; CASE/UCO co-release 1.0.0 Aug 2022https://unifiedcyberontology.org/
Sleuth Kit / Autopsy data model2002 (TSK); Autopsy data-model 2012+Brian CarrierSQLite-backed case schema (tsk_files, tsk_objects, blackboard_artifacts)Active, the canonical OSS case database used by Autopsy and downstream toolshttps://www.sleuthkit.org/sleuthkit/docs/api-docs/
MAEC (Malware Attribute Enumeration and Characterization)2010MITREJSON / XML schema for malware behaviour, capabilities, indicatorsLegacy / niche; partially superseded by STIX 2.x malware-analysis object; v5.0 last majorhttps://maecproject.github.io/

Tier 3 family table — DFIR query / detection languages

LanguageFirst appearedOriginTypeStatus (2026)URL
Velociraptor VQL2018Velocidex (Mike Cohen, Carlos Canto)SQL-like endpoint-query DSL with plugins for filesystem, registry, EVTX, memory artifactsVery active; current 0.76 / 0.76.1 (Mar–Apr 2026); the most powerful OSS live-response DSLhttps://docs.velociraptor.app/docs/vql/
GRR Rapid Response flows2011GooglePython client-side flow language for fleet IRActive but lower velocity than Velociraptor in the broader communityhttps://grr-doc.readthedocs.io/
OSQuery (osquery SQL)2014Facebook (now Meta)SQLite-backed virtual tables over OS state; queryable via standard SQL dialectActive; ifLinuxFoundation osquery foundation since 2019https://osquery.readthedocs.io/
YARA-X2024 (beta); 1.0.0 stable 2025-06-04VirusTotal (Victor Alvarez / plusvic)Rust rewrite of YARA pattern-match DSL; ~99% rule compatibility with YARAStable; powers VT Livehunt + Retrohunt as of 2025https://virustotal.github.io/yara-x/
YARA (original C)~2008VirusTotalPattern-match DSL for files/memoryMaintenance mode; new dev moved to YARA-X. Also catalogued in network-protocol-dslshttps://github.com/VirusTotal/yara
Sysmon config XML2014Microsoft Sysinternals (Mark Russinovich, Thomas Garnier)XML-based detection / inclusion / exclusion rule grammar for Sysmon ETW providerActive; SwiftOnSecurity and Olaf Hartong configs are de facto baselineshttps://learn.microsoft.com/en-us/sysinternals/downloads/sysmon
Splunk SPL~2004Splunk Inc.Pipe-oriented search-and-stats DSL; the original commercial log-query languageVery active, dominant in enterprise SOCs; SPL2 introduced ~2021https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/WhatsInThisManual
Splunk CIM (Common Information Model)~2010Splunk Inc.Field-name and data-model schema for normalising security telemetryActive; published as the Splunk CIM Add-onhttps://docs.splunk.com/Documentation/CIM/latest/User/Overview
Microsoft KQL (Kusto Query Language)~2014 (internal); 2018+ publicMicrosoft (Azure Data Explorer team)Pipe-style analytics DSL; powers Sentinel, Defender XDR Advanced Hunting, Azure MonitorVery active, de facto cloud-SIEM query language at scalehttps://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-query-language
EQL (Event Query Language)2018 (Endgame); folded into Elastic 2019Endgame Inc. → ElasticSequence-and-window correlation DSL for endpoint events; uses sequence by ... with maxspan semanticsActive, part of Elastic Security rule enginehttps://www.elastic.co/guide/en/elasticsearch/reference/current/eql.html
Lucene query syntax (SIEM use)1999 (Lucene); SIEM use 2010s+Doug Cutting; widely embeddedField:value boolean query DSL; Elastic SIEM, OpenSearch, othersUniversal in Elastic/OpenSearch stackshttps://lucene.apache.org/core/9_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html
OpenC2 (Open Command and Control)2017 (OASIS TC); v1.0 in 2019, v2.0 CSD 2024-05OASIS OpenC2 TCJSON-encoded action/target/args grammar for machine-to-machine security automationActive; v2.0 in committee draft as of mid-2024https://openc2.org/

Tier 3 family table — E-discovery load files

FormatFirst appearedOriginTypeStatus (2026)URL
Concordance DAT~1995Dataflight (Concordance)Pilcrow (¶) and thorn (þ) delimited flat file; one row per documentUniversal in e-discovery production despite 1990s vintagehttps://help.relativity.com/RelativityOne/Content/Relativity/Import_Export/Import_Export_Load_file_specifications.htm
Relativity DAT~2006kCura / Relativity LLCConcordance-compatible DAT used as Relativity’s import/export formatDominant in modern e-discovery; Relativity holds the largest market sharehttps://help.relativity.com/RelativityOne/Content/Relativity/Import_Export/Export_Workflows/Production_load_file_export.htm
OPT (Opticon)~1996OpticonCSV-style page-image cross-reference; one row per page; pairs with DATUniversal as the page-image companion to DAT productionshttps://help.relativity.com/RelativityOne/Content/Relativity/Import_Export/Import_Export_Load_file_specifications.htm
LFP~1998IPRO / LexisNexisIPRO LFP image cross-reference (similar role to OPT)Active in IPRO-anchored workflowshttps://my.ipro.com/help/
EDRM XML2010 (v1.0); v1.2 ~2012EDRM consortium (Duke Law / Relativity)XML-based load-file alternative to DAT; carried document, family, and metadataAdopted only sporadically; never displaced DAT in production practicehttps://edrm.net/resources/frameworks-and-standards/edrm-xml/
EDRM Load File (CSV)2010sEDRM consortiumCSV-delimited alternative load fileNiche; vendor-specific variants dominatehttps://edrm.net/

Tier 3 family table — Incident / threat-modelling schemas

SchemaFirst appearedOriginTypeStatus (2026)URL
VERIS2010Verizon RISK teamJSON Schema vocabulary for incident recording; Action/Asset/Actor/Attribute four-axis “A4” modelActive; underpins annual Verizon DBIR; VCDB is the public corpushttps://verisframework.org/
STIX 2.12014 (STIX 1.x); 2.1 OASIS standard 2021OASIS CTI TC (orig. MITRE)JSON cyber-threat-intel object modelActive; cross-listed in government-civictechhttps://oasis-open.github.io/cti-documentation/
TAXII 2.12014 (TAXII 1.x); 2.1 OASIS 2021OASIS CTI TCHTTPS-based STIX transport protocolActive; cross-listed in government-civictechhttps://oasis-open.github.io/cti-documentation/
Mandiant OpenIOC2011Mandiant (now Google Cloud)XML schema for indicators-of-compromise; predates STIXLegacy but still encountered; cross-listed in government-civictechhttps://github.com/mandiant/OpenIOC_1.1
MISP core format2013Belgian Defence / CIRCLJSON event-and-attribute model for Malware Information Sharing PlatformVery active; cross-listed in government-civictechhttps://www.misp-standard.org/
Sigma2016Florian Roth, Thomas PatzkeYAML detection-rule DSL; vendor-neutral, compiles to KQL/EQL/SPL/Lucene/etc.Very active; cross-listed in government-civictech and network-protocol-dslshttps://sigmahq.io/

Notable threads

  • CASE/UCO is the long-overdue cross-tool evidence ontology. Until 2018–2022, DFIR had no formally stable cross-vendor representation of “an investigation” — every tool (EnCase, FTK, X-Ways, Autopsy, Magnet AXIOM, Cellebrite) emitted its own report shape. CASE (built on UCO) finally provides JSON-LD classes for Trace, ProvenanceRecord, Action, and Identity that map to investigation primitives. The project was DoJ/NIST/DC3 + MITRE-driven, transitioned to the Linux Foundation Cyber Domain Ontology project in December 2021, and shipped CASE/UCO 1.0.0 in August 2022. Adoption is slow — vendor-internal formats are sticky — but CASE is now the only standard a multi-tool investigation can serialise into without losing semantics. Time-stamped claim 2026-05-10: verify CASE current minor version against caseontology.org before citing in production work.

  • plaso is the open-source super-timeline standard. Forensic timelines used to be built by hand from mactime, bodyfile, and event-log dumps. log2timeline (Kristinn Gudjonsson, 2009) and then its Python rewrite plaso (2012+) automated the process: 200+ parser plugins for OS artifacts (EVTX, Prefetch, MFT, Plist, SQLite stores, browser histories, $UsnJrnl) all emit into a unified event JSON. Current plaso release is 20260119 (Feb 2026) and a 20260415-pre is in test per the plaso docs. Output formats include plaso-storage, JSONL, Elasticsearch direct push, and L2tCSV. This is the closest the field has to a universal timeline interchange.

  • The E01 vs AFF4 container war never resolved. EnCase’s E01 / EWF format won law-enforcement adoption in the late 1990s and never lost it, despite being effectively undocumented (libewf reverse-engineered the spec). AFF (2005) and AFF4 (2009, with v1.0 maturing through NIST-funded work into the late 2010s) won academic and OSS mindshare but never displaced E01 in police labs. AFF4-L (DFRWS 2019) added logical-evidence deduplication semantics but is essentially a single-vendor (Evimetry) format in practice. The net effect: every forensic tool reads E01, every modern tool also reads AFF4, and the “industry standard” is whichever one your acquisition tool emitted.

  • Relativity DAT files are 1990s technology that won the e-discovery war. Despite EDRM XML (2010+, v1.2) being technically superior, the dominant e-discovery production format remains Concordance-style DAT — a pilcrow (¶) and thorn (þ) delimited flat file — paired with OPT (Opticon page-image cross-reference) and folders of natives + TIFFs + extracted-text. Relativity (now Relativity LLC) became the dominant e-discovery platform in the 2010s, and its DAT export is the de facto litigation-data-exchange format. The structural mismatch between this and modern JSON-LD DFIR formats (CASE/UCO) means the two domains barely interoperate.

  • Sigma → vendor-rule conversion is the detection-as-code lingua franca. Sigma (2016) lets a detection author write a single YAML rule that compiles to KQL (Sentinel/Defender), EQL (Elastic Security), SPL (Splunk), Lucene (OpenSearch), Microsoft 365 Defender Advanced Hunting, Chronicle YARA-L, QRadar AQL, and ArcSight. The Sigma backends (pySigma, sigmac legacy) are the actual moneymakers — they’re how a vendor-neutral rule turns into a runnable SIEM detection. There are now public Sigma rule repositories with thousands of community-maintained rules covering MITRE ATT&CK coverage. Cross-listed under government-civictech.

  • KQL is the de facto SIEM query language at cloud scale. Microsoft’s Kusto Query Language emerged from Azure Data Explorer (~2014 internal, public 2018+) and now powers Microsoft Sentinel, Defender XDR Advanced Hunting, Defender for Endpoint Advanced Hunting, Azure Monitor, and Log Analytics. Because Microsoft owns roughly half the enterprise security stack via Defender + Entra + Sentinel, the practical effect is that most production-grade hunting in 2026 is written in KQL. The closest open competitor at similar scale is Splunk SPL; Elastic’s EQL is more specialised for endpoint sequences.

  • Velociraptor VQL is the most powerful open-source live-response DSL. While other endpoint-DFIR tools (GRR, OSQuery) ship a constrained query surface, Velociraptor’s VQL is a near-complete SQL-flavoured language with plugins for filesystem walks, registry parsing, EVTX, memory artifacts, NTFS internals, file carving, YARA scans, sigma-rule evaluation, and remote execution — all composable in a single query. The current release (0.76, March 2026; 0.76.1 April 2026) added a Bleve-backed local full-text search across query results and refactored the CLI to treat artifacts as mini-VQL programs. For incident response on a fleet, nothing else in OSS gets close.

Citations