Digital Forensics / E-Discovery / DFIR DSLs Family Index
type: language-family-index family: forensic-evidence languages_catalogued: 28 tags: [language-reference, family-index, forensic-evidence, dfxml, plaso, case-uco, aff4, yara, vql, kql, edrm, relativity, veris, sysmon, openc2]
Digital Forensics / E-Discovery / DFIR — Family Index
Family overview
Digital-forensics and incident-response (DFIR) is one of the most XML-and-JSON-fragmented domains in computing, and the reason is historical: every wave of forensic tooling — from the late-1990s disk-imaging vendors, to the 2000s law-enforcement timeline tools, to the 2010s threat-intel sharing era, to the 2020s detection-as-code wave — produced its own schema, its own container format, its own query language, and almost never deprecated the prior generation. The result is a stack with at least five distinct semantic layers, each with multiple competing standards: disk-image containers (E01/EVF, EVF2, AFF, AFF4, AFF4-L), filesystem and artifact descriptions (DFXML, TSK schema), timeline events (plaso/log2timeline JSON), cross-tool case ontologies (CASE / UCO, MAEC), detection rules (YARA / YARA-X, Sigma, Sysmon config XML), endpoint-query DSLs (Velociraptor VQL, OSQuery, GRR flows), SIEM/log query languages (KQL, EQL, SPL, Lucene-for-SIEM), incident-sharing schemas (VERIS, STIX/TAXII, OpenIOC, MISP), command-and-control automation (OpenC2), and the parallel e-discovery world (Relativity DAT, Concordance, EDRM load files, EDRM XML).
The disk-image container battle is the oldest and least-resolved: Guidance Software’s proprietary E01 / EVF (1998+, now OpenText after the 2017 acquisition) became the de facto law-enforcement standard despite being effectively undocumented, while the open alternative AFF (Simson Garfinkel, 2005) and its successor AFF4 (Cohen / Garfinkel / Schatz, 2009) won academic and open-tool adoption but never displaced E01 in police labs. AFF4-L (logical-evidence variant, NIST-funded, presented at DFRWS 2019) layered deduplicated logical-image semantics on top of ZIP, but uptake outside Evimetry’s own tooling has been modest. The 2018–2025 push for CASE (Cyber-investigation Analysis Standard Expression), layered on top of the UCO (Unified Cyber Ontology) and stewarded under the Linux Foundation’s Cyber Domain Ontology project since December 2021, finally provides a JSON-LD-based cross-tool evidence ontology — CASE 1.0.0 shipped August 2022 and is the first formally stable cross-vendor standard the field has had.
The detection / hunting layer changed shape twice in the 2020s. First, YARA (VirusTotal, ~2008) — already cross-listed in network-protocol-dsls — was effectively frozen and superseded by YARA-X (Rust rewrite, VirusTotal), which reached stable 1.0.0 on 2025-06-04 and now powers VirusTotal Livehunt/Retrohunt. Second, the detection-as-code movement crystallised around Sigma (Florian Roth, 2016, cross-listed under government-civictech) as a vendor-neutral SIEM rule format that compiles down to KQL, EQL, SPL, Lucene, and others — making it possible to write a detection once and ship it to Microsoft Sentinel, Elastic Security, and Splunk. KQL (Microsoft Kusto, Defender XDR / Sentinel) is now the de facto SIEM query language at large-scale; EQL (Elastic) introduced sequence-and-window semantics for endpoint events; Splunk SPL remains entrenched in enterprise SOCs.
The e-discovery world runs in an almost completely parallel universe, governed by litigation practice rather than infosec. Despite the EDRM (Electronic Discovery Reference Model) consortium’s repeated attempts to standardise on XML (EDRM XML 1.2, 2010+), the dominant production format remains the Concordance-style DAT file — a 1990s pilcrow-and-thorn-delimited flat file, paired with OPT (Opticon) page-image cross-reference files. Relativity (kCura / Relativity LLC) won the e-discovery platform war in the 2010s and its DAT/OPT export is the closest thing to a real-world standard. The structural mismatch — DFIR moving toward JSON-LD ontologies while e-discovery still produces pilcrow-delimited flat files — is why “forensic data interchange” remains a polite fiction more than a reality.
In our deep library
None catalogued. Forensic / e-discovery DSLs do not have standalone deep-library notes; they live in this Tier 3 family index plus cross-listings.
Cross-reference:
- government-civictech — sibling family; STIX 2.1, TAXII 2.1, CSAF 2.0, Sigma, MISP, OpenIOC are catalogued there. This note cross-lists them for DFIR completeness but treats them as primary residents of government-civictech.
- network-protocol-dsls — sibling; YARA (original C) and Sigma are catalogued there; this note covers the modern YARA-X Rust rewrite as the new primary.
- api-description — adjacent; CASE/UCO are JSON-LD ontologies that fit a similar serialisation tradition as JSON-Schema and OpenAPI.
- notation-spec — adjacent; UCO’s RDFS/OWL underpinnings sit next to formal-grammar and notation families.
- identity-auth-policy — adjacent; authentication-event evidence (Windows Event Log, OIDC token traces, Entra sign-in logs) is a common forensic artifact source.
- graph-log-event-query — sibling; KQL, EQL, SPL, Lucene-for-SIEM are deep relatives of the log-and-graph query family.
Tier 3 family table — Disk-image / evidence containers
| Format | First appeared | Origin | Type | Status (2026) | URL |
|---|---|---|---|---|---|
| EnCase E01 / EWF / EVF | 1998 | Guidance Software (now OpenText) | Proprietary segmented disk-image container with embedded metadata, hashing, compression | Dominant in law enforcement despite proprietary nature; libewf open-source reader is the de facto interop layer | https://github.com/libyal/libewf |
| EVF2 (Ex01 / Lx01) | ~2012 | Guidance / OpenText | Successor to E01 with AES encryption and stronger hashing | Active in EnCase product line; libewf has partial EVF2 read support | https://github.com/libyal/libewf/wiki/Format-EVF |
| AFF (Advanced Forensic Format) | 2005 | Simson Garfinkel + Basis Technology | Open disk-image container with extensible metadata segments | Legacy, superseded by AFF4 | https://www.forensicswiki.xyz/wiki/AFF |
| AFF4 v1.0 | 2009 (paper); v1.0 spec maturing through 2018+ | Michael Cohen, Simson Garfinkel, Bradley Schatz | Open ZIP-based container; supports streams, logical objects, hash-based dedup | Active standard, primary OSS alternative to E01; powers Evimetry, supported by Magnet AXIOM | https://github.com/aff4/Standard |
| AFF4-L (logical) | DFRWS 2019 (NIST-funded) | Schatz / Cohen / Garfinkel | Logical-evidence container variant of AFF4 with deduplication semantics | Active but niche outside Evimetry; primary use in selective-collection workflows | https://dfrws.org/wp-content/uploads/2019/06/2019_USA_paper-aff4_l_a_scalable_open_logical_evidence_container.pdf |
| AccessData FTK FXMLZ | ~2014 | AccessData (now Exterro) | XML-Zip case-export bundle from FTK | Active in FTK product line; effectively vendor-internal | https://www.exterro.com/digital-forensics-software/forensic-toolkit |
| DD / raw image | 1972 (Unix dd) | AT&T | Bit-for-bit raw image, no metadata | Universal as lowest-common-denominator format | https://man7.org/linux/man-pages/man1/dd.1.html |
| Volatility memory profile (.dwarf, .vtypes) | 2007+ (Vol 2), 2017+ (Vol 3) | The Volatility Foundation | Python-defined memory layouts + symbol tables for OS kernels | Active; Volatility 3 (Python rewrite, 2019+) uses ISF JSON symbol files instead of Python vtypes | https://volatility3.readthedocs.io/ |
Tier 3 family table — Timeline / artifact / case formats
| Format | First appeared | Origin | Type | Status (2026) | URL |
|---|---|---|---|---|---|
| DFXML (Digital Forensics XML) | 2009+ | Simson Garfinkel (NPS / NIST) | XML schema for fileobject, byte_runs, hashes, timestamps, hashdb | Active but slow-moving; still the canonical XML for filesystem-walk output | https://github.com/dfxml-working-group/dfxml_schema |
| plaso / log2timeline JSON | 2012 (l2t orig.); plaso rewrite 2013+ | Kristinn Gudjonsson + community | Plaso’s “super-timeline” event JSON; one event per row, plugin-derived | Very active; current stable release 20260119 (Feb 2026) per plaso releases | https://plaso.readthedocs.io/ |
| CASE (Cyber-investigation Analysis Standard Expression) | 2014 (DC3 + MITRE); 1.0.0 in Aug 2022 | DC3 / MITRE; Linux Foundation CDO since Dec 2021 | JSON-LD ontology layered on UCO; classes for Trace, Investigation, Action, ProvenanceRecord | Active standard, 1.0.0 stable since 2022; 2.x is the next major | https://caseontology.org/ |
| UCO (Unified Cyber Ontology) | 2014 (Ebiquity / UMBC origin) | UMBC Ebiquity → CDO under Linux Foundation (Dec 2021) | RDFS/OWL/JSON-LD cyber-domain ontology underlying CASE; classes for ObservableObject, Identity, Action | Active; CASE/UCO co-release 1.0.0 Aug 2022 | https://unifiedcyberontology.org/ |
| Sleuth Kit / Autopsy data model | 2002 (TSK); Autopsy data-model 2012+ | Brian Carrier | SQLite-backed case schema (tsk_files, tsk_objects, blackboard_artifacts) | Active, the canonical OSS case database used by Autopsy and downstream tools | https://www.sleuthkit.org/sleuthkit/docs/api-docs/ |
| MAEC (Malware Attribute Enumeration and Characterization) | 2010 | MITRE | JSON / XML schema for malware behaviour, capabilities, indicators | Legacy / niche; partially superseded by STIX 2.x malware-analysis object; v5.0 last major | https://maecproject.github.io/ |
Tier 3 family table — DFIR query / detection languages
| Language | First appeared | Origin | Type | Status (2026) | URL |
|---|---|---|---|---|---|
| Velociraptor VQL | 2018 | Velocidex (Mike Cohen, Carlos Canto) | SQL-like endpoint-query DSL with plugins for filesystem, registry, EVTX, memory artifacts | Very active; current 0.76 / 0.76.1 (Mar–Apr 2026); the most powerful OSS live-response DSL | https://docs.velociraptor.app/docs/vql/ |
| GRR Rapid Response flows | 2011 | Python client-side flow language for fleet IR | Active but lower velocity than Velociraptor in the broader community | https://grr-doc.readthedocs.io/ | |
| OSQuery (osquery SQL) | 2014 | Facebook (now Meta) | SQLite-backed virtual tables over OS state; queryable via standard SQL dialect | Active; ifLinuxFoundation osquery foundation since 2019 | https://osquery.readthedocs.io/ |
| YARA-X | 2024 (beta); 1.0.0 stable 2025-06-04 | VirusTotal (Victor Alvarez / plusvic) | Rust rewrite of YARA pattern-match DSL; ~99% rule compatibility with YARA | Stable; powers VT Livehunt + Retrohunt as of 2025 | https://virustotal.github.io/yara-x/ |
| YARA (original C) | ~2008 | VirusTotal | Pattern-match DSL for files/memory | Maintenance mode; new dev moved to YARA-X. Also catalogued in network-protocol-dsls | https://github.com/VirusTotal/yara |
| Sysmon config XML | 2014 | Microsoft Sysinternals (Mark Russinovich, Thomas Garnier) | XML-based detection / inclusion / exclusion rule grammar for Sysmon ETW provider | Active; SwiftOnSecurity and Olaf Hartong configs are de facto baselines | https://learn.microsoft.com/en-us/sysinternals/downloads/sysmon |
| Splunk SPL | ~2004 | Splunk Inc. | Pipe-oriented search-and-stats DSL; the original commercial log-query language | Very active, dominant in enterprise SOCs; SPL2 introduced ~2021 | https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/WhatsInThisManual |
| Splunk CIM (Common Information Model) | ~2010 | Splunk Inc. | Field-name and data-model schema for normalising security telemetry | Active; published as the Splunk CIM Add-on | https://docs.splunk.com/Documentation/CIM/latest/User/Overview |
| Microsoft KQL (Kusto Query Language) | ~2014 (internal); 2018+ public | Microsoft (Azure Data Explorer team) | Pipe-style analytics DSL; powers Sentinel, Defender XDR Advanced Hunting, Azure Monitor | Very active, de facto cloud-SIEM query language at scale | https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-query-language |
| EQL (Event Query Language) | 2018 (Endgame); folded into Elastic 2019 | Endgame Inc. → Elastic | Sequence-and-window correlation DSL for endpoint events; uses sequence by ... with maxspan semantics | Active, part of Elastic Security rule engine | https://www.elastic.co/guide/en/elasticsearch/reference/current/eql.html |
| Lucene query syntax (SIEM use) | 1999 (Lucene); SIEM use 2010s+ | Doug Cutting; widely embedded | Field:value boolean query DSL; Elastic SIEM, OpenSearch, others | Universal in Elastic/OpenSearch stacks | https://lucene.apache.org/core/9_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html |
| OpenC2 (Open Command and Control) | 2017 (OASIS TC); v1.0 in 2019, v2.0 CSD 2024-05 | OASIS OpenC2 TC | JSON-encoded action/target/args grammar for machine-to-machine security automation | Active; v2.0 in committee draft as of mid-2024 | https://openc2.org/ |
Tier 3 family table — E-discovery load files
| Format | First appeared | Origin | Type | Status (2026) | URL |
|---|---|---|---|---|---|
| Concordance DAT | ~1995 | Dataflight (Concordance) | Pilcrow (¶) and thorn (þ) delimited flat file; one row per document | Universal in e-discovery production despite 1990s vintage | https://help.relativity.com/RelativityOne/Content/Relativity/Import_Export/Import_Export_Load_file_specifications.htm |
| Relativity DAT | ~2006 | kCura / Relativity LLC | Concordance-compatible DAT used as Relativity’s import/export format | Dominant in modern e-discovery; Relativity holds the largest market share | https://help.relativity.com/RelativityOne/Content/Relativity/Import_Export/Export_Workflows/Production_load_file_export.htm |
| OPT (Opticon) | ~1996 | Opticon | CSV-style page-image cross-reference; one row per page; pairs with DAT | Universal as the page-image companion to DAT productions | https://help.relativity.com/RelativityOne/Content/Relativity/Import_Export/Import_Export_Load_file_specifications.htm |
| LFP | ~1998 | IPRO / LexisNexis | IPRO LFP image cross-reference (similar role to OPT) | Active in IPRO-anchored workflows | https://my.ipro.com/help/ |
| EDRM XML | 2010 (v1.0); v1.2 ~2012 | EDRM consortium (Duke Law / Relativity) | XML-based load-file alternative to DAT; carried document, family, and metadata | Adopted only sporadically; never displaced DAT in production practice | https://edrm.net/resources/frameworks-and-standards/edrm-xml/ |
| EDRM Load File (CSV) | 2010s | EDRM consortium | CSV-delimited alternative load file | Niche; vendor-specific variants dominate | https://edrm.net/ |
Tier 3 family table — Incident / threat-modelling schemas
| Schema | First appeared | Origin | Type | Status (2026) | URL |
|---|---|---|---|---|---|
| VERIS | 2010 | Verizon RISK team | JSON Schema vocabulary for incident recording; Action/Asset/Actor/Attribute four-axis “A4” model | Active; underpins annual Verizon DBIR; VCDB is the public corpus | https://verisframework.org/ |
| STIX 2.1 | 2014 (STIX 1.x); 2.1 OASIS standard 2021 | OASIS CTI TC (orig. MITRE) | JSON cyber-threat-intel object model | Active; cross-listed in government-civictech | https://oasis-open.github.io/cti-documentation/ |
| TAXII 2.1 | 2014 (TAXII 1.x); 2.1 OASIS 2021 | OASIS CTI TC | HTTPS-based STIX transport protocol | Active; cross-listed in government-civictech | https://oasis-open.github.io/cti-documentation/ |
| Mandiant OpenIOC | 2011 | Mandiant (now Google Cloud) | XML schema for indicators-of-compromise; predates STIX | Legacy but still encountered; cross-listed in government-civictech | https://github.com/mandiant/OpenIOC_1.1 |
| MISP core format | 2013 | Belgian Defence / CIRCL | JSON event-and-attribute model for Malware Information Sharing Platform | Very active; cross-listed in government-civictech | https://www.misp-standard.org/ |
| Sigma | 2016 | Florian Roth, Thomas Patzke | YAML detection-rule DSL; vendor-neutral, compiles to KQL/EQL/SPL/Lucene/etc. | Very active; cross-listed in government-civictech and network-protocol-dsls | https://sigmahq.io/ |
Notable threads
-
CASE/UCO is the long-overdue cross-tool evidence ontology. Until 2018–2022, DFIR had no formally stable cross-vendor representation of “an investigation” — every tool (EnCase, FTK, X-Ways, Autopsy, Magnet AXIOM, Cellebrite) emitted its own report shape. CASE (built on UCO) finally provides JSON-LD classes for Trace, ProvenanceRecord, Action, and Identity that map to investigation primitives. The project was DoJ/NIST/DC3 + MITRE-driven, transitioned to the Linux Foundation Cyber Domain Ontology project in December 2021, and shipped CASE/UCO 1.0.0 in August 2022. Adoption is slow — vendor-internal formats are sticky — but CASE is now the only standard a multi-tool investigation can serialise into without losing semantics. Time-stamped claim 2026-05-10: verify CASE current minor version against caseontology.org before citing in production work.
-
plaso is the open-source super-timeline standard. Forensic timelines used to be built by hand from
mactime,bodyfile, and event-log dumps. log2timeline (Kristinn Gudjonsson, 2009) and then its Python rewrite plaso (2012+) automated the process: 200+ parser plugins for OS artifacts (EVTX, Prefetch, MFT, Plist, SQLite stores, browser histories, $UsnJrnl) all emit into a unified event JSON. Current plaso release is 20260119 (Feb 2026) and a 20260415-pre is in test per the plaso docs. Output formats include plaso-storage, JSONL, Elasticsearch direct push, and L2tCSV. This is the closest the field has to a universal timeline interchange. -
The E01 vs AFF4 container war never resolved. EnCase’s E01 / EWF format won law-enforcement adoption in the late 1990s and never lost it, despite being effectively undocumented (libewf reverse-engineered the spec). AFF (2005) and AFF4 (2009, with v1.0 maturing through NIST-funded work into the late 2010s) won academic and OSS mindshare but never displaced E01 in police labs. AFF4-L (DFRWS 2019) added logical-evidence deduplication semantics but is essentially a single-vendor (Evimetry) format in practice. The net effect: every forensic tool reads E01, every modern tool also reads AFF4, and the “industry standard” is whichever one your acquisition tool emitted.
-
Relativity DAT files are 1990s technology that won the e-discovery war. Despite EDRM XML (2010+, v1.2) being technically superior, the dominant e-discovery production format remains Concordance-style DAT — a pilcrow (¶) and thorn (þ) delimited flat file — paired with OPT (Opticon page-image cross-reference) and folders of natives + TIFFs + extracted-text. Relativity (now Relativity LLC) became the dominant e-discovery platform in the 2010s, and its DAT export is the de facto litigation-data-exchange format. The structural mismatch between this and modern JSON-LD DFIR formats (CASE/UCO) means the two domains barely interoperate.
-
Sigma → vendor-rule conversion is the detection-as-code lingua franca. Sigma (2016) lets a detection author write a single YAML rule that compiles to KQL (Sentinel/Defender), EQL (Elastic Security), SPL (Splunk), Lucene (OpenSearch), Microsoft 365 Defender Advanced Hunting, Chronicle YARA-L, QRadar AQL, and ArcSight. The Sigma backends (pySigma, sigmac legacy) are the actual moneymakers — they’re how a vendor-neutral rule turns into a runnable SIEM detection. There are now public Sigma rule repositories with thousands of community-maintained rules covering MITRE ATT&CK coverage. Cross-listed under government-civictech.
-
KQL is the de facto SIEM query language at cloud scale. Microsoft’s Kusto Query Language emerged from Azure Data Explorer (~2014 internal, public 2018+) and now powers Microsoft Sentinel, Defender XDR Advanced Hunting, Defender for Endpoint Advanced Hunting, Azure Monitor, and Log Analytics. Because Microsoft owns roughly half the enterprise security stack via Defender + Entra + Sentinel, the practical effect is that most production-grade hunting in 2026 is written in KQL. The closest open competitor at similar scale is Splunk SPL; Elastic’s EQL is more specialised for endpoint sequences.
-
Velociraptor VQL is the most powerful open-source live-response DSL. While other endpoint-DFIR tools (GRR, OSQuery) ship a constrained query surface, Velociraptor’s VQL is a near-complete SQL-flavoured language with plugins for filesystem walks, registry parsing, EVTX, memory artifacts, NTFS internals, file carving, YARA scans, sigma-rule evaluation, and remote execution — all composable in a single query. The current release (0.76, March 2026; 0.76.1 April 2026) added a Bleve-backed local full-text search across query results and refactored the CLI to treat artifacts as mini-VQL programs. For incident response on a fleet, nothing else in OSS gets close.
Citations
- DFXML schema repository: https://github.com/dfxml-working-group/dfxml_schema
- plaso documentation (current 20260119): https://plaso.readthedocs.io/
- plaso releases (GitHub): https://github.com/log2timeline/plaso/releases
- AFF4 Standard: https://github.com/aff4/Standard
- AFF4-L DFRWS 2019 paper: https://dfrws.org/wp-content/uploads/2019/06/2019_USA_paper-aff4_l_a_scalable_open_logical_evidence_container.pdf
- CASE Community: https://caseontology.org/
- UCO Community: https://unifiedcyberontology.org/
- Linux Foundation CDO press release (UCO/CASE transition, Dec 2021): https://cyberdomainontology.org/2021/12/07/UCO-transitions-to-LF.html
- Velociraptor VQL docs: https://docs.velociraptor.app/docs/vql/
- Velociraptor 0.76 release notes (Mar 2026): https://docs.velociraptor.app/blog/2026/2026-03-10-release-notes-0.76/
- YARA-X 1.0.0 stable announcement (2025-06-04, VirusTotal blog): https://blog.virustotal.com/2025/06/yara-x-100-stable-release-and-its.html
- YARA-X documentation: https://virustotal.github.io/yara-x/
- VERIS framework: https://verisframework.org/
- VERIS GitHub (schema + enums): https://github.com/vz-risk/veris
- Verizon DBIR (2026 edition): https://www.verizon.com/business/resources/reports/dbir/
- Microsoft Defender XDR Advanced Hunting (KQL): https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-query-language
- Elastic EQL: https://www.elastic.co/guide/en/elasticsearch/reference/current/eql.html
- Splunk SPL: https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/WhatsInThisManual
- Sysmon (Sysinternals): https://learn.microsoft.com/en-us/sysinternals/downloads/sysmon
- SigmaHQ: https://sigmahq.io/
- OpenC2 Language Specification v2.0 (CSD, 2024-05): https://docs.oasis-open.org/openc2/oc2ls/v2.0/oc2ls-v2.0.html
- OASIS OpenC2 TC: https://openc2.org/
- Relativity load file specs: https://help.relativity.com/RelativityOne/Content/Relativity/Import_Export/Import_Export_Load_file_specifications.htm
- EDRM (Electronic Discovery Reference Model): https://edrm.net/
- Sleuth Kit API docs: https://www.sleuthkit.org/sleuthkit/docs/api-docs/
- Autopsy: https://www.autopsy.com/
- Volatility 3 docs: https://volatility3.readthedocs.io/
- OSQuery documentation: https://osquery.readthedocs.io/
- GRR Rapid Response: https://grr-doc.readthedocs.io/
- libewf (E01/EWF reader): https://github.com/libyal/libewf
- MAEC: https://maecproject.github.io/
- MISP core format: https://www.misp-standard.org/