CSV Field Reference Generator

Overview

The CSV Generator (generators/csv_generator.py) produces a spreadsheet-compatible field reference for all ECS fields. It exports field definitions to a simple CSV (Comma-Separated Values) format that can be easily imported into spreadsheet applications, databases, or custom analysis tools.

Purpose

This generator creates a human-readable, machine-parseable field catalog that's useful for:

Quick Reference - Search and filter fields in Excel/Google Sheets
Data Analysis - Analyze field usage patterns and statistics
Integration - Parse for custom tooling and automation
Documentation - Include in presentations or reports
Version Comparison - Diff CSV files to see field changes

The CSV format is intentionally simple and widely compatible, making ECS field data accessible to anyone with a spreadsheet application.

Architecture

High-Level Flow

┌─────────────────────────────────────────────────────────────────┐
│                     generator.py (main)                         │
│                                                                 │
│  Load → Clean → Finalize → Generate Intermediate Files          │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│            intermediate_files.generate()                        │
│                                                                 │
│  Returns: (nested, flat)                                        │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼ flat dictionary
┌─────────────────────────────────────────────────────────────────┐
│              csv_generator.generate()                           │
│  1. base_first() - Sort fields (base fields first)              │
│  2. save_csv() - Write CSV with header + field rows             │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Output: fields.csv                           │
│                                                                 │
│  ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization   │
│  8.11.0,true,base,@timestamp,date,core,,2016-05-23...           │
│  8.11.0,true,http,http.request.method,keyword,extended,...      │
│  ...                                                            │
└─────────────────────────────────────────────────────────────────┘

Key Components

1. generate()

Entry Point: generate(ecs_flat, version, out_dir)

Orchestrates CSV generation:

Creates output directory
Sorts fields appropriately
Writes CSV file

2. base_first()

Purpose: Sort fields for readable output

Logic:

Base fields (no dots): @timestamp, message, tags, etc.
All other fields alphabetically: agent., as., client.*, ...

Rationale: Base fields are foundational and referenced frequently, so they appear at the top for easy access.

3. save_csv()

Purpose: Write field data to CSV format

Features:

Header row with column names
One row per field (plus multi-fields)
Multi-fields get separate rows
Consistent quoting and line endings

CSV Structure

Columns

Column	Description	Example Values
ECS_Version	Version of ECS	8.11.0, 8.11.0+exp
Indexed	Whether field is indexed	true, false
Field_Set	Fieldset name	base, http, user, agent
Field	Full dotted field name	@timestamp, http.request.method
Type	Elasticsearch field type	keyword, long, ip, date
Level	Field level	core, extended, custom
Normalization	Normalization rules	array, to_lower, array, to_lower
Example	Example value	GET, 192.0.2.1, 2016-05-23...
Description	Short field description	HTTP request method, User email

Field Set Logic

Base fields (no dots in name): field_set = 'base'
- Examples: @timestamp, message, tags, labels
Other fields: field_set = first part before dot
- http.request.method → field_set = 'http'
- user.email → field_set = 'user'

Multi-Fields

Fields with multi-fields (alternate representations) get additional rows:

8.11.0,true,event,message,match_only_text,core,,Hello world,Log message
8.11.0,true,event,message.text,match_only_text,core,,,Log message

Multi-field rows:

Share version, indexed, field_set, level, description
Have unique field name and type
Have empty normalization and example

Example Output

ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization,Example,Description
8.11.0,true,base,@timestamp,date,core,,2016-05-23T08:05:34.853Z,Date/time when the event originated
8.11.0,true,base,message,match_only_text,core,,Hello World,Log message optimized for viewing
8.11.0,true,base,message.text,match_only_text,core,,,Log message optimized for viewing
8.11.0,true,base,tags,keyword,core,array,"production, eu-west-1",List of keywords for event
8.11.0,true,agent,agent.build.original,keyword,core,,,Extended build information
8.11.0,true,agent,agent.ephemeral_id,keyword,extended,,8a4f500f,Ephemeral identifier
8.11.0,true,agent,agent.id,keyword,core,,8a4f500d,Unique agent identifier
8.11.0,true,http,http.request.body.bytes,long,extended,,1437,Request body size in bytes
8.11.0,true,http,http.request.method,keyword,extended,array,GET,HTTP request method
8.11.0,true,http,http.response.status_code,long,extended,,404,HTTP response status code

Usage Examples

See README.md for generator invocation commands.

Programmatic Usage

from generators.csv_generator import generate
from generators.intermediate_files import generate as gen_intermediate

# Generate intermediate files
nested, flat = gen_intermediate(fields, 'generated/ecs', True)

# Generate CSV
generate(flat, '8.11.0', 'generated')
# Creates generated/csv/fields.csv

Analyzing Field Data

Count fields by type:

import csv
from collections import Counter

with open('generated/csv/fields.csv') as f:
    reader = csv.DictReader(f)
    types = Counter(row['Type'] for row in reader)

print("Field types:")
for field_type, count in types.most_common():
    print(f"  {field_type}: {count}")

Find all extended-level fields:

import csv

with open('generated/csv/fields.csv') as f:
    reader = csv.DictReader(f)
    extended = [row for row in reader if row['Level'] == 'extended']

print(f"Extended fields: {len(extended)}")
for field in extended[:5]:
    print(f"  {field['Field']}")

Fields by fieldset:

import csv
from collections import defaultdict

with open('generated/csv/fields.csv') as f:
    reader = csv.DictReader(f)
    by_fieldset = defaultdict(list)
    for row in reader:
        by_fieldset[row['Field_Set']].append(row['Field'])

for fieldset in sorted(by_fieldset):
    print(f"{fieldset}: {len(by_fieldset[fieldset])} fields")

Making Changes

Adding New Columns

To add a new column to the CSV:

Update header row:

schema_writer.writerow([
    "ECS_Version", "Indexed", "Field_Set", "Field",
    "Type", "Level", "Normalization", "Example", "Description",
    "New_Column"  # Add here
])

Add to data rows:

schema_writer.writerow([
    version,
    indexed,
    field_set,
    field['flat_name'],
    field['type'],
    field['level'],
    ', '.join(field['normalize']),
    field.get('example', ''),
    field['short'],
    field.get('new_property', 'default_value')  # Add here
])

Update multi-field rows similarly
Update documentation in this file

Changing Field Sorting

To change sort order:

def base_first(ecs_flat: Dict[str, Field]) -> List[Field]:
    # Custom sorting logic
    fields_list = list(ecs_flat.values())

    # Sort by level, then name
    return sorted(fields_list, key=lambda f: (f['level'], f['flat_name']))

    # Or by fieldset, then name
    return sorted(fields_list, key=lambda f: (f['flat_name'].split('.')[0], f['flat_name']))

Changing CSV Format

To modify CSV formatting:

schema_writer = csv.writer(
    csvfile,
    delimiter=';',  # Use semicolon instead
    quoting=csv.QUOTE_ALL,  # Quote all fields
    quotechar='"',
    lineterminator='\r\n'  # Windows line endings
)

Filtering Fields

To exclude certain fields:

def generate(ecs_flat: Dict[str, Field], version: str, out_dir: str) -> None:
    ecs_helpers.make_dirs(join(out_dir, 'csv'))

    # Filter out custom fields
    filtered = {k: v for k, v in ecs_flat.items() if v['level'] != 'custom'}

    sorted_fields = base_first(filtered)
    save_csv(join(out_dir, 'csv/fields.csv'), sorted_fields, version)

Troubleshooting

Common Issues

Missing multi-fields

Symptom: Multi-fields not appearing in CSV

Check:

# Verify field has multi_fields
field = flat['message']
print('multi_fields' in field)
print(field.get('multi_fields'))

# Check multi-field structure
if 'multi_fields' in field:
    for mf in field['multi_fields']:
        print(f"  {mf['flat_name']}: {mf['type']}")

Empty normalization column

Symptom: Normalization column is always empty

Check field definitions have normalize key:

field = flat['some.field']
print(field.get('normalize', []))  # Should be a list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV Field Reference Generator

Overview

Purpose

Architecture

High-Level Flow

Key Components

1. generate()

2. base_first()

3. save_csv()

CSV Structure

Columns

Field Set Logic

Multi-Fields

Example Output

Usage Examples

Programmatic Usage

Analyzing Field Data

Making Changes

Adding New Columns

Changing Field Sorting

Changing CSV Format

Filtering Fields

Troubleshooting

Common Issues

Missing multi-fields

Empty normalization column

References

FilesExpand file tree

csv-generator.md

Latest commit

History

csv-generator.md

File metadata and controls

CSV Field Reference Generator

Overview

Purpose

Architecture

High-Level Flow

Key Components

1. generate()

2. base_first()

3. save_csv()

CSV Structure

Columns

Field Set Logic

Multi-Fields

Example Output

Usage Examples

Programmatic Usage

Analyzing Field Data

Making Changes

Adding New Columns

Changing Field Sorting

Changing CSV Format

Filtering Fields

Troubleshooting

Common Issues

Missing multi-fields

Empty normalization column

References