The CSV Generator (generators/csv_generator.py) produces a spreadsheet-compatible field reference for all ECS fields. It exports field definitions to a simple CSV (Comma-Separated Values) format that can be easily imported into spreadsheet applications, databases, or custom analysis tools.
This generator creates a human-readable, machine-parseable field catalog that's useful for:
- Quick Reference - Search and filter fields in Excel/Google Sheets
- Data Analysis - Analyze field usage patterns and statistics
- Integration - Parse for custom tooling and automation
- Documentation - Include in presentations or reports
- Version Comparison - Diff CSV files to see field changes
The CSV format is intentionally simple and widely compatible, making ECS field data accessible to anyone with a spreadsheet application.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β generator.py (main) β
β β
β Load β Clean β Finalize β Generate Intermediate Files β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β intermediate_files.generate() β
β β
β Returns: (nested, flat) β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ flat dictionary
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β csv_generator.generate() β
β 1. base_first() - Sort fields (base fields first) β
β 2. save_csv() - Write CSV with header + field rows β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Output: fields.csv β
β β
β ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization β
β 8.11.0,true,base,@timestamp,date,core,,2016-05-23... β
β 8.11.0,true,http,http.request.method,keyword,extended,... β
β ... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Entry Point: generate(ecs_flat, version, out_dir)
Orchestrates CSV generation:
- Creates output directory
- Sorts fields appropriately
- Writes CSV file
Purpose: Sort fields for readable output
Logic:
- Base fields (no dots): @timestamp, message, tags, etc.
- All other fields alphabetically: agent., as., client.*, ...
Rationale: Base fields are foundational and referenced frequently, so they appear at the top for easy access.
Purpose: Write field data to CSV format
Features:
- Header row with column names
- One row per field (plus multi-fields)
- Multi-fields get separate rows
- Consistent quoting and line endings
| Column | Description | Example Values |
|---|---|---|
| ECS_Version | Version of ECS | 8.11.0, 8.11.0+exp |
| Indexed | Whether field is indexed | true, false |
| Field_Set | Fieldset name | base, http, user, agent |
| Field | Full dotted field name | @timestamp, http.request.method |
| Type | Elasticsearch field type | keyword, long, ip, date |
| Level | Field level | core, extended, custom |
| Normalization | Normalization rules | array, to_lower, array, to_lower |
| Example | Example value | GET, 192.0.2.1, 2016-05-23... |
| Description | Short field description | HTTP request method, User email |
- Base fields (no dots in name): field_set = 'base'
- Examples: @timestamp, message, tags, labels
- Other fields: field_set = first part before dot
- http.request.method β field_set = 'http'
- user.email β field_set = 'user'
Fields with multi-fields (alternate representations) get additional rows:
8.11.0,true,event,message,match_only_text,core,,Hello world,Log message
8.11.0,true,event,message.text,match_only_text,core,,,Log messageMulti-field rows:
- Share version, indexed, field_set, level, description
- Have unique field name and type
- Have empty normalization and example
ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization,Example,Description
8.11.0,true,base,@timestamp,date,core,,2016-05-23T08:05:34.853Z,Date/time when the event originated
8.11.0,true,base,message,match_only_text,core,,Hello World,Log message optimized for viewing
8.11.0,true,base,message.text,match_only_text,core,,,Log message optimized for viewing
8.11.0,true,base,tags,keyword,core,array,"production, eu-west-1",List of keywords for event
8.11.0,true,agent,agent.build.original,keyword,core,,,Extended build information
8.11.0,true,agent,agent.ephemeral_id,keyword,extended,,8a4f500f,Ephemeral identifier
8.11.0,true,agent,agent.id,keyword,core,,8a4f500d,Unique agent identifier
8.11.0,true,http,http.request.body.bytes,long,extended,,1437,Request body size in bytes
8.11.0,true,http,http.request.method,keyword,extended,array,GET,HTTP request method
8.11.0,true,http,http.response.status_code,long,extended,,404,HTTP response status codeSee README.md for generator invocation commands.
from generators.csv_generator import generate
from generators.intermediate_files import generate as gen_intermediate
# Generate intermediate files
nested, flat = gen_intermediate(fields, 'generated/ecs', True)
# Generate CSV
generate(flat, '8.11.0', 'generated')
# Creates generated/csv/fields.csvCount fields by type:
import csv
from collections import Counter
with open('generated/csv/fields.csv') as f:
reader = csv.DictReader(f)
types = Counter(row['Type'] for row in reader)
print("Field types:")
for field_type, count in types.most_common():
print(f" {field_type}: {count}")Find all extended-level fields:
import csv
with open('generated/csv/fields.csv') as f:
reader = csv.DictReader(f)
extended = [row for row in reader if row['Level'] == 'extended']
print(f"Extended fields: {len(extended)}")
for field in extended[:5]:
print(f" {field['Field']}")Fields by fieldset:
import csv
from collections import defaultdict
with open('generated/csv/fields.csv') as f:
reader = csv.DictReader(f)
by_fieldset = defaultdict(list)
for row in reader:
by_fieldset[row['Field_Set']].append(row['Field'])
for fieldset in sorted(by_fieldset):
print(f"{fieldset}: {len(by_fieldset[fieldset])} fields")To add a new column to the CSV:
- Update header row:
schema_writer.writerow([
"ECS_Version", "Indexed", "Field_Set", "Field",
"Type", "Level", "Normalization", "Example", "Description",
"New_Column" # Add here
])- Add to data rows:
schema_writer.writerow([
version,
indexed,
field_set,
field['flat_name'],
field['type'],
field['level'],
', '.join(field['normalize']),
field.get('example', ''),
field['short'],
field.get('new_property', 'default_value') # Add here
])-
Update multi-field rows similarly
-
Update documentation in this file
To change sort order:
def base_first(ecs_flat: Dict[str, Field]) -> List[Field]:
# Custom sorting logic
fields_list = list(ecs_flat.values())
# Sort by level, then name
return sorted(fields_list, key=lambda f: (f['level'], f['flat_name']))
# Or by fieldset, then name
return sorted(fields_list, key=lambda f: (f['flat_name'].split('.')[0], f['flat_name']))To modify CSV formatting:
schema_writer = csv.writer(
csvfile,
delimiter=';', # Use semicolon instead
quoting=csv.QUOTE_ALL, # Quote all fields
quotechar='"',
lineterminator='\r\n' # Windows line endings
)To exclude certain fields:
def generate(ecs_flat: Dict[str, Field], version: str, out_dir: str) -> None:
ecs_helpers.make_dirs(join(out_dir, 'csv'))
# Filter out custom fields
filtered = {k: v for k, v in ecs_flat.items() if v['level'] != 'custom'}
sorted_fields = base_first(filtered)
save_csv(join(out_dir, 'csv/fields.csv'), sorted_fields, version)Symptom: Multi-fields not appearing in CSV
Check:
# Verify field has multi_fields
field = flat['message']
print('multi_fields' in field)
print(field.get('multi_fields'))
# Check multi-field structure
if 'multi_fields' in field:
for mf in field['multi_fields']:
print(f" {mf['flat_name']}: {mf['type']}")Symptom: Normalization column is always empty
Check field definitions have normalize key:
field = flat['some.field']
print(field.get('normalize', [])) # Should be a list