Skip to content

Latest commit

Β 

History

History
318 lines (239 loc) Β· 10.9 KB

File metadata and controls

318 lines (239 loc) Β· 10.9 KB

CSV Field Reference Generator

Overview

The CSV Generator (generators/csv_generator.py) produces a spreadsheet-compatible field reference for all ECS fields. It exports field definitions to a simple CSV (Comma-Separated Values) format that can be easily imported into spreadsheet applications, databases, or custom analysis tools.

Purpose

This generator creates a human-readable, machine-parseable field catalog that's useful for:

  1. Quick Reference - Search and filter fields in Excel/Google Sheets
  2. Data Analysis - Analyze field usage patterns and statistics
  3. Integration - Parse for custom tooling and automation
  4. Documentation - Include in presentations or reports
  5. Version Comparison - Diff CSV files to see field changes

The CSV format is intentionally simple and widely compatible, making ECS field data accessible to anyone with a spreadsheet application.

Architecture

High-Level Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     generator.py (main)                         β”‚
β”‚                                                                 β”‚
β”‚  Load β†’ Clean β†’ Finalize β†’ Generate Intermediate Files          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            intermediate_files.generate()                        β”‚
β”‚                                                                 β”‚
β”‚  Returns: (nested, flat)                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό flat dictionary
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              csv_generator.generate()                           β”‚
β”‚  1. base_first() - Sort fields (base fields first)              β”‚
β”‚  2. save_csv() - Write CSV with header + field rows             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Output: fields.csv                           β”‚
β”‚                                                                 β”‚
β”‚  ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization   β”‚
β”‚  8.11.0,true,base,@timestamp,date,core,,2016-05-23...           β”‚
β”‚  8.11.0,true,http,http.request.method,keyword,extended,...      β”‚
β”‚  ...                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

1. generate()

Entry Point: generate(ecs_flat, version, out_dir)

Orchestrates CSV generation:

  • Creates output directory
  • Sorts fields appropriately
  • Writes CSV file

2. base_first()

Purpose: Sort fields for readable output

Logic:

  1. Base fields (no dots): @timestamp, message, tags, etc.
  2. All other fields alphabetically: agent., as., client.*, ...

Rationale: Base fields are foundational and referenced frequently, so they appear at the top for easy access.

3. save_csv()

Purpose: Write field data to CSV format

Features:

  • Header row with column names
  • One row per field (plus multi-fields)
  • Multi-fields get separate rows
  • Consistent quoting and line endings

CSV Structure

Columns

Column Description Example Values
ECS_Version Version of ECS 8.11.0, 8.11.0+exp
Indexed Whether field is indexed true, false
Field_Set Fieldset name base, http, user, agent
Field Full dotted field name @timestamp, http.request.method
Type Elasticsearch field type keyword, long, ip, date
Level Field level core, extended, custom
Normalization Normalization rules array, to_lower, array, to_lower
Example Example value GET, 192.0.2.1, 2016-05-23...
Description Short field description HTTP request method, User email

Field Set Logic

  • Base fields (no dots in name): field_set = 'base'
    • Examples: @timestamp, message, tags, labels
  • Other fields: field_set = first part before dot
    • http.request.method β†’ field_set = 'http'
    • user.email β†’ field_set = 'user'

Multi-Fields

Fields with multi-fields (alternate representations) get additional rows:

8.11.0,true,event,message,match_only_text,core,,Hello world,Log message
8.11.0,true,event,message.text,match_only_text,core,,,Log message

Multi-field rows:

  • Share version, indexed, field_set, level, description
  • Have unique field name and type
  • Have empty normalization and example

Example Output

ECS_Version,Indexed,Field_Set,Field,Type,Level,Normalization,Example,Description
8.11.0,true,base,@timestamp,date,core,,2016-05-23T08:05:34.853Z,Date/time when the event originated
8.11.0,true,base,message,match_only_text,core,,Hello World,Log message optimized for viewing
8.11.0,true,base,message.text,match_only_text,core,,,Log message optimized for viewing
8.11.0,true,base,tags,keyword,core,array,"production, eu-west-1",List of keywords for event
8.11.0,true,agent,agent.build.original,keyword,core,,,Extended build information
8.11.0,true,agent,agent.ephemeral_id,keyword,extended,,8a4f500f,Ephemeral identifier
8.11.0,true,agent,agent.id,keyword,core,,8a4f500d,Unique agent identifier
8.11.0,true,http,http.request.body.bytes,long,extended,,1437,Request body size in bytes
8.11.0,true,http,http.request.method,keyword,extended,array,GET,HTTP request method
8.11.0,true,http,http.response.status_code,long,extended,,404,HTTP response status code

Usage Examples

See README.md for generator invocation commands.

Programmatic Usage

from generators.csv_generator import generate
from generators.intermediate_files import generate as gen_intermediate

# Generate intermediate files
nested, flat = gen_intermediate(fields, 'generated/ecs', True)

# Generate CSV
generate(flat, '8.11.0', 'generated')
# Creates generated/csv/fields.csv

Analyzing Field Data

Count fields by type:

import csv
from collections import Counter

with open('generated/csv/fields.csv') as f:
    reader = csv.DictReader(f)
    types = Counter(row['Type'] for row in reader)

print("Field types:")
for field_type, count in types.most_common():
    print(f"  {field_type}: {count}")

Find all extended-level fields:

import csv

with open('generated/csv/fields.csv') as f:
    reader = csv.DictReader(f)
    extended = [row for row in reader if row['Level'] == 'extended']

print(f"Extended fields: {len(extended)}")
for field in extended[:5]:
    print(f"  {field['Field']}")

Fields by fieldset:

import csv
from collections import defaultdict

with open('generated/csv/fields.csv') as f:
    reader = csv.DictReader(f)
    by_fieldset = defaultdict(list)
    for row in reader:
        by_fieldset[row['Field_Set']].append(row['Field'])

for fieldset in sorted(by_fieldset):
    print(f"{fieldset}: {len(by_fieldset[fieldset])} fields")

Making Changes

Adding New Columns

To add a new column to the CSV:

  1. Update header row:
schema_writer.writerow([
    "ECS_Version", "Indexed", "Field_Set", "Field",
    "Type", "Level", "Normalization", "Example", "Description",
    "New_Column"  # Add here
])
  1. Add to data rows:
schema_writer.writerow([
    version,
    indexed,
    field_set,
    field['flat_name'],
    field['type'],
    field['level'],
    ', '.join(field['normalize']),
    field.get('example', ''),
    field['short'],
    field.get('new_property', 'default_value')  # Add here
])
  1. Update multi-field rows similarly

  2. Update documentation in this file

Changing Field Sorting

To change sort order:

def base_first(ecs_flat: Dict[str, Field]) -> List[Field]:
    # Custom sorting logic
    fields_list = list(ecs_flat.values())

    # Sort by level, then name
    return sorted(fields_list, key=lambda f: (f['level'], f['flat_name']))

    # Or by fieldset, then name
    return sorted(fields_list, key=lambda f: (f['flat_name'].split('.')[0], f['flat_name']))

Changing CSV Format

To modify CSV formatting:

schema_writer = csv.writer(
    csvfile,
    delimiter=';',  # Use semicolon instead
    quoting=csv.QUOTE_ALL,  # Quote all fields
    quotechar='"',
    lineterminator='\r\n'  # Windows line endings
)

Filtering Fields

To exclude certain fields:

def generate(ecs_flat: Dict[str, Field], version: str, out_dir: str) -> None:
    ecs_helpers.make_dirs(join(out_dir, 'csv'))

    # Filter out custom fields
    filtered = {k: v for k, v in ecs_flat.items() if v['level'] != 'custom'}

    sorted_fields = base_first(filtered)
    save_csv(join(out_dir, 'csv/fields.csv'), sorted_fields, version)

Troubleshooting

Common Issues

Missing multi-fields

Symptom: Multi-fields not appearing in CSV

Check:

# Verify field has multi_fields
field = flat['message']
print('multi_fields' in field)
print(field.get('multi_fields'))

# Check multi-field structure
if 'multi_fields' in field:
    for mf in field['multi_fields']:
        print(f"  {mf['flat_name']}: {mf['type']}")

Empty normalization column

Symptom: Normalization column is always empty

Check field definitions have normalize key:

field = flat['some.field']
print(field.get('normalize', []))  # Should be a list

References