Skip to content

feat: extract timestamps from log lines #23

@STRRL

Description

@STRRL

Background

LAPP currently has a multiline detector that identifies whether a line starts with a timestamp (via token graph probability matching). However, it does not extract the actual time.Time value from the detected timestamp region.

The DuckDB store has a timestamp column, but it's not populated with parsed values from the log content.

Proposal

Add a timestamp extraction step that:

  1. Reuses the existing detector's knowledge of which tokens at the line start are a timestamp
  2. Parses those tokens into a time.Time value
  3. Populates the timestamp field in the store

Approach

  • The detector already scans the first N bytes and identifies timestamp tokens. After detection, attempt to parse the matched region against the known formats in knownTimestampFormats.
  • Consider using a library like dateparse for flexible parsing, or build a format-matching table from the existing knownTimestampFormats list.
  • This enables time-based filtering, time-range queries, and time-series visualization of log patterns (similar to Loki's Drain Chunks).

Reference

Loki's Drain implementation attaches Chunks (time-series samples) to each log cluster, enabling pattern frequency over time. With extracted timestamps, LAPP could do the same — track how often each pattern appears over time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions