Skip to content

Inconsistent Timestamp Formats in CVE JSON Records #114

@jgamblin

Description

@jgamblin

Hello CVE Project Team,

I am consuming the cvelistV5 JSON data and have noticed that multiple different timestamp formats are being used across the records. This inconsistency makes reliable, programmatic parsing difficult and requires consumers to build complex parsers to handle all variations.

Analysis of Timestamp Formats

I ran an analysis script across a full clone of the repository. The script scanned all 315,569 JSON files in the cves/ directory and cataloged the string formats used for timestamp-related keys defined in the CVE 5.x schema.

Summary Statistics

  • Total files scanned: 315,569
  • Total timestamps found: 1,792,599
  • Unique formats detected: 6 different ISO 8601 variations

Detailed Findings by Field

dateUpdated (1,011,798 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 800,823 79.1%
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 192,572 19.0%
YYYY-MM-DDTHH:MM:SSZ 2023-01-16T15:00:00Z 10,355 1.0%
YYYY-MM-DDTHH:MM:SS.ffffff 2023-01-16T15:00:00.123 8,048 0.8%

dateReserved (315,567 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 192,506 61.0%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 112,642 35.7%
YYYY-MM-DDTHH:MM:SSZ 2023-01-16T15:00:00Z 10,419 3.3%

datePublished (311,263 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 175,519 56.4%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 118,726 38.1%
YYYY-MM-DDTHH:MM:SSZ 2023-01-16T15:00:00Z 17,018 5.5%

dateRejected (16,076 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 12,918 80.4%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 3,158 19.6%

dateAssigned (2,820 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS.ffffff+HH:MM 2023-01-16T15:00:00.123+00:00 1,270 45.0%
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 1,087 38.5%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 463 16.4%

datePublic (135,075 timestamps):

Format Example Count Percentage
YYYY-MM-DDTHH:MM:SS 2017-09-18T12:57:01 114,073 84.4%
YYYY-MM-DDTHH:MM:SS.ffffffZ 2023-01-16T15:00:00.123456Z 17,084 12.6%
YYYY-MM-DDTHH:MM:SS+HH:MM 2023-01-16T15:00:00+00:00 2,634 2.0%
YYYY-MM-DDTHH:MM:SS.ffffff+HH:MM 2023-01-16T15:00:00.123+00:00 1,270 0.9%
YYYY-MM-DDTHH:MM:SSZ 2023-01-16T15:00:00Z 14 <0.1%

Key Issues

  1. Timezone Inconsistency: A significant portion of timestamps (20-84% depending on field) lack timezone designators, making it ambiguous whether they represent UTC, local time, or an unspecified timezone. The CVE schema description states "If timezone offset is not given, GMT (+00:00) is assumed", but this assumption must be explicitly documented for consumers.

  2. Multiple Valid Formats: 6 different valid ISO 8601 formats are in use, requiring parsers to implement fallback logic and format detection.

  3. Fractional Seconds Inconsistency: Some timestamps include microsecond precision (.ffffff) while others don't, even within the same field.

  4. Mixed Timezone Representations: Some records use Z (Zulu/UTC), others use explicit offsets like +00:00, and many omit timezone information entirely.

Recommendation

For consistency and to improve machine-readability, I recommend standardizing on a single timestamp format as defined by RFC 3339 / ISO 8601.

Recommended format: YYYY-MM-DDTHH:MM:SS.sssZ

Example: 2025-10-26T14:30:00.123Z

This format:

  • ✅ Includes explicit UTC timezone designator (Z)
  • ✅ Provides sub-second precision (milliseconds, which can be extended to microseconds if needed)
  • ✅ Is unambiguous and widely supported
  • ✅ Complies with RFC 3339 and ISO 8601
  • ✅ Is the most common format already in use for dateUpdated (79%)

Alternative (if sub-second precision is not needed): YYYY-MM-DDTHH:MM:SSZ

Example: 2025-10-26T14:30:00Z

Impact on Consumers

Standardizing timestamp formats would:

  • Eliminate the need for complex multi-format parsing logic
  • Reduce parsing errors and timezone ambiguity
  • Improve data quality and interoperability
  • Make CVE data easier to consume programmatically
  • Align with industry best practices for timestamp representation

Additional Context

  • Analysis Tool: I created a Python script that scans the entire cvelistV5 repository and identifies timestamp format patterns. The script is available if you would like to review the methodology.
  • Schema Reference: Based on CVE Schema 5.x documentation
  • Fields Analyzed: dateUpdated, dateReserved, datePublished, dateRejected, dateAssigned, datePublic

Thank you for considering this request. Standardizing timestamp formats would greatly benefit all consumers of the cvelistV5 data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions