-
Notifications
You must be signed in to change notification settings - Fork 560
Description
Hello CVE Project Team,
I am consuming the cvelistV5 JSON data and have noticed that multiple different timestamp formats are being used across the records. This inconsistency makes reliable, programmatic parsing difficult and requires consumers to build complex parsers to handle all variations.
Analysis of Timestamp Formats
I ran an analysis script across a full clone of the repository. The script scanned all 315,569 JSON files in the cves/ directory and cataloged the string formats used for timestamp-related keys defined in the CVE 5.x schema.
Summary Statistics
- Total files scanned: 315,569
- Total timestamps found: 1,792,599
- Unique formats detected: 6 different ISO 8601 variations
Detailed Findings by Field
dateUpdated (1,011,798 timestamps):
| Format | Example | Count | Percentage |
|---|---|---|---|
YYYY-MM-DDTHH:MM:SS.ffffffZ |
2023-01-16T15:00:00.123456Z |
800,823 | 79.1% |
YYYY-MM-DDTHH:MM:SS |
2017-09-18T12:57:01 |
192,572 | 19.0% |
YYYY-MM-DDTHH:MM:SSZ |
2023-01-16T15:00:00Z |
10,355 | 1.0% |
YYYY-MM-DDTHH:MM:SS.ffffff |
2023-01-16T15:00:00.123 |
8,048 | 0.8% |
dateReserved (315,567 timestamps):
| Format | Example | Count | Percentage |
|---|---|---|---|
YYYY-MM-DDTHH:MM:SS |
2017-09-18T12:57:01 |
192,506 | 61.0% |
YYYY-MM-DDTHH:MM:SS.ffffffZ |
2023-01-16T15:00:00.123456Z |
112,642 | 35.7% |
YYYY-MM-DDTHH:MM:SSZ |
2023-01-16T15:00:00Z |
10,419 | 3.3% |
datePublished (311,263 timestamps):
| Format | Example | Count | Percentage |
|---|---|---|---|
YYYY-MM-DDTHH:MM:SS |
2017-09-18T12:57:01 |
175,519 | 56.4% |
YYYY-MM-DDTHH:MM:SS.ffffffZ |
2023-01-16T15:00:00.123456Z |
118,726 | 38.1% |
YYYY-MM-DDTHH:MM:SSZ |
2023-01-16T15:00:00Z |
17,018 | 5.5% |
dateRejected (16,076 timestamps):
| Format | Example | Count | Percentage |
|---|---|---|---|
YYYY-MM-DDTHH:MM:SS |
2017-09-18T12:57:01 |
12,918 | 80.4% |
YYYY-MM-DDTHH:MM:SS.ffffffZ |
2023-01-16T15:00:00.123456Z |
3,158 | 19.6% |
dateAssigned (2,820 timestamps):
| Format | Example | Count | Percentage |
|---|---|---|---|
YYYY-MM-DDTHH:MM:SS.ffffff+HH:MM |
2023-01-16T15:00:00.123+00:00 |
1,270 | 45.0% |
YYYY-MM-DDTHH:MM:SS |
2017-09-18T12:57:01 |
1,087 | 38.5% |
YYYY-MM-DDTHH:MM:SS.ffffffZ |
2023-01-16T15:00:00.123456Z |
463 | 16.4% |
datePublic (135,075 timestamps):
| Format | Example | Count | Percentage |
|---|---|---|---|
YYYY-MM-DDTHH:MM:SS |
2017-09-18T12:57:01 |
114,073 | 84.4% |
YYYY-MM-DDTHH:MM:SS.ffffffZ |
2023-01-16T15:00:00.123456Z |
17,084 | 12.6% |
YYYY-MM-DDTHH:MM:SS+HH:MM |
2023-01-16T15:00:00+00:00 |
2,634 | 2.0% |
YYYY-MM-DDTHH:MM:SS.ffffff+HH:MM |
2023-01-16T15:00:00.123+00:00 |
1,270 | 0.9% |
YYYY-MM-DDTHH:MM:SSZ |
2023-01-16T15:00:00Z |
14 | <0.1% |
Key Issues
-
Timezone Inconsistency: A significant portion of timestamps (20-84% depending on field) lack timezone designators, making it ambiguous whether they represent UTC, local time, or an unspecified timezone. The CVE schema description states "If timezone offset is not given, GMT (+00:00) is assumed", but this assumption must be explicitly documented for consumers.
-
Multiple Valid Formats: 6 different valid ISO 8601 formats are in use, requiring parsers to implement fallback logic and format detection.
-
Fractional Seconds Inconsistency: Some timestamps include microsecond precision (
.ffffff) while others don't, even within the same field. -
Mixed Timezone Representations: Some records use
Z(Zulu/UTC), others use explicit offsets like+00:00, and many omit timezone information entirely.
Recommendation
For consistency and to improve machine-readability, I recommend standardizing on a single timestamp format as defined by RFC 3339 / ISO 8601.
Recommended format: YYYY-MM-DDTHH:MM:SS.sssZ
Example: 2025-10-26T14:30:00.123Z
This format:
- ✅ Includes explicit UTC timezone designator (
Z) - ✅ Provides sub-second precision (milliseconds, which can be extended to microseconds if needed)
- ✅ Is unambiguous and widely supported
- ✅ Complies with RFC 3339 and ISO 8601
- ✅ Is the most common format already in use for
dateUpdated(79%)
Alternative (if sub-second precision is not needed): YYYY-MM-DDTHH:MM:SSZ
Example: 2025-10-26T14:30:00Z
Impact on Consumers
Standardizing timestamp formats would:
- Eliminate the need for complex multi-format parsing logic
- Reduce parsing errors and timezone ambiguity
- Improve data quality and interoperability
- Make CVE data easier to consume programmatically
- Align with industry best practices for timestamp representation
Additional Context
- Analysis Tool: I created a Python script that scans the entire cvelistV5 repository and identifies timestamp format patterns. The script is available if you would like to review the methodology.
- Schema Reference: Based on CVE Schema 5.x documentation
- Fields Analyzed:
dateUpdated,dateReserved,datePublished,dateRejected,dateAssigned,datePublic
Thank you for considering this request. Standardizing timestamp formats would greatly benefit all consumers of the cvelistV5 data.