Skip to content

Commit 28a882c

Browse files
authored
feat: add support for complex types in dicts and lists (#483)
1 parent 9dfdbfb commit 28a882c

17 files changed

+1561
-171
lines changed

CHANGELOG.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ Change Log
1616
Unreleased
1717
__________
1818

19+
* Added support for complex types in dictionaries and lists.
20+
1921
[10.5.0] - 2025-08-19
2022
---------------------
2123

docs/how-tos/add-event-bus-support-to-an-event.rst

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,11 +51,56 @@ Complex Data Types
5151
--------------------
5252

5353
- Type-annotated Lists (e.g., ``List[int]``, ``List[str]``)
54+
- Type-annotated Dictionaries (e.g., ``Dict[str, int]``, ``Dict[str, str]``)
5455
- Attrs Classes (e.g., ``UserNonPersonalData``, ``UserPersonalData``, ``UserData``, ``CourseData``)
5556
- Types with Custom Serializers (e.g., ``CourseKey``, ``datetime``)
57+
- Nested Complex Types:
58+
59+
- Lists containing dictionaries (e.g., ``List[Dict[str, int]]``)
60+
- Dictionaries containing lists (e.g., ``Dict[str, List[int]]``)
61+
- Lists containing attrs classes (e.g., ``List[UserData]``)
62+
- Dictionaries containing attrs classes (e.g., ``Dict[str, CourseData]``)
5663

5764
Ensure that the :term:`Event Payload` is structured as `attrs data classes`_ and that the data types used in those classes align with the event bus schema format.
5865

66+
Examples of Complex Data Types Usage
67+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68+
69+
Here are practical examples of how to use the supported complex data types in your event payloads:
70+
71+
.. code-block:: python
72+
73+
# Example 1: Event with type-annotated dictionaries and lists
74+
@attr.s(frozen=True)
75+
class CourseMetricsData:
76+
"""
77+
Course metrics with complex data structures.
78+
"""
79+
# Simple dictionary with string keys and integer values
80+
enrollment_counts = attr.ib(type=dict[str, int], factory=dict)
81+
82+
# List of dictionaries
83+
grade_distributions = attr.ib(type=List[dict[str, float]], factory=list)
84+
85+
# Dictionary containing lists
86+
student_groups = attr.ib(type=dict[str, List[str]], factory=dict)
87+
88+
89+
# Example 2: Event with nested attrs classes
90+
@attr.s(frozen=True)
91+
class BatchOperationData:
92+
"""
93+
Batch operation with collections of user data.
94+
"""
95+
# List of attrs classes
96+
affected_users = attr.ib(type=List[UserData], factory=list)
97+
98+
# Dictionary mapping course IDs to course data
99+
courses_mapping = attr.ib(type=dict[str, CourseData], factory=dict)
100+
101+
# Complex nested structure
102+
operation_results = attr.ib(type=dict[str, List[dict[str, bool]]], factory=dict)
103+
59104
In the ``data.py`` files within each architectural subdomain, you can find examples of the :term:`Event Payload` structured as `attrs data classes`_ that align with the event bus schema format.
60105

61106
Step 3: Ensure Serialization and Deserialization
@@ -103,7 +148,12 @@ After implementing the serializer, add it to ``DEFAULT_CUSTOM_SERIALIZERS`` at t
103148
Now, the :term:`Event Payload` can be serialized and deserialized correctly when sent across services.
104149

105150
.. warning::
106-
One of the known limitations of the current Open edX Event Bus is that it does not support dictionaries as data types. If the :term:`Event Payload` contains dictionaries, you may need to refactor the :term:`Event Payload` to use supported data types. When you know the structure of the dictionary, you can create an attrs class that represents the dictionary structure. If not, you can use a str type to represent the dictionary as a string and deserialize it on the consumer side using JSON deserialization.
151+
The Open edX Event Bus supports type-annotated dictionaries (e.g., ``Dict[str, int]``) and complex nested types. However, dictionaries **without type annotations** are still not supported. Always use proper type annotations for dictionaries and lists in your :term:`Event Payload`. For example:
152+
153+
- ✅ Supported: ``Dict[str, int]``, ``List[Dict[str, str]]``, ``Dict[str, UserData]``
154+
- ❌ Not supported: ``dict``, ``list``, ``Dict`` (without type parameters)
155+
156+
If you need to use unstructured data, consider creating an attrs class that represents the data structure.
107157

108158
Step 4: Generate the Avro Schema
109159
====================================

docs/how-tos/create-a-new-event.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,11 @@ In our example, the event definition and payload for the enrollment event could
165165
- Try using nested data classes to group related data together. This will help maintain consistency and make the event more readable. For instance, in the above example, we have grouped the data into User, Course, and Enrollment data.
166166
- Try reusing existing data classes if possible to avoid duplicating data classes. This will help maintain consistency and reduce the chances of introducing errors. You can review the existing data classes in :ref:`Data Attributes` to see if there is a data class that fits your use case.
167167
- Each field in the payload should be documented with a description of what the field represents and the data type it should contain. This will help consumers understand the payload and react to the event. You should be able to justify why each field is included in the payload and how it relates to the event.
168+
- Use type-annotated complex data types when needed. The event bus supports dictionaries and lists with proper type annotations:
169+
170+
- ``Dict[str, int]`` for dictionaries with string keys and integer values.
171+
- ``List[UserData]`` for lists containing attrs classes.
172+
- ``Dict[str, List[str]]`` for nested complex structures.
168173
- Use defaults for optional fields in the payload to ensure its consistency in all cases.
169174

170175
.. note:: When defining the payload, enforce :ref:`Event Bus` compatibility by ensuring that the data types used in the payload align with the event bus schema format. This will help ensure that the event can be sent by the producer and then be re-emitted by the same instance of `OpenEdxPublicSignal`_ on the consumer side, guaranteeing that the data sent and received is identical. For more information about adding event bus support to an event, refer to :ref:`Add Event Bus Support`.

openedx_events/event_bus/avro/deserializer.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def _deserialized_avro_record_dict_to_object(data: dict, data_type, deserializer
4141
return deserializer(data)
4242
elif data_type in PYTHON_TYPE_TO_AVRO_MAPPING:
4343
return data
44-
elif data_type_origin == list:
44+
elif data_type_origin is list:
4545
# Returns types of list contents.
4646
# Example: if data_type == List[int], arg_data_type = (int,)
4747
arg_data_type = get_args(data_type)
@@ -52,7 +52,11 @@ def _deserialized_avro_record_dict_to_object(data: dict, data_type, deserializer
5252
# Check whether list items type is in basic types.
5353
if arg_data_type[0] in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING:
5454
return data
55-
elif data_type_origin == dict:
55+
56+
# Complex nested types like List[List[...]], List[Dict[...]], etc.
57+
item_type = arg_data_type[0]
58+
return [_deserialized_avro_record_dict_to_object(sub_data, item_type, deserializers) for sub_data in data]
59+
elif data_type_origin is dict:
5660
# Returns types of dict contents.
5761
# Example: if data_type == Dict[str, int], arg_data_type = (str, int)
5862
arg_data_type = get_args(data_type)
@@ -63,6 +67,17 @@ def _deserialized_avro_record_dict_to_object(data: dict, data_type, deserializer
6367
# Check whether dict items type is in basic types.
6468
if all(arg in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING for arg in arg_data_type):
6569
return data
70+
71+
# Complex dict values that need recursive deserialization
72+
key_type, value_type = arg_data_type
73+
if key_type is not str:
74+
raise TypeError("Avro maps only support string keys. The key type must be 'str'.")
75+
76+
# Complex nested types like Dict[str, Dict[...]], Dict[str, List[...]], etc.
77+
return {
78+
key: _deserialized_avro_record_dict_to_object(value, value_type, deserializers)
79+
for key, value in data.items()
80+
}
6681
elif hasattr(data_type, "__attrs_attrs__"):
6782
transformed = {}
6883
for attribute in data_type.__attrs_attrs__:

openedx_events/event_bus/avro/schema.py

Lines changed: 140 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,7 @@
44
TODO: Handle optional parameters and allow for schema evolution. https://github.com/edx/edx-arch-experiments/issues/53
55
"""
66

7-
8-
from typing import get_args, get_origin
7+
from typing import Any, Type, get_args, get_origin
98

109
from .custom_serializers import DEFAULT_CUSTOM_SERIALIZERS
1110
from .types import PYTHON_TYPE_TO_AVRO_MAPPING, SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING
@@ -74,37 +73,19 @@ def _create_avro_field_definition(data_key, data_type, previously_seen_types,
7473
raise Exception("Unable to generate Avro schema for dict or array fields without annotation types.")
7574
avro_type = PYTHON_TYPE_TO_AVRO_MAPPING[data_type]
7675
field["type"] = avro_type
77-
elif data_type_origin == list:
78-
# Returns types of list contents.
79-
# Example: if data_type == List[int], arg_data_type = (int,)
80-
arg_data_type = get_args(data_type)
81-
if not arg_data_type:
82-
raise TypeError(
83-
"List without annotation type is not supported. The argument should be a type, for eg., List[int]"
84-
)
85-
avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(arg_data_type[0])
86-
if avro_type is None:
87-
raise TypeError(
88-
"Only following types are supported for list arguments:"
89-
f" {set(SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.keys())}"
90-
)
91-
field["type"] = {"type": PYTHON_TYPE_TO_AVRO_MAPPING[data_type_origin], "items": avro_type}
92-
elif data_type_origin == dict:
93-
# Returns types of dict contents.
94-
# Example: if data_type == Dict[str, int], arg_data_type = (str, int)
95-
arg_data_type = get_args(data_type)
96-
if not arg_data_type:
97-
raise TypeError(
98-
"Dict without annotation type is not supported. The argument should be a type, for eg., Dict[str, int]"
99-
)
100-
avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(arg_data_type[1])
101-
if avro_type is None:
102-
raise TypeError(
103-
"Only following types are supported for dict arguments:"
104-
f" {set(SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.keys())}"
105-
)
106-
field["type"] = {"type": PYTHON_TYPE_TO_AVRO_MAPPING[data_type_origin], "values": avro_type}
107-
# Case 3: data_type is an attrs class
76+
# Case 3: data_type is a list (possibly with complex items)
77+
elif data_type_origin is list:
78+
item_avro_type = _get_avro_type_for_list_item(
79+
data_type, previously_seen_types, all_field_type_overrides
80+
)
81+
field["type"] = {"type": "array", "items": item_avro_type}
82+
# Case 4: data_type is a dictionary (possibly with complex values)
83+
elif data_type_origin is dict:
84+
item_avro_type = _get_avro_type_for_dict_item(
85+
data_type, previously_seen_types, all_field_type_overrides
86+
)
87+
field["type"] = {"type": "map", "values": item_avro_type}
88+
# Case 5: data_type is an attrs class
10889
elif hasattr(data_type, "__attrs_attrs__"):
10990
# Inner Attrs Class
11091

@@ -135,3 +116,129 @@ def _create_avro_field_definition(data_key, data_type, previously_seen_types,
135116
single_type = field["type"]
136117
field["type"] = ["null", single_type]
137118
return field
119+
120+
121+
def _get_avro_type_for_dict_item(
122+
data_type: Type[dict], previously_seen_types: set, type_overrides: dict[Any, str]
123+
) -> str | dict[str, str]:
124+
"""
125+
Determine the Avro type definition for a dictionary value based on its Python type.
126+
127+
This function converts Python dictionary value types to their corresponding
128+
Avro type representations. It supports simple types, complex nested types (like
129+
dictionaries and lists), and custom classes decorated with attrs.
130+
131+
Args:
132+
data_type (Type[dict]): The Python dictionary type with its type annotation
133+
(e.g., Dict[str, str], Dict[str, int], Dict[str, List[str]])
134+
previously_seen_types (set): Set of type names that have already been
135+
processed, used to prevent duplicate record definitions
136+
type_overrides (dict[Any, str]): Dictionary mapping custom Python types to
137+
their Avro type representations
138+
139+
Returns:
140+
One of the following Avro type representations:
141+
- A string (e.g., "string", "int", "boolean") for simple types
142+
- A dictionary with a complex type definition for container types, such as:
143+
- {"type": "array", "items": <avro_type>} for lists
144+
- {"type": "map", "values": <avro_type>} for nested dictionaries
145+
- {"name": "<TypeName>", "type": "record", "fields": [...]} for attrs classes
146+
- A string with a record name for previously defined record types
147+
148+
Raises:
149+
TypeError: If the dictionary has no type annotation, has non-string keys,
150+
or contains unsupported value types
151+
"""
152+
# Validate dict has type annotation
153+
# Example: if data_type == Dict[str, int], arg_data_type = (str, int)
154+
arg_data_type = get_args(data_type)
155+
if not arg_data_type:
156+
raise TypeError(
157+
"Dict without annotation type is not supported. The argument should be a type, e.g. Dict[str, int]"
158+
)
159+
160+
value_type = arg_data_type[1]
161+
162+
# Case 1: Simple types mapped in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING
163+
avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(value_type)
164+
if avro_type is not None:
165+
return avro_type
166+
167+
# Case 2: Complex types (dict, list, or attrs class)
168+
if get_origin(value_type) in (dict, list) or hasattr(value_type, "__attrs_attrs__"):
169+
# Create a temporary field for the value type and extract its type definition
170+
temp_field = _create_avro_field_definition("temp", value_type, previously_seen_types, type_overrides)
171+
return temp_field["type"]
172+
173+
# Case 3: Unannotated containers (raise specific errors)
174+
if value_type is dict:
175+
raise TypeError("A Dictionary as a dictionary value should have a type annotation.")
176+
if value_type is list:
177+
raise TypeError("A List as a dictionary value should have a type annotation.")
178+
179+
# Case 4: Unsupported types
180+
raise TypeError(f"Type {value_type} is not supported for dict values.")
181+
182+
183+
def _get_avro_type_for_list_item(
184+
data_type: Type[list], previously_seen_types: set, type_overrides: dict[Any, str]
185+
) -> str | dict[str, str]:
186+
"""
187+
Determine the Avro type definition for a list item based on its Python type.
188+
189+
This function handles conversion of various Python types that can be
190+
contained within a list to their corresponding Avro type representations.
191+
It supports simple types, complex nested types (like dictionaries and lists),
192+
and custom classes decorated with attrs.
193+
194+
Args:
195+
data_type (Type[list]): The Python list type with its type annotation
196+
(e.g., List[str], List[int], List[Dict[str, str]], etc.)
197+
previously_seen_types (set): Set of type names that have already been
198+
processed, used to prevent duplicate record definitions
199+
type_overrides (dict[Any, str]): Dictionary mapping custom Python types
200+
to their Avro type representations
201+
202+
Returns:
203+
One of the following Avro type representations:
204+
- A string (e.g., "string", "long", "boolean") for simple types
205+
- A dictionary with a complex type definition for container types, such as:
206+
- {"type": "array", "items": <avro_type>} for lists
207+
- {"type": "map", "values": <avro_type>} for dictionaries
208+
- {"name": "<TypeName>", "type": "record", "fields": [...]} for attrs classes
209+
- A string with a record name for previously defined record types
210+
211+
Raises:
212+
TypeError: If the list has no type annotation, contains unsupported
213+
types, or contains containers (dict, list) without proper type
214+
annotations
215+
"""
216+
# Validate list has type annotation
217+
# Example: if data_type == List[int], arg_data_type = (int,)
218+
arg_data_type = get_args(data_type)
219+
if not arg_data_type:
220+
raise TypeError(
221+
"List without annotation type is not supported. The argument should be a type, e.g. List[int]"
222+
)
223+
224+
item_type = arg_data_type[0]
225+
226+
# Case 1: Simple types mapped in SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING
227+
avro_type = SIMPLE_PYTHON_TYPE_TO_AVRO_MAPPING.get(item_type)
228+
if avro_type is not None:
229+
return avro_type
230+
231+
# Case 2: Complex types (dict, list, or attrs class)
232+
if get_origin(item_type) in (dict, list) or hasattr(item_type, "__attrs_attrs__"):
233+
# Create a temporary field for the value type and extract its type definition
234+
temp_field = _create_avro_field_definition("temp", item_type, previously_seen_types, type_overrides)
235+
return temp_field["type"]
236+
237+
# Case 3: Unannotated containers (raise specific errors)
238+
if item_type is dict:
239+
raise TypeError("A Dictionary as a list item should have a type annotation.")
240+
if item_type is list:
241+
raise TypeError("A List as a list item should have a type annotation.")
242+
243+
# Case 4: Unsupported types
244+
raise TypeError(f"Type {item_type} is not supported for list items.")
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
{
2+
"name": "CloudEvent",
3+
"type": "record",
4+
"doc": "Avro Event Format for CloudEvents created with openedx_events/schema",
5+
"fields": [
6+
{
7+
"name": "course_notification_data",
8+
"type": {
9+
"name": "CourseNotificationData",
10+
"type": "record",
11+
"fields": [
12+
{
13+
"name": "course_key",
14+
"type": "string"
15+
},
16+
{
17+
"name": "app_name",
18+
"type": "string"
19+
},
20+
{
21+
"name": "notification_type",
22+
"type": "string"
23+
},
24+
{
25+
"name": "content_url",
26+
"type": "string"
27+
},
28+
{
29+
"name": "content_context",
30+
"type": {
31+
"type": "map",
32+
"values": "string"
33+
}
34+
},
35+
{
36+
"name": "audience_filters",
37+
"type": {
38+
"type": "map",
39+
"values": {
40+
"type": "array",
41+
"items": "string"
42+
}
43+
}
44+
}
45+
]
46+
}
47+
}
48+
],
49+
"namespace": "org.openedx.learning.course.notification.requested.v1"
50+
}

0 commit comments

Comments
 (0)