Skip to content

Use Opensearch for learning resource API results whenever feasible #189

@mbertrand

Description

@mbertrand

Description/Context

Use opensearch for API results for lists when feasible (ie not for lists of learning paths requested by editors, perhaps other exceptions) and perhaps single object results as well.

Plan/Design

Option 1: OpenSearch endpoint for lists, database endpoints for individual objects and learning path CRUD

Continue using the database to return:

  • Published individual learning resource details (url: /api/v1/learning_resources/<pk>/, /api/v1/courses/<pk>/, etc)
  • Published child resources of a resource (url: /api/v1/learning_resources/<pk>/children/). Example: episodes in a podcast, resources in a learning path

Provide a new database-backed API endpoint - /api/v1/learning_path_admin/ - specifically for learning paths CRUD. This will return specific path details as well as lists of learning paths, both published and unpublished. Only users with editing permission will be allowed to access it.

Use opensearch for all read-only learning resource list endpoints (published only):

  • /api/v1/learning_resources/, /api/v1/courses/, /api/v1/programs/, etc
  • /api/v1/learning_resources/new/ (need to add created_on values to the index and sort by that)
  • api/v1/learning_resources/upcoming/ (need to build an opensearch query filtering by runs.published and (runs.start_date or runs.enrollment_start) then ordering by runs.start_date)

Separate views will be used for each:

  • learning_resources.views.LearningResourceViewSet(RetrieveModelMixin, viewsets.GenericViewSet) for individual objects and their children
  • learning_resources.views.LearningPathAdminViewSet(viewsets.ModelViewSet) for learning path CRUD
  • learning_resources_search.views.LearningResourcesSearchView(ESView) for lists of learning resources.

URL's would have to be loaded in the following order in open_discussions/urls.py:

    re_path(r"", include("learning_resources.urls")),
    re_path(r"", include("learning_resources_search.urls")),

Option 2: OpenSearch endpoint for lists and individual results, database for learning path editing only.

Similar to above, but use opensearch for individual object results and child resources of individual results

  • Would need to adjust serializers so that each child resource document in opensearch includes a parents field indicating the relation (and type of relation) to other resources. For example, /api/v1/podcasts/<id>/episodes could not rely on database relationships to get correct results, would need to query opensearch where parent_id=<id> and relation_type=podcast,

Separate views will be used for each:

  • learning_resources.views.LearningPathViewSet(viewsets.ModelViewSet) for learning path CRUD
  • learning_resources_search.views.LearningResourcesSearchView(ESView) for learning resources lists and individual objects

Option 3: Conditionally switch to an opensearch API view from a database API view, depending on parameters.

  @extend_schema(responses=LearningResourceSerializer(many=True))
  @action(methods=["GET"], detail=False, name="New Resources")
  def new(self, request: Request) -> QuerySet:  # noqa: ARG002
      if self.use_database():  # query param of db=True, or path editor requesting list of all paths
     
          page = self.paginate_queryset(self.get_queryset().order_by("-created_on"))
          serializer = self.get_serializer(page, many=True)
          return self.get_paginated_response(serializer.data)

      return LearningResourcesSearchView().get(request, sortby="-created_on")   
      # or insert the code from the view above here directly.

This does not seem like the best approach, for several reasons:

  • It mixes views from two different apps, which does not seem like good practice

  • Need to ensure that the different views always accept the same query parameters and apply filters consistently.

  • The views may have slightly different outputs. For example, the current search endpoint does not include next and previous page attributes, though maybe it should. And the current database-backed endpoint never includes this data structure:

    "metadata": {
        "aggregations": {},
        "suggest": []
    }
    

    Though again, the outputs could probably be made consistent, up to a point (if using database, can't get actual values for those metadata fields, other than empty ones like above).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions