Skip to content

Conversation

@Ash-Crow
Copy link
Collaborator

Purpose

Allows a user to find more easily the other users they search, with the following order of priority:

  • users they already share documents with (more recent first)
  • users that share the same full email domain
  • users that share the same partial email domain (last two parts)
  • other users

Solves #1521

Proposal

  • Add 2 functions in core/utils.py: users_sharing_documents_with() and extract_email_domain_parts()
  • Use them as keys to sort the results of a basic user search
  • User research through "full" email address (contains the '@') is left unaffected.

@Ash-Crow Ash-Crow force-pushed the sbl-proximity-search branch 3 times, most recently from fb5d1bf to d7cc384 Compare January 14, 2026 14:32
@Ash-Crow Ash-Crow changed the title 🚸(backend) sort user search results by proxmity with the active user 🚸(backend) sort user search results by proximity with the active user Jan 14, 2026
Allows a user to find more easily the other users they search, with
the following order of priority:
- users they already share documents with (more recent first)
- users that share the same full email domain
- users that share the same partial email domain (last two parts)
- other users
@Ash-Crow Ash-Crow force-pushed the sbl-proximity-search branch 3 times, most recently from 19706b0 to 768516c Compare January 14, 2026 14:49
@Ash-Crow Ash-Crow force-pushed the sbl-proximity-search branch from 768516c to 0eb0dc1 Compare January 14, 2026 14:54
@Ash-Crow Ash-Crow marked this pull request as ready for review January 14, 2026 14:58
@Ash-Crow Ash-Crow requested a review from lunika January 14, 2026 14:59
Copy link
Member

@lunika lunika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great implementation

Copy link
Member

@qbey qbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting:) I like the python instead of SQL approach but I wonder if it will not bring too much overhead. I guess it could be ok for a first version but:

  • is the "closer" compute not too much? I mean if you put the user you share something with, wouldn't that be lighter to compute and enough? Could be a first step, before bringing the heavy machine gun.
  • we need to monitor the function execution time to see if at some point or for some users it starts to be long (either logger.info or prometheus)

Anyway, this is just a review to help (I know I only ask more questions and did not bring any solution ^^)

return re.findall(enums.MEDIA_STORAGE_URL_EXTRACT, xml_content)


def users_sharing_documents_with(user):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit afraid of this, because on each search query, you will load all user accesses, I don't have a better proposition and maybe we should let it like that but we should at least add a "timer" log to be able to see if this takes too much time. Maybe a cache could also help?

.values("user")
.annotate(last_shared=db.Max("created_at"))
)
return {item["user"]: item["last_shared"] for item in shared_qs}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return {item["user"]: item["last_shared"] for item in shared_qs}
return {item["user"]: item["last_shared"] for item in shared_qs.iterator()}

I would suggest to use an iterator here.

except ValidationError:
return "", ""

domain = email.split("@", 1)[1].lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such split is not "safe", the safe way is using email.headerregistry.Address but for performances sake, I guess we should keep it like this: maybe add a comment and rename the function unsafe_extract_email_domain_parts to say you did it this way on purpose could help, and might prevent the use of this method for another context :)

Note: django-lasuite already provides a get_domain_from_email

"document_id", flat=True
)
shared_qs = (
models.DocumentAccess.objects.filter(document_id__in=user_docs_qs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using a Subquery be better here? (I mean, to prevent data from being passed to Django for nothing)


### Changed

- 🚸(backend) sort user search results by proximity with the active user #1802
Copy link
Member

@sampaccoud sampaccoud Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create an issue now in django-lasuite so we don't forget to upstream it as it will have to be shared with all our apps once it has proven its efficiency?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants