-
Notifications
You must be signed in to change notification settings - Fork 510
🚸(backend) sort user search results by proximity with the active user #1802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fb5d1bf to
d7cc384
Compare
Allows a user to find more easily the other users they search, with the following order of priority: - users they already share documents with (more recent first) - users that share the same full email domain - users that share the same partial email domain (last two parts) - other users
19706b0 to
768516c
Compare
768516c to
0eb0dc1
Compare
lunika
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great implementation
qbey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting:) I like the python instead of SQL approach but I wonder if it will not bring too much overhead. I guess it could be ok for a first version but:
- is the "closer" compute not too much? I mean if you put the user you share something with, wouldn't that be lighter to compute and enough? Could be a first step, before bringing the heavy machine gun.
- we need to monitor the function execution time to see if at some point or for some users it starts to be long (either logger.info or prometheus)
Anyway, this is just a review to help (I know I only ask more questions and did not bring any solution ^^)
| return re.findall(enums.MEDIA_STORAGE_URL_EXTRACT, xml_content) | ||
|
|
||
|
|
||
| def users_sharing_documents_with(user): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit afraid of this, because on each search query, you will load all user accesses, I don't have a better proposition and maybe we should let it like that but we should at least add a "timer" log to be able to see if this takes too much time. Maybe a cache could also help?
| .values("user") | ||
| .annotate(last_shared=db.Max("created_at")) | ||
| ) | ||
| return {item["user"]: item["last_shared"] for item in shared_qs} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return {item["user"]: item["last_shared"] for item in shared_qs} | |
| return {item["user"]: item["last_shared"] for item in shared_qs.iterator()} |
I would suggest to use an iterator here.
| except ValidationError: | ||
| return "", "" | ||
|
|
||
| domain = email.split("@", 1)[1].lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such split is not "safe", the safe way is using email.headerregistry.Address but for performances sake, I guess we should keep it like this: maybe add a comment and rename the function unsafe_extract_email_domain_parts to say you did it this way on purpose could help, and might prevent the use of this method for another context :)
Note: django-lasuite already provides a get_domain_from_email
| "document_id", flat=True | ||
| ) | ||
| shared_qs = ( | ||
| models.DocumentAccess.objects.filter(document_id__in=user_docs_qs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using a Subquery be better here? (I mean, to prevent data from being passed to Django for nothing)
|
|
||
| ### Changed | ||
|
|
||
| - 🚸(backend) sort user search results by proximity with the active user #1802 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe create an issue now in django-lasuite so we don't forget to upstream it as it will have to be shared with all our apps once it has proven its efficiency?
Purpose
Allows a user to find more easily the other users they search, with the following order of priority:
Solves #1521
Proposal
core/utils.py:users_sharing_documents_with()andextract_email_domain_parts()