Feature/collations#776
Conversation
|
This is very cool, but I think there would be potentially significant issues if this only generates migrations. For example, we often essentially implicitly (for better or for worse) rely on the fact that like 99% of the time the default collation in postgres databases ensures that our expressions behave the same way both in Elixir and in Postgres. If we decide to do this, for example, we probably need to actually handle this when we build references for doing things like sorts, or we need to in some way prevent expressions from running in Elixir. It's a fairly complex topic that would involve a fair bit of Ash core spelunking. Could you share a bit more about your use case and how you plan on using it? |
|
Thanks for your feedback and I can clearly see the limitations. Use case: I have documents and containers labeled something like: "Box 2", "Box 10", and I want ordering that respects the numbers. A column collation like en-u-kn-true gives me that for plain ORDER BY, offset, and keyset pagination out of the box. On the core concern: I don't think this introduces new behaviour, for two reasons:
The documentation should clearly document the scope and limitations of this feature. It would be a nice addition when ash can show a warning for the edge cases. But as I said, because the issue already exists, I don't think that it's totally necessary. Do you see it as a necessity? How tricky would it be to implement this in the ash package? I'm thinking of something like: Ash core: # data_layer.ex
@callback unportable_sort_fields(Ash.Resource.t()) :: [atom]# actions/sort.ex
if is_atom(field) and field in Ash.DataLayer.unportable_sort_fields(resource) do
Logger.warning("Sorting #{inspect(resource)}.#{field} in memory; its database " <>
"collation is not reproduced by Elixir term ordering, so results may differ " <>
"from a database sort.")
end# ash_postgres:
def unportable_sort_fields(resource) do
Enum.map(AshPostgres.DataLayer.Info.collations(resource), & &1.attribute)
end@frankdugan3 mentioned on Discord he'd thought about a feature like this too. Anything from your side worth folding in? |
|
Waaay back 2022, I was using a collation to sort job numbers (naturalsort via icu). At the time, also adding distinct caused it to fail to actually use the sort. You can see the conversation here: https://discord.com/channels/711271361523351632/833816594417057833/971781243987456010 At the moment, I don't currently do any custom collations, so I'm not sure if this is still an issue or not. But it does point to Zach's concern that this could have side effects if not properly integrated into Ash's logic at large. |
|
Also, your point about Postgres using the system C collation is entirely valid, I've run into Elixir and Postgres sorting differences before, can't remember the example unfortunately. |
|
Hm...yeah I think there is a way to do it, not necessarily a small project though. What we can do is add some kind of logic to expression evaluation in Ash that checks to see if a field ref is "safe" to use in some way before actually evaluating it. Or perhaps what operators are "safe" to use with a given sub-expression. If it's not safe to evaluate in elixir then it would return But the best way to do that would likely be with a custom type instead of with a DSL, something like this: attribute :foo, {AshPostgres.Type.CollatedString, collation: "collation"}That custom type can use our operator overloads method to indicate the type of expressions that use it for example. Pretty complex. |
|
Thanks a lot for taking the time to think this through and for the detailed feedback. I can definitely see the concerns around consistency between Elixir and Postgres evaluation, and how this quickly becomes a deeper topic than just generating migrations. Even though I still feel that the declarative approach could provide some practical value on its own. But I also understand the hesitation in introducing something that might lead to subtle inconsistencies in behavior. Given that, I’m totally fine with not moving this PR forward in its current form. I also really appreciate the emphasis on keeping the framework consistent and predictable. Your suggestion around a custom type and tighter integration with expression evaluation is very interesting. I might take a look at that approach when I have some time, although I suspect it’s a bit beyond my current experience level. Thanks again for your effort. |
|
Thank you for the submission! Its something we definitely need to solve for long term! |
|
@weljoda It may be wothwhile to distill the use case, discovered complications and proposed ideas in to a neat, self-contained issue for future efforts. That way we don't have to redo all the archaeological research later, and others that run into the lack of collation support can chime in, possibly creating a workable solution. Edit: Issue #650 is focused on something similar to Zach's suggestion |
Contributor checklist
Leave anything that you believe does not apply unchecked.
Hi everybody,
I added a feature to manage collations within the code. It is only used to create correct migrations, so that they don't need to be added manually and will not be forgotten, living silently in the db. It's as simple as:
and adding additional collations to the DB by using:
I added documentation and testcases and type validation.
Please reach out for any questions.
I would be happy if this will be merged :)
Greetings Johannes