Skip to content

Feature/collations#776

Closed
weljoda wants to merge 4 commits into
ash-project:mainfrom
weljoda:feature/collations
Closed

Feature/collations#776
weljoda wants to merge 4 commits into
ash-project:mainfrom
weljoda:feature/collations

Conversation

@weljoda

@weljoda weljoda commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Contributor checklist

Leave anything that you believe does not apply unchecked.

  • I accept the AI Policy, or AI was not used in the creation of this PR.
  • Bug fixes include regression tests
  • Chores
  • Documentation changes
  • Features include unit/acceptance tests
  • Refactoring
  • Update dependencies

Hi everybody,

I added a feature to manage collations within the code. It is only used to create correct migrations, so that they don't need to be added manually and will not be forgotten, living silently in the db. It's as simple as:

postgres do
  table "documents"
  repo MyApp.Repo

  collations do
    collation :attr_name, "de_AT"
  end
end

and adding additional collations to the DB by using:

defmodule MyApp.Repo do
  use AshPostgres.Repo, otp_app: :my_app

  def installed_collations do
    [
      %AshPostgres.CustomCollation{
        name: "natural_sort",
        provider: :icu,
        locale: "en-u-kn-true",
        deterministic: true
      }
    ]
  end
end

I added documentation and testcases and type validation.

Please reach out for any questions.
I would be happy if this will be merged :)

Greetings Johannes

@zachdaniel

Copy link
Copy Markdown
Contributor

This is very cool, but I think there would be potentially significant issues if this only generates migrations. For example, we often essentially implicitly (for better or for worse) rely on the fact that like 99% of the time the default collation in postgres databases ensures that our expressions behave the same way both in Elixir and in Postgres. If we decide to do this, for example, we probably need to actually handle this when we build references for doing things like sorts, or we need to in some way prevent expressions from running in Elixir. It's a fairly complex topic that would involve a fair bit of Ash core spelunking.

Could you share a bit more about your use case and how you plan on using it?

@weljoda

weljoda commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for your feedback and I can clearly see the limitations.

Use case: I have documents and containers labeled something like: "Box 2", "Box 10", and I want ordering that respects the numbers. A column collation like en-u-kn-true gives me that for plain ORDER BY, offset, and keyset pagination out of the box.

On the core concern: I don't think this introduces new behaviour, for two reasons:

  • I can already create this collation by hand (migration or psql). The feature just makes it declarative and tracked in the snapshot, so a collated column is visible in the resource. This is how I can easily remember that I created a collation on a column, instead of forgetting about it and wondering about some strange behaviour later.
  • The DB-vs-Elixir ordering divergence already exists today for any non-C default collation — this makes it explicit and opt-in rather than accidental.

The documentation should clearly document the scope and limitations of this feature.

It would be a nice addition when ash can show a warning for the edge cases. But as I said, because the issue already exists, I don't think that it's totally necessary. Do you see it as a necessity?

How tricky would it be to implement this in the ash package? I'm thinking of something like:

Ash core:

# data_layer.ex
@callback unportable_sort_fields(Ash.Resource.t()) :: [atom]
# actions/sort.ex
if is_atom(field) and field in Ash.DataLayer.unportable_sort_fields(resource) do
 Logger.warning("Sorting #{inspect(resource)}.#{field} in memory; its database " <>
 "collation is not reproduced by Elixir term ordering, so results may differ " <>
 "from a database sort.")
end
# ash_postgres:
def unportable_sort_fields(resource) do
  Enum.map(AshPostgres.DataLayer.Info.collations(resource), & &1.attribute)
end

@frankdugan3 mentioned on Discord he'd thought about a feature like this too. Anything from your side worth folding in?

@frankdugan3

Copy link
Copy Markdown
Contributor

Waaay back 2022, I was using a collation to sort job numbers (naturalsort via icu). At the time, also adding distinct caused it to fail to actually use the sort. You can see the conversation here: https://discord.com/channels/711271361523351632/833816594417057833/971781243987456010

At the moment, I don't currently do any custom collations, so I'm not sure if this is still an issue or not. But it does point to Zach's concern that this could have side effects if not properly integrated into Ash's logic at large.

@frankdugan3

frankdugan3 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Also, your point about Postgres using the system C collation is entirely valid, I've run into Elixir and Postgres sorting differences before, can't remember the example unfortunately.

@zachdaniel

Copy link
Copy Markdown
Contributor

Hm...yeah I think there is a way to do it, not necessarily a small project though. What we can do is add some kind of logic to expression evaluation in Ash that checks to see if a field ref is "safe" to use in some way before actually evaluating it. Or perhaps what operators are "safe" to use with a given sub-expression. If it's not safe to evaluate in elixir then it would return :error which would force data layer evaluation.

But the best way to do that would likely be with a custom type instead of with a DSL, something like this:

attribute :foo, {AshPostgres.Type.CollatedString, collation: "collation"}

That custom type can use our operator overloads method to indicate the type of expressions that use it for example. Pretty complex.

@weljoda

weljoda commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Thanks a lot for taking the time to think this through and for the detailed feedback.

I can definitely see the concerns around consistency between Elixir and Postgres evaluation, and how this quickly becomes a deeper topic than just generating migrations. Even though I still feel that the declarative approach could provide some practical value on its own. But I also understand the hesitation in introducing something that might lead to subtle inconsistencies in behavior.

Given that, I’m totally fine with not moving this PR forward in its current form. I also really appreciate the emphasis on keeping the framework consistent and predictable.

Your suggestion around a custom type and tighter integration with expression evaluation is very interesting. I might take a look at that approach when I have some time, although I suspect it’s a bit beyond my current experience level.

Thanks again for your effort.

@zachdaniel zachdaniel closed this Jun 10, 2026
@zachdaniel

Copy link
Copy Markdown
Contributor

Thank you for the submission! Its something we definitely need to solve for long term!

@frankdugan3

frankdugan3 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@weljoda It may be wothwhile to distill the use case, discovered complications and proposed ideas in to a neat, self-contained issue for future efforts. That way we don't have to redo all the archaeological research later, and others that run into the lack of collation support can chime in, possibly creating a workable solution.

Edit: Issue #650 is focused on something similar to Zach's suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants