Change type chars and document complex by seberg · Pull Request #357 · jax-ml/ml_dtypes

seberg · 2026-02-05T11:36:37Z

This may be more for discussion. If we want unique type characters things are just annoying. Not that type characters matter much so long that we don't collide with NumPy's.

This is a suggestion to change E to mean complex32 to follow the NumPy pattern on this one, but that means changing bfloat16 to something else, right now r (and bcomplex32 R).

(The truth is, there is no good choice, beyond just setting it \0 some time and maybe defaulting in NumPy to return something else then -- or doing it here via the new API.)

ml_dtypes/_src/dtypes.cc

seberg · 2026-02-06T07:52:36Z

I stumbled on gh-216 where it was discussed that it may be good to do a major version for such a change. But also, if we do the E change, that may be one to include here (if we do that version bump).

So I guess we have two options: punt on it for now and use anything, or actually modify the type code.

Honestly, I suspect the only reason why we need this at all is the hashing. I hope there isn't any other legacy cruft in NumPy left other than that.
In a sense, fortunately, hashing is actually a solved problem with new DTypes (since you can just implement your own hash function. But of course starting to use that is annoying.

We could also change the hash for user-dtypes to be computed differently (e.g. via id(descr) + byteorder + itemsize basically assuming that user-dtypes are never equivalent).

(Since we can rely on there being a custom type(dtype(bfloat16)), etc. I am almost wondering if we can't monkey-patch the C-type/class to fix the descriptor in the meantime. If it really turns out that the hashing is the only reason why we can't use .kind == f -- that said, it likely isn't quite, because e.g. CuPy may have also 1-2 code paths that assume (kind, itemsize) is sufficiently exact. -- I'll assume that only matters for the builtin kinds like f, though.)

leofang · 2026-02-09T21:54:53Z

I stumbled on gh-216 where it was discussed that it may be good to do a major version for such a change. But also, if we do the E change, that may be one to include here (if we do that version bump).

So I guess we have two options: punt on it for now and use anything, or actually modify the type code.

I would love to make E reserved for complex32 for consistency, but yeah I am not sure how this would impact existing Jax and other users... Also, I think ml_dtypes is still on 0.x (0.5.4 is the latest), and I am not sure if this would mean that we need to start shipping 1.x. If SemVer is used in this repo, then 0.x means we can still break (across minor releases)?

@jakevdp @hawkinsp thoughts?

jakevdp · 2026-02-09T22:12:15Z

Pinging @hawkinsp – we need to think about this one.

hawkinsp · 2026-02-09T22:23:54Z

Yes. I think this is case where there are no good options because the idea of statically allocated type codes doesn't work that well.

I have no strong objection to changing it, but no reliable way to tell who it might break. One thing we can do is run our various test suites on that change and it might turn something up, but that's only weak evidence that it is safe.

Under NumPy 2's custom type API, do we still have to assign a type code? If not, then I wonder if it would be more conservative to:
a) freeze the current type codes where they are, and
b) add a variant of this module that uses NumPy 2's API for NumPy 2.4+ say.

i.e., don't break anyone existing, and do something that will fix the problem comprehensively.

seberg · 2026-02-10T09:50:39Z

OK, sounds like we lean towards just punting and not changing things for now. The E meaning is a bit annoying, but maybe not the end of the world. I.e. for non-builtin NumPy types it is better to not rely on type-codes anyway.

There are two reasons why type-codes currently kinda matter:

I believe if casts between same sized types (I am looking at the float family) were implemented, current NumPy might consider them identical. For hashing purposes it already does, but because dt1 == dt2 fails, the hash collision doesn't really matter much.
Users might use them and think comparing .char is as good as comparing dtypes. Those are also the users we would break changing things but OK.

Long term, I suspect we should at some point just break users. That is for .kind we actually could just use f and c (i.e. sensible kinds). While for the character code, I might just go to \0 (in C-API) and either here (or in NumPy) we could map that to error (or so) for dtype.char.

The problem with .kind is just that the old promotion/cast-safety code also relies on it. I forgot how much of a mess that was... But, that is all completely replaced with the new API.
(Of course users might still rely on kind+itemsize+byteorder or char being unique, but it seems unlikely that many do that and already use ml_dtypes, and it isn't even true right now: Most ml_dtypes use kind="V".)

leofang · 2026-02-13T17:17:16Z

OK then @seberg let's revert the type code changes for now and merge the doc updates.

One thing we can do is run our various test suites on that change and it might turn something up, but that's only weak evidence that it is safe.

I think this is still a good idea given that there is really no better alternative. @hawkinsp how would this work? If we draft a PR in this repo, would it be possible to trigger some form of automation and run it through the test suites?

In any case, sounds like regardless of using NumPy 2 dtype APIs or not this is always a breaking change. Can we discuss a rough plan (version number, timeline, possible mitigations, ...) for how this could be executed?

hawkinsp · 2026-02-13T17:28:43Z

OK then @seberg let's revert the type code changes for now and merge the doc updates.

One thing we can do is run our various test suites on that change and it might turn something up, but that's only weak evidence that it is safe.

I think this is still a good idea given that there is really no better alternative. @hawkinsp how would this work? If we draft a PR in this repo, would it be possible to trigger some form of automation and run it through the test suites?

In any case, sounds like regardless of using NumPy 2 dtype APIs or not this is always a breaking change. Can we discuss a rough plan (version number, timeline, possible mitigations, ...) for how this could be executed?

I've tried porting this code to NumPy 2's APIs a couple of times, although I got stuck on NumPy bugs each time because the dtype APIs are relatively new. The time might be ripe for another attempt, particularly if someone is able to review the results. I can dredge up my last attempt.

seberg · 2026-02-17T17:23:13Z

Sorry, at some point I was staring at it and had no ideas for characters and gave up... In the end it doesn't matter it is not J and K, but happy to change.
I did leave it as 'c' for the complex32 kind since it is technically correct and 'f' is used also (even if there is an issue about possibly changing it).

The time might be ripe for another attempt, particularly if someone is able to review the results. I can dredge up my last attempt.

If you can even just put up a silly branch, I can have a look. I think most holes were stuffed (because you can just set the original functions). If there is still something where that doesn't work, we can probably also monkey-patch things (i.e. to "backport" fixes we do in NumPy).

leofang · 2026-02-18T02:51:53Z

In the end it doesn't matter it is not J and K, but happy to change.

This is hilarious lol

hawkinsp · 2026-02-24T08:05:11Z

Here, I AI-slopped my way to a numpy 2 version: #360

(Not for submission, but it may give us a path forward with some work.)

This may be more for discussion. If we want unique type characters things are just annoying. Not that type characters matter much so long that we don't collide with NumPy's. (The truth is, there is no good choice, beyond just setting it `\0` some time and maybe defaulting in NumPy to return something else then -- or doing it here via the new API.)

jakevdp reviewed Feb 6, 2026

View reviewed changes

ml_dtypes/_src/dtypes.cc Show resolved Hide resolved

jakevdp self-assigned this Feb 9, 2026

seberg force-pushed the complex-followup branch from 4cc63bf to d86817f Compare February 17, 2026 17:20

seberg force-pushed the complex-followup branch from d86817f to e89be3a Compare March 5, 2026 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change type chars and document complex#357

Change type chars and document complex#357
seberg wants to merge 1 commit intojax-ml:mainfrom
seberg:complex-followup

seberg commented Feb 5, 2026

Uh oh!

Uh oh!

seberg commented Feb 6, 2026 •

edited

Loading

Uh oh!

leofang commented Feb 9, 2026

Uh oh!

jakevdp commented Feb 9, 2026

Uh oh!

hawkinsp commented Feb 9, 2026

Uh oh!

seberg commented Feb 10, 2026

Uh oh!

leofang commented Feb 13, 2026

Uh oh!

hawkinsp commented Feb 13, 2026

Uh oh!

seberg commented Feb 17, 2026 •

edited

Loading

Uh oh!

leofang commented Feb 18, 2026

Uh oh!

hawkinsp commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

seberg commented Feb 5, 2026

Uh oh!

Uh oh!

seberg commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang commented Feb 9, 2026

Uh oh!

jakevdp commented Feb 9, 2026

Uh oh!

hawkinsp commented Feb 9, 2026

Uh oh!

seberg commented Feb 10, 2026

Uh oh!

leofang commented Feb 13, 2026

Uh oh!

hawkinsp commented Feb 13, 2026

Uh oh!

seberg commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang commented Feb 18, 2026

Uh oh!

hawkinsp commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

seberg commented Feb 6, 2026 •

edited

Loading

seberg commented Feb 17, 2026 •

edited

Loading