Skip to content
This repository was archived by the owner on Mar 12, 2021. It is now read-only.
This repository was archived by the owner on Mar 12, 2021. It is now read-only.

mapreduce (sum, prod, etc.) fail in some cases when given a dims argument. #583

@Sleort

Description

@Sleort

Describe the bug
mapreduce(f, op, A...; dims = dims) and friends (sum(f, A; dims = dims), prod(f, A; dims = dims)...) fail for many (but not all) functions f when a dims ≠ : argument is given.

To Reproduce
The Minimal Working Example (MWE) for this bug:

julia> x = cu(rand(3,3))
3×3 CuArray{Float32,2,Nothing}:
 0.849469  0.625782  0.38785  
 0.877458  0.295448  0.0183218
 0.285424  0.496025  0.0742507

julia> sum(abs, x, dims=1) #Okay also when f = abs2
1×3 CuArray{Float32,2,Nothing}:
 2.01235  1.41725  0.480422

julia> sum(cos, x) #This is fine when dims = :
7.8284926f0

julia> sum(cos, x, dims = 1) #Fails for f ∈ (sin, sqrt, ...) as well...
┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called cos(x::T) where T<:Union{Float32, Float64} in Base.Math at special/trig.jl:100, maybe you intended to call cos(x::Float32) in CUDAnative at /home/troels/.julia/packages/CUDAnative/KWTMt/src/device/cuda/math.jl:6 instead?
│    Stacktrace:
│     [1] cos at special/trig.jl:100
│     [2] mapreducedim_kernel_parallel at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:20
└ @ CUDAnative ~/.julia/packages/CUDAnative/KWTMt/src/compiler/irgen.jl:111
┌ Warning: calls to Base intrinsics might be GPU incompatible
│   exception =
│    You called cos(x::T) where T<:Union{Float32, Float64} in Base.Math at special/trig.jl:100, maybe you intended to call cos(x::Float32) in CUDAnative at /home/troels/.julia/packages/CUDAnative/KWTMt/src/device/cuda/math.jl:6 instead?
│    Stacktrace:
│     [1] cos at special/trig.jl:100
│     [2] mapreducedim_kernel_parallel at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:20
└ @ CUDAnative ~/.julia/packages/CUDAnative/KWTMt/src/compiler/irgen.jl:111
ERROR: LLVM error: Cannot select: 0x690cc50: i64,glue = sube Constant:i64<0>, 0x690cb80, 0x690cbe8:1
  0x6909ec8: i64 = Constant<0>
  0x690cb80: i64 = add 0x6909d28, 0x690cb18
    0x6909d28: i64 = add 0x690a4e0, 0x690a208
      0x690a4e0: i64 = mul 0x6909d90, 0x690a0d0
        0x6909d90: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %13
          0x690bce0: i64 = Register %13
        0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
          0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
            0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
              0x6909df8: i64 = Register %0
            0x690a680: i64 = Constant<4503599627370495>
          0x6909cc0: i64 = Constant<4503599627370496>
      0x690a208: i64 = mulhu 0x690be80, 0x690a0d0
        0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
          0x690a750: i64 = Register %14
        0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
          0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
            0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
              0x6909df8: i64 = Register %0
            0x690a680: i64 = Constant<4503599627370495>
          0x6909cc0: i64 = Constant<4503599627370496>
    0x690cb18: i64 = select 0x690cab0, Constant:i64<1>, 0x690a7b8
      0x690cab0: i1 = setcc 0x690c978, 0x690c910, setult:ch
        0x690c978: i64 = add 0x690a5b0, 0x690c910
          0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
            0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
              0x690a750: i64 = Register %14
            0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
              0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
                  0x6909df8: i64 = Register %0
                0x690a680: i64 = Constant<4503599627370495>
              0x6909cc0: i64 = Constant<4503599627370496>
          0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
            0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
              0x690c088: i64 = Register %15
            0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
              0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
                  0x6909df8: i64 = Register %0
                0x690a680: i64 = Constant<4503599627370495>
              0x6909cc0: i64 = Constant<4503599627370496>
        0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
          0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
            0x690c088: i64 = Register %15
          0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
            0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
              0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
                0x6909df8: i64 = Register %0
              0x690a680: i64 = Constant<4503599627370495>
            0x6909cc0: i64 = Constant<4503599627370496>
      0x690ab60: i64 = Constant<1>
      0x690a7b8: i64 = zero_extend 0x690c9e0
        0x690c9e0: i1 = setcc 0x690c978, 0x690a5b0, setult:ch
          0x690c978: i64 = add 0x690a5b0, 0x690c910
            0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
              0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
                0x690a750: i64 = Register %14
              0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
                0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                  0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0

                  0x690a680: i64 = Constant<4503599627370495>
                0x6909cc0: i64 = Constant<4503599627370496>
            0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
              0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
                0x690c088: i64 = Register %15
              0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
                0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                  0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0

                  0x690a680: i64 = Constant<4503599627370495>
                0x6909cc0: i64 = Constant<4503599627370496>
          0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
            0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
              0x690a750: i64 = Register %14
            0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
              0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
                0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
                  0x6909df8: i64 = Register %0
                0x690a680: i64 = Constant<4503599627370495>
              0x6909cc0: i64 = Constant<4503599627370496>
  0x690cbe8: i64,glue = subc Constant:i64<0>, 0x690c978
    0x6909ec8: i64 = Constant<0>
    0x690c978: i64 = add 0x690a5b0, 0x690c910
      0x690a5b0: i64 = mul 0x690be80, 0x690a0d0
        0x690be80: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %14
          0x690a750: i64 = Register %14
        0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
          0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
            0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
              0x6909df8: i64 = Register %0
            0x690a680: i64 = Constant<4503599627370495>
          0x6909cc0: i64 = Constant<4503599627370496>
      0x690c910: i64 = mulhu 0x690a138, 0x690a0d0
        0x690a138: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %15
          0x690c088: i64 = Register %15
        0x690a0d0: i64 = or 0x690aa28, Constant:i64<4503599627370496>
          0x690aa28: i64 = and 0x690c020, Constant:i64<4503599627370495>
            0x690c020: i64,ch = CopyFromReg 0x2cda6e0, Register:i64 %0
              0x6909df8: i64 = Register %0
            0x690a680: i64 = Constant<4503599627370495>
          0x6909cc0: i64 = Constant<4503599627370496>
In function: julia_paynehanek_18536
Stacktrace:
 [1] handle_error(::Cstring) at /home/troels/.julia/packages/LLVM/DAnFH/src/core/context.jl:103
 [2] macro expansion at /home/troels/.julia/packages/LLVM/DAnFH/src/base.jl:18 [inlined]
 [3] LLVMTargetMachineEmitToMemoryBuffer at /home/troels/.julia/packages/LLVM/DAnFH/lib/6.0/libLLVM_h.jl:2726 [inlined]
 [4] emit(::LLVM.TargetMachine, ::LLVM.Module, ::LLVM.API.LLVMCodeGenFileType) at /home/troels/.julia/packages/LLVM/DAnFH/src/targetmachine.jl:42
 [5] mcgen(::CUDAnative.CompilerJob, ::LLVM.Module, ::LLVM.Function) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/mcgen.jl:87
 [6] macro expansion at /home/troels/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
 [7] macro expansion at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:209 [inlined]
 [8] macro expansion at /home/troels/.julia/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:228 [inlined]
 [9] #codegen#154(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:206
 [10] #codegen at ./none:0 [inlined]
 [11] #compile#153(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:52
 [12] #compile at ./none:0 [inlined]
 [13] #compile#152 at /home/troels/.julia/packages/CUDAnative/KWTMt/src/compiler/driver.jl:33 [inlined]
 [14] #compile at ./none:0 [inlined] (repeats 2 times)
 [15] macro expansion at /home/troels/.julia/packages/CUDAnative/KWTMt/src/execution.jl:393 [inlined]
 [16] #cufunction#198(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::typeof(CuArrays.mapreducedim_kernel_parallel), ::Type{Tuple{typeof(cos),typeof(Base.add_sum),CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},Int64,Int64}}) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/execution.jl:360
 [17] cufunction(::Function, ::Type) at /home/troels/.julia/packages/CUDAnative/KWTMt/src/execution.jl:360
 [18] macro expansion at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:61 [inlined]
 [19] macro expansion at ./gcutils.jl:91 [inlined]
 [20] _mapreducedim!(::Function, ::Function, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}) at /home/troels/.julia/packages/CuArrays/OiLYC/src/mapreduce.jl:58
 [21] mapreducedim!(::Function, ::Function, ::CuArray{Float32,2,Nothing}, ::CuArray{Float32,2,Nothing}) at ./reducedim.jl:274
 [22] _mapreduce_dim(::Function, ::Function, ::NamedTuple{(),Tuple{}}, ::CuArray{Float32,2,Nothing}, ::Int64) at ./reducedim.jl:317
 [23] mapreduce_impl at /home/troels/.julia/packages/GPUArrays/dhirJ/src/host/mapreduce.jl:78 [inlined]
 [24] #mapreduce#29 at /home/troels/.julia/packages/GPUArrays/dhirJ/src/host/mapreduce.jl:64 [inlined]
 [25] #mapreduce at ./none:0 [inlined]
 [26] _sum at ./reducedim.jl:679 [inlined]
 [27] #sum#588 at ./reducedim.jl:653 [inlined]
 [28] (::Base.var"#kw##sum")(::NamedTuple{(:dims,),Tuple{Int64}}, ::typeof(sum), ::Function, ::CuArray{Float32,2,Nothing}) at ./none:0
 [29] top-level scope at REPL[4]:1

Environment details
Details on Julia:

julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Julia packages:

(v1.3) pkg> st CuArrays
    Status `~/.julia/environments/v1.3/Project.toml`
  [79e6a3ab] Adapt v1.0.0
  [fa961155] CEnum v0.2.0
  [3895d2a7] CUDAapi v2.1.0 #master (https://github.com/JuliaGPU/CUDAapi.jl.git)
  [c5f51814] CUDAdrv v5.0.1 #master (https://github.com/JuliaGPU/CUDAdrv.jl.git)
  [be33ccc6] CUDAnative v2.9.1 #master (https://github.com/JuliaGPU/CUDAnative.jl.git)
  [3a865a2d] CuArrays v1.7.0 #master (https://github.com/JuliaGPU/CuArrays.jl.git)
  [864edb3b] DataStructures v0.17.9
  [0c68f7d7] GPUArrays v2.0.1 #master (https://github.com/JuliaGPU/GPUArrays.jl.git)
  [1914dd2f] MacroTools v0.5.3
  [872c559c] NNlib v0.6.4
  [189a3867] Reexport v0.2.0

CUDA: toolkit and driver version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions