Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M1/M2: Large matrix multiplications can contains NaNs #381

Open
chengchingwen opened this issue Jul 4, 2024 · 30 comments
Open

M1/M2: Large matrix multiplications can contains NaNs #381

chengchingwen opened this issue Jul 4, 2024 · 30 comments
Labels
upstream Out of our hands

Comments

@chengchingwen
Copy link
Contributor

chengchingwen commented Jul 4, 2024

MWE:

julia> a = Metal.randn(10000, 10000);

julia> b = Metal.randn(10000, 10000);

julia> c = a * b';

julia> for i in 1:10
           C = Metal.zeros(Float32, size(a))
           mul!(C, a, b')
           @assert C ≈ c "$i"
       end
ERROR: AssertionError: 1
Stacktrace:
 [1] top-level scope
   @ ./REPL[58]:4

julia> for i in 1:10
           C = Metal.zeros(Float32, size(a))
           mul!(C, a, b')
           @assert C ≈ c "$i"
       end
ERROR: AssertionError: 8
Stacktrace:
 [1] top-level scope
   @ ./REPL[58]:4

julia> for i in 1:10
           @assert a * b' ≈ c "$i"
       end
ERROR: AssertionError: 3
Stacktrace:
 [1] top-level scope
   @ ./REPL[59]:2

julia> for i in 1:10
           @assert a * b' ≈ c "$i"
       end
ERROR: AssertionError: 8
Stacktrace:
 [1] top-level scope
   @ ./REPL[59]:2
@chengchingwen
Copy link
Contributor Author

adding wait_completed on matmul!'s command buffer does not help

@christiangnrd
Copy link
Contributor

christiangnrd commented Jul 4, 2024

Adding Metal.@sync to the mul! also does not help. However, I cannot reproduce when calling MPS.matmul! directly.

@maleadt
Copy link
Member

maleadt commented Jul 5, 2024

I cannot reproduce at all on Metal.jl#master using an M3 Pro, but it does seem reproducible on an M1 Pro.

I wonder if this is a problem with mapreduce, since you're calling isapprox on GPU arrays. Can you test if calling @assert Array(C) ≈ Array(c) makes things pass? It does here, at least.

@tgymnich
Copy link
Member

tgymnich commented Jul 5, 2024

I can reproduce the issue on M1 master. It also looks like all the tasks run on the same queue.

@chengchingwen
Copy link
Contributor Author

The issue was found on a M2 Max. The MWE only happens if the array is large enough. It seems to be launching the subsequent kernel before the matmul finished. Is it possible that the mapreduce not checking the availability of the input arrays?

p.s. I'm about to board the plane to JuliaCon so I won't be able to test it soon.

@maleadt
Copy link
Member

maleadt commented Jul 5, 2024

I wonder if this is a problem with mapreduce, since you're calling isapprox on GPU arrays. Can you test if calling @assert Array(C) ≈ Array(c) makes things pass? It does here, at least.

It also reproduces when comparing on the CPU, just much less likely, so this isn't a mapreduce issue.

@maleadt
Copy link
Member

maleadt commented Jul 5, 2024

Looks like a bunch of NaN's in the second matrix.

@christiangnrd
Copy link
Contributor

christiangnrd commented Jul 5, 2024

My current MWE is:

using Metal, LinearAlgebra; begin
    n = 10000
    a = mtl(randn(Float32,n,n))
    b = mtl(randn(Float32,n,n))
    C = Metal.zeros(Float32, size(a))
    for i in 1:10
        C = Metal.zeros(Float32, size(a))
        mul!(C,a,b)
        @assert !any(isnan.(C)) "$i"
    end
end

I define C out of the loop to access it afterwards. When I had C .= ... in the loop instead of C = .... It only ever happened at iteration 1. I suspect it has to do with the location in memory of the array.

@maleadt
Copy link
Member

maleadt commented Jul 5, 2024

I cannot reproduce when calling MPS.matmul! directly

I can:

using Metal, LinearAlgebra

function main(T=Float32, N=10000)
    a = Metal.rand(T, N, N)
    b = Metal.rand(T, N, N)
    c = a * b'
    synchronize()

    for i in 1:100
        println("Iteration $i")
        d = Metal.zeros(T, size(a))
        MPS.matmul!(d, a, b, #=alpha=#true, #=beta=#false,
                    #=transpose_a=#false, #=transpose_b=#true)
        @assert !any(isnan.(Array(d))) "NaN in iteration $i"

        # XXX: this redundant check is needed, or the failure never occurs
        @assert !any(isnan.(d))
    end
end

isinteractive() || main()

The need for a secondary kernel is very weird.

@maleadt maleadt changed the title matrix multiplication not always synchronized M1/M1: Large matrix multiplications can contains NaNs Jul 5, 2024
@tgymnich
Copy link
Member

tgymnich commented Jul 5, 2024

It is not MPS related:

 for i in 1:10
       C = Metal.zeros(Float32, size(a))
       GPUArrays.generic_matmatmul!(C, a, b, MulAddMul())
       @assert C  c "$i"
end

@maleadt
Copy link
Member

maleadt commented Jul 5, 2024

GPUArrays.generic_matmatmul!(C, a, b, MulAddMul())

I don't see how that's related; it's an entirely different kernel. Does it contain NaNs in similar places?
The generic matmatmul kernel, while being extraordinarily slow, doesn't introduce NaNs here.

@tgymnich
Copy link
Member

tgymnich commented Jul 5, 2024

Just wanted to confirm that its MPS rather than the synchronisation between kernel launches.

@tgymnich
Copy link
Member

tgymnich commented Jul 5, 2024

I've been seeing the NaN issues with large arrays for a long time in #145

MPX seems fine:

import mlx.core as mx

a = mx.random.normal((10000, 10000))
b = mx.random.normal((10000, 10000))
c = a @ b.T


for i in range(0,10):
    C = a @ b.T
    assert(mx.allclose(C,c))

@christiangnrd christiangnrd changed the title M1/M1: Large matrix multiplications can contains NaNs M1/M2: Large matrix multiplications can contains NaNs Jul 5, 2024
@christiangnrd
Copy link
Contributor

christiangnrd commented Jul 12, 2024

I would love for someone to review my code because I'm not a Swift expert by any means, but I was able to reproduce this in the Swift REPL.

Swift MWE

import Metal 
import MetalPerformanceShaders
 
func main(T: Float.Type = Float32.self, N: Int = 10000) { 
    guard let device = MTLCreateSystemDefaultDevice(), 
          let commandQueue = device.makeCommandQueue() else { 
        fatalError("Metal device or command queue could not be created") 
          } 
     
    print("Initializing a & b") 
    // Generate random NxN matrices 
    var a = [Float](repeating: 1, count: N * N) 
    var b = [Float](repeating: 1, count: N * N) 
 
    print("a and b created\n") 
    // Metal buffers for matrices 
    let aBuffer = device.makeBuffer(bytes: &a, length: MemoryLayout<Float>.size * N * N, options: []) 
    let bBuffer = device.makeBuffer(bytes: &b, length: MemoryLayout<Float>.size * N * N, options: []) 
 
    print("Starting matmul\n") 
    for i in 1...10 { 
        print(i) 
        print("\n") 
        // Create MPSMatrices 
        let aMatrixDescriptor = MPSMatrixDescriptor(rows: N, columns: N, rowBytes: MemoryLayout<Float>.size * N, dataType: .float32) 
        let bMatrixDescriptor = MPSMatrixDescriptor(rows: N, columns: N, rowBytes: MemoryLayout<Float>.size * N, dataType: .float32) 
 
 
        let aMatrix = MPSMatrix(buffer: aBuffer!, descriptor: aMatrixDescriptor) 
        let bMatrix = MPSMatrix(buffer: bBuffer!, descriptor: bMatrixDescriptor) 
 
        // Matrix multiplication using MPSMatrixMultiplication 
        let matrixMultiplication = MPSMatrixMultiplication(device: device, 
        transposeLeft: false, 
        transposeRight: false, 
        resultRows: N, 
        resultColumns: N, 
        interiorColumns: N, 
        alpha: 1.0, 
        beta: 0.0) 
        let cBuffer = device.makeBuffer(length: MemoryLayout<Float>.size * N * N, options: []) 
        let cMatrixDescriptor = MPSMatrixDescriptor(rows: N, columns: N, rowBytes: MemoryLayout<Float>.size * N, dataType: .float32) 
        let cMatrix = MPSMatrix(buffer: cBuffer!, descriptor: cMatrixDescriptor) 
         
        let commandBuffer = commandQueue.makeCommandBuffer()! 
        matrixMultiplication.encode(commandBuffer: commandBuffer, 
        leftMatrix: aMatrix, 
        rightMatrix: bMatrix, 
        resultMatrix: cMatrix) 
        commandBuffer.commit() 
        commandBuffer.waitUntilCompleted() 

        // Check for NaNs in the result matrix 
        let cPointer = cBuffer!.contents().bindMemory(to: Float.self, capacity: N * N) 
        var j = 0
        while j < N*N {
            if cPointer[j].isNaN {
                fatalError("NaN in iteration \(i)")
            }
            j += 1
        }
    } 
}
 
Output:
Initializing a & b
a and b created

Starting matmul

1


2


3


4


__lldb_expr_3/repl.swift:56: Fatal error: NaN in iteration 4
2024-07-12 17:58:38.583349-0300 repl_swift[1500:21665] __lldb_expr_3/repl.swift:56: Fatal error: NaN in iteration 4
Execution interrupted. Enter code to recover and continue.
Enter LLDB commands to investigate (type :help for assistance.)

@tgymnich
Copy link
Member

@christiangnrd Your Swift Code looks good to me. It turns out MPX doesn’t even use MPS.

@christiangnrd christiangnrd added the upstream Out of our hands label Jul 12, 2024
@maleadt
Copy link
Member

maleadt commented Jul 13, 2024

Haven't been able to look into this, but here's the ObjC version:

#import <Foundation/Foundation.h>
#import <Metal/Metal.h>
#import <MetalPerformanceShaders/MetalPerformanceShaders.h>

void performMatrixMultiplication(NSInteger N) {
    if (N == 0) {
        N = 10000;
    }
    
    id<MTLDevice> device = MTLCreateSystemDefaultDevice();
    id<MTLCommandQueue> commandQueue = [device newCommandQueue];
    
    if (!device || !commandQueue) {
        NSLog(@"Metal device or command queue could not be created");
        return;
    }
    
    NSLog(@"Initializing a & b");
    // Generate random NxN matrices
    float *a = calloc(N * N, sizeof(float));
    float *b = calloc(N * N, sizeof(float));
    
    for (NSInteger i = 0; i < N * N; i++) {
        a[i] = 1.0f;
        b[i] = 1.0f;
    }
    
    NSLog(@"a and b created\n");
    // Metal buffers for matrices
    id<MTLBuffer> aBuffer = [device newBufferWithBytes:a length:sizeof(float) * N * N options:MTLResourceStorageModeShared];
    id<MTLBuffer> bBuffer = [device newBufferWithBytes:b length:sizeof(float) * N * N options:MTLResourceStorageModeShared];
    
    NSLog(@"Starting matmul\n");
    for (NSInteger i = 1; i <= 10; i++) {
        NSLog(@"%ld\n", (long)i);
        
        // Create MPSMatrices
        MPSMatrixDescriptor *aMatrixDescriptor = [MPSMatrixDescriptor matrixDescriptorWithRows:N
                                                                                       columns:N
                                                                                      rowBytes:sizeof(float) * N
                                                                                      dataType:MPSDataTypeFloat32];
        MPSMatrixDescriptor *bMatrixDescriptor = [MPSMatrixDescriptor matrixDescriptorWithRows:N
                                                                                       columns:N
                                                                                      rowBytes:sizeof(float) * N
                                                                                      dataType:MPSDataTypeFloat32];
        
        MPSMatrix *aMatrix = [[MPSMatrix alloc] initWithBuffer:aBuffer descriptor:aMatrixDescriptor];
        MPSMatrix *bMatrix = [[MPSMatrix alloc] initWithBuffer:bBuffer descriptor:bMatrixDescriptor];
        
        // Matrix multiplication using MPSMatrixMultiplication
        MPSMatrixMultiplication *matrixMultiplication = [[MPSMatrixMultiplication alloc] initWithDevice:device
                                                                                          transposeLeft:NO
                                                                                         transposeRight:NO
                                                                                             resultRows:N
                                                                                          resultColumns:N
                                                                                       interiorColumns:N
                                                                                                 alpha:1.0
                                                                                                  beta:0.0];
        
        id<MTLBuffer> cBuffer = [device newBufferWithLength:sizeof(float) * N * N options:MTLResourceStorageModeShared];
        MPSMatrixDescriptor *cMatrixDescriptor = [MPSMatrixDescriptor matrixDescriptorWithRows:N
                                                                                       columns:N
                                                                                      rowBytes:sizeof(float) * N
                                                                                      dataType:MPSDataTypeFloat32];
        MPSMatrix *cMatrix = [[MPSMatrix alloc] initWithBuffer:cBuffer descriptor:cMatrixDescriptor];
        
        id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
        [matrixMultiplication encodeToCommandBuffer:commandBuffer
                                         leftMatrix:aMatrix
                                        rightMatrix:bMatrix
                                       resultMatrix:cMatrix];
        [commandBuffer commit];
        [commandBuffer waitUntilCompleted];
        
        // Check for NaNs in the result matrix
        float *cPointer = cBuffer.contents;
        for (NSInteger j = 0; j < N * N; j++) {
            if (isnan(cPointer[j])) {
                NSLog(@"NaN in iteration %ld", (long)i);
                free(a);
                free(b);
                return;
            }
        }
    }
    
    free(a);
    free(b);
}

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        NSInteger N = 10000;
        if (argc > 1) {
            N = atoi(argv[1]);
        }
        performMatrixMultiplication(N);
    }
    return 0;
}
❯ clang mps.m -o mps -framework Foundation -framework Metal -framework MetalPerformanceShaders -fobjc-arc -mmacosx-version-min=10.13

❯ ./mps
2024-07-13 12:23:11.771 mps[54256:2493528] Initializing a & b
2024-07-13 12:23:11.931 mps[54256:2493528] a and b created
2024-07-13 12:23:12.001 mps[54256:2493528] Starting matmul
2024-07-13 12:23:12.001 mps[54256:2493528] 1
2024-07-13 12:23:13.933 mps[54256:2493528] 2
2024-07-13 12:23:15.477 mps[54256:2493528] 3
2024-07-13 12:23:16.997 mps[54256:2493528] 4
2024-07-13 12:23:18.440 mps[54256:2493528] NaN in iteration 4

@tgymnich
Copy link
Member

Should we just file a radar / feedback?

@maleadt
Copy link
Member

maleadt commented Jul 13, 2024

I'll have a better look first and forward it to our Apple contact.

@maleadt
Copy link
Member

maleadt commented Aug 28, 2024

Apparently this looks like an ARC bug. Curiously, the ObjC reproducer is "fixed" by adding an @autoreleasepool around the for loop body, but the same doesn't hold in Julia (in fact, the original issue was calling into mul! which is already marked @autoreleasepool).

Of course, the Julia MWE is more complex, as the @assert !any(isnan.(d)) involves two additional kernels...

Still broken Julia MWE
using Metal, LinearAlgebra
using ObjectiveC, .Foundation

function main(T=Float32, N=10000)
    a = Metal.rand(T, N, N)
    b = Metal.rand(T, N, N)
    synchronize()

    for i in 1:100
        @autoreleasepool begin
            println("Iteration $i")
            d = Metal.zeros(T, size(a))
            MPS.matmul!(d, a, b, #=alpha=#true, #=beta=#false,
                        #=transpose_a=#false, #=transpose_b=#false)
            @assert !any(isnan.(Array(d))) "NaN in iteration $i"

            # XXX: this redundant check is needed, or the failure never occurs
            @assert !any(isnan.(d))
        end
    end
end

isinteractive() || main()
"Fixed" ObjeC MWE
#import <Foundation/Foundation.h>
#import <Metal/Metal.h>
#import <MetalPerformanceShaders/MetalPerformanceShaders.h>

void performMatrixMultiplication(NSInteger N) {
    if (N == 0) {
        N = 10000;
    }

    id<MTLDevice> device = MTLCreateSystemDefaultDevice();
    id<MTLCommandQueue> commandQueue = [device newCommandQueue];

    if (!device || !commandQueue) {
        NSLog(@"Metal device or command queue could not be created");
        return;
    }

    NSLog(@"Initializing a & b");
    // Generate random NxN matrices
    float *a = calloc(N * N, sizeof(float));
    float *b = calloc(N * N, sizeof(float));

    for (NSInteger i = 0; i < N * N; i++) {
        a[i] = 1.0f;
        b[i] = 1.0f;
    }

    NSLog(@"a and b created\n");
    // Metal buffers for matrices
    id<MTLBuffer> aBuffer = [device newBufferWithBytes:a length:sizeof(float) * N * N options:MTLResourceStorageModeShared];
    id<MTLBuffer> bBuffer = [device newBufferWithBytes:b length:sizeof(float) * N * N options:MTLResourceStorageModeShared];

    NSLog(@"Starting matmul\n");
    for (NSInteger i = 1; i <= 100; i++) {
        @autoreleasepool {
            NSLog(@"Iteration %ld\n", (long)i);

            // Create MPSMatrices
            MPSMatrixDescriptor *aMatrixDescriptor = [MPSMatrixDescriptor matrixDescriptorWithRows:N
                                                                                           columns:N
                                                                                          rowBytes:sizeof(float) * N
                                                                                          dataType:MPSDataTypeFloat32];
            MPSMatrixDescriptor *bMatrixDescriptor = [MPSMatrixDescriptor matrixDescriptorWithRows:N
                                                                                           columns:N
                                                                                          rowBytes:sizeof(float) * N
                                                                                          dataType:MPSDataTypeFloat32];

            MPSMatrix *aMatrix = [[MPSMatrix alloc] initWithBuffer:aBuffer descriptor:aMatrixDescriptor];
            MPSMatrix *bMatrix = [[MPSMatrix alloc] initWithBuffer:bBuffer descriptor:bMatrixDescriptor];

            // Matrix multiplication using MPSMatrixMultiplication
            MPSMatrixMultiplication *matrixMultiplication = [[MPSMatrixMultiplication alloc] initWithDevice:device
                                                                                              transposeLeft:NO
                                                                                             transposeRight:NO
                                                                                                 resultRows:N
                                                                                              resultColumns:N
                                                                                           interiorColumns:N
                                                                                                     alpha:1.0
                                                                                                      beta:0.0];

            id<MTLBuffer> cBuffer = [device newBufferWithLength:sizeof(float) * N * N options:MTLResourceStorageModeShared];
            MPSMatrixDescriptor *cMatrixDescriptor = [MPSMatrixDescriptor matrixDescriptorWithRows:N
                                                                                           columns:N
                                                                                          rowBytes:sizeof(float) * N
                                                                                          dataType:MPSDataTypeFloat32];
            MPSMatrix *cMatrix = [[MPSMatrix alloc] initWithBuffer:cBuffer descriptor:cMatrixDescriptor];

            id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
            [matrixMultiplication encodeToCommandBuffer:commandBuffer
                                             leftMatrix:aMatrix
                                            rightMatrix:bMatrix
                                           resultMatrix:cMatrix];
            [commandBuffer commit];
            [commandBuffer waitUntilCompleted];

            // Check for NaNs in the result matrix
            float *cPointer = cBuffer.contents;
            for (NSInteger j = 0; j < N * N; j++) {
                if (isnan(cPointer[j])) {
                    NSLog(@"NaN in iteration %ld", (long)i);
                    free(a);
                    free(b);
                    return;
                }
            }
        }
    }

    free(a);
    free(b);
}

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        NSInteger N = 10000;
        if (argc > 1) {
            N = atoi(argv[1]);
        }
        performMatrixMultiplication(N);
    }
    return 0;
}

@tgymnich
Copy link
Member

Couldn't reproduce the ObjectiveC case today with and without autoreleasepool.
Swift and Julia were still reproducible.

@christiangnrd
Copy link
Contributor

christiangnrd commented Aug 28, 2024

I can reproduce the error in both Swift and ObjectiveC and it goes away when surrounded by an autoreleasepool block in both languages.

@tgymnich
Copy link
Member

Oops. I just overlooked the second autoreleasepool. The first one is actually not necessary (at least to hide our bug.)

@christiangnrd
Copy link
Contributor

christiangnrd commented Aug 28, 2024

By "the first one" do you mean the autoreleasepool in main?

@christiangnrd
Copy link
Contributor

I'm able to reproduce this without the second redundant check.

Still broken simpler Julia MWE
using Metal, LinearAlgebra
using ObjectiveC, .Foundation

function main(T=Float32, N=10000)
    a = Metal.rand(T, N, N)
    b = Metal.rand(T, N, N)
    synchronize()

    for i in 1:100
        # @autoreleasepool begin
        begin
            println("Iteration $i")
            d = Metal.zeros(T, size(a))
            MPS.matmul!(d, a, b, #=alpha=#true, #=beta=#false,
                        #=transpose_a=#false, #=transpose_b=#false)
            @assert !any(isnan.(Array(d))) "NaN in iteration $i"
        end
    end
end

isinteractive() || main()

@tgymnich
Copy link
Member

tgymnich commented Sep 25, 2024

Our NSAutoreleasePool seems to contain roughly the same objects before the nan check compared to the objc version from above. Most obvious difference is that the correct objc version has a CaptureMTLDevice and a AGXG13XFamilyComputeContext and we have a AGXG13XFamilyCommandBuffer (could be debug / xcode related).

iteration 1
objc[6905]: ##############
objc[6905]: AUTORELEASE POOLS for thread 0x203b9b240
objc[6905]: 77 releases pending.
objc[6905]: [0x14300d000]  ................  PAGE  (hot) (cold)
objc[6905]: [0x14300d038]  ################  POOL 0x14300d038
objc[6905]: [0x14300d040]    0x6000004c4860  __NSSingleEntryDictionaryI
objc[6905]: [0x14300d048]    0x6000027ccd20  NSBundle  autorelease count 2
objc[6905]: [0x14300d050]    0x6000004cfd40  __NSDictionaryM  autorelease count 2
objc[6905]: [0x14300d058]    0x600002fc8690  MTLCommandQueueDescriptorInternal
objc[6905]: [0x14300d060]    0x600000ac0090  NSUserDefaults  autorelease count 4
objc[6905]: [0x14300d068]    0x6000004c4b20  __NSSingleEntryDictionaryI
objc[6905]: [0x14300d070]    0x6000004d4660  __NSSingleEntryDictionaryI
objc[6905]: [0x14300d078]    0x6000004d4220  __NSSingleEntryDictionaryI
objc[6905]: [0x14300d080]    0x6000004d46a0  __NSSingleEntryDictionaryI
objc[6905]: [0x14300d088]  ################  POOL 0x14300d088
objc[6905]: [0x14300d090]    0x6000011ce180  MPSMatrixDescriptor
objc[6905]: [0x14300d098]    0x6000011cde00  MPSMatrixDescriptor
objc[6905]: [0x14300d0a0]       0x145809000  AGXG13XDevice  autorelease count 15
objc[6905]: [0x14300d0a8]       0x144105550  CaptureMTLDevice  autorelease count 4
objc[6905]: [0x14300d0b0]    0x6000011cc540  __NSCFString
objc[6905]: [0x14300d0b8]    0x600002ac8d80  __NSCFString
objc[6905]: [0x14300d0c0]    0x600003dcc540  NSPathStore2
objc[6905]: [0x14300d0c8]    0x6000011cc600  __NSBundleTables  autorelease count 3
objc[6905]: [0x14300d0d0]    0x6000027dc140  NSBundle  autorelease count 2
objc[6905]: [0x14300d0d8]    0x6000027cd0e0  NSBundle
objc[6905]: [0x14300d0e0]    0x6000020cc480  NSURL
objc[6905]: [0x14300d0e8]    0x6000035cc500  __NSCFString
objc[6905]: [0x14300d0f0]    0x6000004dd4e0  NSFileManager
objc[6905]: [0x14300d0f8]    0x6000020cc5a0  NSURL
objc[6905]: [0x14300d100]    0x6000035cc280  __NSCFString
objc[6905]: [0x14300d108]    0x6000035cc6e0  __NSCFString  autorelease count 2
objc[6905]: [0x14300d110]    0x6000004df3e0  NSConcreteData
objc[6905]: [0x14300d118]    0x6000027cd810  Swift.__StringStorage
objc[6905]: [0x14300d120]    0x6000027cd860  Swift.__StringStorage
objc[6905]: [0x14300d128]    0x6000027cd8b0  Swift.__StringStorage
objc[6905]: [0x14300d130]    0x6000027cd900  Swift.__StringStorage
objc[6905]: [0x14300d138]    0x6000027cd950  Swift.__StringStorage
objc[6905]: [0x14300d140]    0x6000027cd9a0  Swift.__StringStorage
objc[6905]: [0x14300d148]    0x6000035cc6e0  __NSCFString  autorelease count 6
objc[6905]: [0x14300d150]    0x6000011d4980  MPSMatrixDescriptor
objc[6905]: [0x14300d158]       0x144105550  CaptureMTLDevice  autorelease count 2
objc[6905]: [0x14300d160]    0x6000036ceeb0  AGXG13XFamilyComputeContext
objc[6905]: [0x14300d168]    0x6000011d4b80  __NSCFString
objc[6905]: [0x14300d170]    0x600002acaf80  __NSCFString
objc[6905]: [0x14300d178]    0x600003dcc2a0  NSPathStore2
objc[6905]: [0x14300d180]    0x6000011cc600  __NSBundleTables  autorelease count 3
objc[6905]: [0x14300d188]    0x6000027cd0e0  NSBundle
objc[6905]: [0x14300d190]    0x6000027dc140  NSBundle
objc[6905]: [0x14300d198]    0x6000027cda90  NSBundle  autorelease count 2
objc[6905]: [0x14300d1a0]    0x6000020cc600  NSURL
objc[6905]: [0x14300d1a8]    0x6000035cc8c0  __NSCFString
objc[6905]: [0x14300d1b0]    0x6000004df980  NSFileManager
objc[6905]: [0x14300d1b8]    0x6000020cc6c0  NSURL
objc[6905]: [0x14300d1c0]    0x6000035ccb40  __NSCFString
objc[6905]: [0x14300d1c8]    0x6000035cc960  __NSCFString  autorelease count 2
objc[6905]: [0x14300d1d0]    0x6000004d1e40  NSConcreteData
objc[6905]: [0x14300d1d8]    0x6000027cdbd0  Swift.__StringStorage
objc[6905]: [0x14300d1e0]    0x6000027cdc20  Swift.__StringStorage
objc[6905]: [0x14300d1e8]    0x6000027cdc70  Swift.__StringStorage
objc[6905]: [0x14300d1f0]    0x6000027cdcc0  Swift.__StringStorage
objc[6905]: [0x14300d1f8]    0x6000027cdd10  Swift.__StringStorage
objc[6905]: [0x14300d200]    0x6000027cdd60  Swift.__StringStorage
objc[6905]: [0x14300d208]    0x6000035cc960  __NSCFString  autorelease count 6
objc[6905]: [0x14300d210]       0x144105550  CaptureMTLDevice  autorelease count 2
objc[6905]: [0x14300d218]    0x600000a80330  __NSArrayM
objc[6905]: [0x14300d220]    0x600000a80360  __NSArrayM
objc[6905]: [0x14300d228]    0x6000004d2f40  __NSCFString
objc[6905]: [0x14300d230]    0x6000004d2ec0  __NSCFString
objc[6905]: [0x14300d238]    0x6000004d2ee0  __NSCFString
objc[6905]: [0x14300d240]    0x6000004d2f00  __NSCFString
objc[6905]: [0x14300d248]    0x6000008e7240  __NSCFString
objc[6905]: [0x14300d250]    0x6000008e7090  __NSCFString
objc[6905]: [0x14300d258]    0x6000008e70c0  __NSCFString
objc[6905]: [0x14300d260]    0x6000008e70f0  __NSCFString
objc[6905]: [0x14300d268]    0x6000004d2f20  __NSCFString
objc[6905]: [0x14300d270]    0x6000004d2ea0  __NSCFString
objc[6905]: [0x14300d278]    0x6000004d2fc0  __NSCFString
objc[6905]: [0x14300d280]    0x6000008e6e20  __NSArrayM
objc[6905]: [0x14300d288]    0x6000004d3140  __NSCFNumber
objc[6905]: [0x14300d290]       0x14304b800  __NSCFString
objc[6905]: [0x14300d298]    0x6000020cc780  MTLComputePipelineReflectionInternal
objc[6905]: ##############
iteration 2
objc[36563]: ##############
objc[36563]: AUTORELEASE POOLS for thread 0x203b9b240
objc[36563]: 16 releases pending.
objc[36563]: [0x14080a000]  ................  PAGE  (hot) (cold)
objc[36563]: [0x14080a038]  ################  POOL 0x14080a038
objc[36563]: [0x14080a040]    0x600001f3c5a0  __NSSingleEntryDictionaryI
objc[36563]: [0x14080a048]    0x600003c202d0  NSBundle  autorelease count 2
objc[36563]: [0x14080a050]    0x600001f2e7a0  __NSDictionaryM  autorelease count 2
objc[36563]: [0x14080a058]    0x60000342c0e0  MTLCommandQueueDescriptorInternal
objc[36563]: [0x14080a060]    0x60000112c2a0  NSUserDefaults  autorelease count 4
objc[36563]: [0x14080a068]    0x600001f3cb00  __NSSingleEntryDictionaryI
objc[36563]: [0x14080a070]    0x600001f3c4e0  __NSSingleEntryDictionaryI
objc[36563]: [0x14080a078]    0x600001f3cac0  __NSSingleEntryDictionaryI
objc[36563]: [0x14080a080]    0x600001f3cae0  __NSSingleEntryDictionaryI
objc[36563]: [0x14080a088]  ################  POOL 0x14080a088
objc[36563]: [0x14080a090]    0x600000aa2740  MPSMatrixDescriptor
objc[36563]: [0x14080a098]    0x600000aa2780  MPSMatrixDescriptor
objc[36563]: [0x14080a0a0]    0x600000a21040  MPSMatrixDescriptor
objc[36563]: [0x14080a0a8]       0x141005410  CaptureMTLDevice  autorelease count 6
objc[36563]: [0x14080a0b0]    0x600002d24510  AGXG13XFamilyComputeContext
objc[36563]: ##############
Iteration 1
objc[6186]: ##############
objc[6186]: AUTORELEASE POOLS for thread 0x203b9b240
objc[6186]: 20 releases pending.
objc[6186]: [0x12e00b000]  ................  PAGE  (hot) (cold)
objc[6186]: [0x12e00b038]       0x12d20c0f0  _NSSwiftProcessInfo
objc[6186]: [0x12e00b040]       0x12d304d20  Swift.__SwiftDeferredNSArray
objc[6186]: [0x12e00b048]       0x12d304f30  __NSCFCharacterSet
objc[6186]: [0x12e00b050]       0x12d3061e0  __NSCFString
objc[6186]: [0x12e00b058]       0x12c64cdf0  __NSCFString
objc[6186]: [0x12e00b060]       0x12c79d6b0  __NSCFString
objc[6186]: [0x12e00b068]  ################  POOL 0x12e00b068
objc[6186]: [0x12e00b070]       0x11c635370  __NSCFString
objc[6186]: [0x12e00b078]       0x141619730  MPSMatrixDescriptor
objc[6186]: [0x12e00b080]       0x1491eabe0  MPSMatrixDescriptor
objc[6186]: [0x12e00b088]       0x14911b550  MPSMatrixDescriptor
objc[6186]: [0x12e00b090]       0x1496f2b40  __NSCFString
objc[6186]: [0x12e00b098]       0x1491a0d80  __NSCFString
objc[6186]: [0x12e00b0a0]       0x13b718b50  __NSBundleTables
objc[6186]: [0x12e00b0a8]       0x12d33d8e0  NSBundle  autorelease count 3
objc[6186]: [0x12e00b0b0]       0x149152250  NSURL
objc[6186]: [0x12e00b0b8]       0x149111be0  __NSCFString
objc[6186]: [0x12e00b0c0]       0x14913f620  AGXG13XFamilyCommandBuffer
objc[6186]: [0x12e00b0c8]       0x14977a970  __NSArrayM
objc[6186]: [0x12e00b0d0]       0x14978b090  __NSArrayM
objc[6186]: ##############
Iteration 2
objc[6186]: ##############
objc[6186]: AUTORELEASE POOLS for thread 0x203b9b240
objc[6186]: 12 releases pending.
objc[6186]: [0x12e00b000]  ................  PAGE  (hot) (cold)
objc[6186]: [0x12e00b038]       0x12d20c0f0  _NSSwiftProcessInfo
objc[6186]: [0x12e00b040]       0x12d304d20  Swift.__SwiftDeferredNSArray
objc[6186]: [0x12e00b048]       0x12d304f30  __NSCFCharacterSet
objc[6186]: [0x12e00b050]       0x12d3061e0  __NSCFString
objc[6186]: [0x12e00b058]       0x12c64cdf0  __NSCFString
objc[6186]: [0x12e00b060]       0x12c79d6b0  __NSCFString
objc[6186]: [0x12e00b068]  ################  POOL 0x12e00b068
objc[6186]: [0x12e00b070]       0x12c7ff7d0  __NSCFString
objc[6186]: [0x12e00b078]       0x13b7c3da0  MPSMatrixDescriptor
objc[6186]: [0x12e00b080]       0x13b7fc3b0  MPSMatrixDescriptor
objc[6186]: [0x12e00b088]       0x13b714930  MPSMatrixDescriptor
objc[6186]: [0x12e00b090]       0x148cb99a0  AGXG13XFamilyCommandBuffer
objc[6186]: ##############
[NSAutoreleasePool showPools]

@christiangnrd
Copy link
Contributor

Apparently this looks like an ARC bug.

Are we using ARC in Julia?

@tgymnich
Copy link
Member

We don’t use ARC, but the libraries we are using might have been compiled with ARC enabled.

@christiangnrd
Copy link
Contributor

christiangnrd commented Sep 26, 2024

When I turned off ARC in XCode for the objc version, even with the @autoreleasepool blocks the NaNs show up.

@maleadt
Copy link
Member

maleadt commented Sep 26, 2024

When I turned off ARC in XCode for the objc version, even with the @autoreleasepool blocks the NaNs show up.

AFAIU -fobjc-arc make the compiler automatically insert release/retain/autorelease calls, and doesn't affect how precompiled libraries like MPS may behave.

@christiangnrd
Copy link
Contributor

When I turned off ARC in XCode for the objc version, even with the @autoreleasepool blocks the NaNs show up.

AFAIU -fobjc-arc make the compiler automatically insert release/retain/autorelease calls, and doesn't affect how precompiled libraries like MPS may behave.

That's my understanding too. However, from what I understand about our implementation of the @autoreleasepool macro, we're using an NSAutoreleasePool object and a [pool release]; statement at the end, which according to the documentation, isn't possible with ARC on. By turning ARC off for the objc version, I was trying to reproduce the conditions of the failing Julia code.

The only thing is that I don't know it this information is actually helpful.

@tgymnich tgymnich removed the bug label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Out of our hands
Projects
None yet
Development

No branches or pull requests

4 participants