Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: hipMalloc() allocations over 4GB redirected to shared GPU memory #3644

Open
7shi opened this issue Oct 20, 2024 · 6 comments
Open

Comments

@7shi
Copy link

7shi commented Oct 20, 2024

Problem Description

While testing memory allocation using hipMalloc() in a HIP environment, I observed an unexpected behavior when attempting to allocate large chunks of memory.

Key Observations:

  1. Allocations up to 4GB appear to use GPU VRAM as expected.
  2. Attempts to allocate more than 4GB in a single hipMalloc() call result in the allocation being redirected to shared GPU memory instead of dedicated VRAM.

Implications:

This behavior could significantly impact performance for applications requiring large memory allocations, as shared GPU memory is typically slower than dedicated VRAM.

Operating System

Windows 11 Home (10.0.22631)

CPU

AMD Ryzen 5 5600X 6-Core Processor

GPU

AMD Radeon RX 7600 XT

ROCm Version

ROCm 6.1.0

ROCm Component

HIP

Steps to Reproduce

I used a C++ program that incrementally allocates memory using hipMalloc(), starting from 512MB and increasing by 512MB increments up to 4GB and beyond. The program monitors the success of each allocation and the type of memory used.

#include <hip/hip_runtime.h>
#include <iostream>
#include <vector>
#include <string>
#include <thread>

int main() {
    hipDevice_t device;
    hipDeviceProp_t props;
    hipGetDevice(&device);
    hipGetDeviceProperties(&props, device);

    uint64_t totalVRAM = static_cast<uint64_t>(props.totalGlobalMem);
    size_t totalVRAM_MB = static_cast<size_t>(totalVRAM >> 20);
    size_t maxAllocationMB = 4096;

    std::cout << "Total VRAM: " << totalVRAM_MB << "MB" << std::endl;
    std::cout << "Will try to allocate up to: " << maxAllocationMB << "MB" << std::endl;

    for (size_t i = 512; i <= maxAllocationMB; i += 512) {
        void* ptr;
        hipError_t status = hipMalloc(&ptr, i << 20);
        if (status != hipSuccess) {
            std::cout << "Allocation failed: " << hipGetErrorString(status) << std::endl;
            break;
        }
        std::cout << "Allocated " << i << "MB" << std::endl;
        if (i < maxAllocationMB) {
            std::this_thread::sleep_for(std::chrono::seconds(5));
        } else {
            // Pause before freeing memory
            std::cout << "Press Enter to free memory and exit..." << std::endl;
            std::string dummy;
            std::getline(std::cin, dummy);
        }
        hipFree(ptr);
    }
}

Result:
image

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

hipconfig --full

HIP version  : 6.1.40252-53f3e11ac

== hipconfig
HIP_PATH     : C:/Program Files/AMD/ROCm/6.1/
ROCM_PATH    : /opt/rocm
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I"C:/Program Files/AMD/ROCm/6.1//include" -I"C:\Program Files\AMD\ROCm\6.1\lib\clang\19
"

== hip-clang
HIP_CLANG_PATH   : C:/Program Files/AMD/ROCm/6.1//bin
clang version 19.0.0git ([email protected]:Compute-Mirrors/llvm-project b3dbdf4f03718d63a3292f784216fddb3e73d521)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\AMD\ROCm\6.1\\bin
AOMP-18.0-12 (http://github.com/ROCm-Developer-Tools/aomp):
 Source ID:18.0-12-ce1873ac686bb90ddec72bb99889a4e80e2de382
  LLVM version 19.0.0git
  Optimized build.
  Default target: x86_64-pc-windows-msvc
  Host CPU: znver3

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -O3 
hip-clang-ldflags  : --driver-mode=g++ -O3 -fuse-ld=lld --ld-path="C:\Program Files\AMD\ROCm\6.1\bin/lld-link.exe" -Llib --hip-link 

=== Environment Variables
PATH=C:\Program Files\AMD\ROCm\6.1\bin;C:\Program Files\ninja;C:\VulkanSDK\1.3.296.0\Bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\Git\cmd;C:\Program Files\CMake\bin;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Strawberry\c\bin;C:\Strawberry\perl\site\bin;C:\Strawberry\perl\bin;C:\Users\7shi\.cargo\bin;C:\Users\7shi\AppData\Local\Microsoft\WindowsApps;;C:\Users\7shi\AppData\Local\Programs\Ollama;C:\Users\7shi\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\7shi\.dotnet\tools
HIPCC="C:\Program Files\AMD\ROCm\6.1\bin\hipcc"
HIPCONFIG="C:\Program Files\AMD\ROCm\6.1\bin\hipconfig"
HIP_PATH=C:\Program Files\AMD\ROCm\6.1\
HIP_PATH_61=C:\Program Files\AMD\ROCm\6.1\

== Windows Display Drivers
Hostname     : rx7600xt
Advanced Micro Devices, Inc.  C:\Windows\System32\DriverStore\FileRepository\u0407465.inf_amd64_25af0ae2eb807985\B407987\atidx9loader64.dll,C:\Windows\System32\DriverStore\FileRepository\u0407465.inf_amd64_25af0ae2eb807985\B407987\amdxx64.dll,C:\Windows\System32\DriverStore\FileRepository\u0407465.inf_amd64_25af0ae2eb807985\B407987\amdxx64.dll,C:\Windows\System32\DriverStore\FileRepository\u0407465.inf_amd64_25af0ae2eb807985\B407987\amdxc64.dll  AMD Radeon RX 7600 XT  
@harkgill-amd
Copy link

Hi @7shi, an internal ticket has been created to investigate this. Thanks!

@zichguan-amd
Copy link

Hi @7shi, I cannot reproduce the problem. Can you verify that you have the latest version 24.10.16 of the display driver?

@7shi
Copy link
Author

7shi commented Oct 22, 2024

Hi @zichguan-amd, Thank you for your response.

I updated my driver from 24.9.1 to 24.10.1 (latest available version), but the issue still persists. Note that I couldn't locate version 24.10.16 you mentioned.

For background context (outside the scope of this issue): I initially encountered this while investigating LLM performance issues in my environment. After investigation, I traced the root cause to this hipMalloc behavior: when running 8B models in Ollama, it forces the use of shared GPU memory despite having available VRAM, resulting in slower performance than CPU-only computation.

image

@7shi
Copy link
Author

7shi commented Oct 23, 2024

I was informed that this issue doesn't occur in 24.8.1, but appears in both 24.9.1 and 24.10.1. The version 24.10.16 you mentioned doesn't seem to be publicly available yet, but I hope it will be fixed in that release.
ollama/ollama#7107

@zichguan-amd
Copy link

I can confirm this issue with Adrenalin 24.10.1 and 24.9.1. I'll look into it. Changing the driver should fix this. You can either downgrade to Adrenalin 24.8.1, or use the latest Pro edition driver that comes bundled with HIP sdk, which is 24.Q2 or 24.10.16 that I mentioned in the previous comment.

@7shi
Copy link
Author

7shi commented Oct 23, 2024

Thank you for investigating this issue and providing the workarounds.

I've confirmed that hipMalloc() works correctly with Adrenalin 24.8.1. Given that Ollama also works properly with this version, it reinforces that the hipMalloc() behavior was indeed the root cause.

I wasn't aware that Pro drivers could be used with Radeon RX. I've tested 24.Q2 and can confirm it works correctly. While I couldn't figure out how to obtain 24.10.16, I plan to use 24.Q2 for now.

Since most users would typically install Adrenalin, I hope this issue will be resolved in future versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants