Jetson Orin Nano Super

SUB和官方板记录。

Yahboom SUB 版本 8 GB Jetson Orin Nano

Super 升级

  1. 升级 Jetpack 6.2
  2. 制作 SSD
  3. 文件系统扩容

进系统后

  • 账密:jetson/yahboom

  • 没有浏览器

    1
    sudo apt-get install firefox
  • 没有 Wifi 驱动

    1
    2
    sudo apt-get install iwlwifi-modules
    # reboot

Nvidia 案例

NVIDIA Jetson AI Lab

测试

调整超频,算力翻倍

Power Mode

Jtop

总览
Cuda等

Ollama

Yahboom 已经自带 Ollama

cuda-samples

cuda-samples/Samples/1_Utilities/bandwidthTest

因为计算能力不匹配的更改:

1
2
3
4
5
6
7
8
9
10
11
12
13
diff --git a/Samples/1_Utilities/bandwidthTest/CMakeLists.txt b/Samples/1_Utilities/bandwidthTest/CMakeLists.txt
index 04375e50..ff3ee2bb 100644
--- a/Samples/1_Utilities/bandwidthTest/CMakeLists.txt
+++ b/Samples/1_Utilities/bandwidthTest/CMakeLists.txt
@@ -8,7 +8,7 @@ find_package(CUDAToolkit REQUIRED)

set(CMAKE_POSITION_INDEPENDENT_CODE ON)

-set(CMAKE_CUDA_ARCHITECTURES 50 52 60 61 70 72 75 80 86 87 89 90 100 101 120)
+set(CMAKE_CUDA_ARCHITECTURES 50 52 60 61 70 72 75 80 86 87 89 90)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Wno-deprecated-gpu-targets")
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
# set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -G") # enable cuda-gdb (expensive)

测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

Device 0: Orin
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 18.0

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 16.7

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 49.8

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

NVBandwidthTest

因为不匹配的修改:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
diff --git a/inline_common.h b/inline_common.h
index 76cccb3..d16c0ce 100644
--- a/inline_common.h
+++ b/inline_common.h
@@ -115,8 +115,8 @@ std::ostream &operator<<(std::ostream &o, const PeerValueMatrix<T> &matrix) {

// NUMA optimal affinity
inline void setOptimalCpuAffinity(int cudaDeviceID) {
-#ifdef _WIN32
- // NVML doesn't support setting affinity on Windows
+#if defined(_WIN32) || defined(__aarch64__)
+ // NVML doesn't support setting affinity on Windows or Jetson
return;
#endif
if (disableAffinity) {
diff --git a/nvbandwidth.cpp b/nvbandwidth.cpp
index ade37c1..93e666c 100644
--- a/nvbandwidth.cpp
+++ b/nvbandwidth.cpp
@@ -270,8 +270,10 @@ int main(int argc, char **argv) {

CU_ASSERT(cuDriverGetVersion(&cudaVersion));

- char driverVersion[NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE];
- NVML_ASSERT(nvmlSystemGetDriverVersion(driverVersion, NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE));
+ // char driverVersion[NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE];
+ // NVML_ASSERT(nvmlSystemGetDriverVersion(driverVersion, NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE));
+ char driverVersion[NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE] = {0};
+ nvmlSystemGetDriverVersion(driverVersion, NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE);

output->addCudaAndDriverInfo(cudaVersion, driverVersion);

测试结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
./nvbandwidth -t host_to_device_memcpy_ce
nvbandwidth Version: v0.7
Built from Git version: v0.7

CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 540.4.0

Device 0: Orin (00000000:00:00)

Running host_to_device_memcpy_ce.
memcpy CE CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 28.47

./nvbandwidth -t host_to_device_memcpy_sm
nvbandwidth Version: v0.7
Built from Git version: v0.7

CUDA Runtime Version: 12060
CUDA Driver Version: 12060
Driver Version: 540.4.0

Device 0: Orin (00000000:00:00)

Running host_to_device_memcpy_sm.
memcpy SM CPU(row) -> GPU(column) bandwidth (GB/s)
0
0 40.44

参考 Support for Jetson GPUs (Jetpack 6.0+, Orin)