Analysis Llama.cpp

This blog is about the code structure analysis of llama.cpp project.

Compilation

This project supports Makefile and CMakeLists.txt, while I don’t understand Makefile well, I’ll use the latter one to help understand the compilation of Make.

The most important example given by the LLM deployment framework is llama-cli. The usage of llama-cli can be found in https://github.com/ggerganov/llama.cpp/tree/master/examples/main.

It’s CMake configuration shows in examples/main/CMakeLists.txt:

1
2
3
4
5
set(TARGET llama-cli)
add_executable(${TARGET} main.cpp)
install(TARGETS ${TARGET} RUNTIME)
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_11)

The source file to compile llama-cli is examples/main/main.cpp. It shows the execution file depends on common and llama libraries.

We could easily found the llama library in src/CMakeLists.txt, it is built by the most important backend source file src/llama.cpp, and the public header file include/llama.h. I may analyze the how the backend works later, while in this blog, I mainly focus the logic of llama-cli execution, and then I want to modify it. The library common is in the folder common, from the CMake code we can know it contains basic functions of the project.

main.cpp

Let’s dive into the main.cpp to find how it woks.

In the chat_llama3.sh:

1
2
3
4
5
6
7
8
9
FIRST_INSTRUCTION=$2
SYSTEM_PROMPT="You are a helpful assistant."

./llama-cli -m $1 --color -i \
-c 0 -t 6 --temp 0.2 --repeat_penalty 1.1 -ngl 40 \
-r '<|eot_id|>' \
--in-prefix '<|start_header_id|>user<|end_header_id|>\n\n' \
--in-suffix '<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n' \
-p "<|start_header_id|>system<|end_header_id|>\n\n$SYSTEM_PROMPT\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n$FIRST_INSTRUCTION\n\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

Debug main.cpp

Firstly, we need to compile in Debug mode. We can see -g in the CXXFLAGS.

1
2
make clean
make GGML_CUDA=1 LLAMA_DEBUG=1 -j32

Then, create a launch.json to use VSCode to debug with UI:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
"version": "0.2.0",
"configurations": [
{
"name": "gdb",
"type": "cppdbg",
"request": "launch",
"program": "${workspaceFolder}/llama-cli",
"args": [
"-m",
"/home/sjl/work/llama.cpp/model/lllama-3-chinese-8b-instruct-v3-ggml-model-q8_0.gguf",
"--color",
"-i",
"-c",
"0",
"-t",
"6",
"--temp",
"0.2",
"--repeat_penalty",
"1.1",
"-ngl",
"999",
"-r",
"'<|eot_id|>'",
"--in-prefix",
"'<|start_header_id|>user<|end_header_id|>\\n\\n'",
"--in-suffix",
"'<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n'",
"-p",
"'<|start_header_id|>system<|end_header_id|>\\n\\nYou are a helpful assistant.\\n\\n<|eot_id|><|start_header_id|>user<|end_header_id|>\\n\\n介绍一下北京\\n\\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n'",
],
"stopAtEntry": true,
"cwd": "${fileDirname}",
"environment": [],
"externalConsole": false,
"MIMode": "gdb",
}
]
}

Reference