LLVM compiler infrastructure is powerful because of its modular design, flexibility, and rich intermediate representation (IR) that enables deep analysis and transformation of code. Unlike traditional compilers, LLVM separates the front end (language parsing) from the back end (code generation), allowing developers to support multiple languages and targets with minimal duplication. Its IR is language-agnostic and designed for optimization, making it ideal for building advanced tooling such as static analyzers, custom code generators, obfuscators, and JIT compilers. Additionally, LLVM’s extensive ecosystem—including Clang, LLD, and MLIR (Multi-Level Intermediate Representation)—makes it a robust foundation for research, experimentation, and production-grade compiler development.
Clang/Clang++ is a frontend built on top of LLVM that parses and compiles source code written in C, C++, Objective-C, and Objective-C++. It translates this code into LLVM IR, which is then processed by LLVM’s backend. Together, Clang and LLVM form a flexible and extensible toolchain capable of supporting a wide range of languages, platforms, and custom compiler features.
In this post, we will customize the LLVM compiler infrastructure to build a solution that enables self-masking capabilities for ordinary user-defined functions in a C++ source file. Self-masking means that a function remains in a masked (obfuscated or encrypted) state until it is invoked. Once execution enters the function, it is temporarily unmasked, and upon returning, it reverts back to its masked state.
The Clang compiler, built on top of this customized LLVM, consistently generates binaries in which all functions are equipped with this masking behavior. Although the functions are not masked at compile time, the runtime design ensures that they are expected to be in a masked state during execution. This approach enhances binary protection and obfuscation without altering the original control flow or affecting the handling of arguments and return values.
To impart self-masking capabilities to ordinary functions, we must manipulate the program’s control flow by injecting custom code during compilation.
It is crucial that these modifications do not alter the original control flow or interfere with the handling of function arguments and return values.
In this section, we will explore how to implement custom prologue and epilogue stubs for each function within a compilation unit. These stubs will manage execution flow and invoke a dedicated masking handler responsible for performing masking and unmasking operations.
The custom Clang compiler we develop will integrate these transformations directly into the compiled binary. However, functions within the binary will remain unmasked at compile time. Instead, the design assumes that all functions are in a masked state at runtime.
To enforce this, we introduce an additional component that handles entry point redirection and performs pre-CRT execution logic. This ensures that the original function bodies and their epilogues remain masked throughout the binary’s lifetime in memory.
The PE file generated by our custom Clang++ compiler includes two custom sections, as illustrated in the image below:

.funcmeta section stores the XOR key along with metadata required to identify the start and length of each function. We will explore the structure and purpose of this section in detail in the following sections. A high-level data layout is illustrated in the image below:
.stub section contains shellcode that serves as the entry point for the PE file and is executed immediately upon program launch.Our custom LLVM implementation does not indiscriminately mask every function within a compilation unit; instead, it selectively applies modifications only to those functions explicitly chosen by the user. We need to provide users with an intuitive and efficient mechanism to register functions that require masking.
A custom section named .funcmeta is created to hold XOR key and function meta data.
__attribute__((section(".funcmeta")))
uint32_t myfuncsec_key = 0x12345678;
The function metadata consists of the function’s start address and the length of its body. We encapsulate this information in a structure named FunctionMetaData, as shown below. For each registered function, an instance of FunctionMetaData is placed in the .funcmeta section.
struct FunctionMetaData
{
void* func;
uint32_t len;
};
Let’s define a specialized macro function named REGISTER_FUNCTION(), as shown below, which inserts function metadata into the .funcmeta section. Since the length of the function body cannot be determined at compile time, we use a placeholder value of 0xDEADBEEF. The macro accepts the name of the function to be masked, and its pointer is recorded in the custom section. To distinguish registered functions from regular ones, we follow a naming convention where each registered function begins with the prefix REG_, allowing LLVM to identify them during compilation.
// Macro to register a function
#define REGISTER_FUNCTION(fn) \
__attribute__((section(".funcmeta"))) \
struct FunctionMetaData fn##_entry = { (void*)fn, 0xDEADBEEF }
void REG_foo()
{
...
}
REGISTER_FUNCTION(REG_foo)
Before we dive into details, we need to discuss about initialization phase. As mentioned earlier, each function body in the compiled binary will have a custom prologue and epilogue code attached to it, that will redirect the flow to handler code. The whole setup is designed in such a way that the prologue code expects the original function body along with the epilogue code appended to it are already in masked state so that it can instruct the handler to unmask it and execute the function body. It is the duty of the epilogue stub to mask the code after the execution of the function body. This demands the function body to be in masked state when the control reaches the prologue code to execute the function body . We need to introduce an “initialization phase” before program resumes normal execution by invoking CRT . Both prologue and epilogue code should posses ability to distinguish between normal execution flow and initialization phase. This will be discussed in the following section.
Prologue
The working of prologue code is as follows
UserReserved[1] member of TEB for later use.UserReserved[0] member of TEB. If the value is 0x80000001 then we are in initialization phase, here we will simply jump to the epilogue stub appended to the function body. If we are not in initialization phase then we will directly invoke handler from prologue to unmask the function body and resume normal execution.RCX/RDX/R8/R9, non-volatile registers and stack in the prologue code.;
;DO NOT modify RCX/RDX/R8/R9 as this will corrupt function params and preserve
;non volatile registers - RBX/RBP/RDI/RSI/RSP/R12/R13/R14/R15
;
;
; gs:[0xE8] => TEB -> UserReserver[0] => init_phase status value : 0x80000001
; gs:[0xF0] => TEB -> UserReserver[1] => Target function start address
; gs:[0xF8] => TEB -> UserReserver[2] => VirtualProtect address
;
BITS 64
prologue:
call get_rip ; Calculate start address
get_rip:
pop rax
push rcx ; Save rcx
push rdx ; Save rdx
lea rcx, [rel get_rip]
lea rdx, [rel prologue]
sub rcx, rdx
sub rax, rcx ; Function start address
pop rdx ; Restore rdx
pop rcx ; Restore rcx
;
;Read TEB -> UserReserved[0] to check if we are in initializtion phase
;Value 0x80000001 indicates initialization phase
;
mov gs:[0xF0], rax ; Store the function start address in TEB -> UserReserved[1]
mov r10, gs:[0xE8] ; TEB -> UserReserved[0]
xor rax, rax
mov rax, 0x80000001
cmp r10, rax
;
; Initialization phase,
;
je epilogue
;
; Normal execution flow
; Handler is invoked to unmask the code
;
call handler
Epilogue
The working of epilogue code is as follows:
RAX and non-volatile registers here.;
; DO NOT modify rax here
;
;
; gs:[0xE8] => TEB -> UserReserver[0] => init_phase status value : 0x80000001
; gs:[0xF0] => TEB -> UserReserver[1] => Target function start address
; gs:[0xF8] => TEB -> UserReserver[2] => VirtualProtect address
;
BITS 64
epilog:
mov rdx, gs:[0xE8] ;init_phase check, TEB -> UserReserved[0]
xor rcx, rcx
mov rcx, 0x80000001
cmp rdx, rcx
je init_stage ;initialization phase
jmp handler ;use jmp here so that we can return to original caller of this function from the handler
init_stage: ;Initialization phase
call handler ;The ret addr pushed will be used to compute func size
ret ;function boundary
The working of handler code is as follows:
TEB.UserReserved[1], and the initialization code has placed the address of the VirtualProtect API in TEB.UserReserved[2].BITS 64
struc FunctionMetaData
.FunctionStartAddress resq 1
.FunctionLength resd 1
endstruc
;
; gs:[0xE8] => TEB -> UserReserver[0] => init_phase status value : 0x80000001
; gs:[0xF0] => TEB -> UserReserver[1] => Target function start address
; gs:[0xF8] => TEB -> UserReserver[2] => VirtualProtect address
;
;
;Save caller state
;
push rcx
push rdx
push r8
push r9
push rax
push rbx
push rsi
push rdi
push rbp
push r12
push r13
push r14
push r15
;
;Fetch runtime ImageBase from PEB
;
mov rax, gs:[0x60] ; Get PEB base
mov rsi, [rax + 0x10] ; ImageBaseAddress
;
;Fetch function start address from TEB -> UserReserved[1]
;
mov rax, gs:[0xF0]
find_pe_header:
;
; PE file validation, optional
;
cmp word [rsi], 0x5A4D ; 'MZ'
jne done
mov edi, [rsi + 0x3C] ; e_lfanew
add rdi, rsi
cmp dword [rdi], 0x00004550 ;'PE\0\0'
jne done
;
;Locate section table
;
mov ecx, [rdi + 0x6] ; Number of sections
xor rbx,rbx
mov bx, [rdi + 0x14] ; Size of optional header
add rbx, rdi ; Make sure it doesnt alter CF state
add rbx, 0x18 ; Section table starts here
section_lookup:
cmp dword [rbx], 0x6E75662E ; ".fun"
jne next_section
cmp dword [rbx + 4], 0x74656D63 ; "cmet"
jne next_section
mov edx, dword [rbx + 0x0C] ; RVA
add rdx, rsi ; Convert to VA
jmp fetch_metadata
next_section:
add rbx, 0x28 ; IMAGE_SECTION_HEADER size
loop section_lookup
init_phase:
;
;RAX/R8 contains func start address
;RSP contains epilog ret address
;RBX contains ptr to FunctionMetaData[i]
;RSI peb::ImageBase
;xor encoder expects func length in rdx
;
xor r9, r9
mov rcx, rsp
add rcx, 0x68
mov r9, [rcx] ;Fetch epi ret address, function boundary
sub r9, rax ;Function size
mov [rbx + FunctionMetaData.FunctionLength], r9
mov rdx, r9
jmp xor_encoder
fetch_metadata:
xor r9, r9
xor r10,r10
mov r9d, dword [rdx] ; r9 = xor key
add rdx, 0x8
mov rbx, rdx ; RBX = points to metadata struct in .funcmeta
process_metadata_entries:
mov r8, [rbx + FunctionMetaData.FunctionStartAddress]
cmp r8, 0
je done
mov rdx, [rbx + FunctionMetaData.FunctionLength]
xor rsi,rsi ;Clear data
mov rsi, r9 ;Place xor key in rsi
cmp rax, r8 ;Check if caller's start address is same as metadata entry
je function_found
add rbx, 0x10
jmp process_metadata_entries
function_found:
;
;Read TEB -> UserReserved[0] to check if we are in initializtion phase
;Value 0x80000001 indicates initialization phase
;
mov r10, gs:[0xE8]
xor rcx, rcx
mov rcx, 0x80000001
cmp r10, rcx ;init phase check
je init_phase
;
;RSI -->key
;R8 -->Target Memory
;RDX -->Length
;
xor_encoder:
add r8, 0x46 ; Prologue length
; Delta between prologue stub and LLVM emitted code - 0xB.
; Total length = prologue stub length + delta
sub rdx, 0x46 ; Update length (Function length - prologue length)
test rdx, rdx ; check if length is zero
jz done ; if zero, exit
xor rbx,rbx
mov r11, gs:[0xF8] ;TEB -> UserReserved[2]
push rdx
push r8
mov rcx, r8
mov r8, 0x40
sub rsp, 8 ; Reserve 8 bytes
mov qword [rsp], 0 ; Optional: zero it
mov r9, rsp ; R9 = pointer to old protection
sub rsp, 0x20 ; Shadow space
call r11 ; VirtualProtect()
add rsp, 0x28 ; Stack clean-up
;
; Restore data
;
pop r8
pop rdx
;
;Save data for future VirtualProtect call
;
push r8
push rdx
encode_loop:
mov bl, [r8] ; Load target instruction
xor bl, sil ; perform xor on instruction
mov [r8], bl ; store encoded byte back
inc r8 ; move to next byte
dec rdx ; decrement function length
jnz encode_loop
pop rdx ; dwSize
pop rcx ; lpAddress
mov r8, 0x20 ; flNewProtect
sub rsp, 8
mov qword [rsp], 0
mov r9, rsp ; lpflOldProtect
sub rsp, 0x20 ; Shadow space allocation
mov r11, gs:[0xF8] ; Fetch VirtualProtect address from TEB -> UserReserved[2]
call r11 ; VirtualProtect()
add rsp, 0x28 ; Stack clean-up
done:
;
;Restore caller state
;
pop r15
pop r14
pop r13
pop r12
pop rbp
pop rdi
pop rsi
pop rbx
pop rax
pop r9
pop r8
pop rdx
pop rcx
ret
Following the creation of binary compiled using our custom LLVM implementation, which will contain handler code embedded in .text section along with prologue/epilogue code attached to all the functions, we will have to perform additional step of embedding an entrypoint stub into the binary. To direct the control to our custom stub, we will patch AddressOfEntryPoint member of PE OptionalHeader. The entry point stub will be responsible for executing initialization phase and it serves two primary purposes: first, to compute the total size of all registered functions; and second, to mask each of them before normal execution resumes.
Below code can be summarized into following points:
0x80000001 in TEB.UserReserved[0] to let prologue, epilogue and handler know about intialization phase.0x12345678 is used to store original entrypoint. This will be done externally using a python script. We will discuss this in details later. struct FunctionMetaData
{
void* func;
uint32_t len;
};
initialize_loop label will take care of the initialization phase by calling into each registered function and place the size of the each function in the .funcmeta section as discussed above. Before we call into each function we will0x00000000 in TEB.UserReserved[0] to indicate that initialization is done and we simply jump to original entrypoint address (CRT).BITS 64
struc FunctionMetaData
.FunctionStartAddress resq 1
.FunctionLength resd 1
endstruc
;
;Fetch runtime ImageBase from PEB
;
mov rax, gs:[0x60] ; Get PEB base
mov rsi, [rax + 0x10] ; ImageBaseAddress
;
;
;IAT Parser begin
;
;
mov rbx, rsi
mov eax, dword [rbx + 0x3C] ; e_lfanew
add rbx, rax ; NT Headers
; Get Optional Header
add rbx, 0x18 ; skip Signature + FileHeader
mov rdx, rbx ; Optional Header
; Get RVA of Import Directory
mov eax, dword [rdx + 0x78] ; DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress
test eax, eax
jz not_found
add rax, rsi ; Import Directory VA
mov rdi, rax ; IMAGE_IMPORT_DESCRIPTOR
find_kernel32:
mov eax, dword [rdi] ; Check if descriptor is null
test eax, eax
jz not_found
; Get DLL name
mov eax, dword [rdi + 0x0C] ; Name RVA
add rax, rsi
mov r8, rax ; DLL name
mov rcx, 0x32334C454E52454B ; "KERNEL32"
cmp qword [r8], rcx
jne next_descriptor
mov ecx, dword [r8 + 8]
cmp ecx, 0x6C6C642E ; ".DLL"
jne next_descriptor
; Found kernel32.dll
xor rbx, rbx
mov ebx, dword [rdi + 0x10] ; FirstThunk RVA
add rbx, rsi ; rbx = IAT
mov rax, [rdi + 0x00] ; OriginalFirstThunk RVA
add rax, rsi ; rax = INT
mov rcx, rbx ; rcx = IAT
mov rdx, rax ; rdx = INT
loop_thunks:
mov r8, [rdx]
test r8, r8
jz not_found
; Check if import by ordinal
mov rax, 0x8000000000000000
test r8, rax
jnz next_thunk
; Get function name
add r8, rsi
add r8, 2
mov r9, r8 ;r9 = function name
;
; Compare with "VirtualProtect"
;
mov rax, 0x506c617574726956
cmp qword [r9], rax
jne next_thunk
mov eax, dword [r9 + 8]
cmp eax, 0x65746f72
jne next_thunk
;
; Found thunk for VirtualProtect
;
mov rax, [rcx] ; Get resolved address of VirtualProtect from IAT
mov gs:[0xF8], rax
jmp iat_parsing_success
next_thunk:
add rcx, 8 ; Next IAT entry
add rdx, 8 ; Next INT entry
jmp loop_thunks
next_descriptor:
add rdi, 0x14 ; Next IMAGE_IMPORT_DESCRIPTOR
jmp find_kernel32
not_found:
xor rax, rax
ret ; Return to NTDLL
;
;
;IAT Parser end
;
;
iat_parsing_success:
;
;Store the value 0x80000001 in TEB -> UserReserved[0] to indicate initialization phase.
;
xor rbx, rbx
mov rbx, 0x80000001
mov gs:[0xE8], rbx
;
;Fetch AddressOfEntryPoint - DWORD offset
;
xor r14, r14
mov r14, 0x12345678 ; Placeholder 0x12345678
find_pe_header:
cmp word [rsi], 0x5A4D ; 'MZ'
jne done
mov edi, [rsi + 0x3C] ; e_lfanew
add rdi, rsi
cmp dword [rdi], 0x00004550 ;'PE\0\0'
jne done
;
; Junk instructions
;
nop
xor eax, eax
inc eax
dec eax
;
; Locate section table
;
mov ecx, [rdi + 0x6] ; Number of sections
xor rbx, rbx
mov bx, [rdi + 0x14] ; Size of optional header
add rbx, rdi
add rbx, 0x18 ; Section table starts here
;
; Junk instructions start
;
push rax
pop rax
mov rax, rax
nop
;
; Junk instructions end
;
section_lookup:
cmp dword [rbx], 0x6E75662E ; ".fun"
jne next_section
;
; Junk instructions start
;
xor r8, r8
test r8, r8
jz .skip1
.skip1:
;
; Junk instructions end
;
cmp dword [rbx + 4], 0x74656D63 ; "cmet"
jne next_section
;
; Custom section .funcmet found
;
mov edx, dword [rbx + 0x0C] ; RVA
add rdx, rsi ; Convert to VA
jmp fetch_metadata
next_section:
add rbx, 0x28 ; IMAGE_SECTION_HEADER size
;
; Junk instructions start
;
xor r9, r9
mov r9, r9
loop section_lookup
;
; Junk instructions end
;
fetch_metadata:
add rdx, 0x8 ; Skip xor-key
mov rbx, rdx
;
; Junk instructions start
;
nop
pushfq
popfq
;
; Junk instructions end
;
;
; Perform initialization - Mask registered functions before execution of Main()
;
initialize_loop:
mov r13, [rbx + FunctionMetaData.FunctionStartAddress]
cmp r13, 0
je done
;
; Junk instructions start
;
xor r10, r10
test r10, r10
jz .skip2
.skip2:
;
; Junk instructions end
;
call r13 ; Call registered function
add rbx, 16 ; Move to next FunctionMetaData entry
loop initialize_loop
done:
;
; Initialization finished
; Make sure we change the value 0x80000001 in TEB->UserReserved[0] to 0
;
xor eax, eax
mov gs:[0xE8], eax
;
; Junk instructions
;
nop
mov rcx, rcx
push rdx
pop rdx
;
; Junk instructions end
;
;
; Execute original entry point (CRT)
;
add r14, rsi ; AddressOfEntryPoint DWORD offset + ImageBaseAddress
jmp r14
LLVM compilation is organized into several phases, each responsible for transforming source code into optimized machine code. Here’s a breakdown of the key phases:
.ll or .bc).o, .obj, .exe, .dll, etc.For this project, we do not interact with LLVM’s IR-level code, meaning no modifications are required at that stage. Instead, our focus is on attaching a custom prologue and epilogue to the beginning and end of each registered function, respectively, and embedding a handler stub within the .text section. These transformations must occur during the backend phase. To emit the stub code correctly, we will need to modify specific components of LLVM’s code generation infrastructure.
Before diving into the backend modifications, we must first address a critical issue—patching return instructions. Each registered function typically ends with a return instruction, which interferes with our plan to append a custom epilogue stub. To resolve this, we need to remove all return instructions prior to inserting the epilogue during the backend phase. This requires implementing a custom backend pass that scans the body of each registered function, identifies all return instructions, and safely erases them.
To create a custom machine function pass, lets declare a subclass X86RetModPass that inherits properties and methods from MachineFunctionPass class. We need to override a special LLVM routine runOnMachineFunction in the the superclass.
#ifndef LLVM_LIB_TARGET_X86_X86RETMODPASS_H
#define LLVM_LIB_TARGET_X86_X86RETMODPASS_H
#include "llvm/CodeGen/MachineFunctionPass.h"
namespace llvm {
class X86RetModPass : public MachineFunctionPass {
public:
static char ID;
X86RetModPass();
bool runOnMachineFunction(MachineFunction &MF) override;
StringRef getPassName() const override;
};
} // end namespace llvm
#endif // LLVM_LIB_TARGET_X86_X86RETMODPASS_H
Lets implement runOnMachineFunction to write a function machine pass that will perform following tasks:
REG_; if it does, apply the pass. Otherwise, skip to the next function.bool X86RetModPass::runOnMachineFunction(MachineFunction &MF) {
const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
MCContext &Ctx = MF.getContext();
// Demangle function name
std::string MangledName = MF.getName().str();
std::string FuncName = llvm::demangle(MangledName);
// Skip transformation if function name doesnt contain "REG_"
if (FuncName.find("REG_") == std::string::npos) {
return true;
}
// Find the last RET instruction in the function
MachineInstr *LastRetInstr = nullptr;
for (auto &MBB : llvm::reverse(MF)) {
for (auto &MI : llvm::reverse(MBB)) {
if (MI.isReturn()) {
LastRetInstr = &MI;
break;
}
}
if (LastRetInstr) break;
}
for (auto &MBB : MF) {
for (auto MI = MBB.begin(); MI != MBB.end(); ) {
if (MI->isReturn()) {
DebugLoc DL = MI->getDebugLoc();
if (&*MI == LastRetInstr) {
MI = MBB.erase(MI); // Erase last RET
} else {
MCSymbol *Sym = Ctx.getOrCreateSymbol("handler");
//const MCExpr *Expr = MCSymbolRefExpr::create(Sym, Ctx);
BuildMI(MBB, MI, DL, TII->get(X86::JMP_1)).addSym(Sym);
MI = MBB.erase(MI);
}
} else {
++MI;
}
}
}
return true;
}
Registering a pass is the process of informing LLVM about our custom transformation so it can be integrated into the compilation pipeline. As discussed earlier, it’s crucial to choose the appropriate phase for registration. Since our work does not involve IR-level transformations, we want LLVM to execute our pass during the Pre-Emit phase. This phase is ideal for performing low-level code transformations just before the machine instructions are emitted for the target architecture.
The LLVM X86 target provides several hook points for injecting custom passes, as outlined below. For our use case, we utilize the addPreEmitPass() hook to register our X86RetModPass, ensuring it runs just before machine code emission.
//source--> https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86TargetMachine.cpp
void addIRPasses() override;
bool addInstSelector() override;
bool addIRTranslator() override;
bool addLegalizeMachineIR() override;
bool addRegBankSelect() override;
bool addGlobalInstructionSelect() override;
bool addILPOpts() override;
bool addPreISel() override;
void addMachineSSAOptimization() override;
void addPreRegAlloc() override;
bool addPostFastRegAllocRewrite() override;
void addPostRegAlloc() override;
void addPreEmitPass() override;
void addPreEmitPass2() override;
void addPreSched2() override;
bool addRegAssignAndRewriteOptimized() override;
To register our pass, simply instantiate the class inside LLVM function addPass() as outlined below.
void X86PassConfig::addPreEmitPass()
{
/*
DO NOT Modify existing code here in this function, only append your code!
*/
addPass(new X86RetModPass());
}
The MC Layer (Machine Code Layer) in LLVM is a critical part of the backend responsible for emitting machine code, assembly, and object files. It acts as the final stage in the compilation pipeline, translating MachineInstr representations into actual binary or textual output.
Key Responsibilities of the MC Layer:
MachineInstr into binary opcodes.MCCodeEmitter to encode instructions for the target architecture.AsmPrinter.MCObjectStreamer to write .o or .obj files..text, .data, .bss, and custom sections.MCSection, MCSymbol, and MCContext..debug_* sections and symbol annotations.Core Components in the MC Layer
A solid understanding of the various components within LLVM’s MC Layer is essential if you intend to manipulate the code generation process effectively. These components form the backbone of instruction encoding, section management, and final output emission, making them critical for any low-level backend customization.
| Component | Role |
|---|---|
MCStreamer | Abstract interface for emitting code (assembly or object). |
MCObjectStreamer | Emits object files using target-specific formats (ELF, COFF, Mach-O). |
MCAsmStreamer | Emits textual assembly output. |
MCCodeEmitter | Encodes instructions into binary form. |
MCInst | Target-independent representation of a machine instruction. |
MCContext | Manages symbols, sections, and other state. |
MCSection | Represents a section in the output file. |
MCSymbol | Represents labels and symbols in code. |
AsmPrinter | Bridges MachineInstr and MCInst, emits assembly or object code. |
To accomplish our goal, we will modify the X86AsmPrinter which is a sub class of AsmPrinter component so that each registered function receives a custom prologue and epilogue. Additionally, we will inject a handler stub into the .text section. These modifications leverage the X86AsmPrinter‘s role as the bridge between MachineInstr and the MC Layer, allowing us to control how instructions and auxiliary code are emitted during the final stages of code generation.
Before proceeding, it’s important to clearly delineate the responsibilities between our custom prologue/epilogue code and LLVM’s built-in infrastructure. Specifically, we need to decide which parts of the code generation process will be handled by our implementation, and which aspects will rely on support from LLVM’s AsmPrinter. This separation ensures that our custom logic—such as injecting prologues, epilogues, and handler stubs—is integrated seamlessly with LLVM’s existing emission pipeline.
The commented-out instructions will be dynamically generated by the X86AsmPrinter. The remaining code, however, must be explicitly provided by us and passed to X86AsmPrinter for emission.
Modified Prologue
; .\nasm.exe -f bin -o .\out.bin prologue.asm (Windows)
BITS 64
prologue:
call get_rip
get_rip:
pop rax
push rcx
push rdx
lea rcx, [rel get_rip]
lea rdx, [rel prologue]
sub rcx, rdx
sub rax, rcx
pop rdx
pop rcx
mov gs:[0xF0], rax
mov r10, gs:[0xE8]
xor rax, rax
mov rax, 0x80000001
cmp r10, rax
;
; LLVM will emit below instruction
;
;je epilogue
;call handler
Modified Epilogue
; .\nasm.exe -f bin -o .\out.bin epilogue.asm (Windows)
BITS 64
epilog:
mov rdx, gs:[0xE8]
xor rcx, rcx
mov rcx, 0x80000001
cmp rdx, rcx
;
; LLVM will emit below instructions
;
;je init_stage
;jmp handler
;init_stage:
;call handler
;ret
The X86AsmPrinter class is a subclass of LLVM’s AsmPrinter, and it exposes several key functions that can be customized. In our case, we will modify three of these functions to enable X86AsmPrinter to emit a custom prologue and epilogue for each registered function, as well as inject our handler stub into the .text section during code generation.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86AsmPrinter.cpp
void emitFunctionBodyStart() override;
void emitFunctionBodyEnd() override;
void emitEndOfAsmFile(Module &M) override;
Modifying emitFunctionBodyStart and emitFunctionBodyEnd
During the code generation process, LLVM invokes emitFunctionBodyStart for each function in the compilation unit to emit the beginning of its machine-level representation. This makes it an ideal insertion point for our custom prologue code, allowing us to seamlessly integrate additional logic at the start of registered functions.
void X86AsmPrinter::emitFunctionBodyStart() {
//Our custom code starts here..
llvm::StringRef Mangled = CurrentFnSym->getName();
std::string Demangled = llvm::demangle(Mangled.str());
llvm::errs() << Demangled << "\n";
if(Demangled.find("REG_") != std::string::npos )
{
/* Prologue stub
0: e8 00 00 00 00 call 0x5
5: 58 pop rax
6: 51 push rcx
7: 52 push rdx
8: 48 8d 0d f6 ff ff ff lea rcx,[rip+0xfffffffffffffff6] # 0x5
f: 48 8d 15 ea ff ff ff lea rdx,[rip+0xffffffffffffffea] # 0x0
16: 48 29 d1 sub rcx,rdx
19: 48 29 c8 sub rax,rcx
1c: 5a pop rdx
1d: 59 pop rcx
1e: 65 48 89 04 25 f0 00 mov QWORD PTR gs:0xf0,rax
25: 00 00
27: 65 4c 8b 14 25 e8 00 mov r10,QWORD PTR gs:0xe8
2e: 00 00
30: 48 31 c0 xor rax,rax
33: b8 01 00 00 80 mov eax,0x80000001
38: 49 39 c2 cmp r10,rax
<rest emitted by LLVM>
*/
static const uint8_t PrologueStub[] = {
0xE8, 0x00, 0x00, 0x00, 0x00, 0x58, 0x51, 0x52, 0x48, 0x8D, 0x0D, 0xF6, 0xFF, 0xFF, 0xFF,
0x48, 0x8D, 0x15, 0xEA, 0xFF, 0xFF, 0xFF, 0x48, 0x29, 0xD1, 0x48, 0x29, 0xC8, 0x5A, 0x59,
0x65, 0x48, 0x89, 0x04, 0x25, 0xF0, 0x00, 0x00, 0x00, 0x65, 0x4C, 0x8B, 0x14, 0x25, 0xE8,
0x00, 0x00, 0x00, 0x48, 0x31, 0xC0, 0xB8, 0x01, 0x00, 0x00, 0x80, 0x49, 0x39, 0xC2
};
for (uint8_t Byte : PrologueStub) {
OutStreamer->emitIntValue(Byte, 1);
}
/*
Rest of the prologue code is emitted here by LLVM
je epilogue
call handler
*/
EpilogueStubSymbol = OutContext.getOrCreateSymbol(Twine(CurrentFnSym->getName()) +"_epilogue_stub");
MCSymbol *HandlerSym = OutContext.getOrCreateSymbol("handler");
MCSymbol *AfterJE = OutContext.createTempSymbol();
MCInst CallInst;
CallInst.setOpcode(X86::CALL64pcrel32);
CallInst.addOperand(MCOperand::createExpr(MCSymbolRefExpr::create(HandlerSym, OutContext)));
// je epilogue
OutStreamer->emitBytes("\x0F\x84");
// Emit 4-byte displacement placeholder for epilogue_stub start address
const MCExpr *RelExpr = MCBinaryExpr::createSub(
MCSymbolRefExpr::create(EpilogueStubSymbol, OutContext),
MCSymbolRefExpr::create(AfterJE, OutContext),
OutContext
);
OutStreamer->emitValue(RelExpr, 4);
OutStreamer->emitLabel(AfterJE);
//call handler
OutStreamer->emitInstruction(CallInst, *TM.getMCSubtargetInfo());
}
//Our custom code ends here..
if (EmitFPOData) {
auto *XTS =
static_cast<X86TargetStreamer *>(OutStreamer->getTargetStreamer());
XTS->emitFPOProc(
CurrentFnSym,
MF->getInfo<X86MachineFunctionInfo>()->getArgumentStackSize());
}
}
Emitting je epilogue and call handler instructions
PrologueStub[] contains the assembled position-independent code (PIC) stub generated from the modified prologue code discussed earlier.X86AsmPrinter::emitFunctionBodyEnd()—a separate event in the compilation pipeline. To resolve this, we can define a shared symbol, EpilogueStubSymbol, which can be referenced by both emitFunctionBodyStart() and emitFunctionBodyEnd() during function emission. This approach requires updating the X86AsmPrinter class definition to ensure the symbol is properly declared and accessible across both emission stages. private:
MCSymbol *EpilogueStubSymbol = nullptr;
je (jump if equal) instruction that correctly targets the epilogue stub. Since je uses a 32-bit RIP-relative offset, we need to compute the offset to EpilogueStubSymbol. Our approach involves emitting the raw je opcode (\x0F\x84) using the emitBytes() method. After that, we calculate the delta between emitLabel(AfterJE) immediately after the je (referred to as AfterJE) and the address of EpilogueStubSymbol, storing this offset in RelExpr. Finally, we emit the RelExpr value, which resolves to the correct RIP-relative offset, ensuring the je instruction correctly jumps to the epilogue stub. MCSymbol *AfterJE = OutContext.createTempSymbol();
//je <ip relative offset to epilogue stub>
OutStreamer->emitBytes("\x0F\x84");
const MCExpr *RelExpr = MCBinaryExpr::createSub(
MCSymbolRefExpr::create(EpilogueStubSymbol, OutContext),
MCSymbolRefExpr::create(AfterJE, OutContext),
OutContext
);
OutStreamer->emitValue(RelExpr, 4);
OutStreamer->emitLabel(AfterJE);
HandlerSym, which will later be emitted as a label in the emitEndOfAsmFile() method and associated with the handler stub. To represent the call instruction, we instantiate an MCInst object named CallInst. We then configure it by invoking setOpcode() to specify the call operation, followed by addOperand() to add the target operand—our handler symbol. This effectively constructs a call to the handler. MCSymbol *HandlerSym = OutContext.getOrCreateSymbol("handler");
MCInst CallInst;
CallInst.setOpcode(X86::CALL64pcrel32);
CallInst.addOperand(MCOperand::createExpr(MCSymbolRefExpr::create(HandlerSym, OutContext)));
OutStreamer->emitInstruction(CallInst, *TM.getMCSubtargetInfo());
We will apply the same strategy described above to emit the epilogue code, as demonstrated below. The EpilogueStub[] contains the assembled position-independent code (PIC) stub generated from the modified epilogue code discussed earlier.
void X86AsmPrinter::emitFunctionBodyEnd() {
llvm::StringRef Mangled = CurrentFnSym->getName();
std::string Demangled = llvm::demangle(Mangled.str());
//llvm::errs() << Demangled << "\n";
if(Demangled.find("REG_") != std::string::npos )
{
MCSymbol *InitStageSym = OutContext.getOrCreateSymbol(Twine(CurrentFnSym->getName()) + "init_stage");
MCSymbol *AfterJE = OutContext.createTempSymbol(); // Marks address after full JE instruction
MCSymbol *HandlerSym = OutContext.getOrCreateSymbol("handler");
MCInst JumpInst;
JumpInst.setOpcode(X86::JMP_1);
const MCExpr *TargetExpr = MCSymbolRefExpr::create(HandlerSym, OutContext);
JumpInst.addOperand(MCOperand::createExpr(TargetExpr));
MCInst CallInst;
CallInst.setOpcode(X86::CALL64pcrel32);
CallInst.addOperand(MCOperand::createExpr(MCSymbolRefExpr::create(HandlerSym, OutContext)));
/*
Epilogue stub
0: 65 48 8b 14 25 e8 00 mov rdx,QWORD PTR gs:0xe8
7: 00 00
9: 48 31 c9 xor rcx,rcx
c: b9 01 00 00 80 mov ecx,0x80000001
11: 48 39 ca cmp rdx,rcx
<rest emitted by llvm>
*/
OutStreamer->emitLabel(EpilogueStubSymbol);
static const uint8_t EpilogueStub[] = {
0x65, 0x48, 0x8B, 0x14, 0x25, 0xE8, 0x00, 0x00, 0x00,
0x48, 0x31, 0xC9, 0xB9, 0x01, 0x00, 0x00, 0x80,
0x48, 0x39, 0xCA
};
for (uint8_t Byte : EpilogueStub) {
OutStreamer->emitIntValue(Byte, 1);
}
/*
Rest of the epilogue code is emitted here by LLVM
je init_stage
jmp handler
init_stage:
call handler
ret
*/
// Emit JE init_stage
OutStreamer->emitBytes("\x0F\x84");
// Emit 4-byte displacement placeholder for epilogue_stub start address
const MCExpr *RelExpr = MCBinaryExpr::createSub(
MCSymbolRefExpr::create(InitStageSym, OutContext),
MCSymbolRefExpr::create(AfterJE, OutContext),
OutContext
);
OutStreamer->emitValue(RelExpr, 4);
OutStreamer->emitLabel(AfterJE);
//jump handler
OutStreamer->emitInstruction(JumpInst, *TM.getMCSubtargetInfo());
// init_stage:
OutStreamer->emitLabel(InitStageSym);
//call handler
OutStreamer->emitInstruction(CallInst, *TM.getMCSubtargetInfo());
//ret
OutStreamer->emitBytes("\xC3");
}
if (EmitFPOData) {
auto *XTS =
static_cast<X86TargetStreamer *>(OutStreamer->getTargetStreamer());
XTS->emitFPOEndProc();
}
}
Modifying emitEndOfAsmFile
Finally, we need to emit the assembly stub for the handler logic, as previously discussed. This includes emitting the handler label to ensure that all call and jump instructions correctly transfer control to the handler stub. The symbol is emitted using emitLabel(StubSym), and the handler’s position-independent code stub is emitted byte-by-byte using emitIntValue(Byte, 1).
void X86AsmPrinter::emitEndOfAsmFile(Module &M)
{
OutStreamer->switchSection(getObjFileLowering().getTextSection());
MCSymbol *StubSym = OutContext.getOrCreateSymbol("handler");
OutStreamer->emitLabel(StubSym);
/*
0: 51 push rcx
1: 52 push rdx
2: 41 50 push r8
4: 41 51 push r9
6: 50 push rax
7: 53 push rbx
8: 56 push rsi
9: 57 push rdi
a: 55 push rbp
b: 41 54 push r12
d: 41 55 push r13
f: 41 56 push r14
11: 41 57 push r15
13: 65 48 8b 04 25 60 00 mov rax,QWORD PTR gs:0x60
1a: 00 00
1c: 48 8b 70 10 mov rsi,QWORD PTR [rax+0x10]
20: 65 48 8b 04 25 f0 00 mov rax,QWORD PTR gs:0xf0
27: 00 00
29: 66 81 3e 4d 5a cmp WORD PTR [rsi],0x5a4d
2e: 0f 85 24 01 00 00 jne 0x158
34: 8b 7e 3c mov edi,DWORD PTR [rsi+0x3c]
37: 48 01 f7 add rdi,rsi
3a: 81 3f 50 45 00 00 cmp DWORD PTR [rdi],0x4550
40: 0f 85 12 01 00 00 jne 0x158
46: 8b 4f 06 mov ecx,DWORD PTR [rdi+0x6]
49: 48 31 db xor rbx,rbx
4c: 66 8b 5f 14 mov bx,WORD PTR [rdi+0x14]
50: 48 01 fb add rbx,rdi
53: 48 83 c3 18 add rbx,0x18
57: 81 3b 2e 66 75 6e cmp DWORD PTR [rbx],0x6e75662e
5d: 75 11 jne 0x70
5f: 81 7b 04 63 6d 65 74 cmp DWORD PTR [rbx+0x4],0x74656d63
66: 75 08 jne 0x70
68: 8b 53 0c mov edx,DWORD PTR [rbx+0xc]
6b: 48 01 f2 add rdx,rsi
6e: eb 1f jmp 0x8f
70: 48 83 c3 28 add rbx,0x28
74: e2 e1 loop 0x57
76: 4d 31 c9 xor r9,r9
79: 48 89 e1 mov rcx,rsp
7c: 48 83 c1 68 add rcx,0x68
80: 4c 8b 09 mov r9,QWORD PTR [rcx]
83: 49 29 c1 sub r9,rax
86: 4c 89 4b 08 mov QWORD PTR [rbx+0x8],r9
8a: 4c 89 ca mov rdx,r9
8d: eb 48 jmp 0xd7
8f: 4d 31 c9 xor r9,r9
92: 4d 31 d2 xor r10,r10
95: 44 8b 0a mov r9d,DWORD PTR [rdx]
98: 48 83 c2 08 add rdx,0x8
9c: 48 89 d3 mov rbx,rdx
9f: 4c 8b 03 mov r8,QWORD PTR [rbx]
a2: 49 83 f8 00 cmp r8,0x0
a6: 0f 84 ac 00 00 00 je 0x158
ac: 48 8b 53 08 mov rdx,QWORD PTR [rbx+0x8]
b0: 48 31 f6 xor rsi,rsi
b3: 4c 89 ce mov rsi,r9
b6: 4c 39 c0 cmp rax,r8
b9: 74 06 je 0xc1
bb: 48 83 c3 10 add rbx,0x10
bf: eb de jmp 0x9f
c1: 65 4c 8b 14 25 e8 00 mov r10,QWORD PTR gs:0xe8
c8: 00 00
ca: 48 31 c9 xor rcx,rcx
cd: b9 01 00 00 80 mov ecx,0x80000001
d2: 49 39 ca cmp r10,rcx
d5: 74 9f je 0x76
d7: 49 83 c0 46 add r8,0x46
db: 48 83 ea 46 sub rdx,0x46
df: 48 85 d2 test rdx,rdx
e2: 74 74 je 0x158
e4: 48 31 db xor rbx,rbx
e7: 65 4c 8b 1c 25 f8 00 mov r11,QWORD PTR gs:0xf8
ee: 00 00
f0: 52 push rdx
f1: 41 50 push r8
f3: 4c 89 c1 mov rcx,r8
f6: 41 b8 40 00 00 00 mov r8d,0x40
fc: 48 83 ec 08 sub rsp,0x8
100: 48 c7 04 24 00 00 00 mov QWORD PTR [rsp],0x0
107: 00
108: 49 89 e1 mov r9,rsp
10b: 48 83 ec 20 sub rsp,0x20
10f: 41 ff d3 call r11
112: 48 83 c4 28 add rsp,0x28
116: 41 58 pop r8
118: 5a pop rdx
119: 41 50 push r8
11b: 52 push rdx
11c: 41 8a 18 mov bl,BYTE PTR [r8]
11f: 40 30 f3 xor bl,sil
122: 41 88 18 mov BYTE PTR [r8],bl
125: 49 ff c0 inc r8
128: 48 ff ca dec rdx
12b: 75 ef jne 0x11c
12d: 5a pop rdx
12e: 59 pop rcx
12f: 41 b8 20 00 00 00 mov r8d,0x20
135: 48 83 ec 08 sub rsp,0x8
139: 48 c7 04 24 00 00 00 mov QWORD PTR [rsp],0x0
140: 00
141: 49 89 e1 mov r9,rsp
144: 48 83 ec 20 sub rsp,0x20
148: 65 4c 8b 1c 25 f8 00 mov r11,QWORD PTR gs:0xf8
14f: 00 00
151: 41 ff d3 call r11
154: 48 83 c4 28 add rsp,0x28
158: 41 5f pop r15
15a: 41 5e pop r14
15c: 41 5d pop r13
15e: 41 5c pop r12
160: 5d pop rbp
161: 5f pop rdi
162: 5e pop rsi
163: 5b pop rbx
164: 58 pop rax
165: 41 59 pop r9
167: 41 58 pop r8
169: 5a pop rdx
16a: 59 pop rcx
16b: c3 ret
*/
uint8_t HandlerStub[] = {
0x51, 0x52, 0x41, 0x50, 0x41, 0x51, 0x50, 0x53, 0x56, 0x57, 0x55, 0x41, 0x54, 0x41, 0x55, 0x41, 0x56, 0x41, 0x57,
0x65, 0x48, 0x8B, 0x04, 0x25, 0x60, 0x00, 0x00, 0x00, 0x48, 0x8B, 0x70, 0x10, 0x65, 0x48, 0x8B, 0x04, 0x25, 0xF0,
0x00, 0x00, 0x00, 0x66, 0x81, 0x3E, 0x4D, 0x5A, 0x0F, 0x85, 0x24, 0x01, 0x00, 0x00, 0x8B, 0x7E, 0x3C, 0x48, 0x01,
0xF7, 0x81, 0x3F, 0x50, 0x45, 0x00, 0x00, 0x0F, 0x85, 0x12, 0x01, 0x00, 0x00, 0x8B, 0x4F, 0x06, 0x48, 0x31, 0xDB,
0x66, 0x8B, 0x5F, 0x14, 0x48, 0x01, 0xFB, 0x48, 0x83, 0xC3, 0x18, 0x81, 0x3B, 0x2E, 0x66, 0x75, 0x6E, 0x75, 0x11,
0x81, 0x7B, 0x04, 0x63, 0x6D, 0x65, 0x74, 0x75, 0x08, 0x8B, 0x53, 0x0C, 0x48, 0x01, 0xF2, 0xEB, 0x1F, 0x48, 0x83,
0xC3, 0x28, 0xE2, 0xE1, 0x4D, 0x31, 0xC9, 0x48, 0x89, 0xE1, 0x48, 0x83, 0xC1, 0x68, 0x4C, 0x8B, 0x09, 0x49, 0x29,
0xC1, 0x4C, 0x89, 0x4B, 0x08, 0x4C, 0x89, 0xCA, 0xEB, 0x48, 0x4D, 0x31, 0xC9, 0x4D, 0x31, 0xD2, 0x44, 0x8B, 0x0A,
0x48, 0x83, 0xC2, 0x08, 0x48, 0x89, 0xD3, 0x4C, 0x8B, 0x03, 0x49, 0x83, 0xF8, 0x00, 0x0F, 0x84, 0xAC, 0x00, 0x00,
0x00, 0x48, 0x8B, 0x53, 0x08, 0x48, 0x31, 0xF6, 0x4C, 0x89, 0xCE, 0x4C, 0x39, 0xC0, 0x74, 0x06, 0x48, 0x83, 0xC3,
0x10, 0xEB, 0xDE, 0x65, 0x4C, 0x8B, 0x14, 0x25, 0xE8, 0x00, 0x00, 0x00, 0x48, 0x31, 0xC9, 0xB9, 0x01, 0x00, 0x00,
0x80, 0x49, 0x39, 0xCA, 0x74, 0x9F, 0x49, 0x83, 0xC0, 0x46, 0x48, 0x83, 0xEA, 0x46, 0x48, 0x85, 0xD2, 0x74, 0x74,
0x48, 0x31, 0xDB, 0x65, 0x4C, 0x8B, 0x1C, 0x25, 0xF8, 0x00, 0x00, 0x00, 0x52, 0x41, 0x50, 0x4C, 0x89, 0xC1, 0x41,
0xB8, 0x40, 0x00, 0x00, 0x00, 0x48, 0x83, 0xEC, 0x08, 0x48, 0xC7, 0x04, 0x24, 0x00, 0x00, 0x00, 0x00, 0x49, 0x89,
0xE1, 0x48, 0x83, 0xEC, 0x20, 0x41, 0xFF, 0xD3, 0x48, 0x83, 0xC4, 0x28, 0x41, 0x58, 0x5A, 0x41, 0x50, 0x52, 0x41,
0x8A, 0x18, 0x40, 0x30, 0xF3, 0x41, 0x88, 0x18, 0x49, 0xFF, 0xC0, 0x48, 0xFF, 0xCA, 0x75, 0xEF, 0x5A, 0x59, 0x41,
0xB8, 0x20, 0x00, 0x00, 0x00, 0x48, 0x83, 0xEC, 0x08, 0x48, 0xC7, 0x04, 0x24, 0x00, 0x00, 0x00, 0x00, 0x49, 0x89,
0xE1, 0x48, 0x83, 0xEC, 0x20, 0x65, 0x4C, 0x8B, 0x1C, 0x25, 0xF8, 0x00, 0x00, 0x00, 0x41, 0xFF, 0xD3, 0x48, 0x83,
0xC4, 0x28, 0x41, 0x5F, 0x41, 0x5E, 0x41, 0x5D, 0x41, 0x5C, 0x5D, 0x5F, 0x5E, 0x5B, 0x58, 0x41, 0x59, 0x41, 0x58,
0x5A, 0x59, 0xC3
};
for (uint8_t Byte : HandlerStub) {
OutStreamer->emitIntValue(Byte, 1);
}
}
We inject a new section named .stub into the final PE file generated by our custom LLVM-based Clang++ compiler.
For convenience, this is done externally using a Python script. As discussed in the Redirecting Entrypoint and Initialization section of this post, the .stub section embeds the assembly code responsible for handling entry point redirection and pre-CRT execution logic.
The Python script shown below demonstrates how to create a new section and embed position-independent shellcode into it. The final binary generated by the script has all components seamlessly integrated and is fully prepared for execution.
import pefile
import struct
import mmap
import argparse
def add_section_and_modify_entry(pe_path, shellcode, output_path):
pe = pefile.PE(pe_path)
# Patch shellcode with original entry point
ep = pe.OPTIONAL_HEADER.AddressOfEntryPoint
ep_little_endian = struct.pack("<I", ep)
print(ep_little_endian.hex())
placeholder = b"\x78\x56\x34\x12"
modified_stub = shellcode.replace(placeholder, ep_little_endian)
escaped = ''.join(f'\\x{b:02x}' for b in modified_stub)
print(escaped)
# Section setup
new_section_name = b'.stub\x00\x00\x00'
new_section_size = len(modified_stub)
file_alignment = pe.OPTIONAL_HEADER.FileAlignment
section_alignment = pe.OPTIONAL_HEADER.SectionAlignment
aligned_raw_size = (new_section_size + file_alignment - 1) & ~(file_alignment - 1)
aligned_virtual_size = (new_section_size + section_alignment - 1) & ~(section_alignment - 1)
# Calculate safe placement
last_raw_end = max(s.PointerToRawData + s.SizeOfRawData for s in pe.sections)
last_virtual_end = max(s.VirtualAddress + s.Misc_VirtualSize for s in pe.sections)
new_section_raw_address = (last_raw_end + file_alignment - 1) & ~(file_alignment - 1)
new_section_virtual_address = (last_virtual_end + section_alignment - 1) & ~(section_alignment - 1)
# Ensure raw data doesn't overwrite headers
if new_section_raw_address < pe.OPTIONAL_HEADER.SizeOfHeaders:
raise RuntimeError("New section raw data would overwrite PE headers.")
# Ensure there's space for another section header
max_section_headers = (pe.OPTIONAL_HEADER.SizeOfHeaders - pe.DOS_HEADER.e_lfanew - 248) // 40
if pe.FILE_HEADER.NumberOfSections >= max_section_headers:
raise RuntimeError("Not enough space in PE header for new section header.")
# Create new section header and set its file offset
new_section = pefile.SectionStructure(pe.__IMAGE_SECTION_HEADER_format__)
last_section_header_offset = pe.sections[-1].get_file_offset()
new_section.set_file_offset(last_section_header_offset + 40)
new_section.Name = new_section_name
new_section.Misc = new_section.Misc_VirtualSize = aligned_virtual_size
new_section.VirtualAddress = new_section_virtual_address
new_section.SizeOfRawData = aligned_raw_size
new_section.PointerToRawData = new_section_raw_address
new_section.PointerToRelocations = 0
new_section.PointerToLinenumbers = 0
new_section.NumberOfRelocations = 0
new_section.NumberOfLinenumbers = 0
new_section.Characteristics = 0x60000020 # Read + Execute + Code
# Inject section
pe.__structures__.append(new_section)
pe.sections.append(new_section)
# Update headers
pe.FILE_HEADER.NumberOfSections += 1
pe.OPTIONAL_HEADER.SizeOfImage = new_section.VirtualAddress + aligned_virtual_size
pe.OPTIONAL_HEADER.AddressOfEntryPoint = new_section.VirtualAddress
required_size = new_section_raw_address + aligned_raw_size
if isinstance(pe.__data__, mmap.mmap):
pe.__data__ = bytearray(pe.__data__)
if len(pe.__data__) < required_size:
pe.__data__.extend(b'\x00' * (required_size - len(pe.__data__)))
# Write shellcode
pe.set_bytes_at_offset(new_section_raw_address, modified_stub.ljust(aligned_raw_size, b'\x00'))
# Save modified PE
pe.write(output_path)
print(f" Modified PE saved to {output_path}")
parser = argparse.ArgumentParser()
parser.add_argument("input", help="Path to the input file")
parser.add_argument("output", help="Path to the output file")
args = parser.parse_args()
'''
shellcode_stub
0: 65 48 8b 04 25 60 00 mov rax,QWORD PTR gs:0x60
7: 00 00
9: 48 8b 70 10 mov rsi,QWORD PTR [rax+0x10]
d: 48 89 f3 mov rbx,rsi
10: 8b 43 3c mov eax,DWORD PTR [rbx+0x3c]
13: 48 01 c3 add rbx,rax
16: 48 83 c3 18 add rbx,0x18
1a: 48 89 da mov rdx,rbx
1d: 8b 42 78 mov eax,DWORD PTR [rdx+0x78]
20: 85 c0 test eax,eax
22: 0f 84 a5 00 00 00 je 0xcd
28: 48 01 f0 add rax,rsi
2b: 48 89 c7 mov rdi,rax
2e: 8b 07 mov eax,DWORD PTR [rdi]
30: 85 c0 test eax,eax
32: 0f 84 95 00 00 00 je 0xcd
38: 8b 47 0c mov eax,DWORD PTR [rdi+0xc]
3b: 48 01 f0 add rax,rsi
3e: 49 89 c0 mov r8,rax
41: 48 b9 4b 45 52 4e 45 movabs rcx,0x32334c454e52454b
48: 4c 33 32
4b: 49 39 08 cmp QWORD PTR [r8],rcx
4e: 75 74 jne 0xc4
50: 41 8b 48 08 mov ecx,DWORD PTR [r8+0x8]
54: 81 f9 2e 64 6c 6c cmp ecx,0x6c6c642e
5a: 75 68 jne 0xc4
5c: 48 31 db xor rbx,rbx
5f: 8b 5f 10 mov ebx,DWORD PTR [rdi+0x10]
62: 48 01 f3 add rbx,rsi
65: 48 8b 07 mov rax,QWORD PTR [rdi]
68: 48 01 f0 add rax,rsi
6b: 48 89 d9 mov rcx,rbx
6e: 48 89 c2 mov rdx,rax
71: 4c 8b 02 mov r8,QWORD PTR [rdx]
74: 4d 85 c0 test r8,r8
77: 74 54 je 0xcd
79: 48 b8 00 00 00 00 00 movabs rax,0x8000000000000000
80: 00 00 80
83: 49 85 c0 test r8,rax
86: 75 32 jne 0xba
88: 49 01 f0 add r8,rsi
8b: 49 83 c0 02 add r8,0x2
8f: 4d 89 c1 mov r9,r8
92: 48 b8 56 69 72 74 75 movabs rax,0x506c617574726956
99: 61 6c 50
9c: 49 39 01 cmp QWORD PTR [r9],rax
9f: 75 19 jne 0xba
a1: 41 8b 41 08 mov eax,DWORD PTR [r9+0x8]
a5: 3d 72 6f 74 65 cmp eax,0x65746f72
aa: 75 0e jne 0xba
ac: 48 8b 01 mov rax,QWORD PTR [rcx]
af: 65 48 89 04 25 f8 00 mov QWORD PTR gs:0xf8,rax
b6: 00 00
b8: eb 17 jmp 0xd1
ba: 48 83 c1 08 add rcx,0x8
be: 48 83 c2 08 add rdx,0x8
c2: eb ad jmp 0x71
c4: 48 83 c7 14 add rdi,0x14
c8: e9 61 ff ff ff jmp 0x2e
cd: 48 31 c0 xor rax,rax
d0: c3 ret
d1: 48 31 db xor rbx,rbx
d4: bb 01 00 00 80 mov ebx,0x80000001
d9: 65 48 89 1c 25 e8 00 mov QWORD PTR gs:0xe8,rbx
e0: 00 00
e2: 4d 31 f6 xor r14,r14
e5: 41 be 78 56 34 12 mov r14d,0x12345678
eb: 66 81 3e 4d 5a cmp WORD PTR [rsi],0x5a4d
f0: 75 7d jne 0x16f
f2: 8b 7e 3c mov edi,DWORD PTR [rsi+0x3c]
f5: 48 01 f7 add rdi,rsi
f8: 81 3f 50 45 00 00 cmp DWORD PTR [rdi],0x4550
fe: 75 6f jne 0x16f
100: 90 nop
101: 31 c0 xor eax,eax
103: ff c0 inc eax
105: ff c8 dec eax
107: 8b 4f 06 mov ecx,DWORD PTR [rdi+0x6]
10a: 48 31 db xor rbx,rbx
10d: 66 8b 5f 14 mov bx,WORD PTR [rdi+0x14]
111: 48 01 fb add rbx,rdi
114: 48 83 c3 18 add rbx,0x18
118: 50 push rax
119: 58 pop rax
11a: 48 89 c0 mov rax,rax
11d: 90 nop
11e: 81 3b 2e 66 75 6e cmp DWORD PTR [rbx],0x6e75662e
124: 75 19 jne 0x13f
126: 4d 31 c0 xor r8,r8
129: 4d 85 c0 test r8,r8
12c: 74 00 je 0x12e
12e: 81 7b 04 63 6d 65 74 cmp DWORD PTR [rbx+0x4],0x74656d63
135: 75 08 jne 0x13f
137: 8b 53 0c mov edx,DWORD PTR [rbx+0xc]
13a: 48 01 f2 add rdx,rsi
13d: eb 0c jmp 0x14b
13f: 48 83 c3 28 add rbx,0x28
143: 4d 31 c9 xor r9,r9
146: 4d 89 c9 mov r9,r9
149: e2 d3 loop 0x11e
14b: 48 83 c2 08 add rdx,0x8
14f: 48 89 d3 mov rbx,rdx
152: 90 nop
153: 9c pushf
154: 9d popf
155: 4c 8b 2b mov r13,QWORD PTR [rbx]
158: 49 83 fd 00 cmp r13,0x0
15c: 74 11 je 0x16f
15e: 4d 31 d2 xor r10,r10
161: 4d 85 d2 test r10,r10
164: 74 00 je 0x166
166: 41 ff d5 call r13
169: 48 83 c3 10 add rbx,0x10
16d: e2 e6 loop 0x155
16f: 31 c0 xor eax,eax
171: 65 89 04 25 e8 00 00 mov DWORD PTR gs:0xe8,eax
178: 00
179: 90 nop
17a: 48 89 c9 mov rcx,rcx
17d: 52 push rdx
17e: 5a pop rdx
17f: 49 01 f6 add r14,rsi
182: 41 ff e6 jmp r14
'''
shellcode_stub =b"\x65\x48\x8B\x04\x25\x60\x00\x00\x00\x48\x8B\x70\x10\x48\x89\xF3\x8B\x43\x3C\x48\x01\xC3\x48\x83\xC3\x18\x48\x89\xDA\x8B\x42\x78\x85\xC0\x0F\x84\xA5\x00\x00\x00\x48\x01\xF0\x48\x89\xC7\x8B\x07\x85\xC0\x0F\x84\x95\x00\x00\x00\x8B\x47\x0C\x48\x01\xF0\x49\x89\xC0\x48\xB9\x4B\x45\x52\x4E\x45\x4C\x33\x32\x49\x39\x08\x75\x74\x41\x8B\x48\x08\x81\xF9\x2E\x64\x6C\x6C\x75\x68\x48\x31\xDB\x8B\x5F\x10\x48\x01\xF3\x48\x8B\x07\x48\x01\xF0\x48\x89\xD9\x48\x89\xC2\x4C\x8B\x02\x4D\x85\xC0\x74\x54\x48\xB8\x00\x00\x00\x00\x00\x00\x00\x80\x49\x85\xC0\x75\x32\x49\x01\xF0\x49\x83\xC0\x02\x4D\x89\xC1\x48\xB8\x56\x69\x72\x74\x75\x61\x6C\x50\x49\x39\x01\x75\x19\x41\x8B\x41\x08\x3D\x72\x6F\x74\x65\x75\x0E\x48\x8B\x01\x65\x48\x89\x04\x25\xF8\x00\x00\x00\xEB\x17\x48\x83\xC1\x08\x48\x83\xC2\x08\xEB\xAD\x48\x83\xC7\x14\xE9\x61\xFF\xFF\xFF\x48\x31\xC0\xC3\x48\x31\xDB\xBB\x01\x00\x00\x80\x65\x48\x89\x1C\x25\xE8\x00\x00\x00\x4D\x31\xF6\x41\xBE\x78\x56\x34\x12\x66\x81\x3E\x4D\x5A\x75\x7D\x8B\x7E\x3C\x48\x01\xF7\x81\x3F\x50\x45\x00\x00\x75\x6F\x90\x31\xC0\xFF\xC0\xFF\xC8\x8B\x4F\x06\x48\x31\xDB\x66\x8B\x5F\x14\x48\x01\xFB\x48\x83\xC3\x18\x50\x58\x48\x89\xC0\x90\x81\x3B\x2E\x66\x75\x6E\x75\x19\x4D\x31\xC0\x4D\x85\xC0\x74\x00\x81\x7B\x04\x63\x6D\x65\x74\x75\x08\x8B\x53\x0C\x48\x01\xF2\xEB\x0C\x48\x83\xC3\x28\x4D\x31\xC9\x4D\x89\xC9\xE2\xD3\x48\x83\xC2\x08\x48\x89\xD3\x90\x9C\x9D\x4C\x8B\x2B\x49\x83\xFD\x00\x74\x11\x4D\x31\xD2\x4D\x85\xD2\x74\x00\x41\xFF\xD5\x48\x83\xC3\x10\xE2\xE6\x31\xC0\x65\x89\x04\x25\xE8\x00\x00\x00\x90\x48\x89\xC9\x52\x5A\x49\x01\xF6\x41\xFF\xE6"
add_section_and_modify_entry(args.input, shellcode_stub, args.output)
// test.cpp
#include <windows.h>
#include <iostream>
#include <stdint.h>
struct MyStruct
{
void* func;
uint32_t len;
};
// Key at the start of the section
__attribute__((section(".funcmeta")))
uint32_t myfuncsec_key = 0x12345678;
// Macro to register a function
#define REGISTER_FUNCTION(fn) \
__attribute__((section(".funcmeta"))) \
struct MyStruct fn##_entry = { (void*)fn, 0xDEADBEEF };
void REG_foo2()
{
std::cout << "\nhello from foo2";
}
int REG_foo(int a, int b, int c, int d, int e)
{
int i = 0;
int x = a + b + c + d + e;
std::cout << "\n" << x;
MessageBoxA(NULL, "Hello from foo", "Test", MB_OK);
if (i)
{
return 0;
}
else{
i++;
}
return x;
}
REGISTER_FUNCTION(REG_foo)
REGISTER_FUNCTION(REG_foo2)
int main()
{
bool p = VirtualProtect(0,0,0,0);
std::cout << "MAIN here";
int a = REG_foo(1,2,3,4,5);
std::cout << "\n Ret val :" << a;
REG_foo2();
return 0;
}
The image below illustrates the state of the REG_foo function prior to the initialization phase, where the function body remains unmasked. You can clearly observe the custom prologue and epilogue code stubs attached to the function body, which are responsible for managing execution flow and preparing for masking operations.

After the initialization phase, the REG_foo function becomes masked along with its epilogue code—only the prologue remains visible, as shown in the image below. This reflects the intended runtime state where the function body and epilogue are protected, ensuring that masking is active throughout the binary’s execution.

This post was written by saab_sec.
You can find the companion code to this release on the MDSec github.