&& && &&
&&
Note: If you encounter error such as
| );
Try adding
#include<limits>
in the header file and re-building the project
Note: For subsequent builds just run
cd build && ninja -j 4
insideCSE601
directory
Note: For testing the pass run the following command
cd tests/PA1/ && make clean && make -B
insideCSE601
directory
There are mainly 4 instructions that may lead to segmentation faults while dereferencing null pointers.
Naïve solution: Adding nullchecks before every instructions of these four types. But it will lead to multiple compare and conditional branch instructions.
Better solution for this is to use forward data flow analysis to track every pointer and add nullchecks only when we are unsure of it being null safe. (Similar to constant propagation data flow analysis)
std::unordered_map<Value *, std::unordered_set<Value *>> childPointers;
std::unordered_map<Value *, bool> unsafePointers;
childPointers
maps each pointer to a set of its child pointers. Or in other words every child pointer’s value is dependent upon its parent pointer.
%a = load i32*, i32** %a.addr;
In this example
%a
is the child pointer of%a.addr
. As if the value of%a.addr
changes the value present in%a
is corrupted and new value may not be null safe.
unsafePointers
just tells whether a pointer present in the map is unsafe or not.
The
childPointers
map is preserved through out the function, but theunsafePointers
map resets at each basic block. As from any basic block, one can reach any basic block. Thus any basic block could have modified the pointer thus making them unsafe, so we can’t rely on previous observations.
includeInst
and processInst
.includeInst
takes in the instruction and both maps. And outputs whether we should add nullchecks for this instruction.processInst
adds nullchecks for the given instruction.includeInst
.Function definition an be found @
Adding nullchecks for current instruction will depend upon unsafePointers[basePointer]
; if the basePointer
i.e. %a.addr
present in the unsafePointers
then the value itself otherwise the basePointer
is unsafe by default.
%a.addr
thus making it unsafe when using it to load %a
.basePointer
i.e. the loaded value to the childPointers
. Then as the loaded pointer is always safe, we’ll write that fact into unsafePointers
.basePointer
in this case is just the function pointer itself. Reasoning behind making the function pointer safe is explained later.unsafePointers
map. Reasoning explained above…Before returning we will always make the
basePointer
mark as safe. Reasoning is that if the current instructions executes after getting through the nullchecks, it means that the pointer is not null. And is safe to use for the coming instruction using that pointer.
This does not do anything as childPointers
is preserved through out the function. And unsafePointers
is empty at the start of each basic block, or in other words it also does not depend upon the incoming unsafePointers
map from predecessor basic blocks.
At the start of function childPointers
is 0 initialized and unsafePointers
is also 0 initialized at the start of each basic block.
void
, int
, int *
. As the pass is adding default return values as the transmission instructions for the NullBB
(exhaustive declaration of return instructions).NotNullBB
. LLVM IR files are present inside nullchecks_opt_ll.SafeGC implements a bump allocator that always returns an 8-byte address.
Safe GC has three components: allocator, mark and sweep.
[obj, obj + sizeof(obj)]
in its address space.Safe GC implements a bump allocator and a stop-the-world conservative garbage collector. The allocator maintains a list of segments. All objects are allocated from a segment. A segment is a 4-GB contiguous memory area.
Physical pages to the segment memory are allocated on demand. ReservePtr
points to the end of the segment. CommitPtr
points to a memory area until which the physical pages have been allocated. AllocPtr
is the head of the bump allocator. The first 2-MB of the segment is reserved for metadata. The rest of the segment is used by the bump allocator. DataPtr
points to the first byte in the data region of a segment.
typedef struct ObjHeader ObjHeader;
SafeGC adds an object header to each object. An object header contains the size, status, and type of the object. SafeGC doesn’t use the type field of the object header; reserved for future use. The status field contains the validity of an object. When an object is freed, the status filed of the object is set to FREE
. The status field can also be used by the mark phase to MARK
reachable objects. The bump allocator never reuses a virtual address. When a segment is full, a new segment is created for future allocations. myfree
is called to free an object. myfree reclaims the physical page associated with a virtual page when all the objects on the virtual pages are freed. A virtual page is a 4-KB contiguous memory area in the segment address space. A virtual page is also aligned to 4-KB size. Future accesses to a freed virtual pages result in segmentation faults. SafeGC reserves two bytes metadata corresponding to each virtual page in the segment. The metadata is stored in a contiguous 2-MB memory area at the top of the segment. The virtual page metadata tracks the number of free bytes on a virtual page. When the number of free bytes becomes equal to the size of a virtual page, the corresponding physical page is reclaimed. When the allocation size of an object is less than 4-KB (page size), SafeGC ensures that the object always lies on a single page. The top of a virtual page always contains the starting address (address of object header) of an object. Allocations of size more than 4-KB allocations are called big allocations. SafeGC implements a different allocation scheme for big allocations. For big allocations, the allocation size is adjusted to the nearest multiple of the page size. For these allocations, myfree immediately reclaims the physical pages. The metadata corresponding to the first page of a big allocation is set to one to identify the first byte of these objects. myfree sets the metadata corresponding to all pages of a big allocation to the page size. The metadata can be used by the mark and sweep phase to check the validity of a page before access, and also for the finding the object headers of big allocations.
Scanning roots: It scans the heap and stack for the 8-bytes aligned valid heap address. After encountering a valid heap address it finds the object header corresponding to that address and MARK
it if not adready marked and add it to the unscanned objects list.
After scanning of the roots is complete. It iterates over each heap object in the unscanned objects list and check for references to valid heap addresses in that object. After encountering a valid heap address it proceeds to do the same as mentioned in the scan roots phase.
It iterates over all the valid heap addresses and fetch their corresponding object header if an object not marked and not freed is found it proceeds to free them using myfree
routine.
Warning: Due to viewing all the value which have holds a valid heap address as a pointer to the heap memory; a conservative garbage collector may incorrectly identify unreachable objects (whose addresses matches with an integer) as reachable, which could lead to a memory leak. Fortunately, in most applications, such cases are few, and thus it is practical to implement conservative garbage collection for them.
Run tests using make test
)
)
) )
)
)
)
)
)
)
)
)
)
This pass aims to enforce spatial and a weaker type safety for the C language via disallowing out-of-bound pointer accesses and having pointers with invalid addresses.
&& && &&
&&
Note: If you encounter error such as
| );
Try adding
#include<limits>
in the header file and re-building the project
Note: For subsequent builds just run
cd build && ninja -j 4
insideCSE601
directory
Note: For testing the pass run the following command
cd tests/PA1/ && make clean && make -B
insideCSE601
directory
C allows pointer typecasting to non-pointer types and pointer arithmetic. Thus, directly enforcing type checks at runtime is not feasible. To address this, we utilize the mymalloc
routine, which tracks objects’ size and type information, enabling dynamic enforcement of memory safety by storing object metadata just before the object itself.
More than just keeping track of the base pointer is required; as in C, a pointer can be typecasted to unsigned long long and passed to a function. Thus, we must keep track of all the variations (child pointers) of the base pointer (parent pointer).
int *ptr = ;
unsigned long long a = ptr;
%ptr_addr = alloca i32*, align 8
%a = alloca i64, align 8
%mymalloc_call = call noalias i8* @mymalloc(i64 4) #2
%mymalloc_output = bitcast i8* %mymalloc_call to i32*
store i32* %0, i32** %ptr_addr, align 8
%1 = load i32*, i32** %ptr_addr, align 8
%2 = ptrtoint i32* %1 to i64
store i64 %2, i64* %a, align 8
As
a
is not a pointer but dynamically contains a pointer value. Thus we need to keep track of it as well. For this we need to track the pointer going through different instructions such as bitCastInst, getElementPtrInst, etc. In this examplea
is a child pointer of the parent pointerptr
.
The first step is to replace every alloca instruction (stack allocation) and malloc call instruction (heap allocation) with a mymalloc call instruction (heap allocation of object and object metadata) so that we have access to the object metadata. For the alloca instruction, we need to determine the size of the requested object, which is done using the custom getSizeOfAlloca routine. Once the size of the alloca instruction is calculated, we insert a mymalloc API call and include the myfree (equivalent to the free routine) after the last use of the alloca instruction found in the original LLVM IR.
This is handled by the replaceAllocaToMymalloc API.
Next step is to disallow out-of-bounds pointer accesses which is handled by the disallowOutOfBoundsPtr API. It works by finding all the pointer accesses and adds a call instruction to a custom isSafeToEscapeFunc before the current pointer access. The isSafeToEscapeFunc
routine works by finding the closest base pointer of the given pointer and checks whether the given pointer lies in the bounds, i.e. [basePointer, basePointer+baseSize)
of the base pointer.
int *arr = ;
arr = 1;
arr = 1; // OOB access
* = 1; // OOB access
; // OOB access
Demonstrations of Out-of-bounds pointer accesses.
int *arr = ;
; // Pass
arr = 1;
; // `Error: invalid pointer\nIssue: pointer out of bounds of base pointer\n`
arr = 1;
; // `Error: invalid pointer\nIssue: pointer out of bounds of base pointer\n`
* = 1;
; // `Error: invalid pointer\nIssue: pointer out of bounds of base pointer\n`
;
C equivalent of the updated LLVM IR after passing through the
disallowOutOfBoundsPtr
API.
Lastly, we add write barriers via addWriteBarriers API to identify instances where invalid heap addresses are getting stored into pointers. This API works by getting the pointer operand from all the store instructions and passing it to a custom checkWriteBarrier routine which validates the heap address. This API also works if we store to an object which contains a pointer operand.
int *ptr = ; // valid address
int *a = 0; // invalid address
LinkedListNode node = ; // invalid address
Demonstration of invalid address. The address 0 is getting stored in the variable
a
, which is invalid. The same is true for the variablenode
, where address 0 is getting stored in theNext
field, which is invalid.
int *ptr = ;
; // Pass
;
int *a = 0; // `Error: invalid pointer (int *)0 found inside (int *)0\n`
;
// `Error: invalid pointer (LinkedListNode *)0 found inside (LinkedListNode){.Value=1, .Next=(LinkedListNode *)0}\n`
LinkedListNode node = ;
C equivalent of the updated LLVM IR after passing through the
addWriteBarriers
API.
Note: We can also use a custom null pointer dereference detection compiler pass in conjunction with this pass for added type safety for C lang. Which can be found here.