MI-NUR Runtime systems - NULOG - Non-Useful Language Of Gods

Team members:

The goal of the work

The goals of our work are:

design programming language,
design compiler to a bytecode,
design stack-based bytecode virtual machine.

NULOG - language

The programming language's syntax is defined by a grammar.

We have used the ANTLRWorks(http://www.antlr.org/works/index.html) software for creating a grammar for our language and for generating the AST. This tool is able to create a Lexer and Parser classes in your preferred language based on the specified grammar. As our favourite programming language is C#, we have used the CSharp3 extension for this tool.

Our language supports basic elements needed for implementing algorithmic problems such as the Knapsack problem.

That is:

Classes and inheritance.
Class fields.
Functions with parameters and native functions.
Local variables.
Integers, strings, arrays and objects.
Key word var, because our language is type-less.
If-else blocks (only with one optional else block).
While cycles (they can simulate all other cycles like for, or repeat).
Calling instance functions, object's functions and accessing fields.
Length property of arrays.
Basic arithmetic operations and comparison operators ( +, -, *, /, %, >, >=, <, ⇐, ==, != ).

List of reserved key words

native
true
false
new
var
function
class
while
if
else

Compiler

Our compiler consists of three base parts.

Class loader

Class loader scans the work directory for files with .nulog extension and for classes inside of those files. It checks whether there is any Main function in all the classes and whether there are no more than one Main functions in each one of the classes.

It also gathers vital information about each class:

Class name.
Base class - inheritance.
Fields and their initial values.
Function names and arguments count.

ScopeTracker

Instances of this object represent a scope in the program. We have three levels of scope in our language. Those are global scope, class scope and function scope. Each scope tracks declared functions, local variables and objects in order to provide the compiler information needed to perform some compile-time checks such as:

declaring variables with the same name in one scope,
creating an instance of unknown classes,
calling non declared instance functions,
accessing not initialized objects or arrays.

The ScopeTracker also stores the compiled bytecode.

Compiler itself

The compiler first runs the ClassLoader in order to obtain information about classes that are going to be compiled. Than it runs Lexer for tokenization of the input and Parser for creating the Abstract Syntaxt Tree. When the AST is created without errors, the compiler starts processing the tree for each class found.

The result of the compiler run is a list of bytecode instructions, that can be processed by the NULOG Virtual Machine.

Exceptions

The compiler throws two types of exception:

TokenException, is thrown if the Lexer or Parser cannot recognize the input, or when the code is breaking some syntax rules.
CompilerException, is thrown when the code breaks any of the compile-time checked rules, such as declaring a variable with the same name ect.

Virtual machine

Our virtual machine is written in C#. Because both autors are C# programmers. In design implementation we were inspired by java virtual machine, because its implementation is well described. Another source of information are books listed in the literature in the bottom of this page.
We describe here only the base principles of our implementation. For deeper understanding are attached source codes.
In our our virtual machine we implemented:

Stack

Stack is used for storing pointers to heap of the objects used in current scope.
We implemented stack as a sliding frame on the integer array. For each scope (function call) is created new frame, arguments from stack are copied to LOCALS and stack pointer is set on next position. When function ends, base pointer is set to previos frame (parent base pointer) and last value from the current stack is copied to the parent frame stack.
Each frame contains:
1. PBP - Parent Base Pointer - pointer to the start of previos frame,
2. SP - Stack Pointer - pointer to top of the stack,
3. FS - Function Symbol - not used in final implementation,
4. LOCALS - it is part where are stored arguments and local variables,
5. STACK - stack for current scope.

        ___________ __________
        |   PBP   |1
        |   SP    |1 HEAD
        |   FS    |1__      FRAME
        |  LOCALS |?  
        |  STACK  |?  
        ----------------------
        |         |
            ...

Heap

Heap is used for storing data of all objects. It is implemented as a byte array.
Byte array was chosen because is better for storing various data types than a integer array.
Heap contains:
1. MARKER - used for identifying start of new object (for debugging)
2. TYPE - type of object (OBJECT = 1, INT = 2, STRING = 3, ARRAY = 4)
3. LENGTH - length of object (allocated space)
4. CLASS - class of the object, it points to class table
5. Data - data for int and string, pointers to object table for fields and array

        ______________ __________
        |   MARKER   |1
        |   TYPE     |1  HEAD
        |   LENGTH   |4   10   OBJECT
        |   CLASS    |4__  
        |   Data     |?   
        -------------------------
        |            |
             ...

Integers caching

Because most of the objects on the heap are small integers, because in our implementation even integers are objects, we create on the heap integer cache in which we store it. When program needs integer in cached range, integer is returned from the cache. Garbage collector doesn't work on this cache, that means integers are stored during whole running time. One integer takes 14B of space, when we store 1000 integers it is only 14kB on the heap.

Class table

Class table contains definition of classes and functions.
Class table contains:
1. name of the class,
2. super class (default is object),
3. table of fields,
4. table of functions.

Function lookup

When function call is invoked, virtual machine looks to the function table of the object for appropriate function, if the function is not found in function table, virtual machine looks for the function in the parent object function table from which is object inherited.

Object table

Object table is structure for storing addresses of objects in the heap. Using object table comes with some benefits for garbage collecting. If we change address of the object in the heap, we just change one value in object table. If we store addresses directly in the heap, we have to go through the whole heap and replace all addresses of the object. But with this benefit, object tabe comes with one issue, which we have to solve. It is the growth of object table. After grabage collector run, objec table has most of its fields empty. We should shrink the object table (it is slow operation and we lose the advantage of object table) or remember empty fields and reuse.

Integer caching

Because most of the object on the heap are integers, we used integer caching (described inthe heap section) for solve the problem of the big object table we wrote about.
Caching integers vastly decreases the growth ot the object table. (In our application.)

Garbage collector

Garbage collector removes unusable objects from the heap (garbage). Object is garbage if the last reference to it is destroyed. For garbage collector run, we devided heap into two parts, always only one part is in use. If heap is full (half of the heap), garbage collector runs and copies to second part all objects, which have any reference.
Algorithm of finding referenced objects work as follows:
1. find a root set of objects - objects referenced on the stack
2. for each object in root set do:
3. copy object to second part of the heap
4. using depth first search algorithm copy to second part of the heap referenced objects, which are not copied

Native functions

Native functions are used when an application cannot be written directly in our programming language. For example by native functions we solve reading from files, or writing to, printing to console and so on.
For native functions we have native function table and special instruction in byte code call_native.

Literature

T. Parr Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages, , 2009. :Pragmatic Bookshelf
SMITH, J.E., AND NAIR, R. Virtual machines: versatile platforms for systems and processes. Morgan Kaufmann Publishers, San Francisco, CA, USA, 2005

Source code

minur_borovto1_hublales.zip

Use of the program

run.bat

NULOGVirtualMachine.exe FinalKnapsack\ Main -args FinalKnapsack\knap_15.in knap_15.out

NULOGVirtualMachine.exe

List of all parameters:

Mandatory:
ClassPath - path to directory with classes to run.
MainClass - Name of the class containing the entrypoint function Main.

Optional:
-hs iteger value in range from 1 to 2147483647  - The size of the heap in MB.
-ss iteger value in range from 1 to 2147483647 - The size of the stack size in kB.
-ics iteger value in range from 1 to 2147483647 - The maximum size of integers to cache.
-args Arguments to be passed to the program. This switch must be last of the parameters!

Example using only mandatory parameters: NULOGVirtualMachine.exe C:\MyClasses\ MyMainClass

Example using optional parameters : NULOGVirtualMachine.exe C:\MyClasses\ MyMainClass -hs 100000 -ss 10000

Example pasing arguments to the program: NULOGVirtualMachine.exe C:\MyClasses\ MyMainClass -args myParam1 myParam2 -mySwitch value

Table of Contents