Matt J. Gumbley | Website: http://www.gumbley.me.uk
Blog: On The Edge of Occam's Razor

EPOC Mixed-Language Programming

Abstract

Matt's notes on developing programs in C, Assembler and C++ for the EPOC platform. How to call one language from another. Synopsis of name mangling conventions.

Table Of Contents



1 Introduction

This project or information is dead. I cannot release any code or further information on it, other than what you'll find on this page. All communication regarding it will be silently deleted. Apologies.

C++ is the "official" language in which one develops software for EPOC. Symbian's OS is OO from the ground up, and C++ calls to methods in DLLs are how the OS is accessed. There are also OPL and Java implementations for EPOC, but I won't be covering them here. Although Symbian provide a C STDLIB implementation, they stress that this isn't the "C API for EPOC".

In some cases, C++ isn't sufficient. Some people claim that it is slow, and produces large programs. To some extent, this is true: virtual function lookups (used when calling member functions of objects that are not explicitly supplied, for example, when passed a pointer or reference to an object) can be expensive. This is one reason why inheritance is, IMHO, oversold. The fact that much of the code reused by an EPOC program is in ROM means that programs are liable to be small.

Some people simply want to code in C, or assembler - it may be what they know; not everyone loves C++; they may need to write in assembler to make the processor go through hoops that the rigidity of a high-level language will not allow. My own case - developing a port of the hForth system for EPOC - means that I have to code in assembler: Forth engines can be coded in high-level languages (see GNU Forth (gforth) for an example), but are possibly larger than their assembler counterparts.

There is (according to comments read on the EPOC World public newsgroups) a small amount of assembler in EPOC itself, mainly the bootstrap code. The rest of the assembler is embedded in C++ source, using the asm("...") directive. The Symbian build tool makmake does not support the building of assembler source, and so this embedded C++ method is the easiest to use.

Debugging an EPOC program is usually done by building it under the WINS envrionment, using Microsoft Visual C++, and testing it in the emulator - a port of EPOC that sits atop Win32. If it works there, it'll probably work on MARM - the version that runs on the Series 5/5mx/7 hardware.

Of course, this is of no use whatsoever to those who write assembler! Nor is it of any comfort to those who eschew Microsoft tools and operating systems. Development of MARM applications under alternative systems such as Linux is not covered here. Suffice it to say that it is possible, using some of the Symbian tools run under Wine, the Windows emulator. I am working on a GNU Remote Debug Stub for EPOC programs, so that they may be debugged remotely from a PC running the GNU debugger. This may help with debugging MARM software.

How then, does one start up a program written in C or assembler for EPOC? There are two possibilities:

  • A "proper" EPOC C++ program has an entry point of E32Main. The first thing it should do is create a cleanup stack. It could read the command line with RProcess::CommandLine(), but most programs are invoked from the system GUI, rather than a command line.
  • A C program, written to use EPOC's STDLIB could have a main(argc, argv[]), as usual. Such programs need to be linked with EPOC's C startup code, ecrt0.o, which was written in C++. ecrt0.o has it's own startup called E32Main. (objdump -lxd ecrt0.o gives a disassembly of this code). You don't have to link with ecrt0.o; you can provide an E32Mai() in your C code, but you'll have to provide a CTrapCleanup pointer, since some STDLIB functions require a cleanup stack.
I found it useful/necessary to write some initialisation code in C/C++, and then call the main body of my code, which was written in assembler. For example, hForth initialises its console objects in C++, then calls the hForth kernel, which is written in assembler. The GDB Stub initialises the Serial I/O system in C++, provides wrapper functions for this in C, and has several routines that are in assembler.

To do this requires knowledge of the differences between C, C++ and assembler calling conventions, function naming, and return handling.


2 Calling language x from y

The possibilities are as follows:
... calls a function written in
AsmCC++
A function
written in
...
Asm The caller does a bl externfunc, and providing the called function doesn't trash the link register, it returns with mov pc,lr. Same as for Asm-Asm Same as for Asm-Asm, but the caller must know the C++ function's mangled name
C Declare the called Asm function as extern Declare the called C function as extern Declare the called C++ function as extern, but you must know it by its mangled name
C++ I expect you could just declare the called Asm function inside an extern "C" { ... } section, but I haven't tried this yet Declare the called C function inside an extern "C" { ... } section Declare the called C++ function as extern


3 Writing Assembler Using asm(...) Directives

This is the easiest way of writing assembler for EPOC. makmake will create a makefile that compiles a C or C++ source file, but it won't do assembler source.

The following observations were made by examining the output of gcc, when compiling simple functions.

3.1 Functions

All functions must reside in a .text section, and marked as .globl - here is a simple assembler function:
extern int func();
asm("     .text 0");
asm("     .align 0");
asm("     .globl func");
asm("func:");
asm("     mov    r0, 5");
asm("     mov    pc, lr");
This simply returns the value 5.

3.2 Parameter passing

A function that takes up to four 32-bit integers as its input will receive these in the r0, r1, r2, and r3 registers, so void func(int a, int b, int c, int d) recieves a in r0, b in r1, c in r2 and d in r3.

If you pass more than four inputs, you have to start using the stack - the following C, and its assembler counterpart illustrate:

static int j;

void funky(int a, int b, int c, int d, int e)
{
  j=a+b+c+d+e;
  return j*5;
}

        .text
        .align  0
        .global funky
funky:
        stmfd   sp!, {lr}
        ldr     ip, [sp, #4]
        ldr     lr, .L2
        add     r1, r0, r1
        add     r1, r1, r2
        add     r1, r1, r3
        add     r1, r1, ip
        str     r1, [lr, #0]
        add     r0, r1, r1, asl #2
        ldmfd   sp!, {pc}
.L3:
        .align  0
.L2:
        .word   j

        .bss
j:      .space  4

        .text 
If you write your assembler function in a C++ source file, as a class member function, then r0 will be this, with r0, r1, r2 holding the remaining parameters.

To test out the assembler version of a C function like this, simply write your C as normal in a stand-alone .c file, and use the CCS.CMD script, as found in the EPOC SDK. I have a slightly modified version of this which doesn't require the use of the unix2dos utility, and doesn't remove the intermediate assembly stages.

3.3 Return values

32-bit and smaller return values are simply stored in r0. 64-bit return values (e.g. long long int) are stored with the most significant word in r1, and the least significant word in r0.
long long int func()
{
  return 5;
}

        .text 
        .align 0
        .global func
func:
        mov     r1, 0
        mov     r0, 5
        mov     pc, lr

3.4 Global and Static Variables

Variable access is performed indirectly. Space for the variable is declared in the .bss section (if the variable is not initialised) or the .data section (if the variable is initialised). If the variable is accessed, a pointer (in the .text section) is created which points to the variable's space; this is then used to access it:

static int j=201;

        .data
        .align 0
j:
        .word  201
        .text
        ... code ...
        ldr    r3, .L2
        str    r0, [r3, #0]
        ... code ...
.L3:    .align 0
.L2:    .word  j
For an uninitialised variable:

static int j;

        .bss
        .align 0
j:
        .space 4
        .text
        ... code ...
        ldr    r3, .L2
        str    r0, [r3, #0]
        ... code ...
.L3:    .align 0
.L2:    .word  j
If the variable is not declared static, a .globl variablename is used before the definition. (and an indirect reference generated, as above).

3.5 Register naming

The gas manual is unfortunately rather vague about the names it accepts for the ARM processor. From the gas source:
  /* Processor Register Numbers */
  {"r0", 0},    {"r1", 1},    {"r2", 2},    {"r3", 3},
  {"r4", 4},    {"r5", 5},    {"r6", 6},    {"r7", 7},
  {"r8", 8},    {"r9", 9},    {"r10", 10},  {"r11", 11},
  {"r12", 12},  {"r13", 13},  {"r14", 14},  {"r15", REG_PC},
  /* APCS conventions */
  {"a1", 0},    {"a2", 1},    {"a3", 2},    {"a4", 3},
  {"v1", 4},    {"v2", 5},    {"v3", 6},    {"v4", 7},     {"v5", 8},
  {"v6", 9},    {"sb", 9},    {"v7", 10},   {"sl", 10},
  {"fp", 11},   {"ip", 12},   {"sp", 13},   {"lr", 14},    {"pc", REG_PC},
  /* FP Registers */
  {"f0", 16},   {"f1", 17},   {"f2", 18},   {"f3", 19},
  {"f4", 20},   {"f5", 21},   {"f6", 22},   {"f7", 23},
  {"c0", 32},   {"c1", 33},   {"c2", 34},   {"c3", 35},
  {"c4", 36},   {"c5", 37},   {"c6", 38},   {"c7", 39},
  {"c8", 40},   {"c9", 41},   {"c10", 42},  {"c11", 43},
  {"c12", 44},  {"c13", 45},  {"c14", 46},  {"c15", 47},
  {"cr0", 32},  {"cr1", 33},  {"cr2", 34},  {"cr3", 35},
  {"cr4", 36},  {"cr5", 37},  {"cr6", 38},  {"cr7", 39},
  {"cr8", 40},  {"cr9", 41},  {"cr10", 42}, {"cr11", 43},
  {"cr12", 44}, {"cr13", 45}, {"cr14", 46}, {"cr15", 47},


4 C++ Name Mangling

In C++, it is possible for the compiler to distinguish between function instances based on thier different parameter lists (e.g. there are definitions of func: func(int i) and func(double j), and depending on the argument given, the correct function is called). Such overloaded function definitions cause a problem for the linker which must resolve all such references in an unambiguous manner. The C++ compiler mangles the names of functions by appending coded information about their parameters to their names.

The format of this code is not defined in the C++ specification; it is peculiar to the compiler, in this case, gcc, and if you're building software under the Emulator (WINS), then you're also relying on Visual C++'s name mangling rules. WINS is a port of EPOC, built using this compiler, and hence, Visual C++'s name mangling rules are hard-coded into the names of every function in the WINS libraries. It is for this reason that you cannot use any other C++ compiler to generate code for the Emulator. (And why you can't use any other compiler to generate code for MARM (why you'd want to use anything other than gcc anyway is a mystery to me!).

The name mangling is transparent, if you're writing wholly in C++. If you write in C or assembler, it'll bite you.

The best way I've found so far to discover a function's mangled name is to run CCS.CMD on the C++ source, then look through the resultant .s file.

To defeat name mangling in a C++ program, define your functions as extern "C". I use this in wrappers, where a C program has to interface with a C++ program (e.g. the C/Asm GDB Stub code calls code in a C++ wrapper that is defined as extern "C". This code then calls C++ code in my C++ Serial I/O module.).

I seem to recall trying to call C++ directly from C/Asm, calling its mangled name, but ran into linking problems. I'll re-investigate, for this document....

I'll try to provide a table of common mangled names in due course; until then, CCS.CMD is your friend.


5 Accessing EPOC

What is the mechanism by which an assembler/C program might call EPOC? Is it via a SWI (software interrupt instruction of the ARM processor) or some other mechanism? Initial investigations show that the ARM's exception vectors are all directed to the same handler. Need a disassembler/monitor running on the ARM...

6 Misc

What effect does __declspec(naked) have on compiled source? (from the gcc/config/arm/arm.c code:)
  • "return" instructions are not used
  • don't output any prologue/epilogue code: the user is assumed to do the right thing

... to be continued
(C) Matt J. Gumbley 1998-2005 - All Tights Reversed