From justin.fletcher@ntlworld.com Mon Apr 19 22:55:26 2004
Date: Wed, 3 Sep 2003 05:14:39 +0100
From: Justin Fletcher <justin.fletcher@ntlworld.com>
Newsgroups: comp.sys.acorn.programmer
Subject: Justin's super-quick guide to compiling C for use from other
    languages

On Tue, 2 Sep 2003, Andrew Pullan wrote:

> On 31 Aug Justin Fletcher wrote:
>
> > > > Ahthough all this is probably not much help if you are still using
> > > > BASIC.
> > >
> > > Actually once I've got the data structures fixed, I can then do it in ARM
> > > code. WXL already has several bits of ARM code to scan data structures,
> > > so another one or two wont make much differance :-)
> >
> > Surely that's just making more work for yourself ?
> >
> > Write it in C, compile it up and link it as bin, then call the bits you
> > need from your BASIC as you see fit.
>
> Didn't think you could call C from basic. Unless it was a module or a
> separate application.
>
> Sounds like a goo idea, but I've no idea how to do it so I'm not inclined to
> experiment. I  crash the machin enough as it is!

Compiled C's just normal machine code.

Justin's super-quick guide to creating code to use from other languages in
C... [probably inaccurate in a lot of places and glossing over specific
details in many others - be prepared to skip dull or plainly wrong
sections]

APCS allows for a number of variants. Lack of stack limits and stack
checking is one of them. This is what we want to use for our example C
code because it's a lot easier to work with from other languages (and
because you know what you're doing if you're linking in from other
languages). As a switch to the compiler this is -apcs 3/noswst (no
software stack checking). And because we want our code to be nicely
compatible with all systems, we build it as 32bit, so the switch becomes
-apcs 3/32/noswst.

Now, the 32bit switch means many things which you already know - mostly
that you're not preserving the state of flags over the call to your C
code from outside. It also means that internally it's not being maintained
over the function calls, but the bits inside that code you're not caring
about because... well that's the Compiler's business and you assume that
it knows what its doing.

For the purposes of this quick guide, I'm going to assume that you're in
USR mode for your code. It doesn't really change much but just use -fz if
you're building for use in SVC mode (don't worry, it's not likely to bite
you but it's just a little safer should you ever be clever).

So you're nearly set on /how/ to compile the code. You need the normal C
compiler command...
  cc -c -apcs 3/32/noswst -o o.code c.output

Notice that I'm not using an include path here. The default include path
for system includes will still be present and we're not really all that
worried about that. However, you're NOT going to be using the C library.
Well, you might. But you'd have to re-write it, find another, or do
cunning things to link with SharedCLibrary.

If at this point you're asking yourself why you're not using the C
library, you may be looking at the problem from the wrong point of view.
The purpose here, is to produce some code in C that takes the work out of
maintaining it. C is easier than assembler to maintain. It's faster than
BASIC. The C library provides a lot of useful stuff, but you don't need a
lot if you're just manipulating algorithms.

So, then... lets look at what the C code has to be...

The C code itself is has to do what you want it to do. That's the crux of
this. So, lets assume for the sake of argument that you're wanting a
little routine that checks whether you're inside a polygon or not. Why am
I giving this as an example ? Because that's exactly what I use in my
little ImageMap checking module and I've got the code to hand. I'm going
to cite the entire code infact. It's small and anyone who wants to bitch
about posting code to the group can just fire away...

-------- c.poly
int InPolygon(int polygon[],int mx,int my,int points)
{
  int lastx=polygon[points*2];
  int lasty=polygon[points*2+1];
  int yflag0=(lasty >= my);
  int thisx,thisy;
  int j,inside_flag,yflag1;

  inside_flag = 0;
  for (j=0; j<points+1; j++)
  {
    thisx=polygon[j*2];
    thisy=polygon[j*2+1];
    yflag1 = (thisy >= my);
    if (yflag0 != yflag1)
      if ( ((thisy-my) * (lastx-thisx) >=
            (thisx-mx) * (lasty-thisy)) == yflag1 )
        inside_flag = !inside_flag;

    yflag0 = yflag1;
    lastx=thisx; lasty=thisy;
  }

  return (inside_flag);
}
--------

Um... that's not commented, I'm afraid. Basically you pass in three
parameters.
  polygon =  A pointer to an array of points (x then y) which describe the
             polygon we're checking
  mx = mouse x position that we're checking
  my = mouse y position that we're checking
  points = number of points in the polygon - 1
... and you get back a flag to indicate if you're inside or not.

The maths behind this algorithm aren't amazingly complex but I couldn't
explain them - consult a text book or online guide for more details, but
this isn't really relevant to our example here.

So, you've got your little chunk of code and you've compiled it up. Using
something like :

cc -c -apcs 3/32/noswst -depend !Depend  -throwback -o o.poly c.poly

And then we need to get this code to a state that we can use it. So we
create a binary image of the code...

link -bin -o bin o.poly

And from this we have a binary file that we can call directly. You can
look at the bin file at this point and see that what it's compiled to and
whether you could have written it better by hand. If you're reasonable at
assembler the answer should be 'yes' at this point. But you should also be
thinking to yourself how much easier the C is. If you're not seeing the
point then you might want to wander off and do something else at this
point. If not then lets roll on...

The code to this point can be found at :

  http://homepage.ntlworld.com/justin.fletcher/CGuide/1/

which you might want to look at to see how this pans out rather than doing
it all yourself.

====


We're not wanting to call this code directly though. Because we're in
BASIC. Here's where you might need to know your APCS bits.

APCS (or ATPCS if you're wanting to bring Thumb into this, which I'm not)
defines the register bindings used by compiled code in order to interwork
different compiled code lumps. For each variant of the APCS standard the
code is (generally) incompatible. There's a few that are compatible so
long as you observe some restrictions but you'll want to go and read the
right specs if you're that bothered by what is and what isn't compatible.

Skip the next paragraph if you don't care about the description of the
APCS variant we use.

I mentioned above that we're not using stack checking. That changes the
variant of the APCS we use. We're also using the 32bit variant which makes
it easier to move the code around. We're implicitly using the
non-reentrant form of the APCS standard. If you want to know more about
that, against the right specs will help but suffice to say that this
allows a single block of code to be safely used by multiple clients. For
*this* example, I'm not dealing with the reentrant variant because it
makes the whole process a lot more fun and this is really only intended as
a quick guide. We're also using the variant of the FP code that the
compiler is shipped with (that's FPE2 for CC's up to around 5.30 or
something, and FPE3 for later ones; version numbers might be a little out,
off the top of my head). In this example we don't use FP code so it's not
an issue.  Depending on who you target, you might want to retain FPE2
compatibility, or only support the later FPE3 code. That's not something
you should worry about right now though. We're using the frame pointer,
because it makes the code a little easier to follow (if you would prefer
to use the stack pointer instead, /nofp will do this - I've not tried this
so I can't say how readable or reliable the code produced is in this
form). And we're using the standard of passing floating point arguments on
the stack. This is the default for normal C code so it makes sense to
follow it. Again, if you're not using FP then it doesn't matter.

So, that's the APCS we're using.

What's that mean ?

Well... in terms of the registers we have the first 4 registers as general
purpose registers. These are used within the compiled for lots of
transfers and calculations, and they're also the registers that are used
to pass parameters to functions. More about this later. It is important to
know that these registers are corrupted when a function returns, though.

The next 7 registers (r4-r10) are treated as register variables within the
function. These are preserved over function calls and can therefore be
used as the compiler wishes for preserving results over the calls, etc.

r11 is the first register with a dedicated roll for this variant of APCS.
It is called the frame pointer and is used to reference arguments passed
on the stack and local storage on the stack.

r12 is a general purpose register, like r0-r3 and can also be corrupt on
return from a function call.

r13 is your stack pointer as you might expect.

r14 is the link address to return to, or a general purpose register. As
with most other assembler, it is assumed that this is corrupt on return
from function calls.

r15 is your program counter. As normal.

In APCS the registers have different names (r1-r3 => a1-a4, r4-r10 =>
v1-v7, etc) but you shouldn't worry too much at this second. You'll find
docs on these in good time.

So, there you have your registers that your C code uses. Let's quickly run
over what the APCS does about calling functions from a machine code point
of view...

If you want to pass no arguments there's nothing you need do. You just
call the function. And you get your result back in R0. For all versions of
the APCS you get your result back in R0. If you're returning a structure
things change slightly but we'll ignore that for this quick guide.

If you want to pass 1-4 arguments you put them in r0-r3. So a function
that passed two parameters would put the first in r0 and the second in r1.

If you want to pass more than 4 arguments then you pass the first 4 in
r0-r3 and the remainder are placed on the stack. So, lets say you're
passing 6 arguments. You place args 1-4 in r0-r3. You drop the stack by 8
bytes (4 * 2 because you're passing two more registers). You store arg 5
at sp+0 and arg 6 at sp+4. Then you call the function. When it returns,
you increment the stack by 8 to reclaim the space (of course you could
leave the stack lowered so that you can use it for other things, but
that's up to you).

For the sake of sanity, pass all your arguments as pointers in your
external-facing C interfaces. You can pass structures to the code if you
want, but you're going to have to do more work. Same as if you return
structures - easier to pass a pointer to a structure to fill in. Inter-C
calls can pass structures around quite happily and you can examine the
code produced to see how to do this, but don't worry for now.

Similarly, variadic functions may make your life tricky for external
facing code so don't bother for now.

So we've coverted how to call the C code from outside and what will
happen when it returns. Basically:
  Stick your parameters in r0-r3 (and extra args on the stack).
  Call the function.
  Get back your result in r0.
  r1,r2,r3,r12,r14 corrupt.

If you look back about at the binary output you got when you compiled it
(or downloaded from my site) you'll be able to see some parts of this
description in place for the entry and exit sequences.

However, we do want to use this with BASIC because, after all, we've
written our web browser in BASIC so obviously that's our preferred
language :-)

So we need to look at the what BASIC gives us when we call it. There's two
ways of calling machine code from BASIC. One is CALL. The other is USR.
CALL can pass parameters to the routines using variables, strings, etc- I
wouldn't worry about this because it's not as useful to the C coder as it
is to the assembler coder (you could always write routines to access the
variables from the C code once you're handy with integrating C and
assembler). USR can get a result back. Both can pass in 8 registers in
r0-r7.

The registers passed in by BASIC are :

R0-R7   A% - H%
R8      Pointer to BASIC's workspace
R9      Pointer to list of parameters (for CALL)
R10     Number of parameters (for CALL)
R11     Pointer to BASIC's string accumulator
R12     BASIC's LINE pointer (points to the current statement)
R13     Pointer to BASIC's full, descending stack
R14     Link back to BASIC and environment information pointer

As you can see, this is a little different to that expected by the C code.
What we can do here is to write a tiny little veneer to call our C code
from the BASIC. Because we know our little routine has 4 parameters and we
can pass them very easily in A%-D%, that's what we'll do. Which means that
we just have to preserve the flags around the call to ensure that we
return with V clear. If V is set on return an error will be generated and
it is easier for us if this doesn't happen.

So we have a nice simple veneer...

   IMPORT InPolygon
call_InPolygon
   STMFD  sp!,{r14}
   BL	  InPolygon
   CMP    r0,r0        ; clear V (don't care about other flags)
   LDMFD  sp!,{pc}

So, that's simple enough.  The only bit that might be unclear is the use
of the 'IMPORT'. This ensures that the symbol 'InPolygon' which is
provided by the C code is able to be used in this file. If you don't
import the symbol, you can't use it. Similarly if you wanted to use a
routine in this file somewhere else in the C code, you'd have to EXPORT
it. We'll come to that later.

That's just one routine which was obviously quite simple. As a quick
example, say you had a function that took 6 arguments and you wanted to
pass them in A%-F%...

   IMPORT SixParamsFunc
call_SixParamsFunc
   STMFD  sp!,{r4,r5,r14} ; push the 5th and 6th argument on to stack
   BL     SixParamsFunc
   ADD	  sp, sp, #8
   CMP    r0,r0
   LDMFD  sp!,{pc}

We have to put this in an assembler file so that it can be called and that
means we need to use objasm. The parameters for objasm are very similar to
those for cc actually...

objasm -apcs 3/32/noswst -depend !Depend -throwback -o o.asm_poly s.asm_poly

However, if you try this with the snippet above you'll find it doesn't
work. Firstly objasm has to have an 'END' command at the end of the file
to ensure it doesn't just run off the end (that's just what it needs,
ok?). And secondly it needs to know what the stuff it's assembling is for
the output file, and what it's 'called'.

The object files we produce - that AOF files, that is - can contain a
number of areas within them which are named and have particular
attributes. The attributes we have to give are code are 'CODE' and
'READONLY' (well, because it is!). These attributes, together with the
name determine where in the output file the data will be placed. The names
are sorted alphabetically, and the C compiler puts code in 'C$$Code'. We
need (for reasons that will be obvious in a moment) to place our assembled
code before the C code. So we call the area '!!!!First' by placing the
following at the top of the file :

   AREA   |!!!!First|, CODE, READONLY

The | characters are to ensure that the name is treated as a string rather
than being terminated at the first non-symbol character. Just use them
like that for now.

So at this point you may want to compile and assemble the code and link it
with something like :

link -o bin -bin o.poly o.asm_poly

If you look at the bin file you'll see that this contains both the C code
and our little assembler veneer. And that the veneer is first in the file
- that's why we used the alphabetically earlier name for the area.

If you're lazy, skip to :

http://homepage.ntlworld.com/justin.fletcher/CGuide/2/

and you can save yourself actually doing any of that lot.

====

Now you've got your code to the point at which you can see how to call it.
All that remains to complete this little example is to actually do that.

First we want to set up a random shape to use as our test... That's nice
and easy because it's just a number of points stored in a block. We'll
even draw it so that we can see what we're doing...

----
REM Setup and draw a polygon
points%=5
DIM poly% points%*8
FORI=0TOpoints%-1
 x%=RND(1024):y%=RND(1024)
 poly%!(I*8)=x%
 poly%!(I*8+4)=y%
 IF I=0 THEN MOVE x%,y% ELSE DRAW x%,y%
NEXT
DRAW poly%!0,poly%!4
-----

All that does is just populate the memory block with a number of different
random positions so that we can pass it to the polygon routine. In a real
situation you'd actually put real data in there, otherwise you might
as well use RND(2)-1 instead of this polygon routine.

We load the code into memory from our binary file...
-----
REM Load code
SYS "OS_File",5,"bin" TO ,,,,len%
DIM mc% len%
SYS "OS_File",255,"bin",mc%,1<<31
-----

The little thing that might not be obvious there is the use of 1<<31 to
say that this file is Code so it should be synchronised. This ensures that
we don't try executing what was there before hand on a StrongARM (and
similar) system. Refer to SH for the definitions of the above SWIs (read
file info and load file).

Finally we use the stonkingly simple code to read where the mouse is and
then call the nice code we loaded to process it.

-----
REM Now, lets test the code
REPEAT
 MOUSE B%,C%,b%
 A%=poly%
 D%=points%-1
 in%=USR mc%
 VDU30:PRINT"Inside polygon: ";in%
UNTIL 0
-----

And that, as they say, is it.

Well, for that example anyhow.

http://homepage.ntlworld.com/justin.fletcher/CGuide/3/

will catch you up at this point.

======

Now you need to know some things about what you can and can't do in C code
like this.

* You can't call the C library. Ok, so that may seem restrictive, but is
it really ? Not in general. You're using C as effectively a mechanism for
getting maintainable code and not just to get C library support. You want
standard C functions, you implement them.

What do you want ? strcpy ? well that's just a set of byte operations.
Write it in C or write it in assembler and call it... either way it's not
too complicated to implement.

printf ? Well, you're on ropier grounds here, but you can easily knock up
a little assembler routine to call XOS_WriteC or whatever and use that.
We'll come to SWIs in a little bit.

malloc ? yeah, that's a tricky one. Create a heap at the top of your BASIC
stack (hell, JFShared does this and it's not so complicated). If you're
writing a web browser then you've probably got your own little assembler
veneers for memory allocation so you just reuse them - you probably don't
even have to wrap them up in much; they'll probably already be compatible
unless they return results in r14 or r3 or something.


* You can't just reuse functions from the standard C usage in a
different form. If you think that redefining memcpy to draw a rectangle
you may find that calling it still copies some memory. The compiler can
make some assumptions. If you find things going a little odd and you've
redefined a standard function differently, check this before going on.


* You may have to provide some 'standard' functions yourself if you want
to do certain things. Division, for example. This is implemented by a
separate function, so you'll have to provide that function if you want
to divide two numbers. No big deal for a lot of things 'cos you can easily
avoid division when doing a lot of things in your code anyhow. Similarly,
you may find that structure assignments (copying a structure to another
structure) will invoke a function (_memcpy, IIRC) if the copy is large
(otherwise it'll do the copy inline).


* You don't get abort handling. If you access invalid memory, you get a
data abort. You don't get a stack backtrace. If you want it, you write it.
But that shouldn't matter too much, 'cos you didn't use the C you'd be
using assembler and be in the same state. You could always track back -
the method is the same as if you were in C; there's some old backtrace
bits I posted up at

  http://homepage.ntlworld.com/justin.fletcher/SCL-BackTrace/

which I put up for Chris Bazley quite a few moons ago. You'd need to rip
out bits from that, but you could have something there.


* You don't get allocations of zero-init data areas. This may not mean
much to you, but avoid leaving areas unassigned if possible. That is, a
global variable declaration such as :

  char buffer[256];

would allocate 256 bytes of zero-initialised data. However when you get
your binary image this space won't actually be allocated in the area, so
it'll corrupt anything after the allocated space if you tried to use it.
In this case, you could use something like :

  char buffer[256]="";

to avoid this. Just remember to initialise your declared
non-local variables and you should avoid this problem.


So having gone through what you can't do, lets look at a problem that
we've caused by building the code as '-bin'.

Consider the simple code fragment :

----
static const char *version="v1.00 (03 Sep 2003)";

const char *get_version(void)
{
  return version;
}
----

Nice and easy, don't you think ? Yes ? Well. It is, but this is binary
code we've built. So it's all based around being run from 0 (or from -base
if you had specified that when linking). Look at the bin produced by this
code.

http://homepage.ntlworld.com/justin.fletcher/CGuide/4/

has this bit done for you. Note that this is without the polygon bits I
was doing earlier.

You'll notice first that the code isn't at the start of the file. That's
one minor thing - if you used a little veneer like the BASIC example we
had then this won't matter so lets ignore that for now (we'll throw one
around it in a second). However, the code that loads the instruction
merely returns the value at address &24 back to the caller. This isn't
right - none of the code is relocated.

What's this mean ? Surely ARM is generally relocatable ? Well, most of it
is, but absolute references to memory aren't. Any references to global
or static data will be referenced through absolute addresses. So in this
case, the absolute address is &24 because we've not relocated it for where
it will be loaded in memory.

You probably won't care about the mechanisms and the method in which
addresses are constructed so I'm not going to go in to this. It's not
hard, but not worth it either. Suffice to say we need a relocation to be
performed before the code can be used. And this can be achieved quite
easily. We pretend the code is a relocatable module. Relocatable modules
have their relocation code appended to their end, along with the logical
addresses at which they will be relocated (a list of /what/ needs
relocating). This is quite simple to achieve because it just means we need
to link it as a module and then call the relocatable code before we use
the rest of the object.

Which is about the time at which we need to rethink how we provide our
assembler. If we want to jump lots of little lumps of C code then it will
be easier if we have a table of branch points at the start of our image.
This way we can just do something like...

  CALL mc%+8

(or even have the 8 as a symbolic constant which will be compressed away
by the BASIC compression tools) to call the 3rd entry point in our C
code. So, that's what we'll do. We'll make the following entry points
available ...

+0 - relocation entry point for our code
+4 - return the version of this code

So, we have a little bit of assembler that does this :

----- s.asm
;
; Veneer for our C code
;

   AREA   |!!!!First|, CODE, READONLY

; Start of the output file
   B      doreloc
   B      call_get_version

   IMPORT __RelocCode
doreloc
   STMFD  sp!,{r14}
   BL     __RelocCode
   CMP    r0,r0
   LDMFD  sp!,{pc}

   IMPORT get_version
call_get_version
   STMFD  sp!,{r14}
   BL     get_version
   CMP    r0,r0        ; clear V (don't care about other flags)
   LDMFD  sp!,{pc}

   END
-----

This is really quite simple - notice the same veneer for the BASIC
interface to that which we used earlier. The branch points have been
placed at the start so that they're based at the beginning of the image.

Once we link these bits of code together we get an object that is typed
as a module. In my example makefiles I settype this as Data so that if
its double clicked you get a message telling you that nothing handles it
(or in my case, it loads into Zap), rather than that the module is
corrupt. Not an important thing, but it might be frustrating if the
header for the module didn't look like it was corrupt and it blew the
machine up.

In order to test this little bit of code, we can reuse the code that
loaded our bin object. Immediately after that load, we call the relocation
code.

-----
REM Relocate it
CALL mc%
-----

and finally print out the message :

-----
REM Now, lets test the code
msg%=USR (mc%+4)
PRINT"Version: ";
REM Remember, this is a 0-terminated string, not CR-terminated
WHILE ?msg% <>0
 VDU ?msg%
 msg%+=1
ENDWHILE
PRINT
-----

The note is important - you'll probably be handling strings that are
zero-terminated in C. This makes life easier should you finally transfer
the bulk of the code to C-proper eventually. However, it may make things
more complex for your BASIC that expects CR terminated strings.
Alternatively you could deal with all strings as CR terminated in your C
code and make BASIC's life easier. That's really a design decision you'll
have to make yourself.

Yup, you've guessed it... you can find the bits up to now at :

http://homepage.ntlworld.com/justin.fletcher/CGuide/5/

=====

Let's quickly look at the assembler veneers we've got there... they're
bulky and awkward. So, we'll make a little macro of them. We can do this
quite simply with a little thought. It's important to make the veneers
simple so that we're not going to end up getting more tied up in ourselves
than we need be. I've taken a slightly complicated, but (I think) quite
sensible route of allowing the veneers to take up to 8 parameters (A%-H%)
and to pass these on to the C code as necessary. Rather than cite it here
and then give you a blow-by-blow description of it, just look at :

http://homepage.ntlworld.com/justin.fletcher/CGuide/6/

It's not much different to /5/ but it makes the assembler simpler and will
be much more usable if you were to have quite a few veneers to provide.

=====

At this point you're probably getting to the point where you'd wished I'd
not bothered with this 'quick' guide. However, there's a couple of further
things to point out about the use of C like this. If you've got your hands
on the ARM cookbook then you may have seen it before. The __swi extension
is specific to the Norcroft compiler and can be used for debugging and
real code to save time. It must be used with care, but it can save you a
lot of effort in re-writing veneers. It's not useful for all cases of SWIs
so it doesn't remove the effort completely but it makes things a lot
easier in some cases.

The syntax for this extension is simple :

<return type> __swi(<number>) <function name>(<parameters);

That is, you declare the function just like any other, except for the use
of the __swi extension. Instead of calling the named function, the
compiler will just inline the SWI you specified. Right there. To hell
with the side effects. So you'd better be sure that there aren't any
that'll affect APCS. Because you're using 32bit APCS you don't have to
worry about register corruption. You will probably want to call the non-X
variant versions of the SWIs however. This will cause an error to be
generated instead of V being set. Why is this useful ? Well, you can't
tell if V was set on return had you set the X bit.

-----
void __swi(0x3) os_newline(void);
void __swi(0x104) vdu4(void);
void __swi(0x2) os_write0(const char *str);

void debug0(const char *str)
{
  vdu4();
  os_write0(str);
  os_newline();
}
----

Now if you call debug0 with a string it'll print it to the VDU4 cursor
followed by a newline. Printing to VDU4 is just my way of debugging, 'cos
it's all captured to my !Console application which makes it nice. For me.
Obviously you'll want to do something else. Well, probably. That's just an
example anyhow.

http://homepage.ntlworld.com/justin.fletcher/CGuide/7/

has the bits for this, showing a debug message printed when get_version is
called. Go on, have a look at the little function debug0. You should
notice the 3 SWIs placed inline.

Now, the limitations of __swi. As you know, you shouldn't have side
effects. That's not too hard to ensure. More difficult, though are the
input and output restrictions. Because it just sticks the SWI in place of
a Branch Link instruction, you are constrained by APCS. The only return
parameter is in r0. The only input parameters are in r0-r3. This means
that if you want to get out an answer from r1, you can't. Or if you need
to pass 5 parameters, you can't.

However, there is an alternative solution to this problem. Well an
alternative other than to write your own veneers, that is. I don't like
OSLib. The main reason I don't like it is the friggin' huge length of time
it takes to build. Twice. And then the time it takes to link anything
against it. But, for this purpose it's utterly ideal and wonderful to use.
Even given its huge size when you link against it. Really, its nice. OSLib
provides you with those SWI veneers you can directly link against. Those
which it can inline it will. You can't ask for more :-) Well you could,
but Peace On Earth and A Safe Full of Gold isn't going to happen.

You're going to have to think back to about 2:10 am when I wrote about the
-fz switch now... If you were compiling this code for SVC mode those SWI
calls would be corrupting R14. By default, the compiler 'knows' that SWIs
don't corrupt R14. However, -fz tells it that they do. This ensures that
the SWIs preserve R14 around themselves. This may not be too important for
many functions, but if the function were to be optimised without this
knowledge you might end up with something like :
   SWI OS_Write0
   MOV pc,r14

which would be fine in USR mode, but fatal in SVC mode. With -fz, r14 will
be preserved around such usage. All my examples use -fz just to ensure
that this is safe for whatever use you put it to.

So there you have simple SWI calls. You probably won't care too much about
SWIs most of the time, because you're dealing in algorithms, etc. Unless
of course you're implementing (say) a redraw loop in C. Then you need a
few SWI calls for the actual operations you do during that loop - plotting
text, locating sprites to render, etc, etc.

=====

Finally - really finally - there's the issue of global data. As I've
mentioned, the global data requires that you use relocation. That's all
great and that but if you're not keeping your 'main' data in the C then
that doesn't help you much. Invariably you have a lot of BASIC structures
that you keep assigned together and manipulate from BASIC alone. The
simplest way to make your structures from BASIC available to C is to place
them in a block of memory and pass this to the C code as a parameter.

Let's for the sake of argument say that we've got a linked list of
'things' that we want to pass to the C code. Usually it lives in a
variable 'list%'. We want to move this into the C so that we don't have to
mess in the C code as much. We have two choices. Change all occurances of
'list%' to directly manipulate a structure that we pass to C. Or assign
the value of list% into that structure before we call the C code that uses
it. The former can be painful if there's a lot of use in the BASIC. The
latter however can be slower if a lot of variables need to migrate across
to the C representation.

This is just an example of how you might implement a function to sort the
list through C. We'll assume that the C code needs the global memory for
something else as well, although in this case we'll just use it to return
the number of entries as a side effect.

----
DIM globalmem% 8
REM +0 = list% value
REM +4 = number of items in list (after sorting)

DEFPROCsort_list(list%)
!globalmem%=list%:REM Set up the global memory for the C code
A%=globalmem%
CALL mc%+mc_sort%
list%=!globalmem%:REM Read back the value from the C code
PRINT "Sorted ";globalmem%!4;" items"
ENDPROC
----

Strictly the return parameters would have been better returned using USR
but I'm trying to demonstrate something, ok ?

The C code would look something like this :

----
typedef struct globalmem_s {
  listdata_t *list;
  int nitems;
} globalmem_t;

void sort(globalmem_t *globalmem)
{
  int counted=0;
  /* Stuff... */
  globalmem->nitems = counted;
}
----

This is simple but can be made even simpler. Or more complex, depending on
how you think of it. The Norcroft extension __global_reg can be used to
declare that a register /always/ contains a variable value. Code compiled
like this can be more complicated in some regards but it can get you some
performance gains as well as making it look cleaner to the eye reading the
code produced.

In our case, you might want r7 to always contain a pointer to some global
structure. That's H% from your BASIC code. That restricts the number of
parameters you can pass, but you also shouldn't need as many parameters.

You might achieve this something like this :

----
DIM globalmem% 8
H%=globalmem%
REM +0 = list% value
REM +4 = number of items in list (after sorting)

DEFPROCsort_list(list%)
!globalmem%=list%:REM Set up the global memory for the C code
CALL mc%+mc_sort%
list%=!globalmem%:REM Read back the value from the C code
PRINT "Sorted ";globalmem%!4;" items"
ENDPROC
----

Notice that H% is initialised at the start of the code and we assume it is
never modified in the BASIC.

----
typedef struct globalmem_s {
  listdata_t *list;
  int nitems;
} globalmem_t;

globalmem_t __global_reg(4) *globalmem;

void sort(void)
{
  int counted=0;
  /* Stuff... */
  globalmem->nitems = counted;
}
----

Here you see that there isn't any parameter passed to sort; globalmem is
treated as a global value which does not need to be loaded from memory.
The register is used directly instead.

I've not done you an example for this one. You might like to play with it
yourself. It's really quite

There is a further method of sharing the data between BASIC and C -
implement a way to call the BASIC functions off the r14 to read the
variables directly. However this is a little more complex and deserves a
whole article in its own right :-)

=====

Ok, I lied. This is the finally. When you're debugging your code you might
find that leaving the function signatures active is a good idea - you can
see which function is where and what you're doing when things go bang.
When you're done, I'd like to suggest that leaving them in is useful even
in the field. You might disagree, so use -ff to remove them from
production code. Up to you. I prefer them being left in.

And also the methods described in this document also apply to linking C
code in to assembler-based code. You just write it in your C with nice
structures and then stick a few veneers between your BASIC and Assembler
to make it all gel nicely.

=====

Obviously this post will be complained at for being overlong and I
wouldn't doubt that there are technical inaccuracies in it, so if you want
to shoot it down over these things, feel free. Probably typos and
gramatical mistakes too. You could complain about them as well, but you'd
do better to complain about the technical bits, 'cos otherwise it's a bit
pointles this being csa.programmer and it might as well be csa.pedants.

Apologise to all who didn't care... I ramble and I don't describe things
well, so feel free to complain about that, too :-)

-- 
Gerph {djf0-.3w6e2w2.226,6q6w2q2,2.3,2m4}
URL: http://www.movspclr.co.uk/
... So I will light a candle for you. Keep it burning in the night.
    And pray that you are alright.