From justin.fletcher@ntlworld.com Mon Apr 19 22:55:26 2004 Date: Wed, 3 Sep 2003 05:14:39 +0100 From: Justin Fletcher Newsgroups: comp.sys.acorn.programmer Subject: Justin's super-quick guide to compiling C for use from other languages On Tue, 2 Sep 2003, Andrew Pullan wrote: > On 31 Aug Justin Fletcher wrote: > > > > > Ahthough all this is probably not much help if you are still using > > > > BASIC. > > > > > > Actually once I've got the data structures fixed, I can then do it in ARM > > > code. WXL already has several bits of ARM code to scan data structures, > > > so another one or two wont make much differance :-) > > > > Surely that's just making more work for yourself ? > > > > Write it in C, compile it up and link it as bin, then call the bits you > > need from your BASIC as you see fit. > > Didn't think you could call C from basic. Unless it was a module or a > separate application. > > Sounds like a goo idea, but I've no idea how to do it so I'm not inclined to > experiment. I crash the machin enough as it is! Compiled C's just normal machine code. Justin's super-quick guide to creating code to use from other languages in C... [probably inaccurate in a lot of places and glossing over specific details in many others - be prepared to skip dull or plainly wrong sections] APCS allows for a number of variants. Lack of stack limits and stack checking is one of them. This is what we want to use for our example C code because it's a lot easier to work with from other languages (and because you know what you're doing if you're linking in from other languages). As a switch to the compiler this is -apcs 3/noswst (no software stack checking). And because we want our code to be nicely compatible with all systems, we build it as 32bit, so the switch becomes -apcs 3/32/noswst. Now, the 32bit switch means many things which you already know - mostly that you're not preserving the state of flags over the call to your C code from outside. It also means that internally it's not being maintained over the function calls, but the bits inside that code you're not caring about because... well that's the Compiler's business and you assume that it knows what its doing. For the purposes of this quick guide, I'm going to assume that you're in USR mode for your code. It doesn't really change much but just use -fz if you're building for use in SVC mode (don't worry, it's not likely to bite you but it's just a little safer should you ever be clever). So you're nearly set on /how/ to compile the code. You need the normal C compiler command... cc -c -apcs 3/32/noswst -o o.code c.output Notice that I'm not using an include path here. The default include path for system includes will still be present and we're not really all that worried about that. However, you're NOT going to be using the C library. Well, you might. But you'd have to re-write it, find another, or do cunning things to link with SharedCLibrary. If at this point you're asking yourself why you're not using the C library, you may be looking at the problem from the wrong point of view. The purpose here, is to produce some code in C that takes the work out of maintaining it. C is easier than assembler to maintain. It's faster than BASIC. The C library provides a lot of useful stuff, but you don't need a lot if you're just manipulating algorithms. So, then... lets look at what the C code has to be... The C code itself is has to do what you want it to do. That's the crux of this. So, lets assume for the sake of argument that you're wanting a little routine that checks whether you're inside a polygon or not. Why am I giving this as an example ? Because that's exactly what I use in my little ImageMap checking module and I've got the code to hand. I'm going to cite the entire code infact. It's small and anyone who wants to bitch about posting code to the group can just fire away... -------- c.poly int InPolygon(int polygon[],int mx,int my,int points) { int lastx=polygon[points*2]; int lasty=polygon[points*2+1]; int yflag0=(lasty >= my); int thisx,thisy; int j,inside_flag,yflag1; inside_flag = 0; for (j=0; j= my); if (yflag0 != yflag1) if ( ((thisy-my) * (lastx-thisx) >= (thisx-mx) * (lasty-thisy)) == yflag1 ) inside_flag = !inside_flag; yflag0 = yflag1; lastx=thisx; lasty=thisy; } return (inside_flag); } -------- Um... that's not commented, I'm afraid. Basically you pass in three parameters. polygon = A pointer to an array of points (x then y) which describe the polygon we're checking mx = mouse x position that we're checking my = mouse y position that we're checking points = number of points in the polygon - 1 ... and you get back a flag to indicate if you're inside or not. The maths behind this algorithm aren't amazingly complex but I couldn't explain them - consult a text book or online guide for more details, but this isn't really relevant to our example here. So, you've got your little chunk of code and you've compiled it up. Using something like : cc -c -apcs 3/32/noswst -depend !Depend -throwback -o o.poly c.poly And then we need to get this code to a state that we can use it. So we create a binary image of the code... link -bin -o bin o.poly And from this we have a binary file that we can call directly. You can look at the bin file at this point and see that what it's compiled to and whether you could have written it better by hand. If you're reasonable at assembler the answer should be 'yes' at this point. But you should also be thinking to yourself how much easier the C is. If you're not seeing the point then you might want to wander off and do something else at this point. If not then lets roll on... The code to this point can be found at : http://homepage.ntlworld.com/justin.fletcher/CGuide/1/ which you might want to look at to see how this pans out rather than doing it all yourself. ==== We're not wanting to call this code directly though. Because we're in BASIC. Here's where you might need to know your APCS bits. APCS (or ATPCS if you're wanting to bring Thumb into this, which I'm not) defines the register bindings used by compiled code in order to interwork different compiled code lumps. For each variant of the APCS standard the code is (generally) incompatible. There's a few that are compatible so long as you observe some restrictions but you'll want to go and read the right specs if you're that bothered by what is and what isn't compatible. Skip the next paragraph if you don't care about the description of the APCS variant we use. I mentioned above that we're not using stack checking. That changes the variant of the APCS we use. We're also using the 32bit variant which makes it easier to move the code around. We're implicitly using the non-reentrant form of the APCS standard. If you want to know more about that, against the right specs will help but suffice to say that this allows a single block of code to be safely used by multiple clients. For *this* example, I'm not dealing with the reentrant variant because it makes the whole process a lot more fun and this is really only intended as a quick guide. We're also using the variant of the FP code that the compiler is shipped with (that's FPE2 for CC's up to around 5.30 or something, and FPE3 for later ones; version numbers might be a little out, off the top of my head). In this example we don't use FP code so it's not an issue. Depending on who you target, you might want to retain FPE2 compatibility, or only support the later FPE3 code. That's not something you should worry about right now though. We're using the frame pointer, because it makes the code a little easier to follow (if you would prefer to use the stack pointer instead, /nofp will do this - I've not tried this so I can't say how readable or reliable the code produced is in this form). And we're using the standard of passing floating point arguments on the stack. This is the default for normal C code so it makes sense to follow it. Again, if you're not using FP then it doesn't matter. So, that's the APCS we're using. What's that mean ? Well... in terms of the registers we have the first 4 registers as general purpose registers. These are used within the compiled for lots of transfers and calculations, and they're also the registers that are used to pass parameters to functions. More about this later. It is important to know that these registers are corrupted when a function returns, though. The next 7 registers (r4-r10) are treated as register variables within the function. These are preserved over function calls and can therefore be used as the compiler wishes for preserving results over the calls, etc. r11 is the first register with a dedicated roll for this variant of APCS. It is called the frame pointer and is used to reference arguments passed on the stack and local storage on the stack. r12 is a general purpose register, like r0-r3 and can also be corrupt on return from a function call. r13 is your stack pointer as you might expect. r14 is the link address to return to, or a general purpose register. As with most other assembler, it is assumed that this is corrupt on return from function calls. r15 is your program counter. As normal. In APCS the registers have different names (r1-r3 => a1-a4, r4-r10 => v1-v7, etc) but you shouldn't worry too much at this second. You'll find docs on these in good time. So, there you have your registers that your C code uses. Let's quickly run over what the APCS does about calling functions from a machine code point of view... If you want to pass no arguments there's nothing you need do. You just call the function. And you get your result back in R0. For all versions of the APCS you get your result back in R0. If you're returning a structure things change slightly but we'll ignore that for this quick guide. If you want to pass 1-4 arguments you put them in r0-r3. So a function that passed two parameters would put the first in r0 and the second in r1. If you want to pass more than 4 arguments then you pass the first 4 in r0-r3 and the remainder are placed on the stack. So, lets say you're passing 6 arguments. You place args 1-4 in r0-r3. You drop the stack by 8 bytes (4 * 2 because you're passing two more registers). You store arg 5 at sp+0 and arg 6 at sp+4. Then you call the function. When it returns, you increment the stack by 8 to reclaim the space (of course you could leave the stack lowered so that you can use it for other things, but that's up to you). For the sake of sanity, pass all your arguments as pointers in your external-facing C interfaces. You can pass structures to the code if you want, but you're going to have to do more work. Same as if you return structures - easier to pass a pointer to a structure to fill in. Inter-C calls can pass structures around quite happily and you can examine the code produced to see how to do this, but don't worry for now. Similarly, variadic functions may make your life tricky for external facing code so don't bother for now. So we've coverted how to call the C code from outside and what will happen when it returns. Basically: Stick your parameters in r0-r3 (and extra args on the stack). Call the function. Get back your result in r0. r1,r2,r3,r12,r14 corrupt. If you look back about at the binary output you got when you compiled it (or downloaded from my site) you'll be able to see some parts of this description in place for the entry and exit sequences. However, we do want to use this with BASIC because, after all, we've written our web browser in BASIC so obviously that's our preferred language :-) So we need to look at the what BASIC gives us when we call it. There's two ways of calling machine code from BASIC. One is CALL. The other is USR. CALL can pass parameters to the routines using variables, strings, etc- I wouldn't worry about this because it's not as useful to the C coder as it is to the assembler coder (you could always write routines to access the variables from the C code once you're handy with integrating C and assembler). USR can get a result back. Both can pass in 8 registers in r0-r7. The registers passed in by BASIC are : R0-R7 A% - H% R8 Pointer to BASIC's workspace R9 Pointer to list of parameters (for CALL) R10 Number of parameters (for CALL) R11 Pointer to BASIC's string accumulator R12 BASIC's LINE pointer (points to the current statement) R13 Pointer to BASIC's full, descending stack R14 Link back to BASIC and environment information pointer As you can see, this is a little different to that expected by the C code. What we can do here is to write a tiny little veneer to call our C code from the BASIC. Because we know our little routine has 4 parameters and we can pass them very easily in A%-D%, that's what we'll do. Which means that we just have to preserve the flags around the call to ensure that we return with V clear. If V is set on return an error will be generated and it is easier for us if this doesn't happen. So we have a nice simple veneer... IMPORT InPolygon call_InPolygon STMFD sp!,{r14} BL InPolygon CMP r0,r0 ; clear V (don't care about other flags) LDMFD sp!,{pc} So, that's simple enough. The only bit that might be unclear is the use of the 'IMPORT'. This ensures that the symbol 'InPolygon' which is provided by the C code is able to be used in this file. If you don't import the symbol, you can't use it. Similarly if you wanted to use a routine in this file somewhere else in the C code, you'd have to EXPORT it. We'll come to that later. That's just one routine which was obviously quite simple. As a quick example, say you had a function that took 6 arguments and you wanted to pass them in A%-F%... IMPORT SixParamsFunc call_SixParamsFunc STMFD sp!,{r4,r5,r14} ; push the 5th and 6th argument on to stack BL SixParamsFunc ADD sp, sp, #8 CMP r0,r0 LDMFD sp!,{pc} We have to put this in an assembler file so that it can be called and that means we need to use objasm. The parameters for objasm are very similar to those for cc actually... objasm -apcs 3/32/noswst -depend !Depend -throwback -o o.asm_poly s.asm_poly However, if you try this with the snippet above you'll find it doesn't work. Firstly objasm has to have an 'END' command at the end of the file to ensure it doesn't just run off the end (that's just what it needs, ok?). And secondly it needs to know what the stuff it's assembling is for the output file, and what it's 'called'. The object files we produce - that AOF files, that is - can contain a number of areas within them which are named and have particular attributes. The attributes we have to give are code are 'CODE' and 'READONLY' (well, because it is!). These attributes, together with the name determine where in the output file the data will be placed. The names are sorted alphabetically, and the C compiler puts code in 'C$$Code'. We need (for reasons that will be obvious in a moment) to place our assembled code before the C code. So we call the area '!!!!First' by placing the following at the top of the file : AREA |!!!!First|, CODE, READONLY The | characters are to ensure that the name is treated as a string rather than being terminated at the first non-symbol character. Just use them like that for now. So at this point you may want to compile and assemble the code and link it with something like : link -o bin -bin o.poly o.asm_poly If you look at the bin file you'll see that this contains both the C code and our little assembler veneer. And that the veneer is first in the file - that's why we used the alphabetically earlier name for the area. If you're lazy, skip to : http://homepage.ntlworld.com/justin.fletcher/CGuide/2/ and you can save yourself actually doing any of that lot. ==== Now you've got your code to the point at which you can see how to call it. All that remains to complete this little example is to actually do that. First we want to set up a random shape to use as our test... That's nice and easy because it's just a number of points stored in a block. We'll even draw it so that we can see what we're doing... ---- REM Setup and draw a polygon points%=5 DIM poly% points%*8 FORI=0TOpoints%-1 x%=RND(1024):y%=RND(1024) poly%!(I*8)=x% poly%!(I*8+4)=y% IF I=0 THEN MOVE x%,y% ELSE DRAW x%,y% NEXT DRAW poly%!0,poly%!4 ----- All that does is just populate the memory block with a number of different random positions so that we can pass it to the polygon routine. In a real situation you'd actually put real data in there, otherwise you might as well use RND(2)-1 instead of this polygon routine. We load the code into memory from our binary file... ----- REM Load code SYS "OS_File",5,"bin" TO ,,,,len% DIM mc% len% SYS "OS_File",255,"bin",mc%,1<<31 ----- The little thing that might not be obvious there is the use of 1<<31 to say that this file is Code so it should be synchronised. This ensures that we don't try executing what was there before hand on a StrongARM (and similar) system. Refer to SH for the definitions of the above SWIs (read file info and load file). Finally we use the stonkingly simple code to read where the mouse is and then call the nice code we loaded to process it. ----- REM Now, lets test the code REPEAT MOUSE B%,C%,b% A%=poly% D%=points%-1 in%=USR mc% VDU30:PRINT"Inside polygon: ";in% UNTIL 0 ----- And that, as they say, is it. Well, for that example anyhow. http://homepage.ntlworld.com/justin.fletcher/CGuide/3/ will catch you up at this point. ====== Now you need to know some things about what you can and can't do in C code like this. * You can't call the C library. Ok, so that may seem restrictive, but is it really ? Not in general. You're using C as effectively a mechanism for getting maintainable code and not just to get C library support. You want standard C functions, you implement them. What do you want ? strcpy ? well that's just a set of byte operations. Write it in C or write it in assembler and call it... either way it's not too complicated to implement. printf ? Well, you're on ropier grounds here, but you can easily knock up a little assembler routine to call XOS_WriteC or whatever and use that. We'll come to SWIs in a little bit. malloc ? yeah, that's a tricky one. Create a heap at the top of your BASIC stack (hell, JFShared does this and it's not so complicated). If you're writing a web browser then you've probably got your own little assembler veneers for memory allocation so you just reuse them - you probably don't even have to wrap them up in much; they'll probably already be compatible unless they return results in r14 or r3 or something. * You can't just reuse functions from the standard C usage in a different form. If you think that redefining memcpy to draw a rectangle you may find that calling it still copies some memory. The compiler can make some assumptions. If you find things going a little odd and you've redefined a standard function differently, check this before going on. * You may have to provide some 'standard' functions yourself if you want to do certain things. Division, for example. This is implemented by a separate function, so you'll have to provide that function if you want to divide two numbers. No big deal for a lot of things 'cos you can easily avoid division when doing a lot of things in your code anyhow. Similarly, you may find that structure assignments (copying a structure to another structure) will invoke a function (_memcpy, IIRC) if the copy is large (otherwise it'll do the copy inline). * You don't get abort handling. If you access invalid memory, you get a data abort. You don't get a stack backtrace. If you want it, you write it. But that shouldn't matter too much, 'cos you didn't use the C you'd be using assembler and be in the same state. You could always track back - the method is the same as if you were in C; there's some old backtrace bits I posted up at http://homepage.ntlworld.com/justin.fletcher/SCL-BackTrace/ which I put up for Chris Bazley quite a few moons ago. You'd need to rip out bits from that, but you could have something there. * You don't get allocations of zero-init data areas. This may not mean much to you, but avoid leaving areas unassigned if possible. That is, a global variable declaration such as : char buffer[256]; would allocate 256 bytes of zero-initialised data. However when you get your binary image this space won't actually be allocated in the area, so it'll corrupt anything after the allocated space if you tried to use it. In this case, you could use something like : char buffer[256]=""; to avoid this. Just remember to initialise your declared non-local variables and you should avoid this problem. So having gone through what you can't do, lets look at a problem that we've caused by building the code as '-bin'. Consider the simple code fragment : ---- static const char *version="v1.00 (03 Sep 2003)"; const char *get_version(void) { return version; } ---- Nice and easy, don't you think ? Yes ? Well. It is, but this is binary code we've built. So it's all based around being run from 0 (or from -base if you had specified that when linking). Look at the bin produced by this code. http://homepage.ntlworld.com/justin.fletcher/CGuide/4/ has this bit done for you. Note that this is without the polygon bits I was doing earlier. You'll notice first that the code isn't at the start of the file. That's one minor thing - if you used a little veneer like the BASIC example we had then this won't matter so lets ignore that for now (we'll throw one around it in a second). However, the code that loads the instruction merely returns the value at address &24 back to the caller. This isn't right - none of the code is relocated. What's this mean ? Surely ARM is generally relocatable ? Well, most of it is, but absolute references to memory aren't. Any references to global or static data will be referenced through absolute addresses. So in this case, the absolute address is &24 because we've not relocated it for where it will be loaded in memory. You probably won't care about the mechanisms and the method in which addresses are constructed so I'm not going to go in to this. It's not hard, but not worth it either. Suffice to say we need a relocation to be performed before the code can be used. And this can be achieved quite easily. We pretend the code is a relocatable module. Relocatable modules have their relocation code appended to their end, along with the logical addresses at which they will be relocated (a list of /what/ needs relocating). This is quite simple to achieve because it just means we need to link it as a module and then call the relocatable code before we use the rest of the object. Which is about the time at which we need to rethink how we provide our assembler. If we want to jump lots of little lumps of C code then it will be easier if we have a table of branch points at the start of our image. This way we can just do something like... CALL mc%+8 (or even have the 8 as a symbolic constant which will be compressed away by the BASIC compression tools) to call the 3rd entry point in our C code. So, that's what we'll do. We'll make the following entry points available ... +0 - relocation entry point for our code +4 - return the version of this code So, we have a little bit of assembler that does this : ----- s.asm ; ; Veneer for our C code ; AREA |!!!!First|, CODE, READONLY ; Start of the output file B doreloc B call_get_version IMPORT __RelocCode doreloc STMFD sp!,{r14} BL __RelocCode CMP r0,r0 LDMFD sp!,{pc} IMPORT get_version call_get_version STMFD sp!,{r14} BL get_version CMP r0,r0 ; clear V (don't care about other flags) LDMFD sp!,{pc} END ----- This is really quite simple - notice the same veneer for the BASIC interface to that which we used earlier. The branch points have been placed at the start so that they're based at the beginning of the image. Once we link these bits of code together we get an object that is typed as a module. In my example makefiles I settype this as Data so that if its double clicked you get a message telling you that nothing handles it (or in my case, it loads into Zap), rather than that the module is corrupt. Not an important thing, but it might be frustrating if the header for the module didn't look like it was corrupt and it blew the machine up. In order to test this little bit of code, we can reuse the code that loaded our bin object. Immediately after that load, we call the relocation code. ----- REM Relocate it CALL mc% ----- and finally print out the message : ----- REM Now, lets test the code msg%=USR (mc%+4) PRINT"Version: "; REM Remember, this is a 0-terminated string, not CR-terminated WHILE ?msg% <>0 VDU ?msg% msg%+=1 ENDWHILE PRINT ----- The note is important - you'll probably be handling strings that are zero-terminated in C. This makes life easier should you finally transfer the bulk of the code to C-proper eventually. However, it may make things more complex for your BASIC that expects CR terminated strings. Alternatively you could deal with all strings as CR terminated in your C code and make BASIC's life easier. That's really a design decision you'll have to make yourself. Yup, you've guessed it... you can find the bits up to now at : http://homepage.ntlworld.com/justin.fletcher/CGuide/5/ ===== Let's quickly look at the assembler veneers we've got there... they're bulky and awkward. So, we'll make a little macro of them. We can do this quite simply with a little thought. It's important to make the veneers simple so that we're not going to end up getting more tied up in ourselves than we need be. I've taken a slightly complicated, but (I think) quite sensible route of allowing the veneers to take up to 8 parameters (A%-H%) and to pass these on to the C code as necessary. Rather than cite it here and then give you a blow-by-blow description of it, just look at : http://homepage.ntlworld.com/justin.fletcher/CGuide/6/ It's not much different to /5/ but it makes the assembler simpler and will be much more usable if you were to have quite a few veneers to provide. ===== At this point you're probably getting to the point where you'd wished I'd not bothered with this 'quick' guide. However, there's a couple of further things to point out about the use of C like this. If you've got your hands on the ARM cookbook then you may have seen it before. The __swi extension is specific to the Norcroft compiler and can be used for debugging and real code to save time. It must be used with care, but it can save you a lot of effort in re-writing veneers. It's not useful for all cases of SWIs so it doesn't remove the effort completely but it makes things a lot easier in some cases. The syntax for this extension is simple : __swi() (nitems = counted; } ---- This is simple but can be made even simpler. Or more complex, depending on how you think of it. The Norcroft extension __global_reg can be used to declare that a register /always/ contains a variable value. Code compiled like this can be more complicated in some regards but it can get you some performance gains as well as making it look cleaner to the eye reading the code produced. In our case, you might want r7 to always contain a pointer to some global structure. That's H% from your BASIC code. That restricts the number of parameters you can pass, but you also shouldn't need as many parameters. You might achieve this something like this : ---- DIM globalmem% 8 H%=globalmem% REM +0 = list% value REM +4 = number of items in list (after sorting) DEFPROCsort_list(list%) !globalmem%=list%:REM Set up the global memory for the C code CALL mc%+mc_sort% list%=!globalmem%:REM Read back the value from the C code PRINT "Sorted ";globalmem%!4;" items" ENDPROC ---- Notice that H% is initialised at the start of the code and we assume it is never modified in the BASIC. ---- typedef struct globalmem_s { listdata_t *list; int nitems; } globalmem_t; globalmem_t __global_reg(4) *globalmem; void sort(void) { int counted=0; /* Stuff... */ globalmem->nitems = counted; } ---- Here you see that there isn't any parameter passed to sort; globalmem is treated as a global value which does not need to be loaded from memory. The register is used directly instead. I've not done you an example for this one. You might like to play with it yourself. It's really quite There is a further method of sharing the data between BASIC and C - implement a way to call the BASIC functions off the r14 to read the variables directly. However this is a little more complex and deserves a whole article in its own right :-) ===== Ok, I lied. This is the finally. When you're debugging your code you might find that leaving the function signatures active is a good idea - you can see which function is where and what you're doing when things go bang. When you're done, I'd like to suggest that leaving them in is useful even in the field. You might disagree, so use -ff to remove them from production code. Up to you. I prefer them being left in. And also the methods described in this document also apply to linking C code in to assembler-based code. You just write it in your C with nice structures and then stick a few veneers between your BASIC and Assembler to make it all gel nicely. ===== Obviously this post will be complained at for being overlong and I wouldn't doubt that there are technical inaccuracies in it, so if you want to shoot it down over these things, feel free. Probably typos and gramatical mistakes too. You could complain about them as well, but you'd do better to complain about the technical bits, 'cos otherwise it's a bit pointles this being csa.programmer and it might as well be csa.pedants. Apologise to all who didn't care... I ramble and I don't describe things well, so feel free to complain about that, too :-) -- Gerph {djf0-.3w6e2w2.226,6q6w2q2,2.3,2m4} URL: http://www.movspclr.co.uk/ ... So I will light a candle for you. Keep it burning in the night. And pray that you are alright.