DASM
DASM - DCPU-16 assembler

Introduction

DASM is a DCPU-16 assembler, disassembler and emulator kit. The DCPU-16 instruction set v1.1 is implemented, with a few extensions. This is not a macro assembler, yet. The assembler supports the syntax given as examples as part of the specification, with a few extensions.

Usage

To assemble a file, use:

./dasm.pl -o <output>.dump <input>.dasm

The dump format produced is a linked textual dump, in the same format as that given in the DCPU-16 specification.

To disassemble a file, use:

./dasm.pl -o <output>.dasm <input>.dump

The output disassembly file should be able to be reassembled. The argument to -o may by an extension, prefixed by a '.' character, to display the output.

For example, to display a disassembly of a file, use:

./dasm.pl -o .dasm <input>.dump

The internal structures used by the file can be displayed by omitting any output:

./dasm.pl <input>.dasm

To execute the input files, add the '-run' option to the command line, for example:

./dasm.pl -run <input>.dasm

Execution of the code begins at address 0.

There may be multiple input files on the command line which will be assembled in order.

Source assembly file format

The source assembly format is given as an example in the DCPU-16 specification. Operands (called 'values' in the specification) take the form specified in the specification, with the exception that the [SP] type specification is not supported (PUSH, PEEK and POP are far clearer).

The following format is expected:

Directives

Directives indicate to the assembler how the code should be processed. The following directives are supported at present:

Special opcodes

Special 'opcodes' provide shortcuts or necessary additions to the assembly format. The 'opcodes' supported at present are:

Expressions

Anywhere that a value is required, an expression may be used. Expressions can include brackets, but do not obey precedence other than the use of brackets themselves. (Poor code on my part - maybe I should have thought more carefully about it). Expressions may use symbols as parts of addition. For example, the following code fragment is valid:

    SET pc, label+1
:label
    DAT 0
    SET pc, pc        ; continues from here, skipping the 0

Expressions can use the following arithmetic operators:

Operator Meaning
+ Addition
- Subtraction
* Multipliciation
/ Division
% Modulo
& Binary AND
\| Binary OR
^ Binary Exclusive OR
<< Left shift
>> Logical right shift

Address file format

Address file format is a simple memory dump format which can be used for input or output. It is based on the format used in the DCPU-16 specification. The start of any line can be specified as:

<address> :

This indicates the address that assigned values start at. Where no address is supplied, the values to be stored will continue from the previous address, or 0 if no address is given at the start of the dump.

Following any address, a whitespace separated list of hex values may be given, to be stored in ascending order from the address specified.

When output, values will be written in maximum of 8 word blocks, terminating at an 8-word address multiple.

This format is used as part of the regression test to check that values are assembled correctly.

Hardware profiles

Hardware profiles define the hardware devices which are attached to the CPU.

The hardware profile source file is a text file with lines in the following format:

<address> <device> [ ( <argument> [ , <argument> ]* ) ]

# <comment>

Hardware devices

Hardware devices are implemented in the DExecHW directory as modules. Device modules are never aware of the addresses they have been mapped to by the hardware profile. They only deal in offsets from their base. This allows them to be relocated by the hardware profile as necessary. Each module provides an object interface with the following methods:

window()

read($offset, $dexec)

write($offset, $dexec, $value)

symbol($offset)

start($dexec)

stop($dexec)

poll($dexec)

When loaded, each device is assigned a symbol which may be used by code later. Symbols take the form:

<device name>_<symbol name>

Each offset within the window is iterated over for the device and symbols generated for the offsets.

Emulator

The emulator is invoked when the -run switch is added to the command line. Execution (at present) begins at address 0. All registers are initialised to 0. Execution terminates when the PC remains constant from one instruction to the next (eg a :here SET PC, here).

At the end of execution the number of cycles, instructions and timings are printed.

Option Meaning
-notimes Disables the display of timings (mostly for the regression tests, as timings would obscure the diffs)
-nocounts Disables the display of cycle and instruction counts (mostly for the regression tests that generate a display)
-showregs Adds the final registers to the output
Using the option twice will force the registers to be displayed before each instruction is executed.

Register dump

The register dump includes the CPU number; which in the present implementation will always be 0. For example:

#0  A: b520  B: 0000  C: 0000  X: 0000  Y: 0000  Z: 0000  I: 6ff1  J: b520
#0  O: 0000 SP: ffff PC: 0015 [01e1]
#0 STK> [0011]

Shows the registers for CPU 0. Immediately after the PC is printed, the value of the next instruction is shown. This is followed by NX if the instruction will not be executed.

The final lines show the values on the stack, from lowest location upwards, up to the top of memory (0xffff).

Structure

The assembler (and disassembler) are an IO handler provided in the DASMIO directory. These modules are able to manipulate the core in the DASM objects in order to provide details. There's a lot of collusion in the code and I've not tidied it up as much as I'd like. Ideally the DASM module should provide methods to access its innards, rather than the IO modules poking them directly.

Additional file formats can be created by adding new modules in the same directory. In order to read or write different file formats the module should store values (or read from) the core and update the symbol table and relocations hashrefs to contain relevant data. I've only supported very simple additive relocation, which should be sufficient for partial linking of relocatable code segments (however this hasn't been started yet - it might need more thought).

Although the intention is that symbols be able to be relocated (for address symbols) or fixed (for constant symbols), the relocation has not been implemented, and the definition of constant symbols is not directly usable.

The nesting of include files are tracked so that errors are reported by the files that caused them and those that included the files.

The internal implementation of the emulator allows for multiple CPUs to be executed, however this is not yet exposed completely to the command line interface.

Example code

Some small example code is provided in the 'test-*' directories, which provide a regression test for the assembler, disassebler, dumper, binary file loader and emulator. In order to test the code, use:

   make tests

The individual tests can be executed with:

   make tests-<type>

See the Makefile for more details.

'Comparison' tests

In particular, some of the 'comparison' tests from:

http://0x10cwiki.com/wiki/Comparison_of_Developer_Tools

have been included in the test-comparison directory. These are not covered by the license of this assembler. The script test-comparison/test.pl invokes these. It includes, as part of its file prologue comment, a description of the initial failures which were found due to those tests, and the changes made as a result.

In particular, I disagree strongly that the division by zero is a fault. See the talk page for details:

http://0x10cwiki.com/wiki/Talk:Comparison_of_Developer_Tools

Future developments

A future version of the assembler may provide the following features:

I have only looked sparingly at other implementations, intentionally so that I can concentrate on what I feel might be interesting or useful.

License

The DASM assembler kit is released under the simplified BSD license because it provides the greatest degree of freedom in using the source, unlike more restrictive licenses such as the GPL. Hopefully others will feel the same way and keep the code free.

Copyright (c) 2012, Justin Fletcher
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met: 

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer. 
2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution. 

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.