Chapter 1: Processor and Assmebler Overview
 ===========================================
  
  
   Basic ROM structure
   -------------------
    
    We need to get familiar with the assembler and rom layout. The assembler has special
    directives that help facilitate the rom layout. This tutorial assumes you know some
    65x or 6280 assembly, or at least some assembly language in general. If not, it would
    be a good idea to have a document with all the 'opcodes' handy to follow along.
    
    Be prepared to be going back and fourth between the intro section and here to get 
    familiar with the architecture and quirks. The first thing we need to do is setup our
    boot code for bank $00. This is because it's the only bank that is mapped on startup.
    
    A quick run down of some symbols:
    
      '$' denote a hex value
      '#' denote an immediate value
      ';' used for comments
      ':' used to create an address label
      '.' used for marking a directive
      '=' shorthand for equate
      '%' denote a binary value
      '<' denote a ZP register
      
    Don't worry if you don't know all the terminology, I'll cover some of this as we go on.
        
    
    
    
    ;#listing1.asm
    ;------------
    
      .bank $00     ;tell the assembler the following code/data will be in bank 00
      
      .code         ;this directive tell the assembler that we are specifing code in this area. 
                    ;It's not terribly important to use this directive, but why not.
      
      .org  $e010   ;set the assembler to use this address. This also effects the 
                    ;address in the rom.
                    
    start_up:   
    
      sei           ;temporarily disable ALL interrupts. SEI = set interrupt disable flag
                      
      lda #$00      ;these aren't doing anything, just some example instructions. Filler as it were.
      ldx #$01      ;
      ldy #$02      ;
      
      csh           ;set the CPU to 7.16mhz mode. The default on boot is 1.79mhz mode.
                    
    loop:           ;create a label in the address range
      jmp loop      ; 'jump' to loop. This is just an infinite loop.
        
    ;end
    
      .org $FFFE

      .dw $E010




    Bank $00 tells the assembler that we are working in external address range $0000-1fff. ORG
    tells the assembler where in CPU logical memory this code(and bank) will be mapped to. If
    you don't specify a logical address, the assembler will use either $0000 or some continuation
    of a previous address. 
    
    'ORG' does something else, too. While it does setup the logical address, it also effects the
    address with in the BANK range. Remember how the PCE uses banks of $2000 bytes? ORG $E010 will
    put our example code at $0010 in the rom. So the value of ORG with logical AND $1FFF also sets
    the destination address in the rom (or CDRAM if there were a CD project).
    
    Here's another example of ORG:
    
      .bank $04
      .org $D123
      
    Where would the code or data start? BANK $04 is address $8000 ($04 x $2000). Take the ORG address
    of $D123 and AND it by $1FFF ($D123 AND $1FFF). We get $1123 and adds this to the external address
    ... and we get $9123. This is where our code or data will start. At this point it's not important
    to know the offset in the rom that ORG effects, just that it does effect more than just the 
    logical address range. It's something to keep in mind. For now, we're really just interested in the
    logical address range since we are creating labels and such.

    Onto the rest of the explanation. The CODE directive just tells the assembler to expect code from
    this point onward. Alternatively, there's also the DATA directive and does just the opposite. For
    the most part, it doesn't have a great effect on the operation of the assembler until you start
    'including' binary data into a project and such.

    Our first instruction disables all interrupts for the moment. We don't know if any are pending and
    sure don't want them to initiate before we've had a chance to set everything up - most importantly
    RAM ;)
    
    The next three instructions just load some values in the cpu registers. Nothing relevant, just some
    filler code to make it look prettier. The 'CSH' instruction sets the CPU to high speed and 'JMP' 
    just does an infinte loop onto itself. This is technically a legal rom, but nothing is initialized.
    Not ram, stack, video, interrupts, or sound.
    
    A quick note on the three filler instructions. Notice the use of '#' symbol. This means we are loading
    an immediate value into a register. What's an immediate value? A hardcoded number that is assembled
    into the rom for the instruction. Since the three general registers are 8bit, you can load a value
    from $00 to $ff into them. If you don't specify a '#' in front of the value, the assembler thinks you
    are trying to load a value from 'memory'. 
    
    If I had used LDX $01, the assembler would think I was trying to load a value from $0001 - a memory
    location of the CPU's logical address range. It's not a good idea to use that short hand for $0001,
    so make sure *if* you're loading from an address, to write the whole address out. A quick run down
    of load register instruction:
    
      ldx #$01    ;load the immediate value of $01 into the register
      ldx $01     ;load a byte from the address of $0001 (value unknown - whatever is at $0001 at the time)
      ldx $0001   ;same as above, but the correct way to write it ;)
      ldx <$01    ;load a byte from ZP register $01 (value unknown - whatever is at ZP reg $01 at the time)


    Now for JMP loop. JMP instruction can jump anywhere in the 65335 byte CPU address range. The value for
    the jump instruction is an immediate, but we've used a label instead. The assembler will convert that
    label into a 16bit(2 bytes) immediate for the instruction at assemble time. This takes a load off our
    backs since we don't need to sit there and count bytes in order to know where to jump to. Having the 
    assembler create a label as a mnemonic for an cpu address and then allowing us to use it in an
    instruction is... well keeps us from going insane. 
    
    Labels aren't just used for code either, you can use them for data as well. You can also use them with
    instructions like 'load register' and use them to define an address in ram (you know, to write something
    too). A label pointing to ram basically becomes a 'variable', if you're familiar with other programming
    languages. A label in rom address is like a read only static label in C.
    
    Now, we have a second ORG usage at the end of the listing. I'm basically telling the assembler to skip
    to the end of the bank by specifying $FFFE ($1FFE in the rom). The 'dw' directive tells the assembler
    to declare a "word" value in the rom. $FFFE (or $1FFE) is a special location for both rom and cpu
    address range. This is where the 'reset vector' lives. The cpu needs this for boot or reset. I put
    value $e010 because that's the address I want the cpu to jump to on startup. Now for something a bit
    more confusing; I could have used a label instead. See that label "start_up:"? I could have put that 
    label after .dw instead of the actual address. This would make things more automated and allow me to
    make changes easier - if they effect the start/boot address.
    
  
    So let's run through the example code real quick.
    
      -On startup: the cpu loads bank $00 to MPR7 (address range $E000-FFFF)
    
      -MPR7 = 00, all other MPR's are setup with random values.
    
      -The cpu 'jumps' to the address located at $FFFE which is $e010.
    
      -The first instruction at $e010 is SEI. Disable interrupts.
    
      -next load some values into registers A,X, and Y.
      
      -next set cpu speed to high
      
      -last execute jump instruction, which jumps to itself
      
      
    That's it. To assemble the listing, use the following: "pceas listing1.asm". You can
    run the rom through mednafen's debugger and see it in action. To do this, open the rom
    with mednafen, press alt+D, press "s", press "F10". This will put you at the beginning
    of the rom. Use "s" to single step through the instructions.


   
   
   Initializing the system
   -----------------------

    So we've created a rom that jumps to an infinite loop, but we really haven't setup the
    CPU yet. To do this, we need to cover the CPU basics: stack, ram, harware bank.
    
    First the stack. The stack is a special area in RAM that values are saved to. One can
    manually save to the stack and other times the CPU saves to the stack. What's actually
    saved to this area of memory? For the most part CPU registers. Let's look at an example.
    
    
      lda #$10      ;load the A register with the value of $10
      pha           ;PUSH the value in register a onto the stack
      lda #$20      ;load register a with the value $20
      pla           ;POP the latest value from the stack and drop it into the A register 
      .             ;Register A now contains $10, not $20. Value $20 is lost forever... ;)
      .
      
    While that code is totally useless, it doesn't really do anything, let's look at what's
    happening more indepth. The stack is 256 bytes long. The stack 'pointer' is an 8bit 
    register. When a value is 'pushed' onto the stack, the cpu writes the value to the area
    in memory for the stack, pointed∂ƒ by the stack register. The stack register is then updated
    to point to the next place in the stack ram or buffer. PHA wrote the value in the A register
    to the stack. We did this because we wanted to quickly save that value, and then when we 
    wanted to retrieve that value form the stack, we used PLA. How does PLA know what value to
    pull from the stack? The answer is LIFO (last in, first out).
    
    
    The stack is at address range $2100-$21ff. The stack pointer starts at the top and works
    its way down. The stack pointer is initialized at $21ff. If we push register A onto the
    stack, then the stack pointer decremented to $21fe. Decremented by one since reg A is only
    1 byte.
    
          ------
    $21ff |$10 |  <- we push value of reg A onto the stack
          ------
    $21fe |    |  <- now the stack pointer register is decremented and points here.
          ------
    $21fd |    |
          ------
    $21fc |    |
          ------
    $21fb |    |
          ------
            .
            .
            .
            .
            .
            .
          ------
    $2100 |    |
          ------
            
              
    The old analogy of explaining the stack is to visualize a stack of dishes/plates. For this
    analogy to work, lets flip the stack upside down. Think of $21ff as the bottom, and $2100
    as the top. Everytime we want to store a value, we put a plate on the stack. And when we
    want that value back, we pull the plate off the stack. The tricky thing about the stack
    is that you have to pull the plates back off the stack in the opposite order you put them
    on there. 
    
    Let's push two different values onto the stack. If we want to get back the first value we 
    pushed onto the stack, we need to pop off the second/last value before we can get to it.
    That seems a bit absurd, doesn't it? And it might be, but that's how it works. The programmer
    needs to be careful when manually saving values to the stack and keep track of the *order* of
    its usage.
    
    So we know a little about the stack and how it works (hopefully), but how do we initialize
    the stack? The stack pointer, referred to as SP from here on, can be 'transfered' back and
    fourth between the X register and itself. We use the X register to manually change the SP
    value. Advance programming, one could manipulate the stack for different usage, but we'll
    keep it simple.
    
    Here's how we initialize the stack:
    
        ldx #$ff
        txs
        
    We load the X register with the immediate value of $FF. TXS is "trasnfer X to stack pointer"
    and it does exactly that. SP is now #$FF. The stack is fixed at cpu location $2100. Since the
    SP is an 8bit value, it's added to $2100 to make the full address range of $2100-21ff. We call
    this 'indexed'. More on this later ;)
    
    We covered manually using the stack, but there's an even more important task of the stack. The
    automated usage of the stack. Interrupts and subroutine calls. Both of these *need* a functioning
    stack in order ot operate. They handle pushing and poping values on the stack themselves. Let's
    move on to setting up ram.
    
    Since we're working in a hucard project, we need to map the only available ram in the system to
    a special address range of the cpu. If you guess the area of the stack, you guess correctly :)
    
    Base ram on the PCE is in the external address range $1F0000-1F1FFF. That doesn't really help us
    at all. We need the bank number. $1F0000 / $2000 = $F8.. or I could've just told you $F8 to begin
    with. Yes, bank $F8 is the system ram - all 8k of it. To map it to the address range we need it in,
    we use MPR1.
    
    Let's look at the CPU address range and the MPR pages again:
    
      $0000-1fff  MPR0
      ----------      
      $2000-3fff  MPR1  
      ----------
      $4000-5fff  MPR2
      ----------
      $6000-7fff  MPR3
      ----------
      $8000-9fff  MPR4
      ----------
      $a000-bfff  MPR5
      ----------
      $c000-dfff  MPR6
      ----------
      $e000-ffff  MPR7
    
    There are eight MPR registers. MPR7 is mapped for us on startup to BANK $00. We're good to go on that,
    but we need to setup the rest of them. The first MPR we're going to setup is RAM BANK $F8.
    
    
    Here's how:
    
      lda #$f8
      tam #$01
      
      
    Simple, right? I knew you'd think so ;) #$F8 is loaded into the A register, then transfered to MPR1
    with TAM (transfer A to MPR reg). Now ram is mapped to $2000-3fff. Excellent. There's another step to
    setting up ram (isn't there always?). We need to clear it. You see, there is no bios in the PCE to
    do such things. Introducing TII, the block transfer instruction. We're going to use it to zero out
    the ram area.
    
      
      lda #$00
      sta $2000
      tii $2000,$2001,$1fff
    
    
    Two new instructions. STA is store A register. This stores the value in the A register to a memory
    location and $2000 is the very first byte in ram. TII is a block transfer instruction. TII is Transfer
    Increment Increment. The first address is the source, the second is the destination, and the third is
    the length of bytes to copy.
    
    Quick TII explanation.
    
      Grab byte from source ($2000)
      store byte to destination ($2001)
      add 1 to source and destination
      subtract 1 from length, if length is less than 0000 then stop, else continue.
    
    The TII instruction writes all 00's to the 8k of ram. Ram is now initialized.
    
    
    The last thing on the list to initialize(for now) is the hardware bank $FF. The hardware bank is the area
    of memory that Hudson reserved for mapping ports to memory (don't worry if you don't currently know what
    that means). If we want to access the other hardware of the system, we'll need to map this bank. This is 
    handled the same as what we did for the RAM bank.
    
      
      Mapping hardware bank:
      
        lda #$ff
        tam #$00
        
    It's customary, but not necessary, to map the hardware bank to $0000 PAGE. Let's look at our memory map
    now.
    
    
      $0000-1fff  MPR0 - bank $FF (ext address $1FE000)
      ----------      
      $2000-3fff  MPR1 - bank $F8 (ext address $1F0000)
      ----------
      $4000-5fff  MPR2 - random value
      ----------
      $6000-7fff  MPR3 - random value
      ----------
      $8000-9fff  MPR4 - random value
      ----------
      $a000-bfff  MPR5 - random value
      ----------
      $c000-dfff  MPR6 - random value
      ----------
      $e000-ffff  MPR7 - Bank $00 (ext address $000000)
    
    
   
    
    Now let's put this all together.
    
    
    ;#listing2.asm
    ;------------
    
    
       .bank $00
       .org $e000
       
     start_up:
     
       sei                         ;disable interrupts
       
       lda #$ff                    ;initialize SP to $FF
       txs
       
       lda #$ff                    ;map hardware bank to MPR0
       tam #$00
      
       lda #$f8                    ;map ram bank to MPR1
       tam #$01
       
       lda #$00                    ;clear the first byte in ram
       sta $2000
      
       tii $2000, $2001, $1fff     ;zero out the rest of the bytes in ram 
      
     loop:
      
       jmp loop                    ;do our infinite wait loop
      
      
      
       .org $fffe                  ;skip to the end of bank $00
    
       .dw start_up                ;setup the reset vector to point to our start position
                                   ;using our convenient label.
      
     ;#end
    
  



   HuC6280 instructions 
   --------------------

    It's probably a good idea to go over some of the CPU's instructions and registers. This
    is by no means a replacement for a 65x or 6280 instruction doc. We'll review some of the
    common instructions and how they translate into opcodes.


  (PC REGISTER)

    I've talked about the SP register and a little about A/X/Y registers, but need to start
    from the beginning - The PC register.
        
    The PC (program counter) register keeps track of where the processor is in the 64k address
    range. That is, where the processor is executing code from. This register is 16bit (hence
    64k address range) and can not be directly written or read. There's isn't really a need to
    do this, but with some clever code it is possible to obtain its value.
    
    The PC register points to an address in the 64k logical address range. Each instruction
    it made up from a series of bytes. When the processor executes an instruction, the number
    of bytes is added to the PC register. This moves along the processor to the next instruction
    so on and so fourth.
    
     Let's look at how some instruction effect the PC register:
     
        PC=$e000:  sei          ; SEI opcode is 1 byte in length, so 1 is added to the PC
        PC=$e001: lda #$f8      ; LDA immd opcode is 2 bytes, so inc the PC by 2
        PC=$e003: tam #$01      ; TAM immd opcode is 2 bytes, inc PC by 2
        PC=$e005: lda #$FF      ; LDA immd opcode 2 bytes, inc PC by 2
        PC=$e007; tam #$00      ; etc
     
     
    We can see the PC being incremented as it loads the opcodes. Opcode is an instruction in binary
    form. You can view them in hex form as well. Opcodes are the actual CPU instructions converted
    by the assembler. With PCEAS opcodes and instructions are 99.98% 1:1. This means the mnemonic
    we use in the assembler almost always translates directly to the cpu opcode. This isn't always
    the case with other assemblers. Some assemblers have pseudo instructions that when assembled,
    are converted to a two or more opcodes. Thankfully we don't have to worry about that. Mnemonic
    is the text form of an opcode that we use in an assembler. Usually abbreviations of or shortened
    english words. 
      
      Mnemonic    Opcode
      --------    ------
      lda #$ff    $A9 $FF
      sei         $78
      jmp $e010   $4C $10 $E0 

    
    Just look at those babies :D Assembly language is a beautiful thing. Now onto branch instructions.
    
    There are two methods of jumping off course - so to speak. Branch instructions allow us to make 
    small jumps of 128 bytes either forward or backwards in the CPU address range. A long branch,
    labeled as jump, allows the processor to make long jumps into the entire logical address range.
    
    The second method are 'calls'. Calls allow the processor to jump to another area, execute some
    code, and return right back to where it was originally. Calls are referred to as subroutines and
    are 'jumped' to with the JSR instruction (Jump Sub Routine). JSR is just like JMP, but it takes
    the PC, adds +3 to it, then saves it to the stack (clever, I know). Why +3? Because that's the
    length of the JSR opcode - 3 bytes. When you 'return' from a subroutine, you want to return to
    next instruction after JSR. See how that works? Let's have an example.
    
        
        lda #$50            ; load reg A with immediate value $50
        jsr put_value       ; jump to a subroutine label "put_value"
        
        lda #$20            ; A = #$20
        jsr put_value       ; call subroutine
        
        lda #$99            ; A = #$99
        jsr put_value       ; call subroutine
      
      loop:                 ; our infinite loop label
        jump loop           ; do that infinite wait loop
        
        .
        .
        .
        .
        
        
      put_value:            ; our subroutine label
        sta $2001           ; all our subroutine does is store whatever is in A to address $2001
        rts                 ; Ahh.. a new instruction


    The JSR tells the processor to jump to put_value address. The code in the put_value routine
    doesn't do much since we haven't explored some of the other instructions. Notice the RTS
    instruction. RTS is ReTurn from Subroutine. This instruction pops the save PC address from
    the stack and loads it into the PC register.


    Let's review. JMP instructions tells the processor to jump to a different address - anywhere
    in the 16bit address range. JSR tells the processor to jump to a different address, again
    anywhere in the 16bit address range, and then return back with the RTS (return) instruction.
    There's something to be cautious of. All JSR calls must have an RTS instruction down the line.
    If not, the SP won't be decremented back to its original index/position before the JSR call
    and you also run the risk of 'overflowing' the stack, i.e. "stack overflow". That would be bad.
    As your programs and projects grow and become more complicated, you'll have multiple layers of
    JSR calls or 'nested' calls. And using an RTS when a JSR wasn't issued will also corrupt the
    program code. In other words, you can't use a JMP and then an RTS, and every JSR executed
    requires an RTS. 



  (STATUS REGISTER)

    Next we'll look at the status register. This register contains 8 conditional flags. Each flag
    is set depending on a specific condition that happens in the processor. Not very descriptive,
    is it? It's probably a good idea not to go too in depth with this register. And on that, let's
    look at the most commonly used flags.

    STATUS register, or 'P' for processor, is an 8bit register. Here's a layout of the register:
    
      D7 D6 D5 D4 D3 D2 D1 D0
      ----------------------- 
      N  V  T  B  D  I  Z  C
      
          
          C = carry flag
          Z = zero flag
          I = interrupt enable/disable flag
          D = decimal mode flag
          B = software interrupt flag
          T = special register mode flag
          V = overflow flag
          N = negative flag

    The C and Z flag are the ones we're mostly going to be discussing. We can't really talk about
    these flags without bringing in some other cpu instructions first. The Z flag is probably the
    easiest one to understand so we'll start with that. 
    
    The Z flag is set when a value (usually in a register, but not always) equals zero, hence the
    name. To show how this flag works, we'll need to bring in some arithmetic instructions. Let's
    start with INC(increment) and DEC(decrement). INC and DEC add 1 or subtract 1 from a register
    *or* value in memory. We'll use an example with a register for simplicity.
    
        
        lda #$01      ; load A with immediate value $01
        dec a         ; subtract 1 from the value in A and store the result in A
                      ; A now is 00
                    
    When register A went from 01 to 00, the Z flag was set. Let's see INC in action.
    
    
        lda #$ff      ; load A with immediate value $FF
        inc a         ; add 1 to A and store it in A
                      ; A is now 00
        
    This might require a bit more explaining. Whenever a register increments to more than what it
    can hold, it rolls over. Like an Odometer in a car, it can only hold so many digits before it
    rolls over. The largest possible value for 8bit is $FF, thus FF + 1 = 00 and the Z flag is set. 
    
    Let's do some more examples. (?=unkown to us)
     
                      ; Z = ?    
        lda #$01      ; Z = 0
        dec a         ; Z = 1 

                      ; Z = ?
        lda #$ff      ; Z = 0
        inc a         ; Z = 1

                      ; Z = ?    
        lda #$01      ; Z = 0
        inc a         ; Z = 0

                      ; Z = ?
        lda #$ff      ; Z = 0
        dec a         ; Z = 0

                      ; Z = ?
        lda #$00      ; Z = 1
        inc a         ; Z = 0

                      ; Z = ?
        lda #$00      ; Z = 1
        dec a         ; Z = 0

    When a flag is 1 we call this 'set' and when the flag is 0 call this 'clear'. If you look
    closely at the examples, you'll notice that even loading register A with a value immediately
    effects the Z flag. Having a chart/list of all the cpu instructions along with what flags
    they set is a must.
    
    Let's look at some other instructions that effect the Z flag. In programming, there needs
    to be a way to 'compare' one value against another. The Z flag takes on a new meaning. The
    pair of instructions we're going to look at are CMP(compare) and BNE/BEQ (branch true/false).
    
    The CMP instruction takes the value from the A register and compares it to another value.
    If this value is the same, the Z flag is set, if not then the Z flag is cleared.
    
      CMP example:
                  
                      ; Z = ?
        lda #$05      ; Z = 0
        cmp #$05      ; Z = 1
        beq true      ; jump to 'true' because Z = 1 
    
          .
          .
          .
          
      true:   
          ...

    CMP compares the value in register A with the immediate value of $05. What CMP actually
    does is take the value from A, subtract the compare value, and discard the result. A is
    not effected, but P register is set accordingly. BEQ is branch if equal, but really it's
    branch if Z flag is set. Branch if equal mnemonic is just easier to process on our human
    brains. These Bxx or 'branch conditional' instructions are limited to + or - 128 bytes of
    "jumping". Not very far in the address range, unfortunately.
      
    I'm feeling confident. Let's try our first 'loop'. This will be an increment loop. I hope
    you're as excited as I am :D
      
      Loop example:
      
                      ; Z = ?
        lda #$00      ; Z = 1
        
      loop:
        inc a         ; Z = 0
        cmp #$05      ; Z = 0 if A != $05, Z = 1 if A = $05
        bne loop      ; jump to 'loop' if Z = 0

      done:
        jmp done      ; We're done.

    
    We load A with $00, then we increment it by +1, test to see if A has reached the value of 5
    yet, if not then jump back and increment A again. We do this until A equals 5, then the code
    passes on to the next instruction - the infinite jump loop. Notice the new conditional
    branch instruction? BNE is Branch Not Equal and branches if the Z flag is cleared. To recap,
    CMP sets or clears the Z flag depending on whether the compare was true or false. BNE/BEQ
    branch/jump depending on the state of the Z flag set be CMP.


    Now that we've got an idea of the Z flag, time to move on to the C flag. Like the Z flag, the
    C flag also has multiple purposes/functions. INC and DEC aren't the only add and subtract
    instructions in the cpu's instruction set. ADC (add with carry) and SBC (subtract with carry)
    are used for when more than '1' needs to be added or subtracted from a register. The CPU doesn't
    have a straight add or sub instruction, so we need to manually set the condition of the carry
    flag *before* using ADC/SBC. The two instructions for setting or clearing the C flag are SEC
    (set carry) and CLC (clear carry). Example time.
    
      Add with carry example
      ----------------------
      
                      ;8bit + 8bit arithmetic
      
        lda #$05      ; load 5 into A
        clc           ; Clear the carry flag
        adc #$05      ; add 5 to register A and store it back to register A

    
    When using ADC to add an 8bit value, the C flag is added into the mix. We don't always know
    the state of the C flag, so clear it to make sure. The logic of the above code looks like
    this: A=5+5+0. If the carry flag was set, it would look like: A=5+5+1. Hopefully that isn't
    too confusing. If the state of the C flag is set, then 1 is added into the arithmetic and 
    0 if cleared. Another tidbit about ADC is that not only does it include the C flag value
    into the 8bit arithmetic, but it will also *sets* or *clears* the C flag upon the output.
    
    
                      
        lda #$05      ; C = ?
        clc           ; C = 0
        adc #$05      ; C = 0, A = $0A
            
        
        
        lda #$f0      ; C = ?
        clc           ; C = 0
        adc #$10      ; C = 1, A = $00

        lda #$ff      ; C = ?
        clc           ; C = 0
        adc #$05      ; C = 1, A = $04
        
        

    The C flag is set if the arithmetic result is greater than 8bit. C is the carry over from
    the 8bit+8bit ADD. We use the same 'carry' system when we do simple decimal addition.
    
      (1) 
        9
       +9
       --
        8

    9+9 result is larger the 'ones' place, so the 1 is the carry. So are we limited to 8bit
    results and variables? No. What do you do with the carry from 9+9? You add it to the 'tens'
    place. We do the same on the processor. This example we're going to store our results back
    into memory. (awesome!)
    
    
        lda #$ff      ; C = ?, A = $ff, $2000 = ??, $2001 = ??
        clc           ; C = 0, A = $ff, $2000 = ??, $2001 = ??
        adc #$05      ; C = 1, A = $04, $2000 = ??, $2001 = ??
        sta $2000     ; C = 1, A = $04, $2000 = 04, $2001 = ??
        lda #$00      ; C = 1, A = $00, $2000 = 04, $2001 = ??
        adc #$00      ; C = 1, A = $01, $2000 = 04, $2001 = ??
        sta $2001     ; C = 0, A = $01, $2000 = 04, $2001 = 01


    We have an 8bit + 8bit addition operation with a 16bit result. We stored the 16bit result
    in $2000/$2001. I guess it would a good time to tell you that I'm *assuming* you know what 
    hexidecimal is and and how bits relate to bytes/words/dwords and vice versa. If not, you're
    probably having a harder time following along. I would suggest obtaining some some docs or 
    tutorials on hex/bits/bytes/words/etc and then come back to this tutorial. Sorry 'bout that.
    For the rest of you that are somewhat and/or barely following along, onward march.

    Our 'carry' happens when the addition result is greater than 8bit (yeah, I keep mentioning
    that). If you look at the second addition part, you'll notice that we are adding 00+00!?
    Ahh, but we are also adding the carry flag. Since C is set from the carry over, we have
    00+00+C (or 00+00+1). We store this in the MSB. MSB is the high/upper byte of a word or 16bit
    value. 
    
    Because this is an 8bit processor and optimized for 8bit data elements, 16bit and 
    greater values are stored as multiple 8bit/byte values. So we label the lower half of the
    16bit value the LSB or least significant byte and the upper half MSB or most significant
    byte. It's somewhat rare to need a value larger than 16bit, but ADC allows you to support
    addition on a scale larger than 8bit. 8,16,24,32,48,56,64bit etc.

    Subtraction is the same process. The only difference with SBC is that we have to 'set' the
    C flag before our instruction. Like so:
    
        lda #$00      ; C = ?
        sec           ; C = 1
        sbc #$05      ; C = 0
        sta $2000     ; C = 0, $2000 = FB
        lda #$01      ; C = 0
        sbc #$00      ; C = 1
        sta $2001     ; C = 1, $2001 = 00

    For SBC, the C flag is reverse in function. Explaining the mechanism behind SBC is probably
    beyond the target of this tutorial. Just think that SBC is the opposite of ADC, so the C
    flag value should be as well ;) If you don't really understand, it's not that important at 
    this stage.
    
    We've seen how ADC and SBC are effected by the C flag. Now let's look CMP. Yup, CMP also
    effects the C flag. Remember when we used CMP to see if one value equaled another? With the
    carry flag, we can see if the value is less than or greater then. How convenient :D
    
                            ; C = ?
        lda #$01            ; C = ?
        cmp #$02            ; C = 0
        bcc less_than       ; C = 0, branch if C = 0
        bcs greater_equal   ; C = 0, branch if C = 1

    Remember, CMP takes the value in A and subtracts that from its 'compare' value. In decimal
    arithmetic you have to borrow from the next group if the amount you are subtracting by is
    greater than the target.
    
        (1)
        13
        -9
        --
         4

    So CMP sets the C flag on a subtraction 'borrow', and clears it for no 'borrow'. If C = 0
    then the value in A is less than, if C = 1 set then value is equal_or_greater than. In all
    this setting of the C flag, the Z flag is still being effected. Let's look again.
    
                            ; C = ?, Z = ?
        lda #$01            ; C = ?, Z = 0
        cmp #$02            ; C = 0, Z = 0
      * bcc less_than       ; C = 0, Z = 0
        beq equal_to        ; C = 0, Z = 0
        bcs greater_than    ; C = 0, Z = 0

                            ; C = ?, Z = ?
        lda #$02            ; C = ?, Z = 0
        cmp #$02            ; C = 1, Z = 1
        bcc less_than       ; C = 1, Z = 1
      * beq equal_to        ; C = 1, Z = 1
        bcs greater_than    ; C = 1, Z = 1

                            ; C = ?, Z = ?
        lda #$03            ; C = ?, Z = 0
        cmp #$02            ; C = 1, Z = 0
        bcc less_than       ; C = 1, Z = 0
        beq equal_to        ; C = 1, Z = 0
      * bcs greater_than    ; C = 1, Z = 0


    The asterisks in each example shows you which branch will jump. If we remember our branch 
    conditional logic, the branch instruction will 'jump' to an address depending on what the 
    instruction expects the state of 'x' flag to be (either set or clear). If the flag's state
    doesn't meet the requirements for the 'jump', then the branch (jump) is not taken and the
    cpu goes onto the next instruction. See how we have them setup in series? The processor
    will 'fall' through conditional branch instructions until it reaches on the that triggers
    or 'jumps'. Hopefully by this point you understand that the terms branch and jump are inter
    changeable. From here on, I'll probably refer to it as branch or branching most of the time. 
    
    We've covered the two most important flags. You can make quite complex projects without even
    touching the other flags. That's not to say you shouldn't learn them at a later point in time.
    With PC, P, and SP registers out of the way, let's move on shall we?



  (A/X/Y REGISTERS)

    We've used the A register quite a bit. We've done addition, subtraction, and compare examples.
    There's a reason for this. The A register is known as the 'Acc' or Accumulator register. All 
    the *main* arithmetic functions must be done with this register. The X and Y registers are 
    known as the 'Index' registers. They can do a little more than just 'index', but that is their
    main function on the processor. Let's see a quick run down of the registers.
    
      A reg: handles (larger) arithmetic, shifting, and logic operations.
      X reg: hangles inc/dec arithmetic and memory indexing
      Y reg: handles inc/dec arithmetic and memory indexing 

    All registers can use a CMP instruction. For the X and Y registers, the instruction
    name/mnemonic changes to CPX & CPY. This is important to remember. Using CPY and CPX will not
    compare a value to Acc. Example:
    
        ldx #$01
        cpx #$02
        bcc less_than
        
        ldy #$01
        cpy #$02
        bcc less_than
    
        lda #$01
        cmp #$02
        bcc less_than

    Time to learn about indexing. But first, we need to go over bytes and the cpu address range.
    As I've mentioned before, the PC is 16bits. A 16bit value can hold 65536 different values.
    On the 6280, the smallest element is a single byte (8bits). There are 65536 bytes that the
    cpu can 'address'. A processor opcode (instruction) is 1, 2, or 3 individual bytes (in series
    of course). Some are longer than that. When the processor moves forward or jumps to a different
    location, this is done in alignment of bytes. 
    
        address $0005 is 5 bytes from zero (the start or bottom)
        address $0001 is 1 byte from zero  
        address $1000 is 4096 bytes from zero
        
    Data or code(opcodes) are located in memory by offsets of bytes. Often when we look at memory,
    we organize it into lines of bytes - like a hex editor.
    
        address $0000: 00 10 00 12 00 00 00 23 44 78 a9 01 4c 10 e0 ea
        address $0010: ea 00 00 00 a9 01 60 ea ea ff ff ff ff ff ff ff
    
    But to the processor, it's just one byte after the other from 0 to 65535. The processor knows
    no difference between 'code' and 'data' that's located in memory. If you make a wrong jump
    in your code into an area of data, the processor will interpret as 'code' and execute it :D
    Your program will more than likely will be unable to recover and crash. Crash and burn.
    
    So the CPU has the PC register to keep track of where we are(or the processor really) in this
    vast series of 65536 bytes or 64k (kilobytes). How does indexing fit into all of this? Say
    we wanted to move some bytes from one area to another. We can do this:
    
        lda $2000     ; get a byte from $2000
        sta $2800     ; store it at $2800
        lda $2001     ; get a byte from $2001
        sta $2801     ; store it at $2801
        lda $2002     ; etc and so fourth and so on
        sta $2802
        lda $2003
        sta $2803
        lda $2004
        sta $2804
    
    Not only is this tedious, but it wastefully uses space to store all this code. If we wanted to
    copy $50 bytes, that would be a lot of instructions to write out, let alone wasting memory.
    In comes the 'indexing' method. Indexing uses a base 'address' that is static (doesn't change)
    and temporarily adds an 'offset' to it. Example:
    
        ldy #$00      ; y = 00
        lda $2000,y   ; y = 00, $2000+00, get a byte from $2000+00
        sta $2800,y   ; y = 00, store the byte in Acc to $2800+00
        iny           ; y = 01. INY = increment Y register
        lda $2000,y   ; y = 01, get a byte from $2000+$01
        sta $2800,y   ; y = 01, store byte to $2800+$01
        iny           ; y = 02, (increment y)
        lda $2000,y   ; y = 02, get byte from $2000+$02 ($2002)
        sta $2800,y   ; y = 02, store byte to $2800+$02 ($2802)
        
    The Y register acts as the 'indexer'. The value in the Y register is added to the address of
    the instruction. If Y is $14 and the base address of the load instruction is $4000, then the
    complete address for loading a byte is $4014. Both X and Y are 8bit registers and can only
    hold a value from 0 to 255. You can only index up to 256 bytes at a time. If you compare the
    two code examples, you'll notice that the indexing example actually requires an extra instruction
    per step of copying a byte. Indexing looks great, but why would we want to increase the amount
    of work? It looks even more tedious than the first example. Indexing allows something that the
    first example doesn't. Looping.
    
        ldy #$00      ; initialize our indexer 
      loop:
        lda $2000,y   ; get a byte from $2000+y
        sta $2800,y   ; store it at $2800+y
        iny           ; increment Y
        cpy #$50      ; see if Y has reached the value $50
        bcc loop      ; if 'less_than', then jump back to 'loop' address/label
        
    Y is the indexer and is added to both $2000 and $2800 address in the load/store instructions.
    We're copying one byte at a time, so we increment the Y register by 1. We check to see if Y
    has reached the value of $50, if it is less than 50 then we branch back to the 'loop' label. 
    X register works the same. Let's do an example with both X and Y indexing.

        ldy #$00      ; initialize Y indexer 
        ldx #$00      ; initialize X indexer
      loop:
        lda $2000,y   ; get a byte from $2000+y
        sta $2800,x   ; store it at $2800+X
        inx           ; increment X
        inx           ; increment X
        iny           ; increment Y
        cpy #$50      ; see if Y has reached the value $50
        bcc loop      ; if 'less_then', then jump back to 'loop' address/label

    I spiced it up a little to make things interesting. For every 'cycle' of the loop, Y is 
    incremented by 1 and X is incremented by 2 since we have two INX instructions. From the start
    ; load a byte from $2000, store it at $2800, load a byte from $2001, store it at $2802, load
    a byte from $2002, store it at $2804, etc. Let's do an example where we copy the bytes in
    reverse order.

        ldy #$00      ; initialize Y indexer 
        ldx #$50      ; initialize X indexer
      loop:
        lda $2000,y   ; get a byte from $2000+y
        iny           ; increment Y
        dex           ; decrement X
        sta $2800,x   ; store it at $2800+X
        cpx #$00      ; see if Y has reached the value $50
        bne loop      ; if 'not_equal', then jump back to 'loop'

    This time we used CPX for the counter. Since we're counting down with X, we want to compare
    it to $00. This presents a problem with the BCC instruction. Since X starts off with a value
    greater than $00, we don't want to branch on 'less_than'. That would be known as a 'bug'. It 
    would fail for the conditional test we need and just 'fall' through without looping. 
    
    Also, notice the DEX right before the indexed store instruction? That is an important placement
    of the decrement instruction. If we think about the logic of this operation in the example, we
    know we need to copy all $50 bytes from $2000-$204f to $284f-$2800. If DEX was placed after
    sta $2800,x , then we wouldn't get the last byte copied over to $2800+00.
    
                      ; X = 1
        sta $2800,x   ; X = 1, address = $2801              
        dex           ; X = 0
        cpx #$00      ; X = 0  Z = 1
        bne loop      ; X = 0, Z = 1, no branch is taken, processor falls through to the next 
                      ;               instruction.
                      
    We see that the last byte never gets copied to $2800 because of the placement of the decrement
    instruction. Are you slightly confused? I sure hope so ;)



  (MEMORY ADDRESS RANGE)

    Now that we've got a few instructions under out belt and basic understanding of the primary
    registers, let's cover the internal address range VS the external address range. Then we can
    move on to writing some full examples.
    
    I've mentioned that the CPU address range is 16bits, because of the PC being 16bit. 64k doesn't
    sound like much memory to work with (and it's not). Externally, the CPU has 21 address lines.
    This means the external address range is 21bits or 2048k. How does the CPU access all 2048k? It
    uses a paging system.
    
    The processor divides the 64k address range into smaller segments. These segments or PAGEs are
    8k in size. There are 8 pages total (8 x 8k = 64k). The CPU can take any 8k segment from the 
    external address range and map it into the internal address range. An 8k segment of the external
    memory is referred to as a BANK. A PAGE is an 8k slot of the *internal* address range, a BANK is
    an 8k segment of the *external* address range.  

    The external address range is $000000-$1FFFFF. To get a BANK #, you divide the external address
    by $2000.
    
        $000000-$001fff = bank $00
        $002000-$003fff = bank $01
              .
              .
              .
        $028000-$029fff = bank $14
              .
              .
              .
        $1fe000-$1fffff = bank $ff
        
    As we've covered previously, there are eight MPR registers. These register map in the sections of
    external memory to the CPU's internal address range.
    
        $0000-$1fff = $1fe000-$1fffff, MPR0 = bank $ff 
        $2000-$3fff = $1f0000-$1f1fff, MPR1 = bank $f8
        $4000-$5fff = $002000-$003fff, MPR2 = bank $01
        $6000-$7fff = $004000-$005fff, MPR3 = bank $02
        $8000-$9fff = $006000-$007fff, MPR4 = bank $03
        $a000-$bfff = $008000-$009fff, MPR5 = bank $04
        $c000-$dfff = $00a000-$00bfff, MPR6 = bank $05
        $e000-$ffff = $000000-$001fff, MPR7 = bank $00

    The MPR registers are written/read via the TAM/TMA instructions. These instructions transfer the
    value in the Acc register to the corresponding MPR register and vice versa. One thing to keep note
    here is that hucards use the lower half of the external address range. The upper half is reserved
    for external peripherals. It's not that you *can't* map rom or ram into the upper range, all 21
    address lines are on the hucard port, but no hucard does because it was considered to be reserved
    for future upgrades. This means your ROM projects are limited to 1024k or 8megabits (with the 
    exception of the Street Fighter 2 mapper).

    The last thing to talk about for the external address range is memory mapped ports. The 6280 CPU,
    and others variants, do not have direct PORTS to read and write from. Ports are external lines
    that the processor can use to talk with other devices. It's simple and easy to interface both
    hardware and in software. To get around this issue, ports need to be mapped to memory addresses.
    This requires some extra circuitry that we really don't care about at the moment (or maybe ever),
    but we are interested in where these ports are mapped. Thankfully Hudson decided on a single bank
    to map these ports to. The very last bank. The hardware bank.
    
    Bank $FF - the hardware bank. It's customary, but not necessary, to map this bank to the first
    MPR register. All official ROMs do and we might as well adhere to the same. All the external
    processors are mapped to this bank. This includes the graphics processor, the video generator,
    the I/O ports, the TIMER, the interrupt controller, and the audio unit. Hudson designed all the 
    processors and the custom cpu core. They included special 'transfer' instructions for moving large
    blocks of data to and from these other processors. These instructions are akin to DMA controllers.
    They are known as the Txx instructions. We used one of them to initialize the system ram.
    
    Hmm. I guess we should look at indirect addressing. I wasn't originally going to cover it, as you
    the reader should be familiar with some 65x instructions and addressing modes already, but since 
    I've covered some of the other basic features of the 65x/6280 - I might as well. If you are un-
    familiar with indirect addressing or pointers in general, it might be a bit confusing at first.
    That might even be an understatement. 
    
    Ok, so far we've been working with direct addressing. Here's an example:
    
          lda $2000
          sta $2800
          
    The operand (data part of the instruction, $2000 for LDA and $2800 for STA) directly correlates to
    an address in processor memory, i.e. direct addressing. With indirect addressing, the address of
    the operand is the address that 'contains' the address we want to read/write. We uses brackets to
    signify indirect addressing. It looks like this:
    
          lda [$2000]
        
    We're going to need some more visual references to understand how this works. First the goal. For
    this exercise, we need to read a byte form address $7108. We won't directly read from this address.
    Instead we are going to store that address number($7108) in address location $2000 and then load
    a byte indirectly. 
    
                        ; in this exercise value $ff is stored at $7108
    
          lda #$08      ; take the lower half of $7108
          sta $2000     ; and store it at $2000
          lda #$71      ; and the upper half of $7108
          sta $2001     ; stored at $2001
          
                        ; $2000/$2001 now contains a 16bit address
          
          lda [$2000]   ; get the 16bit address from $2000/$2001, load it, then get the byte from the
                        ; loaded address ($7108)
                        ; A now contains $ff

    In indirect addressing, we point to the lower half of the address value in memory. The processor
    automatically fetches the upper half from the next address (which is address+1). You might be
    wondering why the 16bit address isn't stored in memory as '7108' instead of '0871'. The 65x and
    it's variants (in our case the 6280) are little endian processors.
    
    This means any 'WORD' or 16bit values read/written by the processor have the two halves reversed
    in memory. The lower byte is always stored first, followed by the second byte. It's the same with
    'reading' a WORD. Being an 8bit processor, there isn't much reading/writing of WORDs or 16bit values
    as a single element, but when it does happen it's in little endian format. 

    As the programmer, you must be careful to align the indirect instruction's operand address to the
    low byte. If you don't and you align with the high byte, the processor will interpret the address
    in memory differently that what you have in mind. It's common to have the address(es) already
    loaded into memory before hand, because you're more than likely going to be reading or writting
    to that address quite often and usually by multiple subroutines. Let's take one more look at
    indirect addressing in action.
    
        ; here the values prep'd in memory for the current state
    
        $2000/$2001 = 08 71 (7108)
        $2002/$2003 = 93 ff (ff93)
        $2004/$2005 = 00 28 (2800)
        $2006/$2007 = 00 30 (3000)

        ; the values at memory location before the code manipulation

        $7108 = $FF
        $FF93 = $1f
        $2800 = $00
        $3000 = $01
        
        lda [$2000]     ; $2000/2001 = $7108. $7108 = $ff. load byte from $7108 and store in Acc
                        ; A = $ff
        sta [$2004]     ; $2004/2005 = $2800. $2800 = 00. store $ff to $2800. $2800 = $ff
        
        lda [$2002]     ; $2002/2003 = $ff93. $ff93 = $1f. load byte from $ff93 and store in Acc
                        ; A = $1f
        sta [$2006]     ; $2006/2007 = $3000. $3000 = $01. store $1f to $3000. $3000 = $1f


    If that doesn't clear things up, then I don't know. Here's hoping you at least partially understand
    all that. We need to make *one* alteration. For the sake of simplifying the example and making things 
    a little more clear, I used the incorrect instruction format. If you remember, $2000-20ff is the
    address range of ZEROPAGE. In the intro I stated that the PCE has 128 16bit memory registers (that
    can also be treated as 256 8bit general/data registers). They are permanenteley mapped to that address
    range. They are also the *only* way the processor can perform 'indirect addressing'. The instruction
    for indirect addressing doesn't need the *whole* 16bit operand/address. Since the upper half of the 
    address is fixed to '$20', there's no need for the processor to waste cycles and 'fetch' it. So the
    instruction looks like this:
    
        lda [$00]
        sta [$04]
        lda [$02]
        sta [$06]
        
    See? The upper '$20' part had been dropped. This saves one byte and 1 cycle for the instruction. If
    you're going to use any of the ZP(zeropage) registers as 8bit 'general' registers, you need to use
    the proper 'symbol' so that the assembler knows that you are using ZP 8bit registers. 
    
        lda <$00  ; loads a byte from direct address $2000 or ZP reg #00
        
    Loading and storing of ZP registers is faster than standard memory. You can also perform shifting and
    some other 'bit' performing instructions that you can't do in X or Y or even Acc. There's even a mode
    to disable Acc and replace it with any one of 256 8bit general registers. For more information, you'll
    need to read a proper document on the 6280 ;). This tutorial is mostly for coding in ASM for the PCE.
    
    
    
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,   
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////////////////////////////////    
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''