ukmuiik

Question

Thinking about machines such as the PDP-7, PDP-10, CDC-160 and so on¹, these are machines which do not have a stack pointer which can quickly and easily change to point to a new stack frame, like the x86 or ARM machines we have today.

The C standard does not say anything about a stack, since C actually doesn't need one. I can see that local variables may in most cases be placed somewhere in memory for most functions. The functions for which that doesn't work are the recursive ones and the mutually recursive ones, (which cannot be converted to iterative routines by an optimisation stage). But the linker won't know anything about that, will it?

If there is a known compiler which does anything other than use or implement a stack, my question is to see a description of what it does "instead", ideally with a link to source code.

^{1: Yes, the Parallax Propeller also fits in this category of machines, and I'm sure many microcontrollers do, and so this question is not purely about Retrocomputing. But still I think the question belongs here because early C porting efforts will have the answer to the question.}

Comments are not for extended discussion; this conversation has been moved to chat. — Aug 8 at 14:26
MIPS for example doesn't have any hardware stack support, and the stack is manipulated manually by load/store relatively to a pointer. So the title is actually incorrect or at least unclear. It should be something like "... had no indirect load/store instruction" since if you can store a memory address and load from that you can have a stack — Aug 9 at 2:48

score 71 · Answer 1 · 2018-08-12 22:12:33Z

How was C ported to architectures that had no hardware stack?

Simply by implementing a dynamic memory management for subroutine calling, parameter passing and local memory.

If there is a known compiler which does anything other than use or implement a stack,

Now this is a completely different question than the one made in the headline.

Implementing a stack isn't anything special to porting C, nor is a stack at all an invention of C. C inherited its usage from ALGOL, which was first implemented (*1) by Friedrich Bauer and Klaus Samelson (among others) - the same men who 'invented' the stack in 1955 (*2) and got a related patent in 1957.

A CPU stack as we know it today (and is used by C) is just a very primitive implementation of the stack principle defined by them. It's a variant optimized for subroutine return address handling - not necessarily good for parameter passing or local variables.

All languages that allow local variables and/or reentrant subroutine calling (*3) use some form of stack (*4). This includes such dinosaurs as Fortran or COBOL (*5). Bauer was eventually the most influential single person in early compiler development, writing several papers that defined the foundation for almost every language developed thereafter.

In the beginning the stack was mainly seen as a software issue: a rather complex data structure that needs to suit the language, nothing to be supplied by the hardware. How much this is true can be illuminated by the /360 design. While designed entirely after the stack had been introduced in programming languages and finished in 1965, it does not feature a stack pointer or any auto-indexing instructions that could be used for a stack. The hardware developers didn't think it would be of any good to implement such, especially as it would have been a quite complex instruction (*6), considering the different ways stacks were handled by different languages.

my question is to see a description of what it does "instead", ideally with a link to source code.

In addition, focusing on the stack here is in fact not just about the data structure, but the wider area of calling conventions. So let's split this up a bit between storage and parameter passing.

The /360 is a prime example for a widely used architecture without, so let's use it.

Local Storage for Register Saving and Variables.

Static Storage

The closest that comes to an 'instead' are maybe calling convention used on IBM /360. They were ubiquitous and survived with assembler code way into the 1980s.

Here each subroutine also had a piece of private memory where all register content of the caller (*7) as well as local variables were stored. The register storage area was traditionally called SAVEAREA. There was no separate handling of the return address, as it is stored in one of the /360 registers. On return the registers were restored and program flow was transferred back to the caller. As a side effect, in these implementations all local variables are static and available for the next program run.

Stack Oriented Storage

(To understand this, it is useful to remember that the narrowed down version of a hardware supported word (or byte) orientated stack is just a special case of the general definition.)

Another, more elaborated was the usage of a stack. For example, the COLUMBUS implementation (*8). Here a register, usually R13, called R@STACK, was pointing to the frame of the actual nesting level. It had to be word aligned (*9). The first (lowest) 16 words (64 bytes) where kept free as storage for the register set of the next level, again called SAVEAREA, while everything above was used for local variables (*10). The stack was handled by the callee, not the caller. Thus a subroutine (function) call was simple:

L R15,=V(FUBAR) * Linkable address of the function to be called
BALR R14,R15 * Subroutine call (*11)

The subroutine's entry code depended on the way stack underflow was handled. For this example we assume the usage of guard pages (*12) to issue an addressing error when exceeding the available memory. So everything to be done is saving the caller's register content and establishing a new stack frame:

STM R14,R13,0(R@STACK) * Store all registers in the save area (At R13)
SH R@STACK,=Y(LLOCAL+64) * Move the stack register down the the amount
 * needed local storage plus room for a new SAVEAREA

Now the subroutine follows.

At the end, all cleanup is done automagically by restoring the registers of the caller:

LM R14,R13,LLOCAL+64(R13) * Restore registers (*13)
BR R14

And we're back.

Parameter Passing

Static Procedures

Conventions for parameter passing and return codes where rather sparse on this and next to every program made up their own variations. In general R0 and R1 where used for parameter passing, often pointing to parameter blocks (*14), then again it was quite common to have a different list of parameters loaded into many registers with no structure at all.

Similar, there were no conventions about parameter returns. Often R15 was used, but not really consistent.

Dynamic, Stack Using

While the COLUMBUS package did introduce a stack to assembler (and other languages), its calling convention was not necessarily stack based. In general, parameters where passed to a function as a memory reference. It was up to the programmer to build up the parameter list (*15), and pass its address in R1.

L R1,PARLIST_FOR_FUBAR

The called function now can move it wherever needed or address the parameters relative to R1.

Much like the stack frame itself, the parameter list was free form. If it was dynamically build, one would usually reserve some stack space (local variable) to hold it. If static, it could as well be located in some constant area (unless it includes a return value). Or as a combination of both - having a preformatted list in a constant area and move it over to the stack when needed and modified before calling

Not relying on a push/pop mechanic offered a lot more flexibility and performance. In many function calls several, or maybe even all parameters are constant. With preformatted parameter blocks, only real variable parameters need to be stored at their position, reducing the amount of needed instructions thus runtime. The most high performance instruction is still one that doesn't get executed at all :))

If the function was supposed to give a return code, this had to end in R15 of the caller - which was simply done by storing the return code into the save location of R15 within the SAVEAREA. It's like having an automatic declared local return code variable that can be manupulated thruout the function and will be automatic loaded on return without any additional space requirement or runtime cost :=)

Remarks:

In real programming the assembly programmer never touched these instructions, as all was covered in macros. A real program looked more like this:

@ENTR TYPE=M,NAME=MAIN
...
@PASS NAME=FUBAR,PLIST=(FCB,RECORD)
@IF NE
LTR R15,R15 * Return Code Not Zero?
@...
...
@EXIT RC=0 * finish Program with Return Code Zero
@END

@ENTR TYPE=L,NAME=FUBAR,PLIST=(FCB,RECORD)
...
(do whatever needed with FCB and RECORD)
...
@EXIT RC=0 * finish Function with Return Code Zero
@END

And yes, that's assembly language - real one that is :)))

*1 - First named ZMMD on a Zuse Z22 in 1958

*2 - Back then called Kellerspeicher, which may be literally translated as basement-storage. I always guess an American would have called it rather attic-storage as they fill the same niche in the US, where basements are way less common.

*3 - Meaning the same subroutine can be called more than once without prior exiting it.

*4 - With only static variables and non-reentrant calling conventions it is possible to come up with calling conventions using static memory. Here each subroutine gets its own memory to save registers and handle local variables - usually as part of the program memory, right after the code. Which is how FORTRAN or COBOL started out, before quite early switching for stack usage.

*5 - Around 1980 Siemens implemented two different sets of stack instructions in their /370 compatible mainframes (X-CPU). Nothing like a simple wordwise PUSH/PULL and one of them more like hardware implementations of linked list.

*6 - As comments by James Large and T.E.D. have underlined, the situation is a bit more complex, especially with FORTRAN. Recursion and thus reentrant enabled functions and non static local variables (i.e.on a stack) where not part of the language definition until FORTRANÂ 90 came along. In fact, many programs did rely on function variables to be static between calls. FORTRAN further issued a compile time error if a function was called within itself.

But even before FORTRANÂ 90 there were compilers that did allow recursion and dynamic allocated variable space. Just not as standard.

Then again, at least since FORTRANÂ 77, recursion was possible by using function pointers. Here the compiler couldn't check the calling tree and no run time checks did hinder the usage. I have a bookmark of a nice page describing how it was done and what a programmer had to keep in mind.

*7 - At least the ones that were to be overwritten.

*8 - There were others - after all, software is ... well ... soft and allows many ways of implementation.

*9 - At least. Aligning to doublewords of better was standard - and there's a nice story how alignment here did improve performance in a macroscopic way - but that's a different story.

*10 - Depending on the configuration another 32 bytes were reserved for debugging information. In this case the entry consisted of a runtime call where frame information was checked, the new frame was build and debug information like the name of the called function was stored in the additional fields. Having function names as readable EBCDIC at the begin of each stack frame was just great when reading a dump.

We had this feature enabled by default, as it also allows a function to return to some known caller within the calling sequence, bypassing all intermediate ones. Extremely handy in error handling. That way, one doesn't have to schlep some error from a function way down through all the whole calling sequence, with (more or less) specific error handling in each of them.

*11 - Well, BAL R15,FUBAR could as well, if FUBAR is within the address range of the current module. In everything but small demo programs it isn't.

*12 - That's simply one read protected page above and below the available stack page. Accessing this as in case of over- or underrun will result in an address violation which in turn can be handles by the runtime according to what is intended. Like extending the stack memory, or cancelling the program run, or whatsoever.

*13 - This implementation allows only a maximum stack size of 4 KiB minus 64 bytes. Which is natural as stack addressing based on the stack register can only address 4 KiB anyway. There are other implementations that allow more stack, but then again will also require more complex addressing. If a procedure needs that much memory it is more useful to get it from the heap anyway.

*14 - Like having R0 point to a FCB and R1 to a record buffer when doing a READ or WRITE call for a file.

*15 - There where support functions for building parameter lists from an argument list, but that doesn't change the basic workings, so let's skip that for simplicity.

Re, "This includes such dinosaurs as FORTRAN..." FORTRAN is a family of languages. I spent some time writing programs in FORTRAN II, which allowed me to declare variables with local scope, but it statically allocated function activation records (i.e., using the System/360 calling convention that you describe above.) Recursive calls and other reentrant calls were not possible. — Aug 6 at 18:26
About #3 and 4: What is written is technically correct with the caveats given. However, Fortran is of a similar vintage as Algol and the stack itself (53-57), and thus initially did not assume or typically use a stack, and indeed its subroutines were generally not reentrant. Due to the extreme importance of backward compatibility, I'm not sure there was official support in the language until Fortran 90. — Aug 6 at 18:29
You're both right (@jameslarge , T.E.D.), it's a complex issue with a lot of variables (sic!) . Fortran didn't directly allow recursion before FORTRAN90, still recursion was possible by using function pointers. More important here, there where compilers prior to FORTRAN90 that did allow the use of reentrand functions and stack managed variables. Just not standard. — Aug 6 at 18:50
@jameslarge: From what I've seen, newer versions of Fortran seem like they would be superior to "only-define-behaviors-mandated-by-the-Standard C" for many purposes. I've not used any version of Fortran since FORTRAN-77, but Fortran seems to have been becoming suitable for more purposes while "only-define-behaviors-mandated-by-the-Standard C" has gone the other way. — Aug 6 at 21:26

score 16 · Answer 2 · 2018-08-07 20:35:18Z

I've seen three common approaches to handling that:

The first approach is to have the compiler statically allocate an area of memory for use as a stack, and maintain a global pointer to the current "stack pointer". On a platform with a subroutine call/return stack that's too small to accommodate expected function nesting, something like:

int foo(int a, int b);
...
foo(1,2);
...
void bar(int);
int foo(int a, int b);

 int temp=a+b;
 a>>=1;
 bar(a);
 bar(temp);

may be handled, absent optimization as:

 *compilerSP++ = 1; // Push first arg
 *compilerSP++ = 2; // Push second arg
 CALL _foo // Stores return address in a special register
 compilerSP -= 2; // Remove pushed args
 ...
_foo:
 *compilerSP++ = RETURN_ADDR; // Get called function
 compilerSP++; // Reserve space for temp
 compilerSP[-1] = compilerSP[-4]+compilerSP[-3]; // temp = a+b
 compilerSP[-4] >>+ 1; // a>>=1;
 *compilerSP++ = compilerSP[-4]; // Push a
 call _bar
 compilerSP--; // Remove pushed arg
 *compilerSP++ = compilerSP[-1]; // Push temp
 call _bar
 compilerSP--; // Remove pushed arg
 compilerSP--; // Remove space for temp
 RETURN_ADDR = *++compilerSP;
 RETURN;

Performance of this approach is generally pretty bad, but it will support recursion and re-entrant functions without difficulty.

A second approach which is not conforming but is generally more useful is to forbid recursion, compute what the worst-case stack usage would be for each function, and then have the linker statically allocate space for all arguments, local variables, and any temporary variables the compiler needed to make things work. Beyond the fact that such an approach greatly improves performance on systems without stacks, it has a second major advantage even for systems with stacks - such a great advantage, in fact, that I think the Standard should have recognized a subset of C without recursion. If recursion is disallowed, programs can be statically guaranteed never to have any kind of "stack overflow" at run-time. If the linker can't produce a static allocation that's guaranteed to work, it will reject program at link time without it running at all. Although some kinds of program need support for recursion and re-entrancy, waiving such support would allow implementations for safety-critical systems to guarantee that combination of inputs could bomb the stack--something that would be useful even on systems that use stacks.

A third approach which is conforming, but still performs better than the first, is to statically allocate space for local variables and arguments whose address isn't taken, but also keep for each function a flag to indicate its level of nested invocations. If the function is being invoked when it is called again, the second invocation should copy all of its associated automatic objects to a compiler-maintained "pseudo-stack" like the one used in the first approach before allowing it to be invoked recursively (variables whose address is taken would need to be allocated on the pseudo-stack whether code is invoked recursively or not). When a function returns after having been invoked recursively, it should restore all saved automatic objects from that pseudo-stack. This approach, unlike the second, conforms to the Standard, but will run slower because of the need to check whether functions are being invoked recursively. Although I've used a compiler (Archimedes V4) that could be configured to employ this approach, I've never enabled it since it adds extra overhead to all code whether or not it uses recursion. The overhead isn't as severe as with the first approach, but unless an application needs recursion, there's no reason to accept any overhead.

"A third approach which is conforming" i'm not convinced this is conforming, in particular it seems like it would be broken by taking pointers to local variables. — Aug 7 at 20:15
@PeterGreen: Hmm... I hadn't thought of that. A compiler could handle that by allocating pseudo-stack space for any automatic object whose address is taken, at the cost of making accesses to that object much slower than they otherwise would be. As I've said, I would have preferred to see the Standard make recursion optional, especially since it doesn't specify any level of nesting for which it has to work without blowing the stack. Recursion support is sufficiently impractical on some platforms that even if an implementation could support it, programmers would avoid using it at all costs. — Aug 7 at 20:33
@PeterGreen: On a platform like the 8051 where I've seen such an approach used, code which is adapted to fit the limitations of the CPU can often be smaller by a factor of 3 or more, and faster by an order of magnitude, than code which is written in more "typical" fashion. Tolerating the limitations of the CPU in exchange for such huge improvements in efficiency seems like a useful trade to me. — Aug 7 at 20:37
@PeterGreen: As a simple example, if x is an automatic char whose address has not been taken, x++ would be inc FUNCTIONNAME?x, a 2-byte 1-cycle instruction. If it were an automatic char whose address had been taken, code would likely be mov a,_stacklo / add #offX / mov dpl,a / mov a,_stackhi / adc #0 / mov dph,a / movx a,@dptr / inc a / movx @dptr,a. 15 bytes and 11 cycles. — Aug 7 at 20:45

Peter Mortensen 1315 · Answer 3 · 2018-08-11 21:19:23Z

Funny enough, some contemporary architectures, like ARMv4 do not have a hardware stack either. Any subroutine call/return is done through the link register that temporarily holds the return address.

However, what would be called a 'software stack' is still present: some other register is assigned the role of stack pointer: you save the link register where it points as soon as the subroutine is entered, you push and pop registers there through the use of multi-register moves that also pre- or post- increment or decrement register used as a stack pointer (however those instructions use equally any other register as a pointer).

Also worth noting is that a 'hardware stack' has also the meaning of a truly hardware, limited-size stack (made of shift registers or small memory block) that is used nowadays in 'big' CPUs as the predictor of the return address within the speculative execution framework.

I think it's a matter of perspective. R13 is designated for use as a stack pointer, with the ARM docs noting, "In many instructions, you can use R13 as a general-purpose register, but the architecture deprecates this use of R13 in most instructions. For more information see the ARM Architecture Reference Manual." I guess the distinction you're making is that ARMv4 has no instructions that implicitly use SP, just general-purpose instructions that implement a stack, which by convention target R13 when used for procedure entry/exit. — Aug 7 at 14:51
@fadden: What changed with the ARM was that older versions of the architecture handled interrupts by having more than one R14 and R15, and the ability to switch between them, but newer ones push store registers to memory at the descending address using R13. — Aug 7 at 18:03
Arm has hardware stacks (indirect auto inc/dec), but not hardware call stack. The link register is used to avoid bus contention (two accesses to memory at the same time) — Aug 8 at 6:51

berendi 1,109512 · Answer 4 · 2018-08-06 15:02:39Z

A program stack is a very simple data structure that requires only

indexed or indirect memory addressing (pointers)

indirect or computed jumps

a designated register or global variable holding the stack pointer/index

basic arithmetic, i.e. increment and decrement operations

Pushing a value means decrement the index, then store.

Popping is loading a value at the index, then increment the index.

(Of course one can swap decrement and increment to have an upward growing stack)

Calling a subroutine is pushing the return address (program counter plus some offset). If there is no other way to access the PC, the compiler could emit a direct register load followed by a push, and the linker can fix up the loaded value with the return address.

Returning from a subroutine is popping the return address from the stack, and jump to it.

Modern processors tend to implement these operations in single instructions simply because they are often used together, and doing so would save code space and speed up execution.

No, modern processors implement these because old processors did. These instructions are pretty much useless for modern compilers, because matching their effects against the abstract representation of the program is nontrivial when you also want to be able to reorder instructions for better efficiency, keep the invariant that the stack pointer points to an unused address (for exception handling) and emit usable debug information for the program that allows the debugger to find any variable on the stack. If you see push/pop emitted, that is a hardcoded pre-/postamble bypassing the optimizer. — Aug 6 at 17:16
@SimonRichter - I think the answer was more referring to instructions like the 80186 ENTER and LEAVE rather than PUSH or POP, although it's still true that push and pop are commonly used (although generally only for argument passing and not temporary storage, for the reasons you cite). — Aug 6 at 18:36
I suspect that berendi and @SimonRichter has different ideas about when the divide between "old" and "modern" processors occurred. (About 1970 vs. about 2000?) The answer would be better if it explicitly stated what it mean by "modern processors" — Aug 8 at 7:50
@StigHemmer exactly, I was thinking something like "modern processors like the 8080..." — Aug 8 at 10:58
@SimonRichter: all modern compilers targeting x86 use call instead of push return_addr / jmp, and use ret instead of pop rcx / jmp rcx, because it's smaller and much faster. (There's hardware support to make it efficient, like return-address predictor stack for matched call/ret. See blog.stuffedcow.net/2018/04/ras-microbenchmarks. And for stack ops in general, a stack-engine that eliminates the updates to RSP in the out-of-order core, avoiding dependency chains through RSP and making push/pop single-uop. They were slow and avoided on P5 Pentium, but no longer.) — Aug 8 at 12:16

Michael Kay 22315 · Answer 5 · 2018-08-13 08:08:08Z

On very early computers there was no hardware support for a stack, because it was not until they started to program it that they realised that they needed one. The solution was the Wheeler jump Ã¢Â€Â” invented by David Wheeler.

The Wheeler jump

This involves modifying the program as it runs. So will work on a Von Neumann architecture, but not Harvard. (Harvard came latter, and will always have mechanism to implement a stack).

Before a subroutine is jumped to (branch/goto, type jump), the last instruction of the routine that is called is changed to a branch to the next instruction. This last instruction was left blank, for this use. The programmer, or loading tools, need to know the length of the routine that is called. It is a big pain.

Recursion is impossible, because there is only one place (per routine) where the return address is stored. However it is possible to have a subroutine, call as subroutine, call a subroutine. This is all that most programmers do anyway, and most recursion can be replaced by iteration.

This answer is about earlier times, than those mentioned in the question, because it gives some context, and shows what can be done, in relation to the question, with poor hardware support.

Can you point to a C implementation which used the Wheeler jump? I can't picture it. — Aug 7 at 13:10
This was way before C. Everything was done in assembler. They had a loader called initial instructions, it would do some translation of the program as it was loaded, including relocation. I don't know if it did wheeler jump adaptation. After this every CPU had stack support, I am pretty sure that the ones that you listed, do as well. — Aug 7 at 15:29
As you say, recursion doesn't work with the Wheeler jump, so it really doesn't replace a stack - it replaces e.g. a subroutine jump where the current PC is stored in a register. That's still far from a stack. The "callee stores return address in local space" technique stayed common for a long time, e.g. for the PDP-8, and, as shown above, also for the IBM/360. I also doubt that the "next computer" (which? EDSAC II?) "they" (who?) built had a hardware stack - there wasn't enough memory to make that work effectively. Do you have any references for that claim? — Aug 7 at 16:49
If you try hard enough you can replace all recursion with iteration, not just most. It is mostly just not considered practically useful. This is the essence of the Church-Turing Thesis — Aug 11 at 14:14

Peter Mortensen 1315 · Answer 6 · 2018-08-11 21:18:21Z

Even nasty recursive stuff can be converted to iteration. For example, take a look at this:

Reflection and refraction impossible without recursive ray tracing?

Which does exactly that on a platform where recursion is not allowed probably due to the same reasons you are asking about (not sure if a GPU have stack pointer or not).

Anyway, if a computer has memory access with variable addressing (like by register) then you can implement a stack on your own. It is easy, just a few lines of code ... For example, you can do basic stack emulation stuff in Z80 assembly like this:

StackPtr: dw 0

push_a:
 ld hl,(StackPtr)
 dec hl
 ld (StackPtr),hl
 ld (hl),a

pop_a:
 ld hl,(StackPtr)
 ld a,(hl)
 inc hl
 ld (StackPtr),hl

If you do not have addressing by register then you need to do it by self-modifying code where you overwrite the memory access instruction address on a fixed memory position which is then used ... The same goes for call,ret jumping.

Ackermann function

As I mentioned before, even nasty recursive stuff can be converted to an iterative process. Here for example, the Ackermann function in C++:

//---------------------------------------------------------------------------
// https://en.wikipedia.org/wiki/Ackermann_function
//---------------------------------------------------------------------------
DWORD ackermann_r(DWORD m,DWORD n)
 
 if (!m) return n+1;
 if (!n) return ackermann_r(m-1,DWORD(1));
 return ackermann_r(m-1,ackermann_r(m,n-1));
 
//---------------------------------------------------------------------------
DWORD ackermann_i(DWORD m,DWORD n)
 
 const int sz=128; // stack size
 int i=0; // stack pointer
 DWORD a[sz]; // stack
 for (;;)
 
 if (!m)
 
 n++;
 if (!i) return n; // final result
 i--; m=a[i]; // previous recursion level
 continue;
 
 if (!n) m--; n=1; continue; 
 if (i>=sz) return 0; // recursion too deep
 a[i]=m-1; i++; // new recursion level
 n--;
 
 
//---------------------------------------------------------------------------

The _r means recursive and _i means iterative. Of course, without bignum arithmetics we are limited to very small m, n input here comparison of the two functions:

[recursive Ackermann(m,n)]
 mn 0 1 2 3 4 
 0 1 2 3 4 5 
 1 2 3 4 5 6 
 2 3 5 7 9 11 
 3 5 13 29 61 125 

[iterative Ackermann(m,n)]
 mn 0 1 2 3 4 
 0 1 2 3 4 5 
 1 2 3 4 5 6 
 2 3 5 7 9 11 
 3 5 13 29 61 125

The DWORD is an unsigned 32 bit integer... As you can see, I am using my own stack/heap implementation inside the iterative function (but that does not make it recursive!!!) nor is it using stack-related instructions, just memory access ...

The drawback is that you are limited with heap/stack size (this stuff grows really fast), but if you realize you can have a global heap/stack for all the functions even this limitation partially disappears ... But do not befouled; even recursion is limited in the same way as the stack/heap of a commonly compiled program is usually limited too...

Another option is to compute/estimate the needed stack size for example by some series) and use dynamic allocation ... It's also doable without the estimation if we use a dynamic-linked list instead...

While testing, beware this stuff grows crazy. For example:

ackermann_i(4,1) = 65533

took 9.550399 seconds to compute and used sz=65531 DWORDs (while the recursive version stack overflowed already as iterative functions usually need less space than their recursive counterparts).

Not all recursion can be converted to iteration, though it is rare to come across it. — Aug 7 at 12:40
The 'self-modifying' remark in the answer holds true about the way PDP-8 made its subroutine calls. The calling instruction stored the return address in the first word of the subroutine body and continued execution from the second word. — Aug 7 at 14:13
Note the Z80 has a hardware stack. And that code is not stack emulation, it is an implementation of a stack. A stack is a data structure, not a hardware feature. — Aug 8 at 6:39
@ctrl-alt-delor was curious so I tried to encode the Ackermann function both recursively and iteratively and both works so it looks I was right (until someone create another horribility that really can not be converted to iteration but I am afraid such stuff would require construct/syntax even nowadays recursive languages do not allow for now... — Aug 8 at 8:08
Note that ackermann_i is recursive, it just does not use recursive function calls. However it implements its own recursion. This shows you don't need recursive function calls, but you do need to be able to implement a stack. — Aug 8 at 10:41

Steve J 771 · Answer 7 · 2018-08-07 04:28:52Z

up vote
2
down vote

Firstly the PDP-7 had autoincrement capability for some of its memory registers (addressing was direct or memory indirect); the PDP-11 also has autoincrement registers. Note that many CPUs of that era did not have a stack for subroutine calls but instead used a "frame pointer", saving the old PC as an index to the calling routine's parameter data. Second, hardware features can be implemented in software, both in the original Assembly language version of Unix as well as Unix written in the C language.
The Unix clone, Minix, was written for the i8086 originally and has a software Virtual Memory Manager (most newer CPUs have a hardware one).
PDP-7 instruction set:
http://simh.trailing-edge.com/docs/architecture18b.pdf

answered Aug 7 at 4:28

Steve J

771

2

Yes, but how does it answer the question?
â€“Â Wilson
Aug 7 at 4:53

is that a link-register, or frame-pointer?
â€“Â ctrl-alt-delor
Aug 7 at 12:39

Note that the most popular Ã¢Â‰Â¥32bit CPU of this era (the Arm). Does not have a hardware call stack, it too has a Ã¢Â€Âœlink registerÃ¢Â€Â (not a frame pointer, a flame pointer is used to automate stack un-rolling, is stored on the stack, and can be used with or without a link register). In CPUs that use a link register, it is usually possible to push the link-register on to a stack.
â€“Â ctrl-alt-delor
Aug 8 at 6:46

@ctrl-alt-delor: Some RISC ISAs don't have a dedicated "stack register", but the ISA (with plenty of regs and convenient instructions) makes it efficient to dedicate any of the GPRs as a stack pointer. So by software convention you do have a normal call stack on MIPS or PowerPC, for example, even if they never modify it on their own even for exceptions / interrupts. (I know ARM has register banks, but I thought there were some interrupt conditions that did make it automatically use the stack asynchronously, at least on some flavours of ARM. ARM is less RISCy than the full-on RISC ISAs.)
â€“Â Peter Cordes
Aug 8 at 11:49

On Arm (at least originals), there are 14 general purpose registers. Plus program-counter, and link-register. The link-register & one other register are banked, depending on mode (the intention is that you use this other banker register as your call stack, but there is nothing else special about it.). Interrupts put the return address onto the link-register of interrupt mode, interrupts are then disabled, the interrupt handler must save the link-register [on the stack], and re-enable interrupts. (so very similar to mips/power). I think only non-risk instructions are load/store multiple.
â€“Â ctrl-alt-delor
Aug 8 at 14:14

Â |Â
show 1 more comment

Polluks 413 · Answer 8 · 2018-08-08 16:16:30Z

The 6502 has only a stack of 256 bytes, so cc65 implements a stack in software.

Peter Mortensen 1315 · Answer 9 · 2018-08-12 09:35:10Z

up vote
1
down vote

Looking at the manual for PDP-7 (1964), it has:

Jump and copy program counter to accumulator (jsp)

It also has memory indirect addresses, dac i deposit indirect, lac i load indirect.

With up to 8 of these being auto incrementing (depending on memory size).

These last two can be used to implement a stack.

The first uses the accumulator as a link register (the arm has a dedicated link register). The link register (accumulator) can be copied to/from the stack.

edited Aug 12 at 9:35

Peter Mortensen

1315

answered Aug 7 at 16:02

ctrl-alt-delor

2034

Can you point to an implementation which does this?
â€“Â Wilson
Aug 7 at 17:59

@Wilson I have only looked at the manual. I did wonder after writing this, how would a pop be done. This needs decrement, so possibly with more instructions.
â€“Â ctrl-alt-delor
Aug 8 at 6:18

@ctrl-alt-delor why two answers? I would merge them into one ...
â€“Â Spektre
Aug 8 at 8:46

What do you mean by "the arm has a dedicated link register"? Do you mean "the RAM has a dedicated link register"?
â€“Â Peter Mortensen
Aug 11 at 20:13

@PeterMortensen No, it is not in RAM. It is one of the core 16 registers (R14). Most of these 16 are general purpose, expect LR=R14, PC=R15, SP=R13 (R13 is less specialist, just that the OS may save your context on this stack. And like R14=LR and the condition code register , it is hardware banked across all modes.)
â€“Â ctrl-alt-delor
Aug 12 at 10:09

add a commentÂ |Â

R.. 1434 · Answer 10 · 2018-08-07 01:34:46Z

up vote
-3
down vote

There is no such thing as a "hardware stack"; what you're calling a hardware stack is presumably just push/pop (and possibly call/ret) instructions that operate on a specific fixed register and memory addressed by it.

Unlike some other answers claim, no "dynamic memory management" is needed in the absence of such instructions. You just do the same thing you always do to implement call frames with a moving stack pointer, choosing a particular register for use as the stack pointer (at least at call time; if you need to support interrupts/asynchronous events, it probably has to be permentantly-reserved rather than just at call time. Then you just increment/decrement it and store/load at addresses based on it just like you would with a register that the ISA specifically designates as a "stack pointer".

answered Aug 7 at 1:34

R..

1434

"Hardware stack: A common use of stacks at the architecture level is as a means of allocating and accessing memory." en.wikipedia.org/wiki/Stack_(abstract_data_type)#Hardware_stack
â€“Â Bruce Abbott
Aug 7 at 6:05

@BruceAbbott: As far as I can tell from reading the article, it's exactly what I said: a misleading term for a dedicated-purpose register and CISCy stack manipulation instructions in the ISA.
â€“Â R..
Aug 7 at 22:58

Not misleading. A hardware stack is maintained by hardware with minimal (ie. invoking single opcodes) to no (eg. interrupt vectors) software involvement. A RISC machine that requires sequences of several instructions to implement a stack is using software to do it. For example the RCA CDP1802 needs ~26 instructions to implement standard Call and Return functions that are single instructions in other CPUs (which may use random logic or dedicated microcode to perform the actual stack manipulations, but that is hardware not software).
â€“Â Bruce Abbott
Aug 8 at 9:23

1

Some ISAs, like x86, do in fact have a stack as part of the architecture. Interrupt handling asynchronously pushes stuff onto the kernel stack (What happens in the x86 architecture when an interrupt occurs?), so it's not optional to have the kernel RSP pointing to valid memory. You can implement functions without using call/ret or push/pop, have a red-zone for userspace, or even use a different register as the user-space stack pointer, but you can't avoid having a valid kernel stack pointer in RSP unless you run with interrupts disabled.
â€“Â Peter Cordes
Aug 8 at 12:07

1

But sure, for the purposes of this question, the important thing is having enough registers to dedicate one as a stack pointer, and efficient instructions for manipulating it in software. It doesn't actually matter if HW uses it implicitly as a stack pointer or not, because x86-64 and MIPS aren't fundamentally different wrt. how C is implemented. Non-leaf callees pushing a link register achieves the same goal as call pushing a return address.
â€“Â Peter Cordes
Aug 8 at 12:10

Â |Â
show 1 more comment

score 71 · Answer 11 · 2018-08-12 22:12:33Z

How was C ported to architectures that had no hardware stack?

Simply by implementing a dynamic memory management for subroutine calling, parameter passing and local memory.

If there is a known compiler which does anything other than use or implement a stack,

Now this is a completely different question than the one made in the headline.

Implementing a stack isn't anything special to porting C, nor is a stack at all an invention of C. C inherited its usage from ALGOL, which was first implemented (*1) by Friedrich Bauer and Klaus Samelson (among others) - the same men who 'invented' the stack in 1955 (*2) and got a related patent in 1957.

A CPU stack as we know it today (and is used by C) is just a very primitive implementation of the stack principle defined by them. It's a variant optimized for subroutine return address handling - not necessarily good for parameter passing or local variables.

All languages that allow local variables and/or reentrant subroutine calling (*3) use some form of stack (*4). This includes such dinosaurs as Fortran or COBOL (*5). Bauer was eventually the most influential single person in early compiler development, writing several papers that defined the foundation for almost every language developed thereafter.

In the beginning the stack was mainly seen as a software issue: a rather complex data structure that needs to suit the language, nothing to be supplied by the hardware. How much this is true can be illuminated by the /360 design. While designed entirely after the stack had been introduced in programming languages and finished in 1965, it does not feature a stack pointer or any auto-indexing instructions that could be used for a stack. The hardware developers didn't think it would be of any good to implement such, especially as it would have been a quite complex instruction (*6), considering the different ways stacks were handled by different languages.

my question is to see a description of what it does "instead", ideally with a link to source code.

In addition, focusing on the stack here is in fact not just about the data structure, but the wider area of calling conventions. So let's split this up a bit between storage and parameter passing.

The /360 is a prime example for a widely used architecture without, so let's use it.

Local Storage for Register Saving and Variables.

Static Storage

The closest that comes to an 'instead' are maybe calling convention used on IBM /360. They were ubiquitous and survived with assembler code way into the 1980s.

Here each subroutine also had a piece of private memory where all register content of the caller (*7) as well as local variables were stored. The register storage area was traditionally called SAVEAREA. There was no separate handling of the return address, as it is stored in one of the /360 registers. On return the registers were restored and program flow was transferred back to the caller. As a side effect, in these implementations all local variables are static and available for the next program run.

Stack Oriented Storage

(To understand this, it is useful to remember that the narrowed down version of a hardware supported word (or byte) orientated stack is just a special case of the general definition.)

Another, more elaborated was the usage of a stack. For example, the COLUMBUS implementation (*8). Here a register, usually R13, called R@STACK, was pointing to the frame of the actual nesting level. It had to be word aligned (*9). The first (lowest) 16 words (64 bytes) where kept free as storage for the register set of the next level, again called SAVEAREA, while everything above was used for local variables (*10). The stack was handled by the callee, not the caller. Thus a subroutine (function) call was simple:

L R15,=V(FUBAR) * Linkable address of the function to be called
BALR R14,R15 * Subroutine call (*11)

The subroutine's entry code depended on the way stack underflow was handled. For this example we assume the usage of guard pages (*12) to issue an addressing error when exceeding the available memory. So everything to be done is saving the caller's register content and establishing a new stack frame:

STM R14,R13,0(R@STACK) * Store all registers in the save area (At R13)
SH R@STACK,=Y(LLOCAL+64) * Move the stack register down the the amount
 * needed local storage plus room for a new SAVEAREA

Now the subroutine follows.

At the end, all cleanup is done automagically by restoring the registers of the caller:

LM R14,R13,LLOCAL+64(R13) * Restore registers (*13)
BR R14

And we're back.

Parameter Passing

Static Procedures

Conventions for parameter passing and return codes where rather sparse on this and next to every program made up their own variations. In general R0 and R1 where used for parameter passing, often pointing to parameter blocks (*14), then again it was quite common to have a different list of parameters loaded into many registers with no structure at all.

Similar, there were no conventions about parameter returns. Often R15 was used, but not really consistent.

Dynamic, Stack Using

While the COLUMBUS package did introduce a stack to assembler (and other languages), its calling convention was not necessarily stack based. In general, parameters where passed to a function as a memory reference. It was up to the programmer to build up the parameter list (*15), and pass its address in R1.

L R1,PARLIST_FOR_FUBAR

The called function now can move it wherever needed or address the parameters relative to R1.

Much like the stack frame itself, the parameter list was free form. If it was dynamically build, one would usually reserve some stack space (local variable) to hold it. If static, it could as well be located in some constant area (unless it includes a return value). Or as a combination of both - having a preformatted list in a constant area and move it over to the stack when needed and modified before calling

Not relying on a push/pop mechanic offered a lot more flexibility and performance. In many function calls several, or maybe even all parameters are constant. With preformatted parameter blocks, only real variable parameters need to be stored at their position, reducing the amount of needed instructions thus runtime. The most high performance instruction is still one that doesn't get executed at all :))

If the function was supposed to give a return code, this had to end in R15 of the caller - which was simply done by storing the return code into the save location of R15 within the SAVEAREA. It's like having an automatic declared local return code variable that can be manupulated thruout the function and will be automatic loaded on return without any additional space requirement or runtime cost :=)

Remarks:

In real programming the assembly programmer never touched these instructions, as all was covered in macros. A real program looked more like this:

@ENTR TYPE=M,NAME=MAIN
...
@PASS NAME=FUBAR,PLIST=(FCB,RECORD)
@IF NE
LTR R15,R15 * Return Code Not Zero?
@...
...
@EXIT RC=0 * finish Program with Return Code Zero
@END

@ENTR TYPE=L,NAME=FUBAR,PLIST=(FCB,RECORD)
...
(do whatever needed with FCB and RECORD)
...
@EXIT RC=0 * finish Function with Return Code Zero
@END