Mustard and Orange Peels

July 21, 2008

My Blag has moved!

to here... http://cmalabs.com/blog

March 08, 2008

Generating permutations

I came across a programming problem today that required me to generate all permutations of a set of objects. For example, given the set ["bob", "cat", "apple"], my program should be able to generate the following:

  • "bob", "cat", "apple"
  • "bob", "apple", "cat"
  • "cat", "bob", "apple"
  • "cat", "apple", "bob"
  • "apple", "bob", "cat"
  • "apple", "cat", "bob"
So I wrote one in php:


//prints out all permutations of elements in an array
function perm($beg=array(), $end=array()){
if(sizeof($end) == 0){ //print the combination
foreach($beg as $value){
echo $value.' ';
}
echo '<br />';
return;
} else {
foreach($end as $index=>$elem){
$_beg_=$beg; //copy to tmp var so the next step doesn't alter the $beg array since the $beg array needs to be the same for each iteration
$_beg_[]=$end[$index]; //add element to beginning
$_end_=$end; //copy to tmp var so the next step doesn't alter the $end array since the $end array needs to be the same for each iteration
unset($_end_[$index]);//delete element from ending string
perm($_beg_, $_end_); //recurse
}
}
}

March 02, 2008

Levels of Abstraction: Part 1 Computers

[first draft warning: This post may be edited later because it's very long and error prone.]

As a computer scientist in training, I am having to build increasingly more sophisticated systems. This naturally leads to the questions, how long before computer systems become too complex for humans to build and maintain? Are we close to hitting a limit for practicality in maintaining this growth in complexity? How are we even able to understand and improve upon the computer systems we have today?

I'm not going to talk about the ultimate limit to computation where every particle in the system is used as a processor and memory. That is nonetheless a very interesting topic and Seth Lloyd talks about it in this paper here. I also don't claim to know much if anything at all about the practical limits of computation, and anyone with any training in this topic would know more than I do. However, I am going to attempt to discuss what I think I know about the topic and some questions that I find interesting to ponder.

Computer systems have come a long way since Charles Babbage's difference machine, Alan Turing's theoretical framework for computing systems, and advances in physics and material science that made more complicated Turing machines practical.

Modern computers operate by manipulating sets of binary switches, and those switches are represented in binary numbers 1 and 0. 1 normally represents 'on' and 0 'off' (there is no law that says that has to be the case, it's just an isomorphic mapping of one set of objects to another, but it's more natural for us to think of 1 as something being on and nothing or 0 as something being off, or perhaps thats just the convention instilled into my brain over time. That's another topic worthy of it's own discussion and I digress).

These switches are normally manipulated in sets of powers of 2. For example, with 8bits or 1 byte, 00000001 might represent the start of an instruction. This tells the processor to start reading the instruction that follows until it sees 00000000 which might represent 'stop reading the instruction'. 00001010 might represent something like take the memory at address at the location set at the next 1byte address and load it into register 1. Another byte might say 'add the data at register 1 to the data at register 2 and store it in the location to which the next byte of memory points; and yet another might say jump to the memory address to which the next byte of memory points and start reading instructions from there. There are much better and more detailed explanations to how computers work at the register/processor level, but that's not important here.

Note: As someone with some training in computer science will know, there is no distinction between instruction and data, so it's possible to 'execute' data as
instructions and manipulate instructions as data, so it's very easy to make garbage programs that destroys itself upon first execution, and this is also what makes buffer overflow hacks possible. This also makes interesting things like self reproducing programs possible. Again, I digress.

So how do we program simple calculations in binary circuits into modern operating systems that control multiple hardware devices, the flow of memory, and the millions of calculations performed on a processor every second that makes double clicking an icon to start up firefox possible? The answer is higher levels of abstraction.

Alan Turing proved that you can perform just about any mathematical calculation using a turing machine (Kurt Godel later proved that there exist problems that can not ever be solved using a computer). Some very smart people set out to try and do exactly that. They soon realized it would be much easier to represent binary instructions in symbolic language instead, so 're r1' might stand for read the next memory location and store it in register 1, 'bng' might signal the beginning of the program, 'end' might signal the end. This led to the development of assembly languages which required an 'assembler' to translate the symbolic language to binary for computers to execute. Before long, more mathematical, programming, and logical constructs like addition, multiplication, 'if...then...', 'and', 'or', loops, etc where added to the language and it too required translating into binary. Once these constructs where developed, programmers could program more sophisticated programs without having to re-implement simple operations like addition or multiplication in binary. This was the first level of abstraction in programming languages. No longer do we have to think about where memory is stored or retrieved while multiplying 2 numbers, and no longer do we have to think about where the program needs to jump to execute a conditional statement 'if...then...', or to loop through or repeat a series of instructions n times. (A simple explanation of how compilers work can be found here. A more detailed and a much more realistic than my example of how assemblers work here).

Note: It's interesting how compilers are programs that build more programs, and that the first compiler of a higher level language would have to have been built from a lower level language.

Assembly languages made it much easier to program computers, but it was very error prone because much of the time, it was still necessary to deal with memory locations directly, and programs had to be written sequentially to work, and "hacks" to make programs repeat operations like the GOTO statement can cause many problems if misused. To solve this problem, higher levels of abstraction were added to programming languages. Although these new programming languages like BASIC or C still allowed programmers to manipulate computer memory directly, there was a much greater focus on programming style and structured programming which formalized the use of programming constructs like functions, loops, and conditional statements in programming design. This allowed programmers to group together operations that could be repeated into methods or functions that can be called whenever that set of operations needs to be performed.

As computers became easier to program, computer scientists began to focus more on algorithms and problem solving techniques. Many mathematical problems involve objects. For example the mathematical object "graph" consists of two types of objects: nodes and edges that connect the nodes. A graph can be used to model train stations and train tracks. A problem associated with graphs could be what is the best way to build a railroad network such that the material used is minimum and getting from one station to another require a minimal number of stops. These kinds of problems drove the development of object oriented programming. Initially there were very loose structures that represented objects. For example, a graph could be represented by a 2 dimensional array of memory where each row and corresponding column represents a node and the value at a particular intersecting row and column represents the length of the edge connecting the nodes that the said rows and columns represent. To help solve real world problems, certain functions could be defined to operate on a graph. One function might connect or disconnect two nodes, another might find the shortest path from one node to another. As people tried to solve increasingly complex problems with increasingly complex objects, it became necessary to group methods and functions together into a single object, and even have objects grouped into larger objects. Implementation of such grouping added another layer of abstraction to programming languages. Modern languages like C++, python and Java all support this form of grouping in the form of "classes", these are called Object oriented(OO) programming languages.

Note: Another obstacle that made problem solving difficult was memory management. Before Java popularized it's automatic memory dump handling, objects had to be allocated memory before use and they must be deleted once the program no longer needs them. Newer programming languages allows automatic handling of allocating and deallocating memory at the compiler level which trances the use of variables and objects in a program while it's compiled and allocates/deallocates memory accordingly.

Note2: There are a couple down sides to higher levels of abstraction in programming languages. One is that because many operations must be performed by a compiler, and more abstract structures like objects are very complex at a machine level, it means that programs written in a higher level programming language are generally not as fast nor as efficient as the same program written (more painfully without these added levels of abstraction that make programming easier) in a lower level language. Another problem is that some abstractions do not completely conceal their lower level components. A string comparison for example can be thought of as one operation, but it is much slower than a integer comparison(which can also be thought of as one operation) because strings must be compared character by character. JoelOnSoftware has a more in depth and interesting look at "leaky abstractions". Luckily in the case of the former problem, Moores law delegates that computer speeds are growing at such a fast rate that the slowdown in program efficiency is greatly offset by that growth and the benefit that higher levels of programming abstraction provides. However, efficiency in algorithms(worst case, and average case complexity) is still important for programs to scale well, and smaller computers like ones inside mobile phones are far less powerful than desktop computers, so there is still a great demand to make very efficient programming languages.

The abstraction that resulted from structured programming and object oriented programming
allowed programmers to focus on higher level problems without worrying about lower operations of a computer. This allowed them to build more complicated systems like operating systems which has an Application Programming Interface(API) that group together operations like create a new window, or print off this document.

Note: Security has also driven the development of higher levels of abstraction. Abstraction exists to hide implementation details of a computer system, it can also be used to hide vulnerable details like preventing a user from executing data as instructions by forcing users to perform a particular task using a predefined method or API.

And that, boys and girls, is what I think I know about how higher level programming languages evolved to help make more complex systems possible. In part 2(which I may never write or finish knowing the frequency at which I update this blog) I'll think about and write about levels of abstraction in more down to earth activities like playing a game or reading a book. I didn't talk about whether we can just keep adding more and more layers of abstraction to sustain this growth in complexity, especially in the advent of Moore's law beginning to fail because of the limits of particle physics, but this post is getting too long and it's not like I know anything about the subject anyway.

January 17, 2008

midnight quirk 1

You can string together "help+[person or personal pronoun]" to create an indefinitely long sentence.

Help me help you
Help me help you help them
Help me help you help them help George help us help each other help the world.