Keith's Blog

Tuesday, September 22, 2009

The better way

http://www.realworldtech.com/forums/index.cfm?action=detail&id=67226&threadid=66595&roomid=2

This link describes _exactly_ what I think is the future of OS design. Andrew Tanenbaum has said about Minix 3, "a bug in a driver [...] cannot bring down the entire OS." That's a big claim, and these types of claims are what make microkernels sound attractive.

But as Linus has argued so very often, distributed algorithms are not easy, therefore microkernels will never supplant traditional kernels.

But there may be a better way. Get rid of this idea that the only way to protect the kernel is to run untrusted code in a user process. New idea: untrusted code is written in a language that either statically or at-runtime is able to restrict the behavior of said code from doing anything really bad.

When such code is loaded into the kernel, the kernel's "trusty" (no pun intended) compiler combs through it, adding any necessary bounds checking on any pointer dereferences, and bingo. You can make the same guarantee in a traditional-ish kernel as a microkernel. (The only thing non-traditional about this new type of kernel would be the inclusion of said compiler.)

Wednesday, June 24, 2009

Appropriate OO in Multi-Person Projects

After my last couple of posts, it hit me that it kinda sounds like I'm advocating no OO whatsoever in multi-person projects. That is not true. I just think it shouldn't be the organizing principle on such projects.

But, I do think that little entities that live a life of their own and are not integral to the program can be encapsulated as objects.

I other words, if I were part of a team writing a word processor, I wouldn't create a class called "document." I would, however, decide how a document would be stored in memory (it should be a well-understood data structure for everyone on the team), but I wouldn't use a class. I would implement a collection of functions to operate on different parts of a document, but these wouldn't be object methods.

On the other hand, some small things have a limitted amount of information associated with them and are somewhat peripheral to the primary concerns of the program. Take a "string," for instance. It just has an array of characters and a length (and possibly a handful of other meta-data). It may, therefore, be appropriate to have a "string" class.

I guess my rule of thumb would be, if it is one of the data structures that defines the application, don't hide it in a class. If it is a data structure for which a CPAN module might exist (Perl), then it is probably appropriate to use a class.

Tuesday, June 23, 2009

Here are a couple of super-lame graphics to illustrate the difference between separating programmers in a large project using OO isolation mechanisms, vs letting everyone know and hack on a single data structure (using code review and testing for protection rather than isolation).

The circles are programs or portions of a program, and the squares are programmers.

First, OO:

Next, better:

In the OO case, you have portions of the program that are hard-separated (separated via the OO system). It is difficult to change the interactions, even if it is wrong. And the communication necessary to do so is expensive.

In the better case, the program is more monolithic, but has well understood data structures. Everybody hacks on the same code base (their own copy of it, of course). (Functions, files and namespaces may be used to hierarchically separate portions.) Each programmer submits his changes to a code-reviewer (more senior programmer). If he is interacting correctly with the data structures, his changes are incorporated. The project can organically become whatever it needs to become.

Multi-Person Projects

I've heard the claim that OO helps manage large multi-programmer projects. False (usually).

In a previous entry, you may notice that I argue that inheritance reduces flexibility. I will now take that one step further and say that most aspects of a rigid OO system reduce flexibility.

Aside: However, I conceed that OO can provide a lot of code savings, as well as a higher level way to talk about your program (which may help readibility). But this higher level treatment of the code helps organize the code, but does nothing for ensuring sane data structures.

But there is a fundamental problem with OO (which, incidentally, may also be its biggest strength). You can hide implementations. In fact (and this is scary), you can hide data structures. As long as you provide the promised interface, you can really screw up the internals and get away with it.

Oh, and what happens when you need to change the interface of your class? Usually big teams that use OO, divide programmers and teams along class interfaces. That way, a programmer can work on one side of an interface, and another programmer can work on the other side and the two never have to speak (which is good because programmers are intraverts). But what happens when they do have to speak? What happens when you invariably discover that the interface was not quite right? Usually communication regarding changing an interface is not cheap. And the nature of OO is such that this must be decided up front.

There is a better way. Do you want a project that many people can work on without unresolvable conflicts? Do this.

(1) Get the data structures right. Your application should be defined by its data structures. If the data structures change, you have a different application than the original. If the code changes, but the data structures remain unchanged, then it's still the same program.

(2) Teach the data structures to all programmers so that everyone can work with them confidently.

(3) Use code-review to make sure someone isn't improperly mucking with some data structure. (This is in contrast to using a pre-encoded class structure to force everyone to play nicely.)

This keeps the sanity and structure of the program at the human level, and out of the code. That is, the compiler is not used to enforce any rigid structure. You don't end up making rediculous decisions because you chose some class interface incorrectly, and lack the resources to change it.

In support of these arguments, look at the code in the "Git" SCMS and the Linux kernel. The data structures are well understood by those working on the projects. The code can get a bit ugly, but it stays flexible. The only things that can hold the project back are (1) the intelligence (and number) of those working on it, and (2) the quality of the datastructures, not arbitrary class definitions.

Now, having said all this, I will conceed that I use OO all the time on my one-man projects due to the code savings. Since I'm the only one working on all sides of any class interface, it is cheap to change anything that needs to change. So these arguments really apply to multi-programmer projects.

Friday, September 26, 2008

On Orthogonality

An orthogonal set is a minimalistic set of attributes which can be combined (at different values) to create any object in a category of objects defined by said attributes.

Ok, I just kinda vomited that up. Lemme see if I can write that in a way that I'll be able to understand it next month when I read my own blog. I know, I'll give a couple examples.

There exists a category (called two-dimensional space) of objects (called "points") that are fully defined by two attributes ("x-coordinate" and "y-coordinate"). Substituting these specific words into my definition above, we find that the x-coordinate and the y-coordinate make up an orthogonal set. Don't believe me? Watch:

The x- and y-coordinates are a minimalistic set of attributes which can be combined (at different values) to create any point in two-dimensional space.

So, if some attributes make up an orthogonal set, then each such attribute is said to be orthogonal to each other such attribute. Hence, the x-coordinate is orthogonal to the y-coordinate.

This isn't the whole story on orthogonality ... I'm still digesting the notion. But this gets me far enough to talk about orthogonality in programming languages.

Orthogonal languages try to implement a minimalistic set of features which can be combined to create any semantic in the range of the language.

Now Larry and Matz (of Perl and Ruby, respectively) have both (at different times) expressed a dislike for languages that try to be very orthogonal (like Java, C, etc). Orthogonal languages are small (in features) and often easy to learn completely. But they also often yield very verbose code, that people of the non-orthogonal persuasion would call unreadable.

Non-orthogonal languages like Perl and Ruby are large (especially Perl). They give you thirty ways to do the same thing. The idea is that the best way to do X in one place is not necessarily the best way to do it in another place. As a result, you can write very terse code. A person who knows a lot of the language finds this terse code super-readable.

The drawback to non-orthogonality is barrier to entry. Perl, for instance, can take a long time to learn. On the flip side, you can begin to be productive with Perl even if you only understand a very small subset of the language. Larry likes to use this fact in a comparison to natural languages. If all you know of Spanish is "te quiero", you already know enough to get a date, even though there is much more of the language to learn. (That is my example, not Larry's.)

Wednesday, August 27, 2008

Effective Optimization

I'll start with a simple premise: spend more time optimizing repeated
activities to get the biggest bang for you buck.

For example, if you wash dishes every day, but only mow the lawn once
a month, you will do better to spend some time improving your dish
washing skills, rather than spend that same time improving your lawn
mowing skills.

Similarly, if you are coding and you run across the need for a loop,
you should expend your effort optimizing the loop (especially if it
undergoes many iterations) than optimizing code that only executes
once.

The reasoning behind this rationale is simple. If you shave a second
off of a task that is executed once, you have saved a second. But, if
you shave a second off of an operation that occurs 3600 times, you
have saved an hour.

So, as a programmer, it seems natural to spend some time up-front
learning your tools (you do that once), in order to save time using
your tools (you do that lots).

I would gladly spend an hour learning a new time-saving feature of my
editor (namely Vim), if I will have the opportunity to use that
feature to save a second, ten million times before I die. (Ten million
seconds is well over a year's worth of 40-hour work weeks.) That
would be a really good use of an hour. Shoot, you could justify that
even if it took a week to learn that feature.

To sum up, I might say that an effective programmer should spend time
learning to:

1. Type rapidly,
2. Edit text quickly,
3. Know the languages that he/she uses well, and
4. Be able to quickly and effectively use his/her entire tool chain.

After all, just a few well-targeted optimizations in the form of the
basic programmer skills can mean years and years of savings over the
course of your career.

Monday, June 30, 2008

Implicit information via types / inheritance

There exists a broad class of objects in the real world called vehicles. A car is a type of vehicle. A Toyota Camry is a type of car. A 2006 Toyota Camry is a type of Toyota Camry.

These types of relationships would be easy to implement in classes, using subclassing. Class vehicle implements some really basic stuff. Class car implements some more specific stuff, and so on.

Or, you could say a car is a vehicle with the "type" attribute set to "car".


c = new vehicle;
c.type = "car";

This provides greater flexibility, but you forfeit some niceties.

For instance, the count_wheels method should return 4 for a car, and 18 for a semi. (In C++ terms, count_wheels would be a "pure virtual method" in the vehicle class, and overridden in subclasses.) If you make car be a subclass of vehicle, no conditional is needed when you call count_wheels ... the type system checks the type and calls the appropriate overridden method.


class car  : vehicle { int count_wheels() { return 4;  }; };
class semi : vehicle { int count_wheels() { return 18; }; };

c = new car;
s = new semi;

c.count_wheels(); // returns 4
s.count_wheels(); // returns 18

But if you make car be a vehicle with a member called "type" set to "car", you control the conditional (= greater flexibility).


class vehicle {
  int type;
  int count_wheels() { // conditional
    if (type == "car" ) return 4;
    if (type == "semi") return 18;
  }
};

Of course, if you use the type system to hold the implicit info (am I a car or a semi?), then the conditional is evaluated at compile time ... so there *may* be a performance consideration, but beware of thinking this way.

So the moral is that anytime you use inheritance, you forfeit flexibility, but gain some "compiler-does-it-for-you" efficiencies. In fact, inheritance creates strong coupling between the parent class and the subclass, which is not usually a design goal! The only gain is less code (because the compiler does some stuff for you).

On a similar note, I've read some blogs where people claim that all inheritance should be replaced with composition, because composition doesn't require you to give up as much flexibility as does inheritance. So, for instance, a car is "composed of" a vehicle. The con here is you have to completely rewrite the interface for the car class. It doesn't come for free from the vehicle class like with inheritance.


inheritance -> shorter :), less flexible :( code