Re: [stella] OT: Programming, CS theory

Subject: Re: [stella] OT: Programming, CS theory
From: Julian Squires <tek@xxxxxxx>
Date: Fri, 26 Oct 2001 15:54:19 -0230
On Fri, Oct 26, 2001 at 11:00:27AM -0500, Kennehan, Richard wrote:
> This is kinda harsh.  I wasn't trying to antagonize you.

Sorry. It's just that it seems every response so far has been
completely ignoring my point... this is not about coding style,
this is about CS theory, which has little to do with the
practice of programming. (TPoP by Kernighan and Pike is another
excellent book, which, along the lines of Programming Pearls,
shows the many places professional programmers are usually weak)

While I feel that BASIC sucks (except perhaps for rapid prototyping,
but hey, that's why we have perl), and gotos are usually unnecessary
except in asm. Gotos can also be used to improve the style of a program,
such as when they're used as a sort of in-function exception handling.
The whole point is, avoid dogma and make intelligent decisions about
everything. CS theory helps you make those decisions more intelligently.

> In the course I took, this "conversion to an iterative solution" was most
> likely glossed over, or mentioned in passing.  

This has to do with the quality of your instructors and (unfortunately)
generally the quality of undergrad CS in general. Don't forget about
memoization/dynamic programming, which is unfortunately rarely mentioned
more than once to students, unless they're participating in programming
competitions.

> I remember having to write a program that that found chess positions so that
> no queen could capture any other queen, using recursion, in Pascal.  
> 
> That assignment pointlessly emphasized the importance of recursion, to the
> point of teaching us that it should be used more often that it actually
> should be.  A MUCH better assignment would have been this:

Well, it showed how easy it is to solve certain problems with
recursion. If memoization had been added, it might actually have
almost been efficient.

> An evil computer programmer has used recursion in order to make his code so
> hard to read and understand, that only he can understand it, thus preventing
> him from ever getting fired.  However, the company decided to fire him when
> another programmer discovered his evil plan.  As a newly hired programmer,
> it is your job to CONVERT his recursive mess into an iterative solution that
> is easy for other programmers to understand and maintain.

If you find recursion difficult to understand, maybe it wasn't
taught well. I find recursion is very clear, and usually the ideal
for demonstrating a problem's solution. The iterative version is
usually less clear. A good practical example is the flood fill function
in many graphics programs.

> This would teach students how recursion works, AND how to convert it to an
> iterative solution, AND teach good coding style, AND teach a lesson about
> how trying to engineer your own job security makes life harder for the
> person who will eventually replace you.

This is the real problem with CS degree material -- almost nothing
about (once again) the practice of programming is taught, and the
range of caliber which comes out of a CS program is enormous.

> LISP is a great language for Artificial Intelligence, and recursion would be
> better taught in a class that focuses on LISP

Recursion is important at a conceptual level, not a language
level.

Haskell is a great language for a lot of applications, and is used
at Ericsson for that reason. Understanding recursion is essential
for its use.

But I'd have to agree, LISP is garbage for most real-world applications.
(the guys on lisp.org are going to come and kill me now)

> >How about: use the right algorithm and data structure for the job.
> 
> Again...do you have a specific example?  This is still pretty vague.

The classic one, which keeps coming up with people to whom I teach
programming, is using the right sorting algorithm for the job.

On one side of the picture, there was a guy who insisted on using
a poorly-written bubble sort for all his sorting. In a computational
geometry problem (having to do with finding closest points), there
was a lot of sorting. Even just changing his code to call qsort(3)
instead improved performance by leaps and bounds. Coding his own
heap sort then improved things even further.

Of course, I saw the other side of the coin once, too, where someone
was sorting small portions of data very frequently. She was using
the library qsort function, but moving to a simple insertion sort
improved performance significantly.

Dozens of cases seem to come up around the FFT, though, usually
when people are trying to write audio applications, or compression
applications, or bignum math applications. Either a) the person
has never heard of the FFT and tries to fake it, b) the person should
be doing a DCT or an FHT instead of a full FFT and ends up doing much
more work than necessary, c) the person doesn't understand the
constraints of the transforms and spends days (or weeks) debugging
the algorithm because ``it's producing garbage''.

> >I see a lot of people write inefficient (often non-deterministic)
> >solutions to problems they don't really understand. 
> 
> Still kind of vague.  Anything specific?

See the random numbers thing below for example, or the FFT example
above.

> >They often also don't understand how to estimate the efficiency of 
> >their code.
> 
> A little more specific?

[disclaimer: I advise the use of a profiler any time before trying
to second guess the performance of your program]

In the example above about the guy using bubble sort, he couldn't
grasp how it could be taking so much time, even though it is very
obviously O(n**2) and he was doing a lot of sorting.

Another example was someone using doubly linked lists to represent
each line in an editor. There was a lot of overhead in each node,
and he wondered why the program used so much memory as the input
grew.

> >How about people using random numbers instead of permutations? 
> 
> Like how?

Example: someone was writing a simple mp3 playlist frontend for
a command-line mp3 player. To shuffle the data they were picking
random numbers until every song was accounted for. They wondered
why it was taking so long. 

> >How about incredibly inefficient or broken representations of numbers? 
> 
> Example please?

See next example, but also people misimplementing fixed point arithmetic
is a classic.

> >For example, not understanding how positional number systems work is a 
> >big one
> 
> More detail please?

Someone was trying to write a set of calculation routines in a 16-bit
environment which allowed numbers up to 1e20. They couldn't grasp how to
do this efficiently, so they ended up storing the numbers as arrays of
decimal digits. (!)

> >I've seen a lot of problems which a quick glance at Knuth's TAOCP 
> >could solve.
> 
> Like what?

All of the above. Any problem with random number generators,
their construction, or similar. Lots of problems with sorting
algorithms, lots of problems implementing a variety of fundamental
data structures such as trees and lists.

People trying to write garbage collectors or memory allocation
systems, and dozens of other cases, should also read TAOCP.
The best thing is, most of the examples are in (a ficticious
machine's) assembly language, so you have no illusions as to
the cost or hidden difficulties of implementation.

> >Have you ever read Programming Pearls, by Jon Bentley? It's full of
> >excellent anecdotes which illustrate my point. 
> 
> This is a book I should probably own, and I will look for it the next time I
> go to Barnes & Noble.  But if they don't carry it, I would have to
> special-order it just to be able to browse through it.  Rather than wait 2
> weeks, could you be more specific to the above examples so that we can ALL
> improve our coding styles today?

Once again, this is _not_ about coding style, per se.

I don't have Pearls on hand, but a good example from Bentley's first
chapter is about the use of bitmaps to store large amounts of boolean
data -- fairly elementry, but I've seen many programmers do:
    typedef char boolean;
      ...
    boolean *values = malloc(666);
      ...
    values[i] = 1;
      ...
Et cetera.

He has a good chapter on binary search and why it is so frequently
misimplemented. (So frequently that programmers whose libraries
don't have a binary search function, or don't know about it, often
implement a linear search instead even though the data is sorted)

Later in Pearls (or possibly More Programming Pearls) there is an
excellent chapter on ``back of the envelope'' estimates. That is,
estimating on the fly with rough figures. It shows how this applies
to determing the feasability of a given data structure, network 
model, or algorithm in your program.

-- 
  -/ 
 |/|  Julian Squires <tek@xxxxxxx>
 /-

-
Archives (includes files) at http://www.biglist.com/lists/stella/archives/
Unsub & more at http://www.biglist.com/lists/stella/

Current Thread