Tuesday, June 26, 2007

ORNL informal meeting notes (2)

Met with Ken Roche, physicist turned CS guy. Met him yesterday and wasn't diligent about taking notes immediately afterwards, so I'm sure I'll have forgotten things.

We had an interesting conversation I, sadly, didn't follow all of. The meeting reminded me how little math I actually know. Nonetheless, I could at least follow the system design aspects of what he was talking about. We spent some time talking about how in god's name you debug a program running on a hundreds-to-thousands of node supercomputer. They have some cute visualization tools that allow you to view the MPI wait times for each node in the cluster. Turns out that they saw, for instance, some outliers that limited the performance of the system. The results are actually the reverse of what one might naively expect: the outliers with very _low_ wait times are the ones holding up the pack (this immediately makes sense if you think about it: if there's an outlier with an exceptionally low wait time, everybody else may have been sitting there waiting on the slow poke who, of course, can immediately continue once he finishes with whatever he's doing). They suspect that the outlier may, in fact, be due to faulty hardware, particularly in the memory subsystem, which is interesting. I wouldn't expect that the memory interface for just one node could go bad, and specifically that it could go bad in a way that wouldn't crash the program. A little jarring.

Another interesting fact is that right now, although there are dual-core processors in Jaguar (they're hot rod supercomputer), the physical memory associated with the processor is strictly partitioned between the cores so as to preserve the programming model (I think...I was a little unclear on why they do this). Moreover, interrupts are handled only by the "master" core, so that for I/O, both cores may end up being blocked. This seems horribly inefficient given that the whole point of multi-core is fast communication between cores. They know this is inefficient. I'm not sure how they're going to fix it.

I think the issue is that they currently run this weird stripped-down, lean-and-mean OS known as Catamount (which amounts to Linux stripped of just about everything that makes it Linux) that doesn't have, for instance, virtual memory. I think this is what necessitates the strict partitioning.

Weird aside: there are apparently these physics guys at MIT (whose names I of course can't remember) that have designed their quantum mechanical simulators _from scratch_, _in assembly code_ in order to get every last ounce of performance out of the fuckers. That scares me. Those fuckers are crazy. Especially since they aren't even CS people. As Ken said, the computers were just an obstacle in the way of being able to get their physics results. I feel really, really dumb right about now...

No comments: