I write this primarily to one-up these guys, and it's also like a public-access log of my life, if my life consisted only of isolated geeky incidents occurring about every 3 weeks.
Why this page is so simply formatted
Summer quarter is hard. I'm taking Computer Organization, Digital System Design, Programming Language Concepts, and Complex Variables. PLC is pretty easy--I think it's just a 2nd year Comp Sci class. Comp Org and DSD are a lot of work when you take them both at the same time, but not that bad. Complex Variables, though, is pure hell. It's calculus with complex numbers. Now, most of us computer engineers haven't really used calculus in about 2 years, so we don't exactly have 50 derivatives and integrals along with every calculus theorem ever floating at the top of our brains. However, the professor seems to think we do. Lecture is a lot of proofs (each going 5-15 minutes) and moderately helpful examples. The professor is constantly asking us about the next step or some particular property of calculus, but for God's sake WE DON'T KNOW. Not even the math majors remember all that crap. Weekly quizzes only add to the pain.
Anyway, this has been a very wild week. DSD lab due Friday, and as usual the combination of VHDL, Xilinx, hardware, and the RIT computer systems (goddamn virus scanner) make it damn near impossible to get anything working; this means we're spending a LOT of time in the lab throughout the week. Comp Org test and Complex Variables quiz on Monday. DSD exam on Wednesday; Comp Org project due wednesday. There was also supposed to be a Complex Variables exam on Wednesday but through some amazing divine intervention or something the professor pushed it to next Monday. Ugh.
Yes, this has been an entire post of me just complaining about passing crap that's going on.
Edit 28 July 2008: I totally destroyed the Computer Organization and Digital System Design exams, but having just taken the Complex Variables exam today... I think I got owned. Here's hoping the retake goes better.
It seems like several of my friends who are coming back from co-ops have had the same kind of bad experiences with IT departments as I did. At Sandia, we spent about 6 months trying to get a simple, self-administered, unrestricted network for our group. The current networks were all too highly filtered and difficult to work with, and since we did a lot of work with networking systems it turned out to be extremely annoying. We proposed our network and the IT department proceeded to change the requirements, set up strawman networks to show how "misled" we were to want this, lie to managers, and basically do everything possible to avoid doing any real work. The reason? The network guys had set up basically the same network three different times and called them the Open Network, the Restricted Network, and the Classified Network. There isn't much of a difference between those three networks and I don't think IT could actually figure out how to make a different one. All we asked was a pure unfiltered line to the Internet and permission to run our own LAN, but they kept adding in the same port locking, port filtering, and traffic monitoring that made the other networks so unusable. Our managers all the way up the chain were on our side, but the network guys twisted the situation and straight out lied basically so they didn't have to do more work. In the end, we just told them, "Ok, you win". I'm off-site now so I don't really know what the current situation is, but when I left there was talk of renting an office in town or getting another lab with better networking people to run a line for us.
With that background story, and keeping in mind that my friends seem to have had similar experiences of IT obstructionism, we come to the point. It looks like people in IT have forgotten their purpose. The IT department falls under the same category as security, maintenance, and staff--they exist to provide essential services so the scientists and engineers who develop things can get work done. Their job is not to make their own days easier by blocking all ports except 80 and calling the network secure; they need to keep the network & computers secure WHILE allowing the people using the systems to do things with minimal hassle. If I'm a researcher in the Scalable Computing lab and I call them to get port 17500 opened for one of my computers, they shouldn't be asking "why do you need it opened? Do you really need it?", they should be opening the port RIGHT NOW because by God I know what I want and I expect to get it now, without wasting a bunch of government money explaining Plan 9 and authentication to some server monkey with a superiority complex. There are exceptions; giving the accounting department their own standalone connection (without security measures installed) would be foolish, but when a lab of highly experienced Ph.D's with many years of networking behind them ask for something, it's probably a good idea to give it to them.
The solution? Managers should be more prepared to stand up to IT guys when people complain. Our managers were on our side, but they weren't tough enough on the point. When they see these people LYING, blatantly obstructing useful work, they should be ready to apply pressure all the way up to threatening termination AND acting upon that threat. There are a lot of good networking guys, but these bad ones have really left a foul taste in my mouth and I'm apalled that they keep getting away with it.
Somebody posted a Craigslist ad to one of my mailing lists about some ADM3A (here for more info, I responded and was lucky enough to get the lot. Now, I didn't want 7 terminals, just one, so I went over to pick them up with Sellam Ismail. We brought them back, tested them, found one that worked. I have that one and he is holding on to the rest. It's a great terminal, very classic design and a pretty cool keyboard; pick one up if you get the chance. Here's what it looks like when you log in.
This is also my last week of co-op here at Sandia. On Saturday I start the trip home. I'll spend two weeks in Washington, then drive back to Rochester to start the summer quarter. The best news is that I'll be able to continue working for Sandia part-time in Rochester. They're sending me back with a computer and a couple iPaqs to hack around with, should be a lot of fun at a WAY better wage than on-campus employment ($20/hour beats $7.50/hour, hands down. Plus I don't have to grade CS 1 labs). I might even get a certain really cool device that may or may not be publicly announced yet... so I won't say more for fear of NDA-violation.
Digging around in your documents sometimes results in interesting things... here's the story of some stuff I rediscovered today.
Like all good geeks, I liked D&D in high school. Admittedly it was usually a lot more fun to just read the manuals than to actually play it; as Sartre said, "Hell is other people", and it holds for role playing games too. Anyway, my friends and I would sometimes want to try playing it on the cross country bus because we had a long stretch of time and nothing to do. The problem with D&D, though, is that you need a lot of space to roll dice, draw diagrams, set manuals, and work on character sheets. We didn't get much gaming done.
In an effort to alleviate our boredom, we tried various iterations of simplified games. From the start we decided that flipping coins was a much more bus-friendly random number generator, so we used that in almost everything. Experiments ranged in complexity; sometimes it was a simple player-vs-player tournament, flipping coins to see who 'kills' the other first. Once, I created a moderately complex dungeon on a scrap of paper and decided on some simple rules and items, then DM'ed my friends through it. I was actually rather pleased with that one; since everything came from my own head, there was less time-consuming rulebook checking, the characters were easier to deal with, and the combat went faster. We didn't really play anything more than once, but I got the idea in my head to write something that could be played on the road yet had enough complexity to make it interesting.
Today I came across the text files that I wrote for FBS, the Fast Battle System. It's really nothing new; players have a collection of gladiators that they pit against each other, using coin flips to determine the outcome. The game includes your standard fantasy fare of armor, weapons, and spells, along with potions to drink (regain your hit points!). Anyway, the rules document is here and the items list can be found here. If you actually play this, let me know. I think it would probably be best suited to 10-14 year olds, old enough to want a game where all you do is fight your friends, but not old enough to demand any sort of plot or story. We never actually played this game; between books and girlfriends, we passed the bus trips in more traditional ways and saved the D&D for after-school sessions.
First full day of the workshop, and it's been pretty slow. Except for a presentation about the new Ranger supercomputer in Texas, about the only interesting thing was Ron's talk on booting over InfiniBand; everything else seemed more focused on people who are either working directly with the technology or trying to sell it. Ron's talk was a winner because, when asked about IPMI (Intelligent Platform Management Interface), his response was along the lines of, "IPMI? I don't like IPMI, to put it very politely. IPMI should die."
I'm sitting in the second day now, we'll see how this goes.
I have arrived at the 2008 OpenFabrics Alliance Workshop in Sonoma, CA, and I must say that it's beautiful out here. OpenFabrics is all about the networks that glue the nodes of a supercomputer together. Tonight we only had one speaker, opening up the conference with some talk about the directions HPC and networking are taking.
First up was a summary of the state of High Performance Computing. At a time when traditional IT is reducing the number of computers in use through virtualization, HPC continues to grow and require more and more processors. Currently, there does not appear to be any upper limit to the HPC market, so INVEST.
The next section was on InfiniBand. If you don't know what InfiniBand is, educate yourself. Basically, InfiniBand (hereafter referred to as IB) is big in supercomputing because it is *fast* and supercomputing people are reasonably willing to write new software to use IB. 3 of the Top 5, 7 of the Top 20, and 125 of the Top 500 supercomputers are built around IB. The projection is that IB should have sub-microsecond latencies and 100Gb bandwidth by 2010. Since InfiniBand doesn't drop packets (unlike ethernet), people are finding uses for it as a transport for storage protocols.
More on the commodity side of things is 10Gb ethernet. It's already here, but the hardware is still pretty expensive and is mostly being used as backbone. Expect all new servers to ship with 10GbE in about a year. Since 10GbE is a substitution technology, unlike InfiniBand, many sites will make the switch transparently and are expected to do so when the price of hardware drops to about 2 or 3 times that of 1GbE. The IEEE also has roadmaps for 40GbE by 2009 and 100GbE by 2010, so Get Ready.
In summary: expect to see InfiniBand making continuing inroads in the cluster/HPC market, while 10GbE and its successors replace 1GbE in more traditional IT scenarios. Now, please note that all the information above is based on a presentation from one man, albeit a leader in the field, so don't sue me if you invest in InfiniBand companies and they all go bust.
I'm going to leave aside the wisdom of executing arbitrary code just by visiting website XYZ ("unwise"), or whether we really want to push everything into the restrictions of the Web paradigm ("no"). I'm just going to look at a few old statements about software that also hold up in Web applications.
Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can Go ahead and modify this to "expand until it can read your Yahoo/MSN/Gmail contacts and invite everyone" and you have basically every Web 2.0 system.
You can write FORTRAN in any language. Let's make that "you can make any website into Myspace" (see Facebook). 5+ years ago, this was "you can make any website into a crappy portal".
Slowness can be fixed in hardware. At least as true with web apps as with traditional programs. Can I view YouTube videos on my 900Mhz laptop? Barely. Playing low-quality videos is not a complicated thing, but thanks to the wonders of Web 2.0 I need hundreds of megabytes of RAM!
That's all for now. there are more things like this, but my head hurts with rage and stuff, and it's late.
Ok, my brilliant idea is to put together some sort of Plan 9-based data server when I get back to Rochester. Why Plan 9? Because I like it, and because if I put my documents on there I will never delete something by accident. Why? The Fossil and Venti filesystems (best used together) create permanent snapshots every day. I figure I'll take a decently large hard drive and stick it in one of my spare x86 computers, then shove that somewhere out of the way. I was doing some testing today... it looks like a remote Plan 9 filesystem, mounted under Linux, can serve data fast enough to stream music handily while still allowing you to use drawterm and other stuff.
I don't think movies are a great idea to store in the server, at least at first, for two reasons. One, I'm just not sure I want to rely on the ability of my server to stream data fast enough all the time to watch a movie. Two, video takes up a lot of space, and since Venti is permanent, it's not really possible to clear out old videos. I think that my old 80GB hard drive would be a good choice for this; my music only takes up about 16 GB so I would have enough room for a lot more. Access would be available for the other people in the house (I believe Plan 9 can serve NFS and SMB), and with the scripts I wrote to automatically rip CDs, I should be able to fill things up pretty nicely. If it works well, I'll write about it here. If it doesn't there will never be another mention of it!
Some code and stuff I've written is at http://plan9.bell-labs.com/sources/contrib/john/, by the way.
Didn't write anything while I was at Austin; it was very busy and I was pretty tired come evening. We wrapped up on Friday and came back to Livermore. That's not what this post is about. This is about the work I got done over the week.
My job was primarily to port some benchmarks to the BlueGene/L; we thought it would be useful to have some performance metrics. First I brought over the FTQ benchmark, which basically tries to do as much work as possible in a certain amount of time, over and over, then gives some numbers you can play with. The results help you judge how much operating system "noise" there is; if your OS has a lot of daemons running, is high-overhead, etc, you'll see a lot of noise. There's a Python script to plot the data nicely--why they chose Python for a task that is essentially all number-crunching, I don't know. Anyway, as we expected, there isn't a lot of noise on Plan 9, especially on the BlueGene/L nodes, which run a pretty minimal version.
The next benchmark, lmbench, was a bit harder. lmbench is actually a suite of "micro-benchmarks" that each test a small part of the system, like memory bandwidth, pipe latency, context switch latency, etc. I ported the memory benchmark first and had a lot of trouble converting the underlying timing code to Plan 9. Luckily, I didn't have to mess with the Macros From Hell--a set of 4 huge C macros defined in bench.h to take care of running and timing--quite yet. After a lot of poking, changing printf to print, exit(int) to exits(char *), and some more difficult things, I got the memory benchmark working. With the basis done, I was able to quickly bring over several more benchmarks. Then they decided that milliseconds isn't good enough, they'd like to see number of cycles to complete a test as well. This required a largish rewrite of quite a few things and some poking around in the Macros From Hell... not the best experience, because when a macro breaks, you don't get a specific line for the error; they're all on one line. Eventually got that working and finished up. I ported a total of 8 lmbench benchmarks to Plan 9. The lmbench code is available for download from http://plan9.bell-labs.com/sources/contrib/john/9bench.tgz or /n/sources/contrib/john/9bench.tgz if you have Plan 9 installed.
Arrived in Austin yesterday; I guess I didn't note it on here previously, but I'll be spending the week down here to work on Plan 9 at IBM's Austin Research Lab. Anyway, the flight was nice, the hotel is good, and today has been a fine day working. The area I'm working in seems to be in the style of a lot of Plan 9 work areas--a largish open area with a few tables, a lot of computers, and an espresso machine nearby. I've been working with Eric van Hensbergen, Ron Minnich, Charles Forsyth, and Jim McKie. Today, I ported a Linux benchmarking tool to Plan 9 on the Blue Gene/L and collected some data (we have Plan 9 running on a 32-node partition of the BG/L). The data indicates that Plan 9 creates very little "interference" when a program is running--in other words, low OS overhead. Even when running the benchmark on an I/O node which was serving 16 CPU nodes with files, I noticed relatively little interference as compared to Linux. Good stuff, and it does support the movement to use Plan 9 in supercomputing.
This is short, because I'm writing it from an iPaq PDA running Plan 9. I've imported my home directory and the binary direcries so now I have tons of storage and all the programs of a full Plan 9 install. Sweet.
I picked up an IBM Selectric III electric typewriter the other day (free, of course). It's dark blue, in good condition, has basically a new cartridge, and comes with an extra typeball. There is one small problem, though. After typing for a while, maybe 10 lines, 20 lines max, carriage return will just stop moving the typeball head back to the left margin. It slows down, then gets to where it'll go half way, then finally won't make any leftward move. If I then turn it off and let it "cool down" for a while, carriage return will work again for a while. Very odd. I've taken the platen out and removed the top half of the case, and WOW this is a complex piece of equipment. The only electric part is the motor and two small light bulbs; everything else is mechanical. Once I get my camera fixed (I hate Kodak), I plan to take some pictures and make a page about it. There's not much information about Selectric internals on the Internet, but it's still useful knowledge.
I mentioned it a couple posts ago; I've been working on a kernel profiling/tracing system for Plan 9. The idea of tracing is that you can compile the kernel with profiling enabled, which means that every function calls small assembly routines at entry and at exit. These routines then call other functions (which are not profiled, to avoid infinite loops) which record the entry or exit event. The end result is that you can see how long it takes any given function to complete.
My boss, Ron Minnich, and a previous intern, Aki, implemented the tracing code and drivers for Plan 9 under the x86 architecture. It has been my task to tweak that and make it work better on x86, add the ability to only watch function calls from a specific process, refine the tracing controls, and perhaps most importantly port it to the amd64 processor. I think I've managed to achieve these things pretty well; here's an example of how you might use it to watch all calls to the "validaddr" function, which we'll assume for this example is located in memory between 0xfffffff80115000 and 0xfffffff80115500. % represents the prompt.
% bind -a '#T' /dev
% echo trace 115000 115500 new a > /dev/tracectl #this creates a new traced section
% echo trace a on > /dev/tracectl #enable trace probe a
% echo start > /dev/tracectl; sleep 2; echo stop > /dev/tracectl #turn on tracing for 2 seconds to collect data
% cat /dev/trace #reap the fruits of our labors
E ffffffff80115000 0000005fd2dac075 0000000000000066 ffffffff80256d10 0000000000000000 0000000000000000 0000000000000000
X ffffffff80115500 0000005fd2dac170 0000000000000066 0000000000000001 0000000000000000 0000000000000000 0000000000000000
...
Yes, that all looks like a bunch of line noise to you, but if you know the syntax, it makes sense. Those first line of output says that the process with the PID 0x66 entered a traced area at address 0xffffffff80115000 (the start of validaddr) at timestamp 0x0000005fd2dac075 with the single argument 0xffffffff80256d10--so we're checking if 0xffffffff80256d10 is a valid address. Later, at time 0x0000005fd2dac170, it exits the area (function) with the PC set to 0xffffffff80115500; the result is 0x000000000000001, so the address was valid.
The big idea here is that we can use this to analyze where the computer is spending the most time. Even the simplest process can make many many calls to various lock functions, validaddr, memory functions, etc. By finding the biggest time hogs and speeding them up, we can make Plan 9 more efficient for supercomputer use. And I don't know how much I should say about it, but the next step looks to involve kernel-level currying/synthesis of system calls. More on that later.
Wow, been over a month since I've put anything down here. In the time since the last post (which I wrote from Murray Hill), I've returned to Washington, driven down to California, started work, driven back to Washington for Christmas, and now returned to California to continue my work. Sandia is a really great place to work, at least in my section. I work in the Scalable Computing group; everyone is rather casual, but work does get done. I've set up a Plan 9 server and a terminal of my own. Currently, I'm working on getting a new profiled kernel running; previously, I've worked on an ssh v.2 port, fiddled with graphics stuff, and other projects.
Unfortunately, we are in possession of one of the worst email systems ever. First off, it's Microsoft Outlook web access, the same crappy thing RIT provides. And yes, it will log you out after 3 or 4 minutes of inactivity. That wouldn't be so bad, normally, I'd just set my browser to remember the password. However, Sandia has decided to go two-factor on all authentication that it possibly can convert. That means that, in order to check my email, I have to pull out my Cryptocard, type in my PIN, and then type in the 8-digit passcode that it comes up with, which looks something like VN@%aVRp. Sad. Besides the crappy email system, we're also stuck behind Sandia's various firewalling and NAT stuff, which means that we can't really run servers as easily as we'd like, and many ports are blocked. Hopefully, though, we'll get the XRN (eXternal Research Network) that we've been pushing for, which will give us essentially a straight connection to the outside world, allowing us to run whatever servers we please.