OCO crashes and burns

Well, about 6 hours ago (as of writing time), a Taurus XL rocket built by Orbital Sciences Corporation of Virginia ignited and launched from Vandenberg Air Force Base. It was carrying the Orbiting Carbon Observatory on board, which was a new NASA satellite designed to very accurately (and sensitively) detect carbon dioxide. The goal was to better describe the sources and sinks of atmospheric carbon dioxide. However, that mission will now fall more heavily on the similar Japanese satellite called Ibuki than planned because a few minutes after launching, the Taurus XL's second stage didn't fully complete, leaving a part of the rocket attached to the 3rd and 4th stages, weight that could not be accommodated. The rocket crashed into the Southern Ocean, currently a large sink for atmospheric carbon dioxide A few more details at NYTimes.com [LINK]


Code development in the cloud

Well, perhaps this diversion into the world of cloud-computing and "Web 2.0" and emerging technology is more than just a diversion. Maybe it is a full-fledged series. Today, I have just learned about a Mozilla Labs project called Bespin, which fits so neatly into my overall picture of how computational science should be done that I felt compelled to throw it up here to make a record of it.

First of all, Bespin is a Star Wars reference. It's the moon where Cloud City is, which you can see in Star Wars: The Empire Strikes Back. Remember? Lando is in charge of Cloud City. Anyway....

The software Bespin is a code editing environment implemented in java and within a modern web browser. The idea is that you open your browser, presumably from any computer you are sitting in front of, and navigate to your code. I'm not sure of the details of where the code resides, if you have a Bespin-related repository, or if you have to provide your own storage that is web-accessible (I'll check), but in any case, you have some code somewhere in the cloud. Bespin provides a unified front end to access and edit the code, including a file browser that they are calling a dashboard. Select your file, and it opens within the Bespin environment, and you have an actual text editor, where you can modify or write code and save. They're also working on collaboration tools, wherein you'd be editing a file and your friend could be editing the same file and you can see what each other are doing. Combined with Skype or other chat client, this could be a really useful way for people to work on code together without constant e-mails and attachments, or the possibility of getting divergent forks in a small project.

The editor is supposed to be very flexible. What this means in reality is yet to be seen. I'm not sure if you can just click a button to use Emacs encodings versus Vi, say, or if you are stuck learning a new editor and figuring out how to customize it. This could be the downfall of Bespin. People are so hung up on their editors (yes, I use emacs and don't want to learn another set of keyboard commands) that they will forego a new useful tool in favor of their comfortable, customized, "efficient" (really, probably not) system. But we shall see.

How does this fit into my vision of science? Well, first let's establish that in several branches of science there are very large computational projects that are used and modified by a fairly large number of people. Let's also put out there now that, at least in atmospheric science, these projects are climate models or the components thereof, and that the people modifying the code are usually NOT developers. At least in the sense that they don't have backgrounds in computer science, they don't keep up with trends in software development, and they don't necessarily know a lot about proper coding. They are, instead, scientists who want the model to do something differently than some other scientist designed it to do. Some equation is changed, or a new process is introduced, or some grad student just wants to run a sensitivity study by changing some parameter. So, each of these uses is going to be working with their own collection of the code, with their modifications. Maybe a group at some university wants to take an established model, and tinker with it collectively, so they'd want their own centralized repository of code that they can all access, but isn't accessible to the general community working with that model. There are lots of permutations and combinations that need to be transparently established by something like Bespin. And more importantly, these people (and I'm definitely throwing myself into this group) need to be able to get into Bespin and forget that they are using it. They need to log in, find their code and start working, with minimal investment in modifying and customizing and tweaking and generally fiddling. It needs to "just work."

Now, back to the point. It would be terrific to have the code accessible from anywhere for all users (or a subset of users, or your own code just for you). This could be especially useful for code that is going to be run on powerful remote computers (just downstairs maybe, but maybe in another state, or across the globe) because it would reduce the overhead of just getting to the code. Many of us have a multistep process to log in to these computers, and then we are stuck with the environment that is set up on the computer (which varies from computer to computer even within the same institution). Having a secure log-in to a website, where all our files would be sitting immediately, always in the same format with the same permissions and color scheme and shortcuts/aliases/keyboard macros/etc would increase productivity overnight. The possibility that one could use a computer in San Diego or Illinois or Virginia and the only difference be the URL that you navigate to would blow people's minds. Sure, there are some technical details I'm glazing over, like how do you compile and run the code from Bespin (that's not what it does, as far as I know), but just getting things set up in the code from a uniform environment would be life altering.

The impediments to this vision are several-fold. The first is inertia. We've had thirty years of using nearly the same simple environment, and for half that time almost exactly the same methods (from an end-user perspective) for accessing computers, setting up our user accounts, customizing our editors and other software tools, and learning how to move things around and get code to run. Change comes slowly to people set in their ways. Second, I think this needs to be adopted by large institutions, such as the supercomputing centers, and promoted very strongly as the "right way" to edit files on those machines. Similarly, the big science codes, like the community climate system model, should adopt this as a preferred method of modifying code within the development process, and that will spread to the research community. Third, these centers/projects/etc have to address security concerns in a reasonable way, with users in mind. This might be a separate thread from promoting a more efficient way of editing and sharing code, but there needs to be some sensical security procedures for getting into the big computers without a bevy of passwords and a pocket full of devices for accessing different computers.

Note that in the impediments, I never use the word Bespin, because it does not have to be Bespin. The scientific community is in need of new tools, that are powerful, flexible, and easy to use. It does not matter what they are, as long as they can be used across platforms and computing environments. Walls need to be broken down, code needs to be cleaned up, and resources need to be better used. Something like Bespin might provide one more piece of the puzzle to getting some of these things done.

For more about Bespin, check out their web site, and/or watch this screencast:

Introducing Bespin from Dion Almaer on Vimeo.


Files flying around the cloud

Related to my previous post about cloud computing and such, I've been trolling the interwebs and trying to learn something. In doing so, I have come across various "Web 2.0" technologies, some of which seem neat, others not so much. No, I haven't made any progress toward finding the massive storage and high-performance, massively-parallel computing solutions that I want for climate science purposes, but I'd like to share two (and a half) software+service technologies that I'm now using daily.

First and foremost, I have adopted full-hearted the use of Dropbox (getdropbox.com), which is a simple, elegant, and multifaceted tool for syncing files across multiple computers. The idea is that you install a little software package, which makes a folder called Dropbox in some obvious place (your home directory in OS X). Now whatever file you put into that folder gets silently sent to the dropbox servers -- out in the cloud. That alone, plus a simple (maybe too simple) web interface on the dropbox site, provides a very useful form of backup/archiving for these files. But for me the most amazing, wonderful, life-changing thing about dropbox is that when I go to another computer (with dropbox similarly installed), those files are there in the dropbox folder. They aren't available for download from the dropbox site, they are local copies of the files, right there in a folder. This provides another backup for the files, as they now exist on the first computer, the dropbox servers, and the second computer. It also allows me to simply navigate to that folder to continue working on whatever I put there, and when I press "save" the file is then sent back to the dropbox server, and sent on to the first computer. Amazing. For those who don't move files around between computers, it might seem silly, but it is so, SO much better than e-mail files to myself, or putting files on some server, or using (s)ftp to move the files around, or carrying a thumb drive around, etc. For those who are intrigued, go check out the screencast. Oh, and it's 2 GB of storage for free, $99 for 50GB for a year, and I think they will introduce a more flexible pricing system in the future. And yes, you can share a folder between users! Amazing.

Second, and less new and impressive, is simply delicious.com, which we used to know as del.icio.us. It's a "social bookmarking" system. For me this just means that when I want to bookmark something at work, like a paper I want to read later, I do it using a different browser button, and then I can add a description and/or tags to the bookmark. Then I go somewhere else, home or anywhere else with an internet connnection, and I can go to my delicious home page and find the link to that paper. The social part is that you can share bookmarks and whatnot. It is not as life-altering as dropbox, but it is actually quite convenient. You've probably seen on blogs and news sites little delicious symbols (there might be one right below this post). That provides a way to promote a blog or web page using your delicious account, essentially advertising it to other delicious users, and possibly making a list of things you've been reading for your friends. The firefox plugin for delicious is great, and I hope they bring such a simple interface to Safari.

Similarly, Digg.com is a bookmarking tool. I use this to try to promote news stories that I've read and like. I am not in love with digg, but I have kept using it for months. My main complaint is that when I hit a digg button from a page, like NYTimes.com or LATimes.com, it then makes me log in, looks for that article, suggests it might be a duplicate, asks me to write a summary and pick a category if no one else has dugg it, etc. It can be an ordeal. I want to digg something and not have to interact with the digg machine at all, and that hasn't happened very often.