Sunday, December 23, 2012

2 Weeks With The Kinesis Advantage Keyboard

For the past two weeks I've been working with the ergonomic Kinesis Advantage keyboard. For those of you who don't know what it its, well, the picture pretty much sums it up... it's a keyboard with a rather unorthodox design meant to be highly ergonomic. The basic premis is that it forces one to use the correct hand placement when typing because all of the keys are recessed into the keyboard itself. This way, one "naturally" has all fingers below the wrist and minimizes the risk of getting every programmer's worst enemy, RSI (lack of sleep, lack of caffeine, bugs in code, etc. are not far behind.)

First Impressions: RTFM

I must say, this is the first keyboard that I've ever used for which I actually read the instruction manual. As I work on a Mac, the first thing that I had to do was convert the keyboard layout for the Mac keyboard layout -- not very hard, just a simple key combination that is on page one of the instructions. Hats off to the folks at Kinesis to actually supply extra keys to switch out the default Windows keys for to the ones that correspond to the Mac layout. They are even nice enough to include a little plastic doohickey that makes the removal of the keys a snap. In addition, the keyboard comes with some foam pads that can be attached to the keyboard to make a comfortably padded area for your wrists. After that initial setup (and reading about how to access the various functions), I was ready to go.

Although you are free to make your own judgement, stylistically, the keyboard is no where near as sleek as the Apple keyboards (or the Microsoft mobile keyboard). It's most definitely an imposing piece of hardware that can make even the most spartan desk look like Times Square on New Years Eve. I don't fault them for this, since there really isn't another way to have keys recessed in a keyboard without having a big keyboard. (Although, they could potentially shave a couple millimeters from various locations.) Also, by no means is this a quiet keyboard. If you have 10 people using these simultaneously, you'd probably think you were working on a assembly line in an early 1900s factory. But, I guess that's the price you pay for for keeping RSI at bay.

Day 1: Wow, I can't type

The first thing that I noticed, and you probably will to if/when you use a Kinesis keyboard for the first time, was that I couldn't type. The enter key was not by my left pinky. The delete key is now accessed using my left thumb. Boy, oh boy, was this going to be an adventure. My words-per-minute dropped down to probably about 10 (it's a good thing I had no immediate pieces of code that I had due). I was hunting and pecking as if I were back in middle school trying to make that little turtle draw a doughnut in Logo (those were the days!).

Day 2: Things are looking up

There was definite improvement in my typing skill on the second day. I now could type most things without having to double check them on the keyboard. That being said, auto-correct and auto-complete were still my best friends. One thing that you'll also become accustomed to is random passerbys gawking at your keyboard. You'll undoubtedly come up with some sort of witty remark as to why you are using this thing like, "It's how I ensure that no one asks to use my computer."

Two Weeks In: I can type again!

While the first couple of days were certainly rough, I am back to my original typing speed for most things. The placement of the arrow and bracket keys, however, are not ideal for the amount that I use them. Due to the layout of the keys, the largest drawback to using this keyboard is that you are forced to have two hands on the keyboard at all times - it's pretty difficult to, for example, use your mouse with one hand and the keyboard with the other. Of course, having the keys separated and recessed into the keyboard is the point of this keyboard.

Wrap Up

Am I keeping the keyboard? Yes. By and large, it is the most comfortable keyboard that I have worked with. Although the first couple of days were rough, it was worth fighting through the pain and getting accustomed to it. I am currently at the point where I sometimes have trouble using a regular keyboard due to the differences in layout. So, go ahead, treat yourself (and your wrists) to the "La-Z-Boy" of keyboards and enjoy the wondrous stares you'll receive at the office when you show up with your Kinesis keyboard. For those of you who don't want to restrict the fun of typing to your hands (or want to feel like a drummer in a rock band), the even keyboard comes with an optional foot-pedal that can be mapped to commonly used keys!

Wednesday, December 19, 2012

Celebrating with Lasers

To celebrate 2.5 weeks at our new startup, we visited the machine shop in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, and used the laser cutter to create a first pass at a sign for our company. Seeing that we are working in some co-working space at the Harvard Business School Innovation Lab (i-Lab), being able to hang up a sign above our desks, is, well, a good thing.

With the lasers changed from stun to cut, we placed some acrylic in the machine, and hit the green "Go" button. Since our settings weren't right the first time around, we had to do a couple of passes for it to cut all the way through the plastic, but it didn't take long until we could take all the bits and pieces and glue them together. Of course, talk is cheap, so, how about a quick movie? Yes, you say? Ok, here it is.

As we build this company up from three people to, hopefully, plenty more, I'm going to try my best to keep this creative, fun-loving, culture alive so that work isn't just work, but also interspersed with fun (and lasers!).

Saturday, December 15, 2012

(JSON) Serializing (SQLAlchemy) Objects

A common task when building a web application (or REST API) is to take some data from a database and then ship it over the wire in some serialized format. Although the concepts of this post apply to pretty much any sort of serialization task, I am going to be using python and the SQLAlchemy to illustrate my current preferred solution.

The first thing that I tried for this was to add a to_json() method to all of my SQLAlchemy models. So, as an example, a User model may look something like
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class User(Base):
    id = Column(Integer, primary_key=True)
    first_name = Column(String, nullable=False)
    last_name = Column(String, nullable=False)

    def to_json(self):
        return dict(,
And for simple situations this works perfectly. However, let's add a little twist to this. Now let's assume that our IDs are not auto_increment integers, but some binary value (e.g. a UUID of sorts) and that we also have a field that contains the user's date of birth (dob). The problem we face now is that we can't just return the binary value for the ID and the DateTime object for the date of birth, because python's JSONEncoder doesn't know what to do with those. So, now, we have a class that looks something like this:
import uuid

from sqlalchemy import Column, DateTime, Integer, String
from sqlalchemy.types import BINARY
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class User(Base):
    id = Column(BINARY(16), primary_key=True, auto_increment=False)
    first_name = Column(String, nullable=False)
    last_name = Column(String, nullable=False)
    dob = Column(DateTime)

    def to_json(self):
        return dict(id=uuid.UUID(,
Will this work? Assuming there are no bugs in my code, then yes, this will most definitely work. However, as I see it, there are two main flaws with this solution:
  1. In most decent sized projects you will end up having quite a few models. So, this means, that for every model (and many of them will have quite a few more attributes (database table columns) than the aforementioned User with only 4 fields. Thus, you end up having to write this to_json() method with all of the attributes over and over again, wasting both time and increasing the chance of a bug.
  2. If you want to change the format of any of the values, say, you moved away from UUID4s for the IDs for all of your models to UUID1s, you have to go through every to_json() method and make the appropriate changes. Again, a waste of time and highly error prone.
As such, here is the solution that I have come up with that is working well so far. It is based on these two threads (1, 2) in StackOverflow but instead of using mixins, I create a separate Serializer class that takes the object to serialize as a parameter. You'll see why I do this shortly.

First, let's define the serializer. Don't be scared, this serializer only has to be created once (and you can just copy and paste it), and after that serializing any object becomes a piece of cake! Trust me!
import dateutil.parser

class JsonSerializer(object):
    """A serializer that provides methods to serialize and deserialize JSON 

    Note, one of the assumptions this serializer makes is that all objects that
    it is used to deserialize have a constructor that can take all of the
    attribute arguments. I.e. If you have an object with 3 attributes, the
    constructor needs to take those three attributes as keyword arguments.

    __attributes__ = None
    """The attributes to be serialized by the seralizer.
    The implementor needs to provide these."""

    __required__ = None
    """The attributes that are required when deserializing.
    The implementor needs to provide these."""

    __attribute_serializer__ = None
    """The serializer to use for a specified attribute. If an attribute is not
    included here, no special serializer will be user.
    The implementor needs to provide these."""

    __object_class__ = None
    """The class that the deserializer should generate.
    The implementor needs to provide these."""

    serializers = dict(
                            serialize=lambda x: uuid.UUID(bytes=x).hex,
                            deserialiez=lambda x: uuid.UUID(hex=x).bytes
                            serialize=lambda x, tz: x.isoformat()
                            deserialize=lambda x: dateutil.parser.parse(x)

    def deserialize(self, json, **kwargs):
        """Deserialize a JSON dictionary and return a populated object.

        This takes the JSON data, and deserializes it appropriately and then calls
        the constructor of the object to be created with all of the attributes.

            json: The JSON dict with all of the data
            **kwargs: Optional values that can be used as defaults if they are not
                present in the JSON data
            The deserialized object.
            ValueError: If any of the required attributes are not present
        d = dict()
        for attr in self.__attributes__:
            if attr in json:
                val = json[attr]
            elif attr in self.__required__:
                    val = kwargs[attr]
                except KeyError:
                    raise ValueError("{} must be set".format(attr))

            serializer = self.__attribute_serializer__.get(attr)
            if serializer:               
                d[attr] = self.serializers[serializer]['deserialize'](val)
                d[attr] = val

        return self.__object_class__(**d)

    def serialize(self, obj):
        """Serialize an object to a dictionary.

        Take all of the attributes defined in self.__attributes__ and create
        a dictionary containing those values.

            obj: The object to serialize
            A dictionary containing all of the serialized data from the object.
        d = dict()
        for attr in self.__attributes__:
            val = getattr(obj, attr)
            if val is None:
            serializer = self.__attribute_serializer__.get(attr)
            if serializer:
                d[attr] = self.serializers[serializer]['serialize'](val)
                d[attr] = val

        return d
Now, assuming there are no bugs in the code above from when I adapted it from our production code, you can create a serializer for your User object by simply doing something like this:
class UserJsonSerializer(JsonSerializer):
    __attributes__ = ['id', 'first_name', 'last_name', 'dob']
    __required__ = ['id', 'first_name', 'last_name']
    __attribute_serializer__ = dict(user_id='id', dob='date')
    __object_class__ = User
The best part is, for any new object that you create, all you have to do is create one of these serializers and you are good to go. No more writing of to_json() in each model. And to get it to do some serialization, just do:
my_json = UserJsonSerializer().serialize(user)
As it currently stands, this can be used as a mixin, and we could add JsonSerializer as one of the parent classes for our User model. The trouble with going that route, is that you can't pass arguments to the serializer class. For example, in our system we store all dates as UTC formated dates, but need to convert them to the local timezone of the current user. As the serializer currently stands, there is no way to pass it a timezone parameter. To do this, our JsonSerializer has a constructor that takes a timezone parameter that is then used in the serialization of dates. So, for example:
class JsonSerializer(object):
    ... all code that was here before ...

    def __init__(self, timezone): = timezone
Make sense? As an added benefit, we can also add more serializers to our default list of serializers in the constructor. For example, let's say our User object references a list of Email objects and we want to serialize that as well. So, first we'd create an EmailJsonSerializer just like we did for the User, but then add this email serializer to to the serializers in Users. Ok, that was a bit convoluted, so here is what I mean:
class EmailJsonSerializer(JsonSerializer):
    __attributes__ = ['user_id', 'email']
    __required__ = ['user_id', 'email']
    __attribute_serializer__ = dict(user_id='id')
    __object_class__ = Email

class UserJsonSerializer(JsonSerializer):
    __attributes__ = ['id', 'first_name', 'last_name', 'dob', 'emails']
    __required__ = ['id', 'first_name', 'last_name']
    __attribute_serializer__ = dict(user_id='id', dob='date', emails='emails')
    __object_class__ = User

    def __init__(self, timezone):
        super(UserJsonSerializer, self).__init__(timezone)
        self.serializers['emails'] = dict(
            serialize=lambda x:
                [UserEmailJsonSerializer(timezone).serialize(xx) for xx in x],
            deserialize=lambda x:
                [UserEmailJsonSerializer(timezone).deserialize(xx) for xx in x]
Now when we call the serializer, it will not only serialize the contents of the User object, but also the contents of any and all Email objects associated with it (assuming you set up the one-to-many relationship properly in your SQLAlchemy models).

Again, while I used SQLAlchemy models to illustrate this pattern, this can work for pretty much object going to and from any type of serialized data. Happy coding!

Friday, December 7, 2012

To Cubicle, Or Not To Cubicle

Many companies have either already adopted (Google, Facebook, Dropbox, Path, Locu, etc.) or are experimenting with (e.g. athenaHealth) the open office layout for their development teams. As we start thinking about the office environment at our startup (, we are weighing the options of an open layout compared with a "traditional" Office Space-esque cubicle layout. We're currently only a small team so it's less of an issue, but when we start to grow, this can become a serious topic of conversation.

As with most things, there are pros and cons to both options. Joel Spolsky points out that developers need to have a quiet location where they can bang out their code. Nothing is worse than when you are in the middle of writing some code (or trying to find that deeply buried bug) and someone comes over to you and interrupts your train of thought. Once that train leaves the station, it takes several minutes (at least it does for me) before I'm back in the groove and making the same mental connections I was making earlier. If those interruptions happen multiple times a day, that's quite a few minutes that are wasted just trying to get back on the proverbial horse. Not to mention, it may lead to the introduction of bugs or other errors caused by not being in that "coding nirvana" state of mind.

On the flip side, cubicles, and other such designated quiet rooms, isolate people. As many have said, proximity and interaction yield innovation (take a look at the first couple of chapters about the "adjacent possible" and "liquid networks" in Where Good Ideas Come From for a decent overview). Innovation is exactly what every (software) company strives for. Without innovation, products become dated, poor design decisions are made, and ultimately the product (or even company) may fail. Even more business oriented books such as businessThink, make a point of saying that companies need to "create curiosity." Employees, no matter their level, need to ask questions about why things are being done the way, and be able to provide alternate (possibly better) solutions. Something that is markedly harder in the traditional office that is ripe with grey dividers and neon overhead lights.

I've had the "pleasure" of having worked in both environments. During my graduate career I was predominantly in a 6 person "office" that was setup like an open layout -- there were 6 desks, 2 large windows, and everyone could see everyone (and the sun outside!). At my postdoc it was the exact opposite. I was in a 2 person office with harsh neon lighting, while many of the other employees were working in cubicles in the middle of the building with hardly any access to any natural light (take a look at this regarding the effect of natural light on building occupants). If it weren't for the weekly group meetings (or trips to the coffee machine), I doubt most people would have interacted with each other. If it takes effort to do something, most people won't be bothered to exert the effort. Oh, and worst of all, there were hardly any whiteboards that we could use to sketch out ideas.

So, what is better, a quiet space for each developer where there are little distractions or a more free-form layout where interaction is encouraged? While some may beg to differ, I prefer (and our startup will adopt) the open layout. It allows for easy reconfiguration of desks for flexible team arrangement, encourages collaboration, creates an open feel that makes even a small office look and feel much larger, and reduces office politics by eliminating things like, "Who gets the cube by the window?" Yes, there need to be rooms designated for meetings and for occasional quiet work for those that are in the "zone", but those are the exception, not the rule. Importantly, we also want to foster a collaborative community where everyone feels comfortable asking anyone a question about anything -- there is no such thing as a stupid question.

That being said, I urge anyone thinking of switching to the open layout to make the best investment they can make for each of their devs, a great pair of headphones. A pair of $100 headphones (I personally use, and love, my Audio Technica M50s) are not only cheaper than a single set of cubicle walls, but also create that "quiet" space for each employee without isolating them from their colleagues. As an added benefit, it's cool to tell your potential employees that when they start working at your company they will get a free pair of headphones -- all perks are good perks.

So, be like Peter Gibbons in Office Space and knock down those dividers and start innovating!

Friday, November 16, 2012

Context Managers Are Your Friend

Anytime there is built in support for functionality that reduces the potential for bugs (and number of lines of code), while simultaneously improving code legibility, that functionality should be used with reckless abandon. Similar to function decorators that can be used to elegantly wrap functions with additional functionality, context managers allow often repeated blocks of code to be elegantly written.

Let's take a common database transaction use-case as an example.

... do some stuff ...


As you can see, there is nothing special about the (pseudo) code above, as it simply is some code to insert some objects into a database. There isn't really anything wrong with above code, but using context managers we can do this in a much more elegant way:
from contextlib import contextmanager

def transaction(db):

with transaction(db):

... do some stuff ...

with transaction(db):

Ok, ok, that's about the same number of lines of code than the original, but now, every time you want to commit something to the database, you never need to write that try-except-else block to make sure that things are either committed or rolled back - you merely need to use the with transaction(db) statement and you're good to go! And since you only wrote the commit logic once, if there is a bug, it only needs to be fixed in one location. Similarly, if after you've written a bunch more code you decide that you want to log every time there is an exception, you simply have to add the logging code in the except block in your transaction function.

Now that you've gotten a taste of the power of context managers, you will probably want to know more about how they actually work. As there is a great explanation here and here, I'm not going to cover that in this post. You may also check out the Python docs on the @contextmanager.

Sunday, November 11, 2012

Create A Netflix-like Page for Your Movies

A while back I undertook the project of converting all my DVDs to files that I could store on a computer. Although having all movies in a single directory is very convenient for 'movie night', it's not great for browsing. Admittedly, there is no way to replicate the feeling of actually browsing DVDs sitting on a shelf, I decided to create a simple webpage with all of the movie cover images along the lines of what Netflix and Amazon have. Of course, the result isn't identical (nor did I feel like spending the time to make it absolutely perfect), but here is what it basically looks like (and yes, our movie collection is eclectic).

In case anyone wants to do the same thing with their movies, you can check out the script that I put together that generates this page. One big caveat, it assumes that your movie files are named 'nicely' and are of the form MOVIE_NAME (YEAR).avi (e.g. Battle Los Angeles (2011).avi). The year is optional, however. Once you have the script, you can run it as follows:
python [movie_dir] [output_dir]
In case you don't feel like going over to github to take a look at the code, here it is in full. Before you complain, this isn't exactly the best code that I have ever written (and it was written while watching Drive on a Saturday night), but it should be good enough to get the job done.

Sunday, October 28, 2012

Python Function Decorators

One of the (many) neat things about Python is the fact that functions are first class objects (like in JavaScript). As the first class nature of functions has been covered many times elsewhere (take a look at this and this), I shall not talk about what that means. However, one of the things one can do with first class functions is pass them as an argument to another function. The ability to do this is what enables the use of function decorators. Again, if you really want to understand how decorators work, why the work, etc., then I suggest you read this excellent explanation. If however, you just want to see an example so that you can use it in your own code, keep reading.

There are two types of decorators, those without arguments (e.g @mydecorator) and those with arguments (e.g. @mydecorator('foo', 'bar'). For the case with no arguments, the code for your decorator will look as follows:
import functools

def mydecorator(f):
    """A simple decorator."""
    def wrapper(*args, **kwargs):
        print 'Before %s in wrapper' % f.__name__
        result = f(*args, **kwargs)
        print 'After %s in wrapper' % f.__name__
        return result
    return wrapper

def myfunc():
    print 'Inside myfunc'

In case you are curious, we don't technically need to use the @functools.wraps decorator in our decorator, but it performs some very useful things for us. Take a look at this post if you want to know why we 'need' it.

If you want to pass arguments to your decorator, then your code will need be something like this:
import functools

def mydecorator(arg1, arg2):
    """A simple decorator with arguments."""
    def wrap(f):
        def wrapper(*args, **kwargs):
            print 'Decorator arguments: %s, %s' % (arg1, arg2)
            print 'Before %s in wrapper' % f.__name__
            result = f(*args, **kwargs)
            print 'After %s in wrapper' % f.__name__
            return result
        return wrapper
    return wrap

@mydecorator('foo', 'bar')
def myfunc():
    print 'Inside myfunc'

It is very important to note that the arguments passed to the decorator are only evaluated once, the first time that the function is encountered. This essentially means that you cannot change the parameters to your decorator at runtime and if arg1 was 'foo', then it will be 'foo' until your program terminates. If you can look at the code above and understand why that is at first glance, then that is admirable. If you are like the rest of us, check out the great explanation of decorators here.

Thursday, October 25, 2012

Python Project Directory Structure

The "proper" way to setup a Python project is (unfortunately) subject to some debate (just check out these Stackoverflow threads here, here, and here or search for "python project directory structure"). One of the biggest points of contention is where to put the unit tests -- should they go under the project source tree or should the be separate?

Although arguments can be made for either organization structure, I feel that following the lead of the Django project, and separating the tests from the rest of the source code is much cleaner. Doing so clearly differentiates between the actual code of the project and the testing of that code (which can be thought of as a consumer of the project code). Also, make sure not to call the directory "test" as there is an internal Python module named test.


I'm sure that there are many who will say that I'm wrong, but at least I can fall back on the fact that Django does it this way. If you don't believe me, just check out their source code.

Monday, October 22, 2012

Splitting Python Packages Into Multiple Projects

I recently had to split up a python package into multiple separate projects while still keeping the topmost namespace the same. I.e. I had to create "namespace packages". So, a project that was initially setup like this


had to be split up into something like

Although I was able to do this rather simply following the suggestions found here, I put together some example files so that you can quickly see how it works. Basically, the one thing that needs to be done is that in project2/mypackage/ needs to contain
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
Once this is done you can now distribute each of the projects separately while still making use of all modules and classes. For example, if we are in the directory that contains project1 and project2, we can run something like the following:
# Add the two projects to the path
import sys

# Import the modules from the two packages
import mypackage.module1
import mypackage.module2
import mypackage.module3

if __name__ == '__main__':
        # Create some objects from the first package
        p1_foo =
        p1_bar =
        p1_SomeClass = myproject.module2.SomeClass()

        # Create some objects from the second package
        p2_bar =
        p2_baz = myproject.module3.baz.Baz()

        # Show that they all work

You can download all of the above code from github.

Wednesday, September 26, 2012

Yours Truly in CS50x Videos

While I can't say that I had anything to do with the filming and editing of these videos, I did put together the content for these two videos on Algorithms and Binary Search. Hopefully they're useful!

If you are interested in learning more about computer science, I urge you to check out all of the other videos that were put together for Harvard University's EdEx course, CS50x. You should also check out EdEx in general -- there's a lot of good stuff!

Wednesday, August 1, 2012

Door + Computers = Table

I ran in to a problem the other day: we needed a comfortable place to sit, talk, and do research at the office that wasn't in a conference room. Where do people like to have conversations? Around coffee tables, of course! Yes, we could buy a table, or go to a coffee shop, but why not just use what we already have around us that we're not using?

Fortunately for me, I happened to have access to two old MacPro machines. They are probably about 5 or 6 years old and were just sitting in a corner collecting dust (actually, since they are so heavy, one of them was being used as an anchor to secure a laptop to with a security cable). Also, being friendly with the higher-ups in charge of infrastructure of the Stata Center at MIT, I was able to get a free piece of glass. As you can see from the picture below, these pieces are normally used in the doors.

So, yes, we have a (geeky) coffee table that is both great looking (in my humble opinion), and functional (you can sit by it, and because the top is glass, we can use dry-erase markers to take notes on the table while talking). Of course, the amusing thing about this is that each of these computers were about $4,000 new... meaning we essentially have one of the most expensive coffee tables... Oh how technology becomes obsolete so quickly!

Tuesday, July 17, 2012

Drive Across the US in Less Than 20 Minutes

During a cross country drive from Massachusetts to California, I mounted a camera on car dashboard and made it take a photo about every 10 seconds. That's right, that little shooter took a total of 16,497 (and 3,498 miles of driving) photos over the course of our trip. Fortunately for you, those 16,497 photos  have been distilled down to a hopefully manageable video for your viewing pleasure. Without further fanfare, here it is.

Oh, and in case you are wondering, I used an "old" Canon SD600 with hacked firmware to take these photos.

Sunday, July 1, 2012

OS X Fuse and SSHFS on a Mac

I do quite a bit of work where my files reside on a remote machine. Of course, I could log in to the remote machine and work on the files there, copy them to my local machine and then copy them back, use an editor that can read files from a remote machine, or many other methods. However, how about the cooler option of mounting the remote filesystem locally and then working on the files as if they were on my machine? Oh, and yes, since we want to be secure, we'll do all that over SSH.

To get this to work you'll need to install OS X Fuse. In case you care, OS X Fuse is an implementation of Fuse, which allows the creation of a filesystem in userspace. OS X Fuse, only installs the necessary library for the creation of that userspace filesystem. To be able to mount a remote directory on your local machine over SSH, you'll also need to intall the SSHFS libraries, which are conveniently located on the OS X Fuse page.

Once everything is installed, you can mount a remote file system using your terminal as follows:
local$ sshfs user@host:/path/to/dir  /local/path/to/mountpoint -ovolname=NameOfMountedVolume
This will take the directory found at "/path/to/dir" on the remote machine and mount it locally at "/local/path/to/mountpoint" and call the directory "NameOfMountedVolume". Now you can navigate to that directory using the terminal or Finder as if it were a local directory. To unmount it, simply run
local$ umount /local/path/to/mountpoint
Pretty cool, eh?

Sunday, May 20, 2012

You Git. No, I'm Not Insulting You!

Coming from having used SVN for many years, I had to do some reading to figure out what this Git this was all about. Yes, there are many other "Git How-To's" out there, but I thought I'd share the notes that I took. Be forewarned, this only covers the basics and doesn't touch upon the important topics of branching and merging.

The Basics

For details on what Git is and all of the nitty gritty details, I would recommend you check out this page, along with the complete git reference. However, the first thing that you will likely need to do is intall it. More complete instructions can be found here, but here are links to the OS X and Windows installers.
After you’ve installed git, you probably want to check to see that your details are correct, so that when you are working with a team that your name and email are correct. To check what your name and email are set to currently, do (most likely these are not set, so they will return nothing):
local$ git config --get
local$ git config --get
If you want/need to make changes, do:
local$ git config --global ‘Your Name’
local$ git config --global you@somedomain.

Create a New Local Git Repository

Let’s say you just want to put the files on your machine in git to do some version control and don’t need/want to use an external server. For that, simply go to the directory that you want to put under version control and do:
local$ mkdir MyFirstGit
local$ cd MyFirstGit
local$ git init
When you take a look at what's in your MyFirstGit directory, you'll see a new .git directory. This is where git stores all of the metadata. (Unlike svn that has a .svn directory in every subdirectory, there is only one .git directory in a git repository). So, fundamentally, git repositories are composed of two things:
  1. The data. I.e. the “working tree” of directories/files.
  2. A .git directory that contains all of the metadata
If you already have files in that directory to be included in the repository, you’ll want to add them
local$ git add .
Assuming you had some files there to begin with, you’ll want to make an initial commit to save your starting point.
local$ git commit -m “Initial import”
Note, if you don’t want to do “git add [filename]” each time you add new files, you can combine the git add and git commit steps by doing (i.e. adding that -a flag):
local$ git commit -am "My message"
Now, in many cases you may want to use a separate git server to keep track of all your projects. Note, since git is a distributed version control system, you technically don't need a git server, you can just push/pull all of the data to/from your collaborators' computer(s) directly. However, it is often easier to have a central place for the repositories so that the URL stays the same. If you want to set up a get repository server, read on.

Create a New Remote Git Repository

To set up a new git repository (i.e. it's a new project your starting and don't already have a git repository on your local machine) on a server you'll want to do the following (note, if you have an existing git repository you want to put on a server, see the next section):
local$ mkdir myProject.git
local$ cd myProject.git
local$ git init --bare
local$ scp -r myProject.git [user]@[remote]:~/path/to/repositories
As you can see, I am assuming that you have a server somewhere with SSH and/or SCP installed. The init --bare tells git to create a "bare" repository that does not contain any of the actual data and only keeps the metadata about the project (for more info on why this is important, take a look at this page).

Clone the Repository

Now that the repository is on the server, you can clone (the equivalent of"checkout" in svn speak) the repository.
local$ git clone ssh://[user]@[remote]/full/path/to/repositories/myProject.git

Create a Remote Git Repository For an Existing Local Repository

If you already have an existing git repository that you have been working with on your computer, you can do something very similar except that instead of creating a new repository, you clone your existing repository (albeit making it a “bare” one)
local$ git clone --bare /path/to/myExistingProject.git myExistingProject.git
local$ scp -r myExistingProject.git [user]@[remote]:~/path/to/repositories

Making Your Repository Point to the Server

Now that you have the bare repository on the server, you’ll want to make your local repository point to the one on the server. Of course you could clone the remote repository as above, or you can simply add a “remote” location to your existing project. If your repository was cloned from somewhere else and already has a default origin, you’ll first want to remove that origin. To check if it is pointing somewhere else do:
local$ git remote -v
If you see an origin entry there, you’ll want to remove it
local$ git remote rm origin
Now you can add your new origin.
local$ git remote add origin ssh://[user]@[remote]/path/to/repositories/myExistingProject.git
Or, include the port if you are not using the default port for SSH (22).
local$ git remote add origin ssh://[user]@[remote]:[port]/path/to/repositories/myExistingProject.git
Of course, you can name this remote location anything you want (or add multiple ones). The “origin” is simply the alias that will be used when you push data (see below).

Commit Changes to the Repository

Having cloned the repository you can now add/remove files and then commit them. Let’s assume that you are in some existing git repository and you create and add a new file:
local$ echo “a” > a.txt
local$ git add a.txt
local$ git commit -m “Added a new file, a.txt”
The files are now committed to the local “cache.” No one else will see these changes since you haven’t told the server that you made the changes. To publish your changes do:
local$ git push origin master
This tells git to push your data to “origin” (the place you cloned the data from) to the branch called “master” (the default branch). For a more detailed explanation on pushing and pulling data from a repository, take a look at this.


If you are using the Eclipse IDE, the EGit plugin will allow you to work with git repositories from within the IDE.

If you don't use Eclipse, but don’t like the command line for git related stuff (and like a nice UI), take a look at Tower. Sorry, that's OS X only...

By the way, since the above remote git repository requires SSH access to the server, every user that wants to work on that project needs to have SSH access to that machine. If you don't want to setup separate (SSH) accounts for every user, you can use a tool called gitosis or gitolite. However, if you have access to a machine to which all users already have SSH accounts, you probably don't need to do this (in case you care, I've never used either of those tools).

Friday, January 20, 2012

On the Offense: My Thesis Defense

After six and a half years at MIT, I was finally allowed to stand up in front of my thesis committee (and guests) and defend my PhD thesis. Although the road was long, and often very bumpy and windy, I reached the end unscathed. Well, I do have quite a few more grey hairs than when I started, but that's just a minor detail.

In case the above (long) video of the defense is not enough to make you fall asleep, here are the actual slides for the defense and the thesis document. I assure you, if you download the thesis, you will be one of a very select few to do so. If you read it, well, then you will be in an even more elite crowd. =)
Thank you to all the family and friends that made this all possible! I wouldn't be here if it weren't for all of your support!