Thursday, December 19, 2013

Viewing New Relic Audit Logs When Not Using Ruby

Although New Relic is great for viewing the status of running web applications, at our startup (which deals with patient data) we wanted to be sure that we weren't sending any private data to New Relic. For all of the Ruby developers out there, this is a simple task by following their instructions on their website. For the rest of us, however, there aren't any clear instructions. Fortunately, that doesn't mean that it is hard to do!

All you have to do is modify your newrelic.ini file as follows:
  1. Uncomment the line: log_file = /tmp/newrelic-python-agent.log
  2. Then set the log_level setting to debug
  3. Underneath the log_level line, add the following:
    debug.log_data_collector_payloads = True
    debug.log_agent_initialization = True
    debug.log_data_collector_calls = True
    debug.log_transaction_trace_payload = True
    debug.log_thread_profile_payload = True
    debug.log_raw_metric_data = True
    
  4. Restart your application that loads the New Relic agent.
You now can enjoy staring at all of the data that is being sent to New Relic by watching the log file
tail -f /tmp/newrelic-python-agent.log
As you might have guessed, this will create a large log file, so make sure to turn off these settings when you are done with them (or ensure that you are properly rotating your logs).

According to New Relic support, the debug.log_data_collector_payloads setting is what will log every data message sent to the New Relic collector. As this data is encoded and compressed, the other settings above are what decode the data and print them out in human readable form.

If you are not seeing any logging output in your log file, make sure that you are not running into any issues due to logger conflicts. For example, if you disable existing loggers in your app, you won't see any output. For further details take a look at this article put together by the folks at NewRelic.

Wednesday, August 14, 2013

Converting a TEXT column to a VARCHAR column in MySQL

I ran in to the issue of having to convert a column in a table from a TEXT type to a VARCHAR type. Before you do this, make sure that the length of your VARCHAR is long enough to hold the longest string that you have... otherwise you're going to end up truncating your strings! In our case, the strings were all pretty short, so this wasn't an issue. Due to an existing (and known) MySQL bug, we can't cast directly from a TEXT blob to a VARCHAR, but have to cast it to a CHAR first. As this ended up being several steps, I thought I'd share the steps I took. I am sure there are ways to do this with fewer steps, but this method allows you to perform the steps one by one, checking to make sure your data is correct as you go.

For simplicity, let's assume you have a table that looks something like this:
# Our original table
CREATE TABLE my_table (
    id INT NOT NULL AUTO_INCREMENT,
    my_col TEXT NOT NULL,
    PRIMARY KEY (id)
);
What you want to do is convert the my_col column into a VARCHAR column. The following steps should get you from TEXT to VARCHAR:
# Create a temp table
CREATE TABLE tmpvchar (
    id INTEGER,
    my_col_vchar VARCHAR(255)
);

# Copy the data over and cast it to a character
INSERT INTO tmpvchar (id, my_col_vchar)
    SELECT id, CAST(my_col AS CHAR(255)) FROM my_table;

# Add a new column to the original table
ALTER TABLE my_table ADD my_col_vchar VARCHAR(255);

# Copy the data from the temp table
UPDATE my_table SET my_col_vchar = (SELECT my_col_vchar FROM tmpvchar WHERE my_table.id = tmpvchar.id);

# Remove the incorrect column
ALTER TABLE my_table DROP my_col;

# Rename the column back to the correct one 
ALTER TABLE my_table CHANGE my_col_vchar my_col VARCHAR (255) NOT NULL;

# Drop the temporary table
DROP TABLE tmpvchar;
I hope this ends up saving someone some time!

Tuesday, July 30, 2013

Parsing Arguments in Python with argparse

Parse arguments from the command line is something that most of us have to do at some point in time. Unfortunatley for us, when it comes to python, most of the examples are for the old, deprecated library for parsing arguments (optparse). Since python 2.7, the new library, argparse, has become the standard. Although the description of how to parse arguments in python is well documented in the documentation, the most basic examples aren't near the top, so it seems much more complicated than it actually is. In any case, here are some quick, basic examples of how to use argparse.

The Simplest

First, let's start with just getting arguments from the command line.
# argparse1.py
import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='An awesome program')
    parser.add_argument(
        'first_name', help='First Name')
    parser.add_argument(
        'last_name', help='Last Name')
    args = vars(parser.parse_args())
    print "{} {}".format(args['first_name'], args['last_name'])
If you run this as python argparse1.py, without any parameters, you should see an error message with a helpful usage message along the lines of:
# argparse1.py
usage: argparse1.py [-h] first_name last_name
argparse1.py: error: too few arguments
I.e. just by using the argparse parser and parsing the arguments, we have a built in usage generator. Neat!

Now, if you run this with actual values, such as python argparse1.py John Smith, then, as you'd expect this will work work and print out "John Smith". One thing to note is that I used vars() to get the variables out of the Namespace that is created by the parser. If you want to, you can get the values directly out of the Namespace without using vars(), but I prefer the dictionary style access for my arguments. For more details, on this, I'll refer you to the python documentation.

Named Parameters

While the above example works well for simple situations where all arguments are required and positional arguments make sense, it is often nice to allow the use of named (optional) parameters. For example, we can re-write the above using named parameters as follows:
# argparse2.py
import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='An awesome program')
    parser.add_argument(
        '--first_name', required=True, help='First Name')
    parser.add_argument(
        '--last_name', required=True, help='Last Name')
    parser.add_argument(
        '--middle_name', required=False, help='Middle Name')

    args = vars(parser.parse_args())
    if args['middle_name']:
        print "{} {} {}".format(
            args['first_name'], args['middle_name'], args['last_name'])
    else:
        print "{} {}".format(args['first_name'], args['last_name'])
Unlike before, we now have to specify the argument name before using it, but we can get the same result as before by doing:
# argparse1.py
python argparse2.py --first_name John --last_name Smith
Just as before, if you don't provide first_name or last_name, we get a helpful error message. However, we also added a new optional argument for the middle name, which we can provide if we feel like it. E.g.
# argparse1.py
python argparse2.py --first_name John --middle_name Bob --last_name Smith

Sub Commands (Sub-Parsers)

Ok, now that we know the basics, let's look at the case where we have a program that has two different sub commands.
# argparse3.py
import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='An awesome program')
    subparsers = parser.add_subparsers(
        title='subcommands', description='valid subcommands',
        help='additional help')
    parser_create = subparsers.add_parser('create')
    parser_create.set_defaults(which='create')
    parser_create.add_argument(
        '--first_name', required=True, help='First Name')
    parser_create.add_argument(
        '--last_name', required=True, help='Last Name')

    parser_delete = subparsers.add_parser('delete')
    parser_delete.set_defaults(which='delete')
    parser_delete.add_argument(
        'id', help='Database ID')
    args = vars(parser.parse_args())

    if args['which'] == 'create':
        print "Creating {} {}".format(args['first_name'], args['last_name'])
    else:
        print "Deleting {}".format(args['id'])
Whoa! What do we have here? If you run this without any arguments, you will see a help message along the lines of:
# argparse1.py
usage: argparse3.py [-h] {create,delete} ...
argparse3.py: error: too few arguments
This is telling us that we have to provide one of the available subcommands "create" or "delete". So, let's try that by running python argparse3.py create:
# argparse1.py
usage: argparse3.py create [-h] --first_name FIRST_NAME --last_name LAST_NAME
argparse3.py create: error: argument --first_name is required
Now we get the helpful message saying exactly what the arguments are for the subcommand "create". Neat! If you actually provide it with valid inputs, you will see that the parser only returns the arguments for the subparse that was selected. In other words, continuing with the previous exmaple, if we run the program as follows:
# argparse1.py
python argparse3.py create --first_name John --last_name Smith
Then we only get the arguments first_name and last_name; id will not be there since it didn't belong to any of the first sub-parser's arguments. As you also may have noticed, you can mix and match positional arguments and named arguments at will.

Unfortunately, the one thing that is lacking by default in the argument parsing when using subcommands is a way to get which subcommand was run. Although in the example above we can figure it out since only the "create" subcommand has first_name and last_name, but what would we do if both subcommands had overlapping arguments? The solution to this (originally found here) is to provide a default argument that tells us which subcommand was chosen. This is why I added, for example, the line parser_create.set_defaults(which='create') to the first subparser. This allows us to get the argument "which" that we have added to tell us which subcommand was chosen.

Going Farther

Well, that's it for the basics. If you want to do more than this, then I highly suggest you read the docs as they contain other examples. Hopefully, this little introduction has made it a bit easier to digest what is going on in that documentation page!

Friday, May 17, 2013

Unit testing your Flask REST Application

If you are already convinced that unit testing is the way to go, skip the next section with the different examples.

Unit Testing Is Not Optional

As it says on the Flask website, "something that is untested is broken." While this obviously isn't always true, there is no way for you to know whether or not you have bugs in your code unless you test it. When you initially start writing your application, you probably think one of the following:
  1. Even if it takes only 10 minutes to write the unit test, that time could be spent writing the next great feature.
  2. Why bother wasting time writing test cases when one can just test the application by visiting the site and clicking on the various pages. Even with a REST API that doesn't have nice user interface, one could theoretically could use something like the Advanced REST Client Google Chrome plugin to inspect the JSON that is produced.
Although this is true when you first start out (i.e. the first week or two), you will quickly realize how unmanageable it becomes. How can you be 100% sure that when you change some line of code that it doesn't have a (negative) affect on some other, seemingly unrelated, bit of code? Enter unit testing.

Unit testing is the notion of testing the parts (units) of your code to ensure that they behave as expected. For example, if you have some code that verifies whether a password is correct for a given user, then the unit test should test all possible scenarios that this function may encounter. For example, what happens when you pass None to your function? What happens when you provide an integer password instead of a string password that you were expecting? What happens when you provide the correct password but for a different user in the system?

By testing each of your components independently, you can prove to yourself (and others!) that everything is behaving as expected. So, if in the future you then change one of the components (say you now have a faster way of searching for users in your database given their first name), you merely have to test it with the code that you have already written to verify that it is doing what is expected. If it returns the same results as before (assuming the new method doesn't change the ordering, of course!) then you know that even though the underlying search algorithm changed, it's still behaving the same way as before. Since it's still behaving the same way as before, any other bit of code that required the searching of users by email will also work just like before. Now you can feel free to pat yourself on the back and sleep easy that night.

The key to good unit testing is to ensure that you truly test each unit independently. In other words, one test should not have an affect on another. For example, if you have a test for adding a new user to a database, that test should not affect the data being used for another test. Thus, it's imperative that you setup and initialize dependencies (like databases) in each of the tests so that you know exactly what the starting point of the test is. If this wasn't the case, you will run into problems when you try to run, for instance, a test for counting the number of users in your database.

Although, the examples that I am providing below are in the context of Flask application development, the principles of unit testing are framework and language independent.

Now that I hopefully convinced you that unit testing is not something that should be an afterthought, but something that should be part of your day-to-day programming habits, let's take a look at some unit testing for Flask application.

Unit Testing Examples

To get started, I would highly recommend the great introduction to the basics of unit testing Flask applications on the Flask website. The examples that I have linked below are for slightly more complicated or specific things that I had to test.

Unit Testing Flask File Uploads Without Any Files

Uploading files is one of those things that pretty much all websites support. Whether it is to merely upload a profile picture, or something more complex like uploading the result of a biological experiment, the startpoint is the same -- the file is on the client's computer and needs to end up on your server.

Since uploading files can be such a crucial part or a web application, it should be tested like any other part of the system. Unfortunately, performing this test without actually uploading a real file (i.e. simulating the entire thing) is something that isn't as straightforward as I initially expected (now that I know how to do it, it's really easy!).

In any case, the code below simulates uploading a file using StringIO and then simulates the FileStorage used by Flask (and Werkzeug) by returning a "mocked" TestingFileStorage of our choosing.
from StringIO import StringIO
import unittest

from flask import Request
from werkzeug import FileStorage
from werkzeug.datastructures import MultiDict

# Import your Flask app from your module
from myapp import app

class FlaskAppUploadFileTestCase(unittest.TestCase):

    def setUp(self):

        app.config['TESTING'] = True
        app.config['CSRF_ENABLED'] = False
        self.app = app

        # .. setup any other stuff ..

    def runTest(self):

        # Loop over some files and the status codes that we are expecting
        for filename, status_code in \
                (('foo.png', 201), ('foo.pdf', 201), ('foo.doc', 201),
                 ('foo.py', 400), ('foo', 400)):

            # The reason why we are defining it in here and not outside
            # this method is that we are setting the filename of the
            # TestingFileStorage to be the one in the for loop. This way
            # we can ensure that the filename that we are "uploading"
            # is the same as the one being used by the application
            class TestingRequest(Request):
                """A testing request to use that will return a
                TestingFileStorage to test the uploading."""
                @property
                def files(self):
                    d = MultiDict()
                    d['file'] = TestingFileStorage(filename=filename)
                    return d

            self.app.request_class = TestingRequest
            test_client = self.app.test_client()

            rv = test_client.post(
                '/files',
                data=dict(
                    file=(StringIO('Foo bar baz'), filename),
                ))
            self.assertEqual(rv.status_code, status_code)
Let's take a look at this code in a bit more detail. The first thing we do in our runTest() method is loop over 5 different file types. The assumption is that our application accepts the first 3, while rejecting the last 2. In the loop, we create a class TestingRequest that we will use as our request class for our application. What this does, is it overrides the files attribute to return a TestingFileStorage (defined below) instead of the FileStorage that is normally returned. As I mentioned in the comments, we are creating this class inside the for loop because we need to set the filename that is returned by the TestingFileStorage equal to the one that we are currently using in the loop.

Now that we have created our custom Request, we tell the Flask app to use ours instead and then create a TestClient. Note, you must set the request_class of the app before you create the TestClient. Using the patched TestClient, we can the POST a "file" as normal. Except instead of using a real file, we use a StringIO object so that we don't actually have to have any random files in our project for testing.

That's it really, using the above code (and the TestingFileStorage below) you should be able to test your file uploading routes without actually having to have any files on disk!

I left the implementation of the TestingFileStorage until the end because I copied and pasted it from the Flask-Uploads extension. So that you don't have to got digging around the source code there, I've copied it here for your reference. Enjoy.
class TestingFileStorage(FileStorage):
    """
    This is a helper for testing upload behavior in your application. You
    can manually create it, and its save method is overloaded to set `saved`
    to the name of the file it was saved to. All of these parameters are
    optional, so only bother setting the ones relevant to your application.

    This was copied from Flask-Uploads.

    :param stream: A stream. The default is an empty stream.
    :param filename: The filename uploaded from the client. The default is the
                     stream's name.
    :param name: The name of the form field it was loaded from. The default is
                 ``None``.
    :param content_type: The content type it was uploaded as. The default is
                         ``application/octet-stream``.
    :param content_length: How long it is. The default is -1.
    :param headers: Multipart headers as a `werkzeug.Headers`. The default is
                    ``None``.
    """
    def __init__(self, stream=None, filename=None, name=None,
                 content_type='application/octet-stream', content_length=-1,
                 headers=None):
        FileStorage.__init__(
            self, stream, filename, name=name,
            content_type=content_type, content_length=content_length,
            headers=None)
        self.saved = None

    def save(self, dst, buffer_size=16384):
        """
        This marks the file as saved by setting the `saved` attribute to the
        name of the file it was saved to.

        :param dst: The file to save to.
        :param buffer_size: Ignored.
        """
        if isinstance(dst, basestring):
            self.saved = dst
        else:
            self.saved = dst.name

REST App Response Status Code Testing Harness

When creating a web service that has different types of users with different privileges, it is useful to be able to quickly test whether the route that you just created returns the correct response code (e.g. 200, 401, 403). While you can obviously copy and paste a lot of code to make this work, I ended up creating a base class that does the job of initializing the database, getting the various kinds of users, and then running all of the tests at once. In essence, I wanted to be able to write a TestCase that looks something like this:
class SomeRouteTestCase(FlaskAppRouteStatusCodeTestCase):
    """Test for /foo"""

    __GET_STATUS_CODES__ = dict(
        user=200,
        admin=200,
        super_user=200
    )

    __PUT_STATUS_CODES__ = dict(
        user=403,
        admin=200,
        super_user=200
    )

    def get(self, user, test_data, db_data):
        self.login(user.email, user.password)
        rv = self.app.get('/foo')
        self.logout()
        return rv

    def post(self, user, test_data, db_data):
        self.login(user.email, user.password)
        rv = self.app.post(
            '/foo',
            data=json.dumps(dict(bar="barbar", bam="bambam")),
            content_type='application/json')
        self.logout()

    @unittest.skip("No PATCH")
    def test_patch(self):
        """Override the PATCH tester since /foo can't be patched."""
        pass
What kind of magic is this? Well, not any kind, really. A TestCase for a particular route simply defines the correct status codes for the various HTTP methods and the code to make the calls. In the example above, anyone should be able to perform GET /foo (all of them have a response code of 200), while only admins and super_users are allowed to POST. Since this route doesn't accept the PATCH method, we are telling unittest to skip it with the @unittest.skip decorator. Using this framework, one can test any route with all types of users with minimal effort.

In order to make this all work, we have to define the FlaskAppRouteStatusCodeTestCase that this TestCase inherits from. Fortunately for you, the basic structure of it is in the gist below and pretty straight forward. If you simply fill out the methods that initialize the database and get the set of users to perform the testing on, you can create tests for your routs with reckless abandon.

Although I have only added the methods in FlaskAppRouteStatusCodeTestCase for GET, POST, and PATCH, it should be trivial to add in any other methods for things like DELETE, PUT, etc. For your viewing pleasure, I've included the full example here:

Friday, April 5, 2013

Using SQLAlchemy with Celery Tasks

If you are reading this, then you probably know what both SQLAlchemy and Celery are, but for the uninitiated, here is a (very) brief introduction. SQLAlchemy is a full-featured python Object Relational Mapper (ORM) that lets one perform operations on a database using python classes and methods instead of writing SQL. Celery is (in their own words), "is an asynchronous task queue/job queue based on distributed message passing." If you haven't heard or, or haven't used, either of these, I highly suggest you give them a try.

Ok, and now on to the meat of this post. If you are using SQLAlchemy (or any other database connection method, for that matter) and want to execute Celery tasks that retrieve items from the database, you have to deal with properly handling database sessions. Fortunately for us, the designers of both of these projects know what they are doing and it isn't a complicated matter.

First, let's start with creating a simple task without any database requirements (from the Celery tutorial):
# file: proj/tasks.py
from __future__ import absolute_import
import celery

@celery.task
def add(x, y):
    return x + y
The one thing that is missing from this is setting up Celery so that it is connected to the broker and backend. Many of the tutorials that you find on the web show placing the configuration of the celery object (and the database object) in the same file as the tasks being created. While that works fine for examples, it doesn't portray how it should be done in "real life." I find the best way to set up these resources is to create separate modules for each of the resources that you are going to use. As such, we would have a module for setting up the celery object and another one for setting up the database connections. (The Celery tutorial has exactly such an example here.) Why in separate modules instead of just one big resources.py module? Well, if we put all of them in one file, then all of those resources are setup when we may only require one of them. For example, let's say we write a script that only ever needs to connect to the database and never any celery tasks, why should that script create a celery object? So, on that note, let's create another file that sets up the celery object:
# file: proj/celery.py
from __future__ import absolute_import
from celery import Celery

celery = Celery(
    'tasks',
    broker='amqp://',
    backend='amqp',
    include=['tasks.py']
)

if __name__ == '__main__':
    celery.start()
As you can see from that configuration, we have tell Celery that all of our tasks live in a module called tasks.py. Now that we have Celery setup, let's create a very similar module for our SQLAlchemy (or any other database) connections.
# file: proj/db.py
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker

engine = create_engine(
    'sqlite:///:memory:', convert_unicode=True,
    pool_recycle=3600, pool_size=10)
db_session = scoped_session(sessionmaker(
    autocommit=False, autoflush=False, bind=engine))
Great, now we can connect to our database using SQLAlchemy, but how does this all work with Celery tasks? Let's update our tasks module with a new task that gets something from the database. For the sake of the example, let's assume that we have some User model that contains some user information.
# file: proj/tasks.py
from __future__ import absolute_import
import celery

from proj.db import db_session

@celery.task
def get_from_db(user_id):
    user = db_session.query(User).filter(User.id=user_id).one()
    # do something with the user
But wait! If we do this, what happens to the database connection? Don't we have to close it? Yes, we do! Since we are using a scoped_session, we'll want to make sure that we release our connection from the current thread and return it to the session pool managed by SQLAlchemy. We could, just place a db_session.remove() at the end of our task, but that seems a bit fragile. Fortunately for us, there is a way for us to subclass the default Celery Task object and make sure that all connections are returned to the pool auto-magically. Let's update our tasks module again.
# file: proj/tasks.py
from __future__ import absolute_import
import celery

from proj.db import db_session

class SqlAlchemyTask(celery.Task):
    """An abstract Celery Task that ensures that the connection the the
    database is closed on task completion"""
    abstract = True

    def after_return(self, status, retval, task_id, args, kwargs, einfo):
        db_session.remove()


@celery.task(base=SqlAlchemyTask)
def get_from_db(user_id):
    user = db_session.query(User).filter(User.id=user_id).one()
    # do something with the user
As you can see, we created a SqlAlchemyTask that implements the after_return handler that does the job of removing the session. The only change we had to make to our task was to make sure that we set the appropriate base Task type. Neat, huh? By the way, you can check out the other Task handlers here.

UPDATE: There was a question in a comment regarding dealing with data not being in the database when the celery task runs. There are two main ways you can handle this: delay the task or re-try the task (or both).

If you want to delay the task, you simply need to call your task using the apply_async() method with the countdown parameter set instead of the simpler delay() method. However, this is not ideal since you will never know with 100% certainty how long to wait, and since waiting, by definition, makes things run slower (from the user's point of view).

The better way to deal with this is to retry the task on failure. This way, if all goes well, it runs the first time, but if not, you fall back to trying again. Fortunately, this is easy to accomplish by updating the task decorator slightly and making the task retry on error.
@celery.task(base=SqlAlchemyTask, max_retries=10, default_retry_delay=60)
def get_from_db(user_id):
    try:
        user = db_session.query(User).filter(User.id=user_id).one()
    except NoResultFound as exc:
        raise get_from_db.retry(exc=exc)
    # do something with the user
In the example above, this task will retry 10 times and wait 60 seconds between subsequent retries. For this scenario this is most certainly overkill (I sure hope it doesn't take your main app 600 seconds to create a user object!), but in certain situations this might be appropriate (e.g. waiting for some analytics computation to complete). That's it. Now you have a Celery task that gets data from a database and releases the connection after completion. Happy coding!

Thursday, March 7, 2013

Setting HTTP Request Values For Flask Unit Testing

I recently had to do some unit testing for a Flask application code that looked at the user agent and IP address of the client. The first thing that you will realize when you try to get the User Agent from the request headers when you use the test_client() provided by Flask, is that the underlying Werkzeug library raises a KeyError saying that it can't find a value for the user agent.
For example, let's say that you have some code in your app that looks like this:
@app.route('/login', methods=['POST', 'OPTIONS'])
def login():
    print request.remote_addr
    print request.headers['User-Agent']
And then in your testing code you do something like:
client = app.test_client()
client.post('/login',
            data=json.dumps({
                'username': 'sheldon@cooper.com',
                'password': 'howimetyourmother'
            }), content_type='application/json')
Most likely, you will then get an error that looks something like:
File "/Users/leonard/app/rest/app.py", line 515, in login
    if request.headers['User-Agent']:
File "/Users/leonard/.virtualenvs/app/lib/python2.7/site-packages/werkzeug/datastructures.py", line 1229, in __getitem__
    return self.environ['HTTP_' + key]
    KeyError: 'HTTP_USER_AGENT'
As it turns out, there are two solutions to this problems that I found on StackOverflow (1, 2). This first option is to set the request environment variables on each call using the environ_base parameter provided by Werkzeug (more details on that here). For example, we can do something like:
client.post('/login',
            data=json.dumps({
                'username': 'sheldon@cooper.com',
                'password': 'howimetyourmother'
            }), content_type='application/json',
            environ_base={
                'HTTP_USER_AGENT': 'Chrome',
                'REMOTE_ADDR': '127.0.0.1'
            })
Although this method works perfectly well, it requires us to set the values on each call. If we have a really long unit testing script that makes many calls (as I do), then this becomes rather burdensome. The second solution is to create a proxy for the Flask app that overrides the __call__ method. So, the proxy looks something like:
class FlaskTestClientProxy(object):
    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        environ['REMOTE_ADDR'] = environ.get('REMOTE_ADDR', '127.0.0.1')
        environ['HTTP_USER_AGENT'] = environ.get('HTTP_USER_AGENT', 'Chrome')
        return self.app(environ, start_response)
Then when we create our test client, we simply make sure to wrap the WSGI app with our proxy, and voilĂ , it works.
app.wsgi_app = FlaskTestClientProxy(app.wsgi_app)
client = app.test_client()
client.post('/login',
            data=json.dumps({
                'username': 'sheldon@cooper.com',
                'password': 'howimetyourmother'
            }), content_type='application/json')

Friday, February 22, 2013

Implementing Multiple Accounts With Single Sign On

One of the things that I came across recently was to design and implement a solution that allows users of an application to have multiple accounts such that the accounts are treated as completely independently in some situations, but grouped together in others. Since that probably doesn't illustrate the problem well enough, here are a couple examples that may help clarify the situation.

Example 1: Let's say we are creating a web based email system along the lines of GMail, YMail, or anything of that nature. Furthermore, let's assume that I am an employee of two organizations (say, as a contractor), and each organization is paying for single use license for me to use this email service. Also, as part of this corporate email service, we want users to be able to customize their accounts (job title, contact phone number, etc.). In essence, from the standpoint of the organizations that are paying for the system, they want me, the contractor, to have an account dedicated to their organization.

Example 2: As is the case with many applications, there are normal users and admins. Continuing with the email service example above, let's say that I also work for the company that is making this software (yes, in this example I'm a very busy person) and have administrative privileges on the site. Do I always want those admin privileges? Should I always have them? Obviously, the answer is no. (Yes, there are other ways to handle this sort of varied levels of privileges, but let's ignore that for now.)

Common Solution: The easy solution (and the one commonly implemented) is for users to have completely different usernames for different accounts. This makes it very easy to distinguish between the multiple roles/jobs that I have. For most applications, this is perfectly adequate as most users only ever have one account.



The trouble, however, with this solution is if many of the users of the system will have multiple accounts. For example, let's say we want to deploy this in a hospital setting to be used by all of the doctors. It is not uncommon for doctors have multiple appointments (they are employees of multiple hospitals) and may have different roles, titles, email addresses, etc. Again, we could make them have multiple accounts, but what about if we want to create a unified inbox (e.g. as is done by many desktop email clients) across all of their accounts from the various hospitals?

Improved Solution: In retrospect, this isn't anything clever, and is something that is relatively easy to implement from the beginning that allows to handle this scenario with ease. Of course, this doesn't mean that a single user cannot create two completely different accounts, but it allows for the potential of joined accounts.



The idea is to not just have a user of the system, but also include profiles for each user. In other words, the user identifies the unique individual, while the profile identifies the unique account that individual has. All details that are common to all profiles (name, gender, certain preference settings, etc.) are stored at the user level, while all the particulars of a particular account are stored at the profile level.

Now, instead of storing user IDs for all of the records (e.g. the user ID associated with an email), we store the profile ID. This way we get the benefit of being able to create, for example, that unified inbox by getting all emails for all profiles for the particular user, but still have the ability to have completely separate "accounts" along the lines of the profiles.

(P.S. If you are wondering, those UML diagrams were created with by yUML.me.)

Saturday, January 26, 2013

Multi-Threaded Unit Test for (Flask) REST APIs

I've been working on creating a RESTful API using the Flask microframework recently, and had to come up with a way to test it using multiple concurrent threads (i.e. the way it will actually be used when it goes live). Although this example does not preclude the use of the test client that should be used for unit testing the individual parts of your site, this method can be used in addition to your other tests to make sure that it can handle multiple simultaneous requests. Also, as with most unit tests, this is a simulation and does not exactly mirror all of the details of an actual deployment. In particular, this test makes use of the built in server, which is not intended to be a production server.

Apart from the imports that you will need to make your application work, we will be making use of the fabulous Requests package to handle all of the interaction with the web service from within our unit test. While one can do all of this using urllib and urllib2, Requests makes our lives quite a lot easier. For example, here I created some helper methods that allow me to log in and log out of the web service. (For those of you who think that this isn't RESTful since we have to keep some state on the server to allow for logging in/out, I find that this small deviation from a true RESTful interface is justified as it makes everything else much simpler and cleaner.)
def login(self, username, password):
    """Helper method to let us log in to our web service."""

    # Create a dictionary of login data
    login_data = json.dumps(dict(username=username, password=password))
 
    # Log in to our service
    return requests.post(SERVER_URL + "/login", data=login_data,
                                headers={'content-type': 'application/json'})
 
def logout(self, cookies):
    """Helper method to let us log out from our web service."""
    return requests.post(SERVER_URL + "/logout", cookies=cookies)
The real "magic" happens in the following code. We're simply setting up a thread that starts the Flask application. Note, you have to set threaded to True, to ensure that the built in web server code doesn't just run in a single thread.
def start_and_init_server(app):
    """A helper function to start out server in a thread.

    This could be done as a lamnda function, but this way we can
    perform other setup functions if necessary.

    Args:
        app: The Flask app to run
    """
    app.run(threaded=True)
 
# Create a thread that will contain our running server
server_thread = Thread(target=start_and_init_server, args=(self.app, ))
server_thread.start()
Now that the server has been started in its own thread, you can simply bombard it with as many requests as you'd like. For example, if we are to assume that this REST interface is for a blogging system that allows for me to POST a new blog post to /posts, then we can do something like the following:
for i in range(n_new_posts):
    t = Thread(target=post_data)
    t.start()
Of course, the above example assumes that you have some function called post_data that does the job of POSTing some data to the web service.

Again, this isn't a foolproof method to test your code as there may be some issues that come up with you deploy it on your development server, but this should be a good start. And hey, you can always write yourself a test that fires up that actual server that you will be using instead of the one that is included with Flask (or your web framework of choice).

Now that you've seen the bits and pieces, you can find the full example as a gist on github. As aforementioned, we're assuming that we are creating a simple blogging API and that there are a couple REST endpoints such as /login, /logout, /posts, etc. You'll obviously want to change those endpoints to meet your needs.

In case you are too lazy to head over to github, I've also included the code here for your convenience.

Thursday, January 3, 2013

Memorable ID Sentences

I stumbled upon the really neat idea of converting integers into memorable sentences on the Asana blog. The crux of the problem is that when giving a user a random number, it is likely that they will get it wrong, or it will be annoying to repeat each of the numbers to a customer service representative. So, instead of providing them with a number, how about providing them with a sentence instead? For example, the following numbers and sentences are "equivalent" to each other:

2474137653 == 20 exultant leopards strode intensely
1934320125 == 16 excited platypuses babbled wearily
4225612387 == 33 flippant snakes traveled sadly

I won't go into the details of how it works, as it is covered in detail in the original blog post and the code below, but I thought I'd share my implementation of the idea.

The original work uses 32 bit integers, and I implemented it pretty much as it was described. However, in order to allow for the potential extension of this approach to other types of numbers (shorts, longs, etc.) I created a base class that does all of the "heavy lifting" followed by a subclass that merely provides it with the necessary bits of information to make it work. So, if you felt like it, you can create longer sentences for longs, or use fewer words for shorts, etc.

And before you wonder where you can get the code, it's available as a gist and for your viewing pleasure is included here.