Watch Those Mocks

I like writing tests. That’s obviously not surprising these days. However when it comes to writing tests, especially in larger applications when you might be interfacing with an external service, it often feels like its just a matter of time before you want to start using mock objects. Mocks are great, but I’ve found that they can lead to some problematic situations if you aren’t careful. One such situation I’ve seen is caused by how you use stub objects in your tests. In Ruby’s mocking framework Mocha, you can do something like this:

my_mock = stub(:foo => "stuff", :bar => "other stuff")

If you aren’t familiar with the above syntax, it simply creates an object which will respond to the methods ‘foo’ and ‘bar’ by returning the strings ’stuff’ and ‘other stuff’ respectively. This is a handy thing to have available. It’s concise, and it is easy to throw around in your tests when you need to quickly stub out an interface. For instance, lets say you have the following class:

class ComplexObject
//....methods and such....
  def method_that_does_stuff
     //doing some stuff
  end

  def another_method
    //more stuff
  end
end

Perhaps ComplexObject is an object that gets returned from some method that interacts with an external service and this object is populated with data from said service. In your test you might not want to actually go to external service, so instead you decide to mock out that interaction. Great. So, to do this you go ahead and put the following in your test:


my_fake_response = stub(:method_that_does_stuff => "Mocked response", :another_method => "foo")
ServiceWrapper.expects(:get_complex_data).returns(my_fake_response)

In other words, you create a stub object which mocks out the interface on the ComplexObject class to simulate its behavior. I’ve seen this numerous times and have been a victim of it myself. However, this is problematic.

If you use a stub object like this, you are opening yourself up to a situation in which your tests would continue to pass even if there were problems in the code. As an example, what if you change the name of one of the methods on the ComplexObject class? Well, with the stub object approach above, any test using that fake stub would continue to pass even though the interface your stubbing does not actually represent the interface of the ComplexObject class anymore. This means that if you were going through a refactoring, it would be possible to miss code that still used the old method names. This is a small example, but even it feels like enough for me to go about this situation differently. Instead, when you mock out your interface you should use an actual instance of the ComplexObject class:


my_fake_response = ComplexObject.new(fake_data)
ServiceWrapper.expects(:get_complex_data).returns(my_fake_response)

This means that, while you’re mocking out the interface to the service, you are still actually using the response object type that the code is really expecting. This way when you change the ComplexObject class, all your tests will reflect those changes even if you’re mocking out certain interactions.

I think that one of the reasons I had leaned toward using stub objects at one point or another was due to the fact might be kind of a pain in the ass to create the actual object type that would be returned (ComplexObject in this case). To avoid that pain in the tests, creating a simple stub just feels nicer sometimes. While it may be easier, I’ve started to feel like doing that will just allow you to hide some of the complexity in your data models. Instead, if it’s hard to create the objects you want in your tests, you should probably face that problem head on and make it easier to write the tests you should be writing to avoid troublesome situations like the one above.

O(1) –> O(N) = You’re Screwed

As a software developer, you never want to create bugs in your software. That said, when bugs creep up, they can be a lot of fun to figure out. There has always been a part of me that enjoys figuring out tricky bugs and recently I came across one of the strangest I’ve ever had to deal with.

It all started when one of the applications as part of a larger system started experiencing serious performance problems. I’m talking application coming to a complete halt kind of performance issue. Requests were hung and users were getting errors as every call to a key back-end service was timing out. Looking into the logs showed that one user seemed to be the root cause of the issue, but we couldn’t fathom why their data set would cause this kind of an issue. They had good amount of data, but nothing outrageous and we had done tons of testing with much larger sets of data without any sort of performance problem. We did not have direct access to the production data so at first, we were stumped. Was there some issue with the structure of her data set? Was the data corrupt in some odd way? Was there an obscure bug in the back-end service somewhere?

We knew the general area where the time was being spent. It was when the service was putting all of the users data into a hash table it maintains in memory. We looked at the code, and it seemed pretty straight forward. There was really only one spot that had any chance of slowing down: the collision resolution in the table when more than one item hashes to the same location. Sure, collisions can happen, but not to the degree we are talking about here. For this sort of problem we would need to have almost everything in the users data set hashing to the same location. Crazy, right? I mean, that should never happen!

Well, it happened…..

We were finally able to recreate the users data set locally (size wise, not data wise) and saw what we couldn’t believe: Every single entry was hashing to the same spot in the table. All of them. Remember, this is a hash table. This sort of problem means we might as well be using a linked list to store the data since collisions were handled by simply maintaining a list of all entries at a particular location. Our insertion/retrieval times were jumping from being O(1) to O(n) and that order of magnitude increase for this user was killing the application. It’s the picture perfect worst case scenario for the data structure.

Obviously, this is the fault of the hashing algorithm being used. However, the weird thing was that the algorithm used was based off what seemed to be a fairly well known algorithm: Dan Bernstein’s hashing algorithm. There’s a few variations of the algorithm, and from what bit of searching around I’ve done, it seems like it is known that this algorithm can possibly slip into a “degenerate” situation. So, this probably wasn’t the algorithm to use, but what the hell happened?

Well, remember when I said we had done tests on data sets bigger than this users set? Why didn’t that show the problem? It’s because we needed a data set of exactly the size this user happened to create. Allow me to give the nitty gritty details:

- The user had a data set which had 123,577 entries of a particular type.
- The service in question allocated a hash table of size 131,769 (users data size plus a bit extra)
- It then began hashing all the users data using the database id of the fields as the key to the hash table
- given back a value from the hash, it then took that value modulo 131,769
- The resulting value, which is supposed to be that entries location in the hash table was 0. For every entry.

To put it simply: Every hash value coming out of the algorithm had 131,769 as a factor. So every value hashed to the location ‘0′. I don’t want to make bold statements about the root cause or nature of this behavior yet (another post for that), but our initial experiments showed that this was always the case for any integer value going into the algorithm which fit within 32-bits but was allocated in a 64-bit data type. We figure it has to do with the bit shifting used in the algorithm and the fact that it never shifted certain bits off the end (the extra 32 bits gave enough room). But again, more on that to come I hope!

As an aside, since it exhibited this behavior for 131,769, it’s worth noting that the same behavior holds true for any factor of 131,769 as well (not that this should be surprising). This was just a good thing to remember as it highlighted other values which could cause performance issues which were much higher than expected. I hope to get into more details of this situation and algorithm in the near future, but for now I just wanted to tell the story and point out the potential for chaos when using Bernsteins 33 times with addition hash!

Sometimes you just gotta prove it to yourself…

Not long ago I was in Las Vegas spending some time in casinos. Mostly I like playing poker, but deep down there’s a part of me that just likes to gamble a little bit. Now, most people are familiar with casino games and you should know that casinos are not dumb. Any table game played straight up is designed so that the house will win over the long run. That’s no secret. However, even knowing that, it is easy to convince yourself that you might be able to get a slight edge on the house. My friends and I briefly thought we had such a strategy when we were tossing around ideas to bring to the roulette table over a few drinks. Here was our reasoning:

Roulette consists of 38 distinct spots on the wheel. These are the numbers 1 through 36, 0, and 00. We decided we could make bets of $30 each spin covering a total of 20 spots on the board. We would cover 10 spots with $1, and another 10 spots with $2. If one of our $1 spots hit, we would get paid back $35 and we would get paid back $70 if one of our $2 spots hit. Notice that each case yields some sort of a profit ($5 in the first and $40 in the second). Given that we are covering 20/38 spots that gives us about a 52.6% chance to win money on any given spin. So, for each spin, we have a slightly larger than 50% chance to win money. That means that over the long run we’re bound to win a profit, right?

Wrong.

I later explained the idea to a friend who knew it shouldn’t work out that way, but at first, neither of us could put our finger on exactly why it wouldn’t work. Both of us being programmers, we decided to solve it the obvious way: write a simulator and see how it plays out in the long run. We put 25 minutes on the clock and we each hacked together our own simple simulator allowing the ability to try different strategies over a large number of spins of the wheel to see how it worked out. Once the 25 minutes were up and our simulators were finished, we saw what anyone would expect: no matter how we organized the bets on the numbers, we were always losing. Almost immediately, with the verification of the programs, we saw what we had been missing and my friend outlined the solution. The key is to look at the problem a bit differently and to really take the losses into account.

If we use the same strategy outlined above, we can look at the problem this way:
There are 10 spots which will yield a profit of $5: 10×5 = 50
There are 10 spots which will yield a profit of $40: 10×40 = 400
There are 18 spots which will yield a loss of $30: 18×30 = 540
Now take 450 - 540 = -90

See how the number is negative? That’s not good for us. That shows that even though we have a 52.6% chance to win a profit on any given spin, those wins will not account for the size of the losses we will endure the 47.4% of the time that we are bound to lose in the long run. In other words, the house wins again. While I had no real intentions to prove otherwise, I found the whole exercise interesting and it was cool to let the code do the talking when we couldn’t quite settle the issue.

The simulator code used can be seen here.

What I’ve Learned: Data Hashing

I’m no security expert. I know the stuff that everyone knows. Things like “don’t store your users passwords in plain text”. With that said, I recently found myself writing some simple authentication code for a side project. I wanted to do things right, so I made sure that I encrypted my users passwords when storing them. I took the SHA2 hash of the password coming in and even used a password salt! I felt like I had things under control. However, soon after completing the task I came across some resources that made me think otherwise.

The interesting thing about most hashing algorithms is that they are designed to be fast. That seems simple enough, and at first glance I found myself wondering “how else would I want them to behave?”. But in actuality, this is the opposite of how you would want an algorithm used to store sensitive information to behave. Why? Because the faster you can hash the values, the faster someone can crack the information you are trying to keep safe. This is the realization I made when I discovered Bcrypt.

Bcrypt is a hashing algorithm that is designed to be slow. And not like it’s just kind of slow, I mean you can configure how slow you want it to be. The algorithm has a built-in configurable cost factor it uses to determine how slowly the values will be hashed. How does it work? Let’s take a look at the Ruby implementation using the bcrypt-ruby gem. For instance, if I wanted to create a password with a cost other than the default of ‘10′, I could simply do:

BCrypt::Password.create("mypassword", {:cost => 20 })

It kind of blows my mind how simple the idea is, and how nice the result of it is. I can literally tweak a number in my code and change how secure my users information can be. It makes me wonder why I hadn’t heard of and used Bcrypt already since it seems a little silly to me to use anything else currently. Regardless, moving forward, I think I’ll be using Bcrypt…

What I’ve Learned: TCP Sockets in Ruby

While I was in college studying Computer Science, communicating via sockets always felt like one of those odd low level things that most programmers wouldn’t have to work with these days. I didn’t have any actual grounds for thinking that way, but I did. It’s fitting then, that my first project out of school is one that involves working with this type of communication frequently.  There are times when you might need to use a socket to communicate to another part of your system. For instance, you might have a back-end service you want to talk to in order to avoid doing some work yourself. Depending on your architecture, a TCP socket might be the way to do this. I’m going to share some of the things that I’ve learned from working in a Rails app which needs to communicate with a variety of services over TCP Sockets. If you have a lot of experience in this area, you probably won’t find much new here. However, if you have found yourself writing a bit of code involving sockets and you are new to it, my hope is to give you a couple pointers or things to look for in your code to hopefully avoid having tricky bugs creep into your software. So, lets get started with an obvious one:

1. Timeouts

You should always wrap your socket operations with a timeout. If you need to interface with another service, you should program defensively and allow your application to maintain functionality as best as possible if that service becomes unavailable or begins to perform poorly. This is a general rule and doesn’t just apply to sockets, but sockets are an easy spot where your application could start to hang indefinitely if you start reading on a socket which might never have a response written to it. This type of problem could create serious issues in a production environment, but its also easy to fix. Ruby has a built in timeout library and there are also gems like SystemTimer to allow you to wrap bits of code in a timeout. NOTE: If you are using Ruby 1.8, you should NOT use the built in Ruby library. In this case, just use something like SystemTimer.

2. Connection Types

It’s worth taking some time to figure out what type of connection you need to maintain with whatever you are communicating with. What I mean by this and what is important to keep in mind is whether or not your connection will be persistent or if a new connection will be opened up for each request that needs to be made. For the sake of simplicity, I would highly recommend a non-persistent connection. Maintaining a persistent connection will open a new level of complexity and as a result, a new level of bugs will have the chance to appear in your code. You may not really have a choice in this manner though.  If you find yourself in a situation where you need to maintain a persistent connection to a service, then you have another issue to keep in mind…

3. Connection Handling

As soon as you find yourself maintaining a connection across multiple requests, you need to start paying close attention to how that connection is handled in different situations. The most important thing to keep in mind is what happens when an exception is raised while you are using your socket. Since I’ve already discussed timeouts, lets consider that example. What should happen when a timeout error is raised and you have a persistent connection to the service which timed out? You should close the connection! Why? Well, lets take a closer look at what could happen if you don’t:

User A makes a request to our math service and asks a particularly complex question. The question asked is “What is 5 + 5?”. Our service begins working hard on the answer, but it takes too long. Being smart programmers, we have a timeout on our interactions with the service so after a few seconds we raise an error and show it to User A. Now user B comes along and asks a question to the same math service, using the same instance of our application that user A used. User B asks “what is 10 * 5?”. This time there isn’t a timeout. User B gets an answer almost immediately, and the answer given back is “10″.

What happened? User B got the response that was meant for User A! Since we didn’t close the connection to our service after we timed out, it went on happily finishing the question from user A and put the response back on the pipe for us to read. But it wasn’t user A who got it, it was user B; the next user to read from that socket. So what happened to user B’s response? It’s probably ready to be read by the next user who comes along to that instance of the application which now has a very dangerous problem: the responses from the math service are in a “off by one” sort of state. Imagine if the responses weren’t trivial arithmetic answers but were instead your personal data or some other very sensitive information. It would be very bad to release sensitive information about one user to another different user, but even short of that we have an instance of our application which is in a constant error state which likely won’t be able to correct itself anytime soon.

This is one example that illustrates a simple rule I’ve learned: close your socket connection in the event of any error caused by the use of your socket connection or the supporting code and let the next request to come along open up a new connection. Notice the emphasis on the word “any”. Don’t be shy about rescuing of exceptions in the code around your socket connections. In other areas, I’ve found that it is useful to avoid rescuing too broad of a range of exceptions. This is because we don’t want to get into situations where we are masking or hiding exceptions by doing something like “rescue Exception” around all our code. However, when it comes to handling errors around a persistent socket connection, I would say almost the opposite.  You should make sure you are rescuing any exception that could be raised to ensure that your connection can be cleaned up. Obviously you should either re-raise the error you rescue or raise another exception in it’s place after logging any information from the original error, but making sure you have the connection handling in place is very important and it only takes one missing exception type in your rescue block to get you into a dangerous state like the one described above.

I’m intentionally keeping the path of error handling as simple as possible. This is partially in order to keep this post from growing too much but more so because I’ve found that doing this will likely reduce the number of bugs you will need to deal with. Perhaps I’ll add another post exploring this more and examining a larger list of error situations. However, I will mention that things like trying to reconnect to a service, retrying queries, or trying to flush the connection instead of closing it in the event of an error are all examples of things that add complexity and can get you into trouble very quickly when they aren’t done right.

For now, I’d like to wrap this up with a quick mention of…

4. Testing

You’re going to want to be able to test through a variety of troublesome situations with your service to see how your code holds up. This can be tricky to do depending on the behavior you are trying to bring out in your service. Luckily Dan Wellman, a colleague of mine, has written a Ruby gem to help you find and reproduce exactly the type of situations I’ve been discussing here. The gem is called Bane and it’s source and basic documentation can be found here. I’ve used Bane on numerous occasions to help reproduce error conditions locally and to try and to do some exploratory testing and would recommend keeping it in your back pocket for when you come across a tricky situation in the future.

Cards pushed to Github!

A little while ago I came up with an idea for a side project. I decided I wanted to write my own Ruby library to use for poker hand analysis. I wanted to write something that would allow you to compare poker hands against each other and which would allow you to find the best possible hand given some arbitrary set of cards. To give an example, below are a couple of tests which might give a very basic idea of the type of behavior I’m describing:

def test_straight_flush_against_other_straight_flush
  losing_straight_flush = straight_flush_from("10 Clubs", "6 Clubs",
                                 "7 Clubs", "8 Clubs", "9 Clubs")
  winning_straight_flush = straight_flush_from("8 Clubs", "9 Clubs",
                                    "10 Clubs", "J Clubs", "Q Clubs")

  assert winning_straight_flush > losing_straight_flush
end

def test_should_get_best_straight_from_more_than_5_cards
  hand = best_possible_hand_from("7 Diamonds", "8 Spades", "10 Clubs",
                                  "J Spades", "9 Spades", "Q Hearts","K Spades")

  assert_equal Game::Straight, hand.class
  assert_equal ["K", "Q", "J", 10, 9], hand.ranks
end

This behavior (hand comparison and determining the best hand) are just pieces to the end goal for the things I’m interested in creating, but I’ve found the work I’ve done so far to be a lot of fun and interesting. As a result, this evening I pushed the current state of my work out to Github with the hope of some other people having the chance/interest to look at it in the time ahead and provide feedback and improvements.

There is no actual documentation yet but hopefully that will change as the use of the library is more clearly defined. For now, check out the code here and keep an eye on the project as I plan to continue regular work on it in the weeks ahead.

RawkX - version 0.2.0!

So, in my last post I discussed my new gem, RawkX. If you remember, to use it, I told you to open up the RawkX class and over-write a method called ‘parse_line’. This was because that was the method which was called on each line of your log and you could then tell it how to parse your special format.

But lets be honest, that’s little awkward.

Why? because I’m forcing you to see a bit more about how RawkX works then you really need to know. you shouldn’t care what the method for parsing a line is called. You shouldn’t need to know what class it’s in. What if I change the method name? That would cause your parsing to break if you updated versions.  It wouldn’t be your fault, it would just be the result of me exposing too much to the user.

The good news is version 0.2.0 is an attempt to fix, or at least improve, that issue.

Now to specify how to parse your logs you can simply pass a block to RawkX which specifies the logic to handle your log. Lets just take a look at an example. Below is the same parsing logic from my previous example, made to work as version 0.2.0 intends:

require "rubygems"
require "rawkx"

RawkX.new do |line|
  fields = line.split
  [fields[1], fields[0]]
end

To me, this feels cleaner and safer. This block will be used in the same manner as the parse_line method was. it will get called for each line of the log, and it expects a key/value pair to be returned (or nil if the line is to be ignored). As with the previous version, if you do not pass a block or over-write the method, it will fall back to standard Rails log parsing.

Version 0.2.0 has been released to rubygems.org and, again, the source code for RawkX can be found here

RawkX - An Extendable Rawk!

If you’re familiar with Rails then there is a good chance you’ve heard of or used Rawk. Rawk is a log analyzer that will generate statistics about your application’s different controller actions. I won’t go in to details here, but it can be a useful bit of feed back to see a listing of some of your worst requests and the amount of time its taking them to complete as compared to others.

Recently when I was at work we decided we wanted a Rawk-like report for another service our application was using, which was written in house, so we could observe its performance.  I was mildly familiar with Rawk, so I brought up the Rawk source code and started looking through to see what pieces we could reuse. Without much thought I began pulling pieces over, starting with the most obvious pieces and working my way down. It didn’t take very long for me to get what I wanted, but when all was said and done it started to sink in that I had ended up using almost all of Rawk’s original behavior, and simply changed how it parsed log lines. As any developer should, I realized I had created a significant amount of unnecessary duplication and felt compelled to find a way to prevent me from having to do it again.

Hence, the idea for RawkX was born and a couple nights later a first version has been published.

RawkX is simple. It is a slight refactoring of Rawk to allow it to be easily extended with different parsing logic. In short, the goal is to give you an easy way to generate Rawk reports for any log format without needing to think about anything besides how to parse your log.

For me, RawkX is also somewhat of an exercise and was a reason to play around with building and creating a gem for something I have actually had some use for. So maybe some of you will find it useful too!

Enough of that though, how do I get it and how does it work!

There’s no surprises or secrets getting the gem installed: gem install rawkx

Next, Lets say you have a log of actions that looks something like this:

3.24 Read user1
5.0 Write user1
2.245 Read user2
3.4 Read user3

Not complicated, but perhaps you had a large number of these actions and you wanted to generate a report to see the performance of them. With RawkX, that’s pretty simple. Just create a ruby file which does the following:

require "rubygems"
require "rawkx"

class RawkX
  def parse_line(line)
    fields = line.split
    return fields[1], fields[0]
  end
end
RawkX.new

Lets look at this for a second. We are reopening the RawkX class, and redefining the parse_line method. This method recieves a line from the log and is expected to return a key/value pair. The key is the action or item you want to measure, and the value is the time it took to complete. Nothing impressive, but a few things to note:

  • The values returned must be in the order used above: key first, then the value/time
  • Only two values are expected to be returned currently.
  • If you want a particular line, or some type of line, to be ignored simply return nil and RawkX will move on to the next line without doing anything further.
  • The parse_line method is called for each line of the log. If you want to save values between lines, you can do so using something like instance variables to remember things from previous lines (see how the default implementation of parse_line is done).
  • If you use RawkX without over-writing this method, it will use the default Rails log parsing.

You can now run this file and pass in your log information (either through stdin or using the -f parameter to specify a file):

ruby [filename] -f mylog.log

ruby [filename] < mylog.log  mylog2.log   …..

Doing this should show a Rawk report for your log. I’m not going to go into details about the report. it should be fairly obvious and familiar if you’ve ever used Rawk.

Other options available when running RawkX reports are:

-w <count>

This is how many results you want displayed in the final “Worst Requests” list

-s <count>

This is how many results you want displayed in the results for the other report areas.

And, for the time being, that’s pretty much it. version 0.1.0 has been pushed out and available to download at rubygems.org. I have some stuff I want to try with this tool, but this seemed like the minimal amount required to be useful. As such, I wanted to get it released so I would feel like I accomplished something!

Also, the source code is posted on github here.

As always any feedback is welcome and if you ever end up using it, tell me and let me know what you think!

Learning Ruby

So, It’s been a while since my last post. I recently graduated from college and accepted a job working for Cyrus Innovation. The first project I will be joining is a Ruby on Rails project, so I’ve spent some time lately learning a bit more about Ruby. To help with this, I’ve started making attempts at solving some Ruby Quiz problems, amongst other things. I decided to make a new page where I can start posting some of my solutions, and maybe other Ruby stuff, as I continue to learn. I’m very excited about everything thats going on in my life right now, so hopefully I can start posting more frequently as I get settled. In the meantime, check out my solutions and tell me all the things I did wrong!

More Javascript Fun

So, I’ve seen a variety of web pages with components which can be freely dragged around the page. Things like when Facebook used to let you freely drag and rearrange some of your profile boxes. Last night I somewhat randomly decided to find an easy way to do this component dragging using Javascript. Sure enough, I immediately came across a very simple Jquery plugin called EasyDrag. After finding it, I suddenly decided I wanted to make something using it. All I wanted to do tonight was create something simple enough that I could finish at least a very basic example in one quick sitting. So I decided to create a trivially simple puzzle. The puzzle itself is pretty self explanatory. There are 4 colored triangles which start on the left side of the screen. The goal is to drag them onto their matching grey triangle  in the middle of the screen. Once a triangle is correctly placed, it cannot be moved any more until the page is refreshed. Yes, this is about on par with puzzles for your average 1 year-old. Oh well, I’ll create something more interesting once I get a bit more time.

The puzzle itself can be found here. My javascript code can be found here.