Online coding assessments are all the rage for certifying a software developer’s skills during the hiring process. Don’t get me wrong, this is a much better system than what we used to have. A decade ago, all we had were red-flag job ads, whiteboard coding, and buzzword bingo. Back then, oh what I would have done for a chance to develop some actual code—even on a toy problem—just to demonstrate my programming prowess. And these systems are absolutely an improvement. We’re on the right track.
The most well-known system is HackerRank’s, but there are several others as well. I’m picking on HackerRank in the title, but I’m actually commenting on all the online coding assessments (including HackerRank). They all work approximately the same: They usually involve a number of multiple-choice questions followed by a couple of programming challenges. In the programming challenges, you can generally choose from a list of programming languages, and your solutions will be judged against pre-written test cases, some of which you might not have access to.
These assessments are much better than what came before, but they still suffer from fundamental pitfalls that you ought to be aware of, whether you’re a job-seeker or a hiring manager. They’re saddled with the fallout from an educational system that teaches memorization and compliance (rather than innovation and initiative), and they probably don’t reflect what you want to be looking for in a software engineer.
Bugs in the test itself
Most fundamentally, there are often bugs in the test itself. I hope I don’t have to go into too much detail about why this is a problem.
These pop up with dismaying frequency. The designers of these evaluations are clearly not asking enough experienced engineers to solve each programming problem in each one of the candidate languages before deploying the tests to production. One might argue that a programming problem is a programming problem, but this is not so. Some are intrinsically easier to implement in some programming languages; some are intrinsically harder to implement in other languages.
And requirements can be incredibly difficult to get right. In the real world, we get to go back to the customer, project manager, or business analyst and ask for clarification when requirements are vague, but in a programming assessment, the problem statement must be complete to start with. A surprising number of these problems fail to define edge cases or how the solution should behave with invalid data. One quiz problem I recently solved wanted me to return an empty string in certain invalid cases, and fortunately, the test failure clued me into the undocumented requirement. If it’s not a stated requirement, any implementation should satisfy it. But if a test case fails, that counts against you.
The problem of test bugs is especially pernicious in the multiple-choice–question portion of these evaluations, where questions are often ambiguous or answers, misspelled. One Git evaluation I recently took asked me to identify which was the path I’d use to identify a given branch explicitly. I think the correct answer was supposed to be
refs/head/branch-name. (Hint: that won’t work if you’re actually using Git.) When half of the questions are based on you manually spell-checking the compiler anyhow, typos like this critically damage the statistical power of these tests.
Test assessments should be fully proofread, and challenges should be beta-tested by experienced developers in each available programming language.
If you’re recruiting a Chief Engineer who can figure out how to save the Starship Enterprise before it explodes (again), then maybe problems with time limits are a useful evaluation device. But we all know how poorly the Enterprise was designed; it seemed to be designed to promote crisis and drama and to be easy to blow up. I mean, hell, the holodeck goes off the rails and threatens to kill people more times than I can count off the top of my head, and no one ever thought to outfit the stupid thing with an off switch.
As a job-seeker, if you want to be part of a team that promotes crisis and drama, then indeed, you should look for the team for which a timed test is essential.
But for most of us, timed tests just provide useless stress and frustration. These tests should have an expected time limit, that is, this won’t take any more than an hour of your time. But these tests are designed to ding you points if you can’t read the author’s mind before the time elapses, and often the problems seem chosen to take up all the allotted time. On top of that, because of the nature of these problem definitions, there usually isn’t any way to do a partial implementation so that you know you’re on the right track. So you spend ⅔ the time coding up your implementation and then the other ⅓ trying to get it to work, because going back and starting from scratch is simply not an option.
In real life, you sometimes have deadlines (although most deadlines are artificial and can be easily negotiated). But you almost never have to get such-and-such an implementation working within the next 60 minutes or else people will die. Real-life systems and processes are in fact designed to avoid those sorts of dramatic crisis moments. If you’re a hiring manager, you might want to use some judgment in weighing the results from these tests, especially if an applicant runs out of time due to hidden or unreasonable requirements or test cases.
Programming assessments should include a time limit only as an expected maximum, and any reasonably competent programmer should be able to successfully solve the problem in a fraction of that time using any of the available programming language environments.
(And for those programmers who are like me, we spend the other half of the time cleaning up and commenting our code for extra credit.)
Sometimes the authors of these assessment questions and challenges simply can’t resist trying to be too clever, so that the assessment wastes more effort testing cleverness than development skill.
An extreme example of this is one of the Java practice questions on HackerRank, which purports to teach you the visitor pattern but in reality bogs you down in writing code to generate an arbitrarily wide tree from minimal input data (and do so within strict runtime limits). There’s no need for that, as it has zero to do with the visitor pattern. The challenge is 10% visitor pattern, 90% trying to be too clever.
Fortunately, cleverness doesn’t often intrude into assessment challenges as significantly as it does into practice problems. But it does sometimes rear its ugly head. Usually, it shows up as overly complex problems, esoteric algorithms, undocumented requirements, or multiple parts or layers of logic. In the real world, we get to confer with colleagues to figure out complex algorithms, and we get to decompose independent parts and layers of a solution into separate classes or modules and develop them separately. None of this is practical within the confines of an online programming challenge, and being able to force-fit it into those confines does not prove that you’re a good developer.
Programmer assessments should not rely on esoteric algorithms, multi-layer architectures, or anything else that an average developer would consult with colleagues about in the normal course of work.
Hidden test cases
This is the scourge of these programming challenges, as all the assessments I’ve encountered include test cases that you simply cannot access. You don’t know what the test data was, what was expected, or what your code did in response. When these test cases fail, you’re left to guess what went wrong.
It’s as if someone submitted a bug report: “There’s something wrong with your code. I know exactly what it is, but I’m not going to tell you. By the way, I still expect you to fix it.”
If you ever find yourself in an organization that accepts bug reports of that caliber… Leave. Quickly.
In real life, we expect bug tickets to include test-case details. And when they don’t (as often they don’t, despite guidelines to the contrary), we are able to follow up with the submitter to flesh out the specifics of the alleged misbehavior. (It might not even be a bug. How often are these bug reports actually feature requests in disguise?)
If you’re a job seeker, you have to just do the best you can under impossible circumstances. Failing a hidden test case might cause you to fail the assessment; however, I have failed a number of them and have still “passed” the overall assessment (but presumably with a suitably dinged score).
If you’re a hiring manager who is being forced to use hidden test cases, try to understand the circumstances surrounding any test failures. If a whole list of hidden test cases failed, did they all fail because of a single bug? Would the developer have been able to fix the code had he been given a reasonable bug report?
Programming challenges should never include hidden test cases. Ever. Period.
I’ve mentioned undocumented requirements multiple times so far in this post. They are a subset of what I might call “unreasonable requirements.”
Of course, I didn’t know whether this was true. The failing test cases were hidden, and even if they weren’t, I didn’t have enough time during the challenge to delve into it any further. I could have, for example, attempted running it in an environment that was not as resource-limited (like my laptop), to see if it was really just taking a long time or whether there was an infinite loop somewhere. (But I’m pretty sure there was no infinite loop, as there was no loop in the code that could even theoretically fail to exit.) Likewise, re-implementing in another language would’ve simply taken too long.
All of this left an extremely bad feeling about the assessment, and I wondered whether to what extent those failed test cases dinged me points.
Whatever the requirements are for a challenge, they should be (1) fully documented, (2) reasonably easy to achieve in any of the available programming environments, and (3) pretty obvious to an experienced practitioner. This includes happy-path functionality, edge cases, error cases, and performance requirements.
(Addendum: I had optimized the hell out of the inner loop, but the outer loop was still O(nm) over n and m lines of input, in which each operation was reduced to a single string comparison. As I write this post, a little more research reveals that I could pre-sort the lines of input in order of their comparison strings, which would be an O(n log n) operation. Then we only need to compare adjacent strings in the sorted arrays of input. So it would be possible to squeeze a little more performance out of the code, and maybe that might’ve caused those test cases to pass. At this point, this coding challenge is dinging so many red flags on the above list that I don’t even feel inadequate for not thinking of looking into that earlier.)
Online coding assessments can be informative for hiring managers and enjoyable for job seekers, as they allow the hiring manager to see actual code and allow the job seeker to prove himself doing what he does best. In general, these assessments represent a vast improvement over the common practice of a decade ago. However, they still can be hobbled by the “gotcha” mentality that destroys so many hiring processes from within.
One last note: When I was working full-time at The Perl Shop, we developed an interview process that included a peer session with candidates going over their solution to the programming challenge. This is something I’ve not encountered anywhere else. How much knowledge can you gain about how a developer programs and interacts than by spending a session with him working on the actual code together?
And may all your bars turn from red to green.