
Online coding assessments are all the rage for certifying a software developer’s skills during the hiring process. Don’t get me wrong, this is a much better system than what we used to have. A decade ago, all we had were red-flag job ads, whiteboard coding, and buzzword bingo. Back then, oh what I would have done for a chance to develop some actual code—even on a toy problem—just to demonstrate my programming prowess. And these systems are absolutely an improvement. We’re on the right track.
The most well-known system is HackerRank’s, but there are several others as well. I’m picking on HackerRank in the title, but I’m actually commenting on all the online coding assessments (including HackerRank). They all work approximately the same: They usually involve a number of multiple-choice questions followed by a couple of programming challenges. In the programming challenges, you can generally choose from a list of programming languages, and your solutions will be judged against pre-written test cases, some of which you might not have access to.
These assessments are much better than what came before, but they still suffer from fundamental pitfalls that you ought to be aware of, whether you’re a job-seeker or a hiring manager. They’re saddled with the fallout from an educational system that teaches memorization and compliance (rather than innovation and initiative), and they probably don’t reflect what you want to be looking for in a software engineer.
Bugs in the test itself
Most fundamentally, there are often bugs in the test itself. I hope I don’t have to go into too much detail about why this is a problem.
These pop up with dismaying frequency. The designers of these evaluations are clearly not asking enough experienced engineers to solve each programming problem in each one of the candidate languages before deploying the tests to production. One might argue that a programming problem is a programming problem, but this is not so. Some are intrinsically easier to implement in some programming languages; some are intrinsically harder to implement in other languages.
And requirements can be incredibly difficult to get right. In the real world, we get to go back to the customer, project manager, or business analyst and ask for clarification when requirements are vague, but in a programming assessment, the problem statement must be complete to start with. A surprising number of these problems fail to define edge cases or how the solution should behave with invalid data. One quiz problem I recently solved wanted me to return an empty string in certain invalid cases, and fortunately, the test failure clued me into the undocumented requirement. If it’s not a stated requirement, any implementation should satisfy it. But if a test case fails, that counts against you.
The problem of test bugs is especially pernicious in the multiple-choice–question portion of these evaluations, where questions are often ambiguous or answers, misspelled. One Git evaluation I recently took asked me to identify which was the path I’d use to identify a given branch explicitly. I think the correct answer was supposed to be refs/head/branch-name
. (Hint: that won’t work if you’re actually using Git.) When half of the questions are based on you manually spell-checking the compiler anyhow, typos like this critically damage the statistical power of these tests.
Test assessments should be fully proofread, and challenges should be beta-tested by experienced developers in each available programming language.
Time limits
If you’re recruiting a Chief Engineer who can figure out how to save the Starship Enterprise before it explodes (again), then maybe problems with time limits are a useful evaluation device. But we all know how poorly the Enterprise was designed; it seemed to be designed to promote crisis and drama and to be easy to blow up. I mean, hell, the holodeck goes off the rails and threatens to kill people more times than I can count off the top of my head, and no one ever thought to outfit the stupid thing with an off switch.
As a job-seeker, if you want to be part of a team that promotes crisis and drama, then indeed, you should look for the team for which a timed test is essential.
But for most of us, timed tests just provide useless stress and frustration. These tests should have an expected time limit, that is, this won’t take any more than an hour of your time. But these tests are designed to ding you points if you can’t read the author’s mind before the time elapses, and often the problems seem chosen to take up all the allotted time. On top of that, because of the nature of these problem definitions, there usually isn’t any way to do a partial implementation so that you know you’re on the right track. So you spend ⅔ the time coding up your implementation and then the other ⅓ trying to get it to work, because going back and starting from scratch is simply not an option.
In real life, you sometimes have deadlines (although most deadlines are artificial and can be easily negotiated). But you almost never have to get such-and-such an implementation working within the next 60 minutes or else people will die. Real-life systems and processes are in fact designed to avoid those sorts of dramatic crisis moments. If you’re a hiring manager, you might want to use some judgment in weighing the results from these tests, especially if an applicant runs out of time due to hidden or unreasonable requirements or test cases.
Programming assessments should include a time limit only as an expected maximum, and any reasonably competent programmer should be able to successfully solve the problem in a fraction of that time using any of the available programming language environments.
(And for those programmers who are like me, we spend the other half of the time cleaning up and commenting our code for extra credit.)
Cleverness
Sometimes the authors of these assessment questions and challenges simply can’t resist trying to be too clever, so that the assessment wastes more effort testing cleverness than development skill.
An extreme example of this is one of the Java practice questions on HackerRank, which purports to teach you the visitor pattern but in reality bogs you down in writing code to generate an arbitrarily wide tree from minimal input data (and do so within strict runtime limits). There’s no need for that, as it has zero to do with the visitor pattern. The challenge is 10% visitor pattern, 90% trying to be too clever.
Fortunately, cleverness doesn’t often intrude into assessment challenges as significantly as it does into practice problems. But it does sometimes rear its ugly head. Usually, it shows up as overly complex problems, esoteric algorithms, undocumented requirements, or multiple parts or layers of logic. In the real world, we get to confer with colleagues to figure out complex algorithms, and we get to decompose independent parts and layers of a solution into separate classes or modules and develop them separately. None of this is practical within the confines of an online programming challenge, and being able to force-fit it into those confines does not prove that you’re a good developer.
Programmer assessments should not rely on esoteric algorithms, multi-layer architectures, or anything else that an average developer would consult with colleagues about in the normal course of work.
Hidden test cases
This is the scourge of these programming challenges, as all the assessments I’ve encountered include test cases that you simply cannot access. You don’t know what the test data was, what was expected, or what your code did in response. When these test cases fail, you’re left to guess what went wrong.
It’s as if someone submitted a bug report: “There’s something wrong with your code. I know exactly what it is, but I’m not going to tell you. By the way, I still expect you to fix it.”
If you ever find yourself in an organization that accepts bug reports of that caliber… Leave. Quickly.
In real life, we expect bug tickets to include test-case details. And when they don’t (as often they don’t, despite guidelines to the contrary), we are able to follow up with the submitter to flesh out the specifics of the alleged misbehavior. (It might not even be a bug. How often are these bug reports actually feature requests in disguise?)
If you’re a job seeker, you have to just do the best you can under impossible circumstances. Failing a hidden test case might cause you to fail the assessment; however, I have failed a number of them and have still “passed” the overall assessment (but presumably with a suitably dinged score).
If you’re a hiring manager who is being forced to use hidden test cases, try to understand the circumstances surrounding any test failures. If a whole list of hidden test cases failed, did they all fail because of a single bug? Would the developer have been able to fix the code had he been given a reasonable bug report?
Programming challenges should never include hidden test cases. Ever. Period.
Unreasonable requirements
I’ve mentioned undocumented requirements multiple times so far in this post. They are a subset of what I might call “unreasonable requirements.”
My solution to a recent, simple, basic-algorithm challenge failed a number of test cases because it would not complete in the allotted runtime. I optimized the hell out of the inner loop, moving from an O(n²) naive algorithm to an O(n) optimized algorithm that processed each and every input character only once—or in some cases of repeated input data, less than once. My code still didn’t run fast enough. I was left to assume that had I used a faster language than JavaScript (or one more adept at processing individual characters of input), I would have been able to complete the challenge and pass all test cases. In other words, I believe that the problem was impossible to solve, using the standard libraries, in the language I had chosen.
Of course, I didn’t know whether this was true. The failing test cases were hidden, and even if they weren’t, I didn’t have enough time during the challenge to delve into it any further. I could have, for example, attempted running it in an environment that was not as resource-limited (like my laptop), to see if it was really just taking a long time or whether there was an infinite loop somewhere. (But I’m pretty sure there was no infinite loop, as there was no loop in the code that could even theoretically fail to exit.) Likewise, re-implementing in another language would’ve simply taken too long.
All of this left an extremely bad feeling about the assessment, and I wondered whether to what extent those failed test cases dinged me points.
Whatever the requirements are for a challenge, they should be (1) fully documented, (2) reasonably easy to achieve in any of the available programming environments, and (3) pretty obvious to an experienced practitioner. This includes happy-path functionality, edge cases, error cases, and performance requirements.
(Addendum: I had optimized the hell out of the inner loop, but the outer loop was still O(nm) over n and m lines of input, in which each operation was reduced to a single string comparison. As I write this post, a little more research reveals that I could pre-sort the lines of input in order of their comparison strings, which would be an O(n log n) operation. Then we only need to compare adjacent strings in the sorted arrays of input. So it would be possible to squeeze a little more performance out of the code, and maybe that might’ve caused those test cases to pass. At this point, this coding challenge is dinging so many red flags on the above list that I don’t even feel inadequate for not thinking of looking into that earlier.)
Caveat emptor
Online coding assessments can be informative for hiring managers and enjoyable for job seekers, as they allow the hiring manager to see actual code and allow the job seeker to prove himself doing what he does best. In general, these assessments represent a vast improvement over the common practice of a decade ago. However, they still can be hobbled by the “gotcha” mentality that destroys so many hiring processes from within.
One last note: When I was working full-time at The Perl Shop, we developed an interview process that included a peer session with candidates going over their solution to the programming challenge. This is something I’ve not encountered anywhere else. How much knowledge can you gain about how a developer programs and interacts than by spending a session with him working on the actual code together?
Still typing…
And may all your bars turn from red to green.
Tim
By Darren April 25, 2020 - 9:15 pm
Corollary to undocumented/unreasonable requirements: giving the candidate a problem that current staff hasn’t been able to figure out; i.e. an existing real-world problem.
By Tim King April 26, 2020 - 3:26 pm
Yeah, sometimes I wonder whether the existing staff has figured it out.
In a recent interview, they hit me with some deeply mind-bending problems that the on-staff domain experts had figured out. (I am not an expert… yet.) I found those delightfully challenging, and I felt as though they were truly interested in seeing how I approached a really difficult problem. But that’s different than what I experience with the online assessments. (Also than what I tend to experience with whiteboarding, which is a whole other conversation.)
By Chris Brossard February 4, 2022 - 5:56 pm
I tried HackerRank but gave up on it as I found that I was looking up the answers on the internet and then changing the names of the variables to evade the anti plagiarism software in HackerRank. I wasn’t learning anything. I also found that sorting questions had time limits, which is stupid because it forces everyone to use Quicksort all the time. The other thing I noticed is that there are 16 million members. There aren’t 16 million programming jobs in the world, so it’s unlikely that a company is going to offer you a job as a result of you completing one of their preparation programs. I think people might do better to work through the exercises in an algorithm textbook, but even then you won’t be sure that’s enough. The biggest problem with this website and software job testing in general is that it goes against the whole idea of university level courses where you are presented with some material and then are tested on it. In other words, at university you have some idea of what the test will be about. In a software job test, you haven’t a clue.
By SN March 29, 2022 - 12:12 am
I wonder those kids asking for these tests from senior developers can themselves do such wonderful but awfully timed tests in given time frame e.g. 90 to 120 minutes. questions are interesting but the time limit is nonsensical, interviewers have become lazy seems. I got invite for a crypto trading firm asking me to do 3 hours test over weekend, simulate a key part of trading system, how on earth can you do that in 3 hours even though given was 8 hours? Felt frustrated, embarrassed and gave up, I dont believe we code 8 hours at stretch, let alone design complex algorithm/ds based programs in college or professional life, its plain unproductive to sit 8 hours and produce great code. you punks will get old and face same and know the nonsense you put us through.
By Binomo Login April 3, 2025 - 10:48 pm
Howdy, i read your blog occasionally and i own a similar one and i was just wondering if you
get a lot of spam responses? If so how do
you reduce it, any plugin or anything you can advise?
I get so much lately it’s driving me crazy so any help is very much
appreciated.
By Binomo Login April 3, 2025 - 11:52 pm
Valuable info. Lucky me I discovered your web site accidentally, and
I am stunned why this twist of fate did not came about earlier!
I bookmarked it.
By Binomo Login April 4, 2025 - 12:05 am
Heya i am for the primary time here. I came
across this board and I in finding It really helpful & it helped
me out much. I’m hoping to give one thing back and help others like you helped me.
By Binomo Login April 4, 2025 - 1:14 am
I was recommended this website by my cousin. I am not sure whether this post
is written by him as no one else know such detailed about my trouble.
You are incredible! Thanks!