'Treat your test code like production code' is a really common saying. Lets tackle one facet of that rule: unit testing your e2e tests
In August of 2022 I asked on Twitter and LinkedIn if people tested their test code. Specifically I wanted to know if they wrote unit tests for their test code.
At the time I was running a team of SDETs at Promenade Group. We were building end to end (e2e) UI tests for our platform. One of my SDETs, David Ward, introduced unit testing to our custom test code. We continued the practice which was my first time truly treating our test code like production code.
We had plans to write this article for an company blog but the blog never happened and we both eventually left the company. Fast forward 10 months later and we are finally ready to share more about the concept. David ended up writing most of this tutorial so props go to him. I was the instigator and copyeditor.
Below is the tutorial on unit testing your end to end test framework. You can also read David's version.
Treat your test code like production code
You've heard that phrase before. "Treat your test code like production code". Maybe once, maybe twice, maybe countless times.
Everyone says it's a good idea. Few describe what it means and even fewer describe how to do it.
For fun, let's hear what ChatGPT has to say about the topic:
"Treating test code like production code means applying the same level of care and attention to detail when writing and maintaining tests as you would for production code. This helps ensure that your tests are reliable, maintainable, and provide accurate feedback about the quality of your application."
Ultimately, treating test code like production code is about acknowledging the crucial role that testing plays within the software development process, by evaluating it in the same way that you would evaluate the application itself. By investing in high quality tests, you can improve the overall quality of your application, increase your confidence in its behavior, and ensure a better user experience for your customers.
Let's briefly go over what some criteria of "reliable, maintainable" tests might look like:
Stored in version control
Run in a CI/CD pipeline
Reusable code
Results are able to be diagnosed within CI trivially
Are themselves covered by unit tests
"Wait... repeat that last one?"
That's right. Unit tests for your e2e tests. Tests testing your tests.
And before you ask, no, we won't go deeper into the abyss (i.e. no tests testing tests testing tests). At least not yet.
The benefits of testing your tests
Faster feedback
Unit tests are typically faster to run than e2e tests, as they don’t require the entire application to be set up and run. By writing unit tests for the individual components of your e2e tests, you can get feedback on code changes more quickly, which can help you catch and fix issues earlier in the development cycle.
Isolation of issues
If you encounter an issue in your e2e tests, it can be challenging to determine where the problem lies. By writing unit tests for the individual components of your e2e tests, you can isolate issues to specific parts of the application, which can make it easier to debug and fix issues.
Increased confidence
Writing unit tests for your e2e tests can give you more confidence that your e2e tests are accurate and reliable. By verifying the individual components of your e2e tests with unit tests, you can ensure that your e2e tests are testing the right things and that they’re providing accurate feedback about the quality of your application.
Better maintainability
e2e tests can be complex and difficult to maintain, especially as your application grows and changes over time. By breaking down your e2e tests into smaller, more manageable components, you can make them easier to maintain and update as your application evolves.
"Wait, that sounds familiar..."
That's because for every heading under the "The Benefits of testing your tests" section, replace the phrase "e2e tests" with "product"/"software project"/"entire codebase", or what have you, and it's almost a perfect match for what you'd probably be saying if you were consulting as a test engineer on a project without unit tests and needed to sell the idea to the engineering organization.
It all applies to us testers, as well.
Where to unit test your tests
As a baseline, you'll want to unit test code where you've added complexity or logic that might affect test results.
For example:
Data transformation: If you’re creating data to insert into the application, either from your own set of rules and/or based on input you’ve received from the application itself, ensuring that your data gets transformed properly is important.
API wrappers: Are you hitting an API to gather data to be used by your tests? Make sure, for example, your API authentication logic is covered with tests to ensure your tests don’t fail because you attempted to login incorrectly.
Other utility functions: Date manipulation, math calculations (cart logic, coupon codes, etc), string formatting, logic comparisons, and more all belong to a category of code that benefit highly from unit test coverage.
"I'm still not sold"
Allow me to introduce you to some fun then: An example project and a hypothetical situation, that we're going to play out together for the rest of the article.
The project
Our journey begins here. This is a project I've created which uses the tool WebDriverIO to test a demo website I'm sure you've also seen before: https://www.saucedemo.com
That link specifically targets a commit in the project, where we will begin our exercise.
Phase 1: Setting up
Clone the project git clone git@github.com:gendelbendel/saucedemo-wdio.git
Checkout the beginning commit git checkout a0874e2
run npm i
run npm run wdio:inventory
If that setup all worked... Well, the tests will fail. Dang! What seems to be the problem?
Let's first take a look at the spec file that seems to be failing and see what it is that it is doing.
The e2e spec
Source of this test/specs/e2e/inventory.e2e.js for our current commit can be found here:
Explanation of the e2e spec
If we walk through the code, this is what we see:
We enter beforeEach and open up the login page, fill out the credentials, and login as an authorized user.
We generate a list of similar tests we want to run. To do this we have:
A value for sortBy, which is set to the corresponding sort option in the application. For example "Name (A to Z)", which corresponds to both the name of the test and how to sort the products on the page. We can see that here:
A predicate, which is a function we create to make a decision as to whether the values we received from the website are correct or not. We set it to a util.isSortedLowToHi function.
An items property, which is how we retrieve the items to make a comparison for.
Repeat for each item in the array, with their corresponding values.
We grab the list of items we need (either an array of names or an array of prices, depending on the test)
We run our predicate function against those items to determine if they are sorted in the manner we expect.
We assert on that result.
The key point here is that we have some custom logic to determine whether the data we have matches some criteria, which is in a lib/util.js file we have created. Remember this for later.
Let’s look at the product under test, to see if we can see with our own eyes what behavior is being exhibited, to see if our tests have found a bug, or if our tests are bugged themselves.
Exploration
If you browse to https://saucedemo.com, login with standard_user and secret_sauce as the username and password combo, you're greeted with the product inventory screen.
You'll notice all of the products are by default sorted by Name (A to Z). If you look at the names of the products in left to right order, you will observe that this is indeed correct behavior.
Click on the sorting select menu and choose Price (high to low).
You'll notice the product ordering shifts to respect the new sorting, and if you pay close attention to the prices of each product, they do indeed get ordered correctly from highest priced item first, to lowest price item last.
So, the product under test appears good.
We've determined the app does indeed appear to be respecting the sorting behavior and there is no bug there, but our test says otherwise. That suggests our tests may in fact be the problem here.
So we should start by looking at what it is that is determining whether our test is correct or incorrect. In the case of our project, what makes that decision is the functions inside of lib/util.js.
So, we investigate.
Explanation of lib/util.js
Source of this lib/util.js on our current commit can be found here.
These functions will iterate over every item in the list and compare it to the previous one using Javascript's built in operators.
Side note: These functions say they accept any[] as a data type, but that's not quite true, so don't try using them with all data types, just ones that implement comparison with the various built in operators.
Our very first unit test
Source of our test/specs/unit/util.unit.js file on our current commit can be found here
It is quite uneventful for now.
We require chai so we can use the expect assertions later. We have two different contexts: One for isSortedHiToLow and isSortedLowToHi which will contain tests for each of these functions in lib/util.js respectively.
Phase 2: Basic unit tests
Now we just want to see what it looks like to have unit tests and run them against our functions.
Feel free to git checkout 39edcca and then run npm run unit.
Now we've added two basic tests to each function: An array of two values, and each test with a reversed order for these elements.
If we run these tests, we will see they all run against our lib/util.js code and all pass successfully. This is useful information, so we know that we can proceed with the next steps.
Now, we want to see why our function appears to be failing tests that, as far as we can see, should indeed be passing.
Let's first just grab what our items actually are, to observe if perhaps the way we are getting items back is problematic. Then, we will write tests against that data.
We can do this with four simple steps:
Add a temporary console.log() to our spec,
Changing our wdio.conf.js logging to only error for easy visibility
Running the tests to get our outputs.
Write the additional tests
1. Add a console.log
2. Update wdio.conf.js
3. Run the tests for data
In the console output, we get the following arrays (with added comments):
// Name (A - Z)
[
'Sauce Labs Backpack',
'Sauce Labs Bike Light',
'Sauce Labs Bolt T-Shirt',
'Sauce Labs Fleece Jacket',
'Sauce Labs Onesie',
'Test.allTheThings() T-Shirt (Red)',
]
// Name (Z - A)
[
'Test.allTheThings() T-Shirt (Red)',
'Sauce Labs Onesie',
'Sauce Labs Fleece Jacket',
'Sauce Labs Bolt T-Shirt',
'Sauce Labs Bike Light',
'Sauce Labs Backpack',
]
// Price (high to low)
[49.99, 29.99, 15.99, 15.99, 9.99, 7.99]
// Price (low to high)
[7.99, 9.99, 15.99, 15.99, 29.99, 49.99]
So with that data in hand, let's add a couple more tests!
3. Adding tests
Let's start just with the data that resulted in failing tests: the prices.
Now if we run npm run unit... Aha! We have observed failing results. We now have data that we know for sure is causing a problem.
And if we inspect these arrays, we observe that they are in fact in the correct order, which proves there is something incorrect with our sorting function. If you've been following along closely so far, and you take a close look at lib/util.js, you might already know how to fix it, but let us continue.
In our current tests, we are making a few assumptions as to what is acceptable data and what is not. Let's try to think about what assumptions we are making, so we can begin testing against them.
A non-comprehensive list of assumptions:
We were only using one data type: ints. Once we introduce floats, the issues appear. Idea: Is the issue related to floats?
We know we use strings in this function, but we are not testing for it. Idea: Are strings actually safe?
We were only using arrays with a length of 2 elements. Once we introduced arrays with more than 2 elements, issues appeared. Idea: Is the issue related to array length?
We were only testing against arrays with unique elements. Once we introduced arrays with duplicate elements, issues appeared. Idea: Is the issue related to array uniqueness?
Let's begin testing against some of these assumptions, and look for failures.
Begin clearing up assumptions
You'll now see we have multiple branching contexts organizing our code, based on the function being used as well as the data type under test.
For each data type we think we should test against, we added a layer of tests for them, which includes some dummy data as well as the data we know we are receiving.
Now if we run npm run unit... We get no additional new failures.
Which assumptions did we clear up? Which ones have we still not tackled?
There's one more in that list we haven't covered yet: Duplicate elements.
Phase 5: Finalizing our list of assumptions
Do duplicate elements cause issues? Let's write some tests to do so.
Feel free to git checkout 06f2534 and npm run unit to see the results yourself.
If you run npm run unit now... You see a LOT of failures. And every single one is related to duplicate elements.
The bug has been isolated: Duplicate elements in an array for both functions cause the function to always return false, stating that they are unsorted when they actually are.
Phase 6: The bug fix
For reference, lets take a look at the lib/util.js file and see what we can find that may be the issue:
Run git checkout 0b6caa88 and npm run unit, followed by npm run wdio.
One thing that stands out is that in our comparisons, we are not allowing the arr[index - 1] or val values to be equal to each other; we are always using greater than or less than.
So, lets make a quick change and run our unit tests:
If we run npm run unit... All the unit tests pass!
Now... Do the e2e tests also pass?
Drumroll please.
npm run wdio:inventory
The pull request in full
While this article is quite lengthy, the code in question is more terse.
Find the pull request and each individual commit here.
Where to go from here?
This example may have been a trivial one, but we went from having zero unit test coverage to having 34 unique tests, and we really only covered a few assumptions that we had been making before. There's still a ton of room for more tests that could be written.
For example:
What happens if the array being supplied is of type boolean?
What happens if the array has multiple different types (like a mixture of floats, ints, and strings?)
What happens if the array is empty?
What happens if the array is duplicates, but of different types (like [1, 1.0, "1", "1.0"])?
We've really only scratched the surface... and these functions are, again, quite trivial.
Final notes
As your code grows in complexity, so does your potential for errors. Unit testing your code is one way to help cover for those issues, but is not the only way.
There's a lot of duplication of code in our unit tests, which can be one area where that complexity can come in. The only real things changing is the data going in, and the expectation of results. Similar to how we generate tests within test/specs/e2e/inventory.e2e.js, we can also generate unit tests and simplify our code.
This mindset of testing your tests that are testing your tests can go as deep as you want to go. And perhaps, I'll write another post going into what that world looks like.
Subscribe to Shattered Illusion by Chris Kenst
Sign up now to get access to the library of members-only issues.