Refactoring into Pure Functions

“Whenever I have to think to understand what the code is doing, I ask myself if I can refactor the code to make that understanding more immediately apparent.” Martin Fowler

It’s very common to find some parts of our code that contain duplicate logic. In that case, we can think of applying the DRY principle to get all the benefits of having all our logic written in one place.

It’s also common that some parts of our code get too complex over time that we need to refactor them into smaller, simpler, and more readable functions.

There are several ways we can approach the code refactor. In this article, I want to explore some of them showing how can we achieve our goal with the help of pure functions. I want to show an example with a small problem and explore some possible ways to do code refactoring. And I will do so using JavaScript.

The Problem

Let’s have a small problem and that we need to solve. Say we have several points each of them is represented as (x, y) and we are required to find the closest 2 points to each other, and the furthest 2 points from each other.

While we can resort to some sophisticated optimized algorithms to find our required points, let’s just consider the simplest -brute force- solution, that works perfectly with a small number of points. Remember, our goal here is not the most optimized solution, but the most readable, maintainable one.

Let’s agree to represent each point as an object, with 2 fields: x and y. Then let’s define a simple function that calculates the distance between 2 given points…

const getDistance = (p1, p2) => {  
    return Math.sqrt((p1.x - p2.x) ** 2 + (p1.y - p2.y) ** 2);
}

Now, let’s write the function to get the closest 2 points…

const getClosestPoints = points => {  
    let bestPair = null;
    let minDistance = Number.MAX_SAFE_INTEGER;

    for (let i = 0; i < points.length; ++i) {
        const p1 = points[i];

        for (let j = i + 1; j < points.length; ++j) {
            const p2 = points[j];
            const distance = getDistance(p1, p2);

            if (distance < minDistance) {
                minDistance = distance;
                bestPair = [p1, p2];
            }
        }
    }

    return bestPair;
}

A quick explanation of what we do: first, we set the minimum value to be the largest possible integer (value may depend on problem constraints) then we loop over the points and for each point, we consider each other point, then we calculate the distance between the 2 points, then if the distance is less than the minimum so far, we replace the minimum and set the 2 pairs in hand to be the best pair so far. And after we are done, we return the best pair we have found.

This will return the closest 2 points or will return null if the points array contained less than 2 points. Also, in case of having several points with the same minimum distance, this function will return the first pair of them that it finds.

Now, let’s write the function to find the furthest 2 points…

const getFurthestPoints = points => {
    let bestPair = null;
    let maxDistance = -1;

    for (let i = 0; i < points.length; ++i) {
        const p1 = points[i];

        for (let j = i + 1; j < points.length; ++j) {
            const p2 = points[j];
            const distance = getDistance(p1, p2);

            if (distance > maxDistance) {
                maxDistance = distance;
                bestPair = [p1, p2];
            }
        }
    }

    return bestPair;
}

This is almost identical to the last function, except that we keep track of the maximum distance instead of the minimum distance, so we initialize maxDistance to be -1, and we try to maximize the distance inside the nested for-loop.

Now if you look at the 2 functions, most of the code is common, which gives room for a good refactor, let’s consider some possible refactors…

The Refactor

“The problem is not the problem, the problem is your attitude about the problem.”

Imperative Refactor

At first glance, we may be tempted to refactor out the nested loops, and have them in a function, then pass the logic that is done inside the loops to that function. This will result in a function that looks something like the following…

const loopOverPoints = (points, iteratee) => {
    for (let i = 0; i < points.length; ++i) {
        const p1 = points[i];
        for (let j = i + 1; j < points.length; ++j) {
            const p2 = points[j];
            const distance = getDistance(p1, p2);
            iteratee(p1, p2, distance);
        }
    }
}

Here, we are looping all points in a nested loop to get all the pairs, from that point we are handing all the logic handling to the provided iteratee and we provide to that iteratee the pair of points and the distance between them. This will make our 2 functions look as follows…

const getClosestPoints = points => {
    let bestPair = null;
    let minDistance = Number.MAX_SAFE_INTEGER;
    loopOverPoints(
        points,
        (p1, p2, distance) => {
            if (distance < minDistance) {
                minDistance = distance;
                bestPair = [p1, p2];
            }
        }
    );
    return bestPair;
}

const getFurthestPoints = points => {
    let bestPair = null;
    let maxDistance = -1;
    loopOverPoints(
        points,
        (p1, p2, distance) => {
            if (distance > maxDistance) {
                maxDistance = distance;
                bestPair = [p1, p2];
            }
        }
    );
    return bestPair;
}

Things sure look a little better and more compact with all the loops encapsulated in the loopOverPoints function, but we have a small problem!

Although loopOverPoints function “looks like” it’s a pure function as it does not make any mutation to its outer scope, it actually isn’t pure, it makes mutations. Or -in other words- it needs to make mutations in order to be meaningful, let’s explore how that is true by remembering the meaning of pure functions.

A pure function is a function whose output is always the same for the same input, which is a result of having referential transparency. Also, its execution does not cause any side effects) nor is affected by any side causes (values from outside its scope).

If we examine loopOverPoints closely we will find that is designed to not return any value. So, the only way for this function to be useful to where it’s called is for the iteratee to do some logic with the provided parameters and save the result into some variable in the outer scope. So, in order for loopOverPoints to be useful, its iteratee must be an impure function, so loopOverPoints is -by extension- an impure function.

Besides the problem of having an impure helper, we also still have some code duplication that still needs refactoring. If we look closer, we can see that the structure of our 2 functions is almost the same, it just differs in how we initialize the saved distance, and how we compare it with the new distance.

We can make another refactor putting the 2 problems into consideration, and we can update our loopOverPoints function, to be getBestPair function, and that “best” part would be based on some criteria we specify, we may implement something that looks like the following…

const getBestPair = ({ points, initialDistance, isCurrentBetter }) => {
    let bestPair = null;
    let currentDistance = initialDistance;

    for (let i = 0; i < points.length; ++i) {
        const p1 = points[i];

        for (let j = i + 1; j < points.length; ++j) {
            const p2 = points[j];
            const distance = getDistance(p1, p2);

            if (!isCurrentBetter(currentDistance, distance)) {
                currentDistance = distance;
                bestPair = [p1, p2];
            }
        }
    }
return bestPair;
}

Now, we solved the first problem of impurity by returning some meaningful value after doing the logic, and we solved the second problem by doing more refactor and moving all the common structures to the helper. and now we depend on the given initial value for the distance, and how we choose to decide whether the current distance is better or worse than the new distance.

Our functions now will be…

const getClosestPoints = (points) => getBestPair({
    points,
    initialDistance: Number.MAX_SAFE_INTEGER,
    isCurrentBetter: (current, dist) => current < dist
});

const getFurthestPoints = (points) => getBestPair({
    points,
    initialDistance: 0,
    isCurrentBetter: (current, dist) => current > dist
});

And it’s obvious now that we cannot do any more refactoring to them, as they just make 1 call to the helper that performs all the common logic and does the required calculations and comparisons.

This refactor -however- is still flawed in some sense: it is very imperative. When we look into one of the 2 functions to debug or add a feature, we don’t quickly understand how they work or what they do, we will need to go read getBestPair helper and understand it well in order to understand each of the 2 functions, we might also need to run a simulation in our head to figure out what happens when we mess with the initial value or with isCurrentBetter function.

Let’s explore another path that we could have walked, that would give us better refactor and more declarative code.

Declarative Refactor

Let’s take a step back, and re-examine our getClosestPoints function, and notice the flow of the data through the function, here is the code again for reference…

const getClosestPoints = points => {
    let bestPair = null;
    let minDistance = Number.MAX_SAFE_INTEGER;

    for (let i = 0; i < points.length; ++i) {
        const p1 = points[i];

        for (let j = i + 1; j < points.length; ++j) {
            const p2 = points[j];
            const distance = getDistance(p1, p2);

            if (distance < minDistance) {
                minDistance = distance;
                bestPair = [p1, p2];
            }
        }
    }

return bestPair;
}

We can state what we are doing as follows:

Given some points, we get all possible pairs of points via the nested for-loop.
We calculate the distance between each of the pairs.
We pick the pair of points with the least distance.

This is a higher level examination of our code, that is more concerned with what happens more than how it happens. When you see this sentence, it usually means you should pay more attention to the data flow, rather than how that flow is achieved.

If we examine those previous 3 steps -or operations- on the data, we will notice that we do exactly the same for getting the furthest points, except that we maximize the distance instead of minimizing it. Given that information, we can now do some better refactor that focuses on the data flow instead of focusing on the exact actions and code lines that we do. This way, our getClosestPoints function would look like the following…

const getClosestPoints = points => {
    // Get all pairs from the given points.
    const pairs = getPairs(points);

    // Attach distance to each pair.
    const mapPair = ([p1, p2]) => ({
        pair: [p1, p2],
        distance: getDistance(p1, p2)
    });

    const pairsWithDistances = pairs.map(mapPair);

    // Get the pair with the minimum distance.
    const getClosestPair = (pair1, pair2) => pair1.distance < pair2.distance ?
        pair1 :
        pair2;

    return pairsWithDistances.reduce(getClosestPair).pair;
}

Now, it is very clear what we are doing in our function, when we look at the code we immediately know what is happening and how data flows through our function.

We use a function that we have not defined: getPairs it is a generic function that given an array of objects it returns all the possible pairs of those objects, in no specific order. Its implementation can be as follows…

const getPairs = arr => {
    const pairs = [];
    for (let i = 0; i < arr.length; ++i) {
        for (let j = i + 1; j < arr.length; ++j) {
            pairs.push([arr[i], arr[j]]);
        }
    }

    return pairs;
}

Now we can make a small check to keep our function’s signature in case of providing less than 2 points…

const getClosestPoints = points => {
    if (points.length < 2) { return null; }

    // Get all pairs from the given points.
    const pairs = getPairs(points);

    // Attach distance to each pair.
    const mapPair = ([p1, p2]) => ({
        pair: [p1, p2],
        distance: getDistance(p1, p2)
    });

    const pairsWithDistances = pairs.map(mapPair);

    // Get the pair with the minimum distance.
    const getClosestPair = (pair1, pair2) => pair1.distance < pair2.distance ?
        pair1 :
        pair2;

    return pairsWithDistances.reduce(getClosestPair).pair;
}

Now we can very easily write the furthest points function like the following…

const getFurthestPoints = points => {  
    if (points.length < 2) { return null; }

    // Get all pairs from the given points.
    const pairs = getPairs(points);

    // Attach distance to each pair.
    const mapPair = ([p1, p2]) => ({
        pair: [p1, p2],
        distance: getDistance(p1, p2)
    });
    const pairsWithDistances = pairs.map(mapPair);

    // Get the pair with the maximum distance.
    const getFurthestPair = (pair1, pair2) => pair1.distance > pair2.distance ? pair1 : pair2;

    return pairsWithDistances.reduce(getFurthestPair).pair;
}

If we want to do more refactoring, we can encapsulate the part of getting the pairs and getting the distances, so we can define the getPairsWithDistance function to do that job.

We could also create some generic function that reduces the pairs with the way we see fit, but instead, I would prefer to use lodash’s minBy and maxBy helpers to handle the minimization and maximization parts, as such a generic function that would just return the best pair, can be too generic and can actually be a drawback in readability.

So, now the resulting code will be…

const getClosestPoints = points => {
    const pairsWithDistances = getPairsWithDistance(points);
    return _.minBy(pairsWithDistances, 'distance')?.pair ?? null;
}

const getFurthest = points => {
    const pairsWithDistances = getPairsWithDistance(points);
    return _.maxBy(pairsWithDistances, 'distance')?.pair ?? null;
}

And all the other helpers that we use are as follows…

const getDistance = (p1, p2) => {
    return Math.sqrt((p1.x - p2.x) ** 2 + (p1.y - p2.y) ** 2);
}

const getPairs = arr => {
    const pairs = [];
    for (let i = 0; i < arr.length; ++i) {
        for (let j = i + 1; j < arr.length; ++j) {
            pairs.push([arr[i], arr[j]]);
        }
    }
    return pairs;
}

const getPairsWithDistance = points => {
    const pairs = getPairs(points);
    const mapPair = ([p1, p2]) => ({ 
        pair: [p1, p2],
        distance: getDistance(p1, p2)
    });
    const pairsWithDistances = pairs.map(mapPair);
    return pairsWithDistances;
}

And this concludes our refactoring journey.

Conclusion

When we attempt functions’ refactoring -or design- we may be tempted to do the refactoring -or the design- based on how we are going to do our data manipulation instead of what manipulations we are going to do, and this may end up with some refactors that only add more code and files to the project that make things more complex instead of simplifying things.

As we saw, we need to keep an eye on the data flow in our functions, and try to design our helpers in such a way that they take some input and return a meaningful output that contributes to the function at hand and gets us -at least- one step closer to what we want to achieve from our function. We also need to always keep our helpers pure and avoid any possible side effects that we may cause.

This approach also results in more flexible refactors that don’t require us to write some very specific code for our exact problem with our exact form of data like we did with getBestPair. Instead, this allows for more room for enhancements and updates on the algorithm and the approach, and other functions could benefit from the helpers that we designed along our way.

Finally, it would be helpful to have a look at some functional and utility libraries such as lodash and ramda as things get much easier when we already know and have some tools that can help us in our journey.

Amr Saber