Exploration and Exploitation

I love going to jazz festivals. Listening to good jazz at home is a pleasure, but what’s missing are the vibes during a live performance. And it’s not the same when you listen to a recording of a concert. Everything changes when you are actually there, immersed, experiencing directly with all your senses. I guess it’s similar with other types of music. But what makes the difference between listening to a recording and being at concert even bigger for jazz, is that it is all about improvisation. And then the experience of single concerts versus festivals is also different. With concerts, you immerse yourself for a couple of hours into a world of magic and then go back to the normal world. But with jazz festivals, you relocate to live in a music village for a couple of days. This doesn’t only make it a different experience, but also calls for different kinds of decisions.

Previously, when I learned of a new jazz festival or read the line-up of a familiar one, the way I decided whether to go was simple. I just checked who would be performing. If there were musicians that I liked, but hadn’t watched live, or some that I had, but wanted to see again, then I went. If not, I usually wouldn’t risk it.

Once I chose to go, this brought more things to consider. Jazz festivals usually have many stages, with parallel performances during the day and into the night. Last time I went to the North Sea Jazz Festival, there were over 80 performances in only a few days. So, there is a good chance that some of those you want to watch will clash, and you are forced to choose. And I kept applying the same low-risk strategy for choosing what to watch as I did for deciding if I should go at all.

Then one day, I arrived late to a festival just before two clashing sets were about to begin. I dashed into the closest hall with no clue as to what I would find. And there I experienced what turned out to be the best concert of the whole festival. I hadn’t heard of the group, and if I had read the description beforehand, I would have avoided their performance.

I realized then, that by only choosing concerts with familiar musicians, I was over-exploiting and under-exploring. My strategy was depriving me of learning opportunities and reducing the overall value I got from the festivals.

What happened at that festival changed the way I decide whether to go and which performances to see. Now I not only attend many more concerts of musicians previously unknown to me, but not having a familiar name in the line-up does not determine the decision to buy tickets.

However, when the whole line-up of the festival is completely unknown, then going is all exploration. That’s highly risky. When there are no familiar musicians, I listen to recordings of previous concerts of some of the groups. If I like at least two of them, then I usually go to the festival, and once there, I will still check out a few acts I don’t know. That’s another way to balance exploitation and exploration.

When I am in control, “I restrict the world to what I can imagine or permit”, writes Ranulph Glanville. He gives the example of going to a restaurant with friends. If it’s always him who chooses the restaurant, the group will only go to the restaurants that he knows. They are limited to his taste and knowledge, or rather – as he admits – by his ignorance. Letting go of control by letting others choose, not only expands his knowledge but would often give a better experience for everyone.

Having the wrong strategy when it comes to jazz festivals and restaurants reduces the pleasure, but in these examples, the decisions make such a small impact that they may not show how important this balance is. Yet, we make similar choices all the time. For example, you might decide to invest your time in getting better at what you currently do well while not allocating time to trying out new things. This may put you in a very unpleasant situation in times when there is no more demand for what you are skilled at, or when you need a change but have difficulty choosing because you haven’t tested many alternatives.

Throughout our lives, many of us realize that when making choices, we should have a balance of exploration and exploitation. We should let go of some control and not limit ourselves to what we already know. And that’s an important first step, but it’s not enough. It takes a greater effort to keep this awareness awake. And somehow, it’s also easier at a personal level. How so?

We live our lives and are experiencing every minute of every day. We absorb sounds, tastes, smells, and light and feel the air on our skin. Through evolution we are well equipped to receive a signal when there is even a small problem. We get a scratch and react right away. That’s not the case with organizations. They might be missing a whole limb or – and here the metaphor will fail to produce a feeling of exaggeration – a head, without noticing for years. And even if we have learned how to balance exploitation and exploration in our lives, the chances are we are working in organizations that haven’t. It’s not easy even to imagine what maintaining this balance means for an organization. We can’t really step into the shoes of one. What we can do instead is study this phenomenon a little closer, try to understand it better, and then, armed with a new pair of glasses, make the best of that knowledge while we keep learning from what happens. To understand how the balance between exploration and exploitation works in organizations, we’ll start with the problem of resource allocation, and then move to more complex situations.

Allocation of resources

You have discovered a gold deposit. How do you allocate your limited resources? How much should you invest in exploiting the deposit you have versus the amount spent on looking for new deposits? You don’t know how big the vein is and the amount of gold you can extract from the current mine and you also don’t know if you’ll find a new one. If you put all your efforts and resources in exploiting the current deposit, you might not have sufficient resources left to look for new ones when this one is exhausted. But the next one you find might be bigger or have higher quality. Yet it is just as likely that you invest a lot in exploring and you don’t find another deposit at all, so unlike in exploitation, all that resource will be wasted.

Companies face this dilemma all the time. For example, when there are a few successful markets (clients, consumer groups, or regions), should the company come up with new products and services to sell to the existing markets, or explore new territories with current or new offerings? Should it improve the current technologies or try new ones? This dilemma is present not only for marketing and production strategy but in almost every investment decision.

Thus, the basic understanding of Exploration–Exploitation is as an optimization problem. And indeed, it is extensively researched as such. In probability theory, it is known as the multi-armed bandit problem. A lot of optimization strategies and computer algorithms have been developed in the last 70 years for solving that problem. The applications are wide ranging, from clinical trials and financial portfolio design, to minimizing delays in computer networks. The multi-armed bandit problem is the basis of one of the best-established areas in machine learning. Going through all these applications is beyond the objective of this chapter. It is sufficient to say that studying them is useful, as a lot of thinking, mathematics, modelling, and experimenting have been invested in with impressive results. The patterns, algorithms, and strategies which have been discovered can be useful to some extent, in certain organizational contexts. But overall, all these studies work with a lot of assumptions and simplifications which don’t hold in real-life situations.

A broader understanding

The exploitation-exploration dilemma is present everywhere in organizations: in strategy, marketing, sales, research, operations and projects. It appears in different guises and is communicated in various terms. Yet it is not very common to hear it explicitly discussed at meetings.

As James March wrote in 1991,

Exploration includes things captured by terms such as search, variation, risk taking, experimentation, play, flexibility, discovery, innovation. Exploitation includes such things as refinement, choice, production, efficiency, selection, implementation, execution. Adaptive systems that engage in exploration to the exclusion of exploitation are likely to find that they suffer the costs of experimentation without gaining many of its benefits. They exhibit too many undeveloped new ideas and too little distinctive competence. Conversely, systems that engage in exploitation to the exclusion of exploration are likely to find themselves trapped in suboptimal stable equilibria. As a result, maintaining an appropriate balance between exploration and exploitation is a primary factor in system survival and prosperity.

Exploration and exploitation compete for resources and so organizations have to make choices. Some of them are explicit, but most are implicit. The explicit choices are seen as decisions made comparing alternatives. A typical example is investment decisions. In comparison, implicit choices are – as March put it – “buried in many features of organizational forms and customs, for example, in organizational procedures for accumulating and reducing slack, in search rules and practices, in the ways in which targets are set and changed, and in incentive systems”.

Working with the exploration-exploitation balance does not only help in seeing the exploitation and exploration patterns in organizational communications and decisions. It also shifts the attitude to what’s going on, away from what is accepted as rational or intuitive.

For example, a quick-learning new employee starts actually contributing to the organization sooner. She does so by being able to absorb the organizational knowledge in a shorter time. That may be good for her and for the organization in the short term, but it might be bad in the long run. When a slow learner joins the organization, it will take longer for him to fit in, but that might actually improve the organizational knowledge and norms. And the same person when well established, will be slow to absorb new knowledge. This would often be a healthy conservatism, as it would reduce the risk of investing in fads, as March pointed out.

Exploration and exploitation in time

Exploitation follows exploration. We first explore the menu, select, order, and then consume what we’ve ordered (or what we think we’ve ordered). A pharmaceutical company carries out a lot of experiments to come up with a new formula which will work against a certain disease. These experiments may be futile or fruitful or, in fact, come up with something that does not treat the intended disease but turns out to be useful against another. In the first case, that exploratory path does not end in exploitation, but in the other two, it does, in an expected or an unexpected way. What’s common is that exploration comes first and then exploitation.

Yet we can see the exploration and exploitation in such a sequence only if we focus on a particular element, such as choosing the food in a restaurant. But things are connected, and they interact all the time. If I go to explore a forest, I do that by exploiting my shoes. The pharmaceutical company carries out experiments exploiting laboratory equipment. Spacecraft explore the universe by exploiting various technologies.

In these examples, exploration and exploitation go in parallel, and they are coordinated, but the object of exploration and the object of exploitation is different. I’m exploring the forest, but exploiting my shoes when doing that, not the forest. The pharmaceutical company and the spacecraft also have different objects of exploration and exploitation. However, there are some cases when exploration and exploitation can have the same object and happen at the same time. And this can work pretty well.

One such case is Twitter. Sharing a tweet using the so-called “re-tweet”, was neither designed nor planned as a feature in the initial releases of Twitter. People simply started tweeting others’ tweets, adding “RT” for re-tweet, and this was taken up and became viral. Then both Twitter and apps and services in the Twitter ecosystem added a lot of features around RT. It evolved this way because people were exploring Twitter, and at the same time they were exploiting it. That exploration produced many other ways of using it which did not catch on, or at least not to the point of becoming one of the core capabilities of the service.

Something similar happens in the mobile apps markets. By the first quarter of 2020, there were over 2.5 million apps on the Google Play store and over 1.8 million on the Apple app store. By making it easy for the app writers to release new versions and simple for the app users to install and update, an ecosystem was created that was quite different from the traditional software world. When releasing new apps and features, app writers explore the market while at the same time exploiting it. Each app and each feature work as an almost unbiased market survey. At the same time, they are actual businesses, actual exploitation, with revenue being generated either by ads or by selling the app.

It’s similar for the users. They don’t know what will fit their needs and preferences. While trying out new apps and features, app users are also using them, in this way exploring and exploiting at the same time.

Depending on the perspective (user or provider) and the zoom level (features, apps, or market participants), we can see different dynamics. At the level of a single app, this period of parallel exploration and exploitation evolves into exploitation only, but if we zoom out, we’ll see that they keep running in parallel. While being used to certain apps, users keep exploiting the app market. By utilizing new apps (a level up) or new features (a level down), they co-explore with the app writers.

App writers, both individual and companies, compete by improving the existing capabilities and releasing new ones. Some of them develop new apps, entering again into a mode of parallel compressed exploration and exploitation. The balance can be seen at the next level as well, the level of the developers (entrepreneurs). New ones come and some grow, others go. The invisible hand of the market produces entrepreneurs who try out new things and stay (exploit) if they are successful and leave if not.

Going to jazz festivals taught me a lesson about the balance between exploration and exploitation. Observing app markets shows what it is to explore while exploiting. But for an example of the latter, I could’ve just stayed at the jazz festival. Jazz improvisation is where exploitation and exploration run in parallel to form a compressed and precarious balance. With classical music, exploration and exploitation are separated in time and space. A composer explores by trying out different harmonies and melodies, and later on, an orchestra exploits the composed fixed sequence of chords and notes with a prescribed length and manner of playing. In a jazz band, composing and performing happen at the same time. Each musician is exploring and exploiting the territory marked by the main theme, their own ideas and others’ inventions and provocations. It takes a long time and hard work to reach that level of awareness, intuition, and skill. Musicians put in many years of practice to produce just a few minutes of good unprepared music. The years of preparation supply jazz musicians with both an arsenal of patterns to use when short of ideas (exploit), and the skills necessary to break out of them (explore).

Twitter, the app market, and jazz improvisation show the benefits of compressed or simultaneous exploration/exploitation. That doesn’t mean it’s always good when the balance is achieved this way. To suggest so is to give a prescription. And you could easily come up with a daily example challenging such a suggestion. When you search for a good wine and there is no tasting option, you have to buy and try bottle after bottle until you find one that you like. As with the idea for the exploratory projects that should be safe to fail in the mode of simultaneous exploration and exploitation, the size of the investment is important. A bottle is too big a chunk (and price to pay) for such an exploration. In other cases, there are other important factors besides size. Overall, sequential and simultaneous exploration/exploitation are applicable in different circumstances, and maybe, in the long run, should also be balanced.

There seems to be, however, a correlation between the size of the cycle of each mode, and the level of uncertainty in the environment.

For organizations, over-exploiting will either exhaust the external resources they are exploiting or make them slow to adapt, less competitive, and eventually they are driven out of the market. In contrast, over-exploring exhausts organizations’ own resources. To maintain the balance is always necessary for survival. However, depending on the level of complexity and uncertainty, it might not be sufficient just to maintain it, but how the balance is kept becomes crucial.

In biological evolution, sight might be considered as one of the most impressive achievements. For species that excel in seeing, such as an eagle, perception (exploitation) happens together and is complemented by eye movement (exploration). The evolutionary advantage of not just balancing, but bringing exploration and exploitation closer together, applies to organizations as well, as the examples of Twitter and app markets show. While an eagle has some idea of what it will perceive (exploit) and discover (explore) as it soars above its territory, the app market has much more uncertainty, as the various players sell (exploit) and innovate (explore) all at the same time, creating and reshaping their territory by their actions.

The more uncertain the environment, the more that exploration and exploitation following each other in shorter cycles may be beneficial, and the more organizations need to invent new business models where exploration and exploitation go together or are both characteristics of one and the same activity.



This is an excerpt from the third chapter of Essential Balances



Related posts

Autonomy and Cohesion

Notes on Stability-Diversity

QUTE: Enterprise Space and Time


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.