What can music composition tell us about business optimization?

Main visual : What can music composition tell us about business optimization?

Last year I reignited a long-standing musical interest while I was on vacation. Sixteenth century counterpoint, to be exact.

For reasons that are too complicated to explain right now, in my student days I enrolled in a music composition module and took things in an unexpected direction by teaching the university’s Burroughs B7700 mainframe to compose. This was for my final project in a FORTRAN class.

I found that counterpoint interested me in its own right. But my vacation project last year had another motive. I had recently started working on a new solution called ‘digital annealing’ to optimize extraordinarily complex problems.

I’ve written about digital annealing elsewhere (see our blog series and the Fujitsu White Book of Quantum Computing), but in essence, it provides the capability to find the optimum solution very quickly to a problem where the number of possible combinations is so large that conventional computers often cannot produce an answer in a practical amount of time.

There are some interesting experimental approaches to overcome the size/time challenge underway using prototype quantum computers. However, digital annealing is already producing results at scale by applying lessons learned from software developed for future-generation quantum computers to run these so-called ‘combinatorial optimization’ algorithms on current-generation digital architectures.

My ulterior motive was that I wanted to demonstrate a practical application of how digital annealing helps overcome limitations of data sampling in very large and extremely complex information spaces.

The importance of sampling

What we learned on the composition course is that there are centuries-old rules for composing ‘good’ counterpoint. Nine of them, in fact. To give you a flavor, rule one states that a rising cadence at the end of a section of music is the right way to go – without it the monks, apparently, got bored.

For each of these rules, there are myriad potential solutions. As is often the case with complex information spaces, there are just too many possible places to look for the optimum data points than you realistically have time to consider - even using incredibly fast computers.

This is where sampling comes in. It’s a statistical technique that looks at individual data points within the total array of possible answers. Deciding which are the best possible answers within the data set is the tricky part. Somewhere between random sampling at the crudest end of the spectrum and the sheer brute force of testing every possibility, there are several statistical techniques for finding the best answer using the minimum amount of computation (and hence time, money, and energy).

You might have heard of one of the best-known methods – Monte Carlo simulation. This relies on repeated random sampling to solve problems that are or might be, probabilistic in nature. It’s used where the number of possible outcomes runs to hundreds of millions, often billions, or more. Things as diverse as oil field modeling, the set-up of racing cars, and the virtual screening of molecules for potential new drugs. These are all applications we are working on right now: racing cars, for example, might have 30 possible settings, each with 10 levels.  The number of possible settings is 1030, such an enormous number that it does not even have an –illion name.

Repeated random sampling so many possibilities with Monte Carlo simulation is clearly going to be faster than chewing over each one in turn. But how do you know whether your random samples are the really interesting ones? Do you know where you are going to find the best answers and learn the most useful lessons? In short: they are random, so you don’t.

Finding the best solutions hidden in the cracks

Fortunately, another statistical technique can help here: simulated annealing. This approach seeks the ‘global optimum’ in large data sets (as opposed to ‘local optima’ within subsets of the data). Think of it as creating a topography: annealing models the total data set as a three dimensional ‘landscape’ made up of peaks and valleys (see image). The most interesting data points will be at the bottom of those valleys. These become the non-random samples, which can then be further refined in an iterative process until you have the real diamonds.

‘Quantum annealing’ attempts to use the quantum phenomenon of tunneling to move towards the global optimum without having to traverse the entire terrain. Digital annealing does not leverage tunneling but operates on digital circuits that can be used today. It is already being used for transformational uplifts in optimization performance of, for example, factory robots’ pathways, routing in warehousing and logistics, and risk reduction and performance maximization of financial portfolios.

So far so good. But for many annealing applications, things are not always so straightforward. The landscape might not have the pronounced contours of valleys and peaks one might hope for. What you might be looking at is more like a parched, flat mud surface with only shallow cracks (see the second image). This can often be the case in oil field modeling, for example. By identifying sample data in the ‘cracks’, digital annealing allows us to constrain and manage the search space.

Making AI Deep Learning more effective

As well as being inherently more effective at finding relevant data samples, digital annealing can also improve the performance of deep learning in artificial intelligence (AI). In a deep neural network (DNN), weights and biases are currently adjusted to train the model towards the desired outcome. To do this, however, requires vast amounts of training data for the model to try out what works and what doesn’t. Shortage of usable training data is a common stumbling block in AI, particularly when you are looking for an unusual thing that, by definition, does not occur very often.

Digital annealing provides a more elegant, faster method of providing the most promising data points based on a more complete understanding of the information space to generate sample instances from that space. A weather forecasting model, for example, does the same thing. A single weather forecast generates billions of ‘sample’ weather events (say every 15 minutes over every kilometer at 50 different altitudes for 14 days in the future). By comparison, the input data is tiny. Forecasts work as well as they do because the bulk of what describes the information space is embedded in the mathematical formulation that described the dynamics of the weather. In digital annealing the algorithm – known as a QUBO or quadratic unconstrained binary optimization – does the same thing, generating data for training AIs.

Better – but by how much?

That’s the setup. Imagine me now, on vacation in my sunny garden. I’m writing the code to use digital annealing to compose a “good” counterpoint.

This is what randomly generated notes sound like:

And this is what constraining the options to the rules for composition using digital annealing actually produced:

I agree it’s not going to win an Ivor Novello Award, but you can tell instantly that something has improved over the ragbag collection of random notes. Even for the short pieces here, there are 1.4X1034 possible tunes. This is why the monks were taught the rules – some story about a room full of monkeys and typewriters comes to mind.

It’s actually possible to quantify what “good” looks like here, using something called the Hamming Distance. This is a coding metric that quantifies the number of bits in a binary string that you need to flip to arrive at another string (see fig. 3). Two completely random strings will have a larger Hamming Distance and therefore the lower the Hamming Distance between two pieces the more alike they are. If a large number of annealed samples are closer to each other on average than the same number of random samples are to each other, the better our QUBO is at complying with the rules. But, if the number is too low, then all you are demonstrating is a lack of diversity, or in this context creativity: the pieces are too similar – and possibly even the same piece of music. Note that the standard deviation is also still large enough to indicate lots of diversity. To get the optimum result, you need to be sampling deep in those cracks.

While the idea of sampling in simulated annealing is not new, the ability to perform it at scale definitely is. This means that relatively few people have so far put their minds to leveraging it for business advantage, although that situation is now rapidly changing. Right now, we are using digital annealer sampling to identify new therapeutic molecules, from billions of candidates, for treating Dengue Fever and COVID-19. Using conventional random sampling does not get you to usable molecules. Using digital annealing sampling does – very quickly – by focusing only on the sampled data that is likely to be fertile ground, reducing the task to a manageable scale.

Perhaps these scenarios apply to your situation too? If you are already using Monte Carlo simulations, for example, especially if they are not giving you meaningfully-differentiated answers to your questions. A focused co-creation workshop can quickly identify whether there is value in further investigation. To set one up, contact me.