Making unnecessarily long sums with Roman numerals

Update November 2016: I recently tried to remake this more rigorously, and discovered I made at least one incorrect assumption in the details below (i.e. I’m pretty sure I got the carryover rules wrong). I’m not going to rewrite this, but just know that it’s not necessarily right – and, in what should have been my very first step anyway, the brute-force version is fast enough for this scale of numbers anyway and much easier to trust.

If you ignore less-common variations of particular digits (such as writing 4 as IIII instead of IV), there’s only one way to write any individual number* as a Roman numeral. And despite usually needing more than one ‘character’ per digit, unlike our current Arabic numeral system, it’s still base 10 and each magnitude of 10 is similarly self-contained. For example, 1776:

1776 = 1000 + 700 + 70 + 6
MDCCLXXVI = M + DCC + LXX + VI

Recently I was wondering about Roman numerals. There’s only one way to write any individual number, so the question of “what is the longest Roman numeral?” is trivial, and usually contains a lot of 8s (88 is the longest under 100, 888 is the longest under 1000, 3888 is the longest under 5000, and so on). But there’s an easy way to introduce variation, which is the question I ended up looking at:

For a given number A, what are the two numbers B and C such that A = B + C, and the number of characters in B and C when expressed in Roman numerals is greatest?

Alternatively:

What’s the longest way to write a number as the sum of two Roman numerals?

I’m sure I’m not the first, but I couldn’t find the answer after a quick search on the internet. And pleasantly it turns out there’s a much nicer way than brute forcing every possible combination. I actually approached this in a rather roundabout way, accidentally discovering a ‘shortcut’ and having to go back and verify it worked, but I’ll put it here in a more sensible order. Let’s start with single digits.

For 1 to 9 inclusive, there is at least one way of writing it as a sum of two numbers. Each of these variations has a different representation when written as Roman numerals, and can potentially vary in length, although most numbers have several results of the same length. If that’s confusing, maybe the actual tables will clear things up a little (or not – sorry! I’m not good enough with HTML/CSS to fix the formatting).

 1 1 + 0 I 1 2 2 + 0 II 2 1 + 1 I I 2 3 3 + 0 III 3 2 + 1 II I 3 4 4 + 0 IV 2 3 + 1 III I 4 2 + 2 II II 4 5 5 + 0 V 1 4 + 1 IV I 3 3 + 2 III II 5 6 6 + 0 VI 2 5 + 1 V I 2 4 + 2 IV II 4 3 + 3 III III 6 7 7 + 0 VII 3 6 + 1 VI I 3 5 + 2 V II 3 4 + 3 IV III 5 8 8 + 0 VIII 4 7 + 1 VII I 4 6 + 2 VI II 4 5 + 3 V III 4 4 + 4 IV IV 4 9 9 + 0 IX 2 8 + 1 VIII I 5 7 + 2 VII II 5 6 + 3 VI III 5 5 + 4 V IV 3

Importantly, while this is specifically written for 1-9, the same pattern holds at every magnitude – a “ones-digit” (I, X, C), a “five-digit” (V, L, D) and a “tens-digit” (X, C, M). I’m sure there’s some proper name for those terms.

So – since each magnitude of ten is separate, couldn’t you just pick the longest row (or one at random for digits with multiple equally-long rows), substitute the appropriate characters, and glue the values together to find a working sum? Well, no. There’s an omission in the table above, but with hindsight it’s trivial to see.

Remember that the longest Roman numeral under 1000 is 888 – DCCCLXXXVIII (12 characters). It stands to reason that 888 + 888 (24 characters) would thus be the longest sum for 1776, but if we use the above table we get 1776 = 1443 + 333 = MCDXLIII + CCCXXXIII (17 characters), which is much shorter. The error is that the tables aren’t complete – the best solution for 6 is 3 + 3, but the best solution for 16 is 8 + 8, two characters longer.

This makes things more complex, though – it might work for 1776, but is carrying over whenever possible always the better strategy? There might be some pairs of digits where ‘borrowing’ ten from the left digit loses more characters than the right digit gains. I think there are only two cases here – the vast majority of the time, where there are digits beyond that pair, meaning both can potentially be carried over, and the ‘left end’ of the number, where carrying over is eventually impossible. (The latter here isn’t quite equivalent to a pair of digits where the leftmost is 0 – 04 can’t be, but 404 is perfectly fine as 390 + 14.) But if there’s somewhere I’ve made a mistake, it would be here, or in the conclusions about carrying over gained from the next table.

This table is effectively a summary of the above and a similar version extended to 18 = 9 + 9; for each digit, it shows the longest sum length in Roman numerals when not allowing carryovers (<10) and when allowing them (<19). It’s probably obvious, but note that numbers ending in 9 can’t really be carried over – no two single-digit numbers are equal to 19.

 Value Best Sum (<10) Best Sum (<19) 0 0 6 1 1 7 2 2 6 3 3 5 4 4 6 5 5 7 6 6 8 7 5 6 8 4 4 9 5 –

If that’s extended into the various combinations (fairly straightforward in Excel or your spreadsheet program of choice), I determined that carrying over is almost always worthwhile, with only a few exceptions:

• For any pair of digits ending in 9, since as mentioned above carrying over 19 is impossible
• For most pairs of digits ending in 8, since there’s no improvement from 8 to 18 and making the left digit smaller usually shortens it
• Except for 88 and …08, which get longer

So, with those limits in mind, here’s the summary for the ‘best’ digits to use for each digit, carrying over or not.

 Value Longest variant(s) 0 0,0 1 1,0 2 1,1; 2,0 3 2,1; 3,0 4 2,2; 3,1 5 3,2 6 3,3 7 4,3 8 4,4; 5,3; 6,2; 7,1; 8,0 9 6,3; 7,2; 8,1 10 7,3; 8,2 11 8,3 12 8,4 13 7,6; 8,5 14 7,7; 8,6 15 8,7 16 8,8 17 9,8 18 9,9

Let’s try 1776 again:

1776 = 1760 + 16
= 1600 + 160 + 16
= 100*(8,8) + 10*(8,8) + (8,8)
= 888 + 888

That’s more like it! Let’s try another – how about 1000?

1000 = 990 + 10
(19 can’t be carried over, since it can’t be written as the sum of two digits)
= 900 + 90 + 10
(Multiple combinations, so let’s just choose any)
= 100*(6,3) + 10*(7,2) + (8,2)
= 678 + 322

What about 990? It’s not quite the same, because this time we haven’t yet ‘used’ the lowest digit.

990 = 980 + 10
= 800 + 180, but the resulting sum would actually be shorter than not doing it!
= 900 + 80 + 10
= 100*(8,1) + 10*(4,4) + 7,3)
= 847 + 143

And essentially, that’s it. Having determined the best for individual digits by hand, taking only a few minutes, and when not to carry over from the digit above, it’s trivial to find the best sum(s) without having to manually go through every combination.

To save you the trouble, if you’re curious: The numbers with the most equally-longest variations below 3999 are 3989 and 3998, both with 648 equal sums (incidentally both of 17 digits, though that mode doesn’t care how long they are). 3999 itself only has 432 variations. And as you might have suspected, the longest overall result is 3776 at 26 characters with 2 variations (1888 + 1888 and 2888 + 888).

As might be expected, if you’ve looked at anything else on this blog, I turned this into a program – in this case, a command-line Windows executable written in C#. I’m sure it contravenes just about every guideline for command-line programs, but as far as I can tell, it produces the right values. As well as doing the work for you, it can also find the number(s) with the longest sum, or the number(s) with the most equally-long sums.

The biggest issue with it (apart from barely resembling a ‘proper’ command-line program, and probable bad organisation) is that it’s limited to numbers between 2 and 3999. This isn’t a programming limitation – it’s just that Roman numerals for 4000 and above are written differently. I couldn’t easily output unicode to the Windows command-line to display them properly, and I didn’t want to invent ‘fake’ letters to use instead, so it just stops there. Here’s a slightly different way of looking at things (click for full-size). The horizontal axis is (are?) the actual numbers 2 to 3999. At the bottom of the graph in blue, using the left vertical axis, is ‘Count’ – the number of equally-maximally-long sums for each value; you can see the occasional high peaks near …98 and …89, with the two highest on the far-right of the graph, a little hard to see. The orange crosses represent the length of the ‘best’ solution for each value. It’s not a particularly good fit, and there’s probably some mathematical reason for the resemblance, but I added a few log-trendlines (with Excel doing the hard work of fitting them) – the solid orange line for all points, the yellow line for the lowest, and the grey line for the highest, with those last two roughly forming an ‘outline’, suggesting that the spread of lengths for values increases a little faster than the average once you get past 500 or so.

The rate of increase in the Count chart is a little less interesting – there’s generally not much pattern at this scale, but the peaks here are entirely linear because of the range covered: 776, 1776, 2776, 3776. The pattern would continue a little longer – 4776, 5776, 6776 – but perhaps a little ironically, 7776 would be shorter.