If you’ve fallen far enough down the Wordle rabbit hole you may have heard of Dordle, a version of Wordle where you solve two words at once. If you’re looking for more of a challenge, Merriam-Webster has you covered with Quordle, where you solve four words at once. Of course any Wordler worth their salt should be able to handle eight words, like in Britannica’s Octordle. And if you want to do sixteen words at once, you’re spoiled for choice between Sedecordle and Hexadecordle. And no, it doesn’t stop there.
Sexaginta-quattuordle isn’t real, it can’t hurt yo–
One logical extreme of this trend would be to take the list of 2315 valid secret words to create duomilia-trecenti-quindecordle, where each day the puzzle is a different permutation of those 23151 words. Despite how chaotic the user interface would need to be, this variant wouldn’t be much of a challenge: since the same guess is applied to all 2315 words every turn, entering each of the 2315 secret words in any order will always solve it with a perfect score of 2315 guesses.
But what if you could enter different guesses for each of the 2315 secrets each turn? I call this Hyper-Wordle since it can be viewed as an exponentially larger version of normal Wordle:
Normal Wordle | Hyper-Wordle |
---|---|
Secrets are chosen from the \(2315\) possible 5-letter secret words. | Secrets are chosen from the \(2315!\) possible permutations of 5-letter secret words. |
Guesses are chosen from \(12972\) possible 5-letter words. | Guesses are chosen from \(12972^{2315}\) permutations (with replacement) of possible 5-letter words. |
Feedback is given in the form of \(5\) colored squares. | Feedback is given in the form of \(5 \times 2315 = 11575\) colored squares. |
Your score is the number of 5-letter guesses needed to identify the secret. | Your score is the total number of 5-letter guesses needed to identify each word in the secret permutation. |
Believe it or not, this is a real Wordle variant I ran into back in 2022 as part of a competition to see who could write the best Wordle solving program. Originally, the competition tested programs against a sample of 1000 words chosen randomly with replacement. Since some secret words are easier to solve than others, you could spam submissions repeatedly with a suboptimal strategy and eventually get lucky enough to beat better strategies:
Central limit theory in action.
Testing against permutations of the 2315 secret words without replacement seemed like
it might negate any chance of abusing variance.
For example, if you used the optimal2 Wordle strategy starting with the word SALET
(average score of ≈3.4212 guesses) against every secret in the permutation,
submissions would score exactly \(3.4212 \times 2315 = 7920\) regardless of the permutation
since every potential secret word always appears exactly once. Despite this,
there were still ways to introduce variance:
The histogram above shows the score distribution when using the optimal Wordle strategy
starting with SALET
(score of 7920) on half of the words in the permutation, and using the second best
strategy starting with REAST
(score of 7923) on the other half. Variance comes
from the fact that each strategy has its own strengths and weaknesses. For example:
SALET
solvesSAUTE
in 2 guesses, whileREAST
solves it in 4.REAST
solvesROUTE
in 2 guesses, whileSALET
solves it in 4.
If you spam enough submissions, you can retry until you’re tested against permutations where each
strategy covers for the other’s weaknesses, i.e. words like SAUTE
end up in the SALET
half,
and words like ROUTE
end up in the REAST
half.
While we could intentionally inject variance like this and spam submit,
there isn’t much merit in being the contestant who submits the most times. In particular,
the mixed strategy scores ≈7921.5 on average (the average of SALET
’s score and REAST
’s
score) which is worse than SALET
’s score of 7920 by itself.
What if we could find a way to outperform the SALET
strategy on average? For example,
can we take advantage of the fact that the secrets are permuted without replacement to
gain extra information?
Wacky Trick Leaks Extra State
Before we try to solve a permutation of 2315 words, let’s consider a simpler scenario
where we’re solving a permutation of six secret words in parallel:
FIRST
, DEUCE
, THIRD
, FORTH
, FIFTH
, SIXTH
. Let’s take a look at a strategy
where LEAKS
is our starting word:
Note this is a deterministic strategy, meaning our guess for each word is
based solely on feedback we’ve received for the word so far. For example, we
guess THIRD
in all three positions where the feedback from the first guess was five gray squares.
While we show the secret words in order here, since the strategy is deterministic it always
requires a total of 15 guesses to solve all the words regardless of how they’re permuted. Next,
consider a strategy with MAJOR
as our starting word:
Again, this deterministic strategy requires 15 guesses to solve any permutation of the 6 chosen secret words.
Neither the MAJOR
strategy nor LEAKS
strategy are particularly impressive on their own. Let’s
try to solve an unknown permutation of our secret words while mixing the two starting words,
with MAJOR
for the first three positions and LEAKS
for the last three:
To put you in the mindset of the puzzle, the actual value of each secret word is kept, well, secret.
The only information you have is that each of the six secret words appears
only once, but can be in any order. The possibilities column lists the possible secret
words which can be in a position based on the feedback from guess 1,
using 1
, 2
, 3
, 4
, 5
, and 6
as shorthand for FIRST
, DEUCE
, THIRD
, FORTH
,
FIFTH
, and SIXTH
respectively.
Before we make any more guesses, is there anything we can do to narrow down the values in
the possibilities column? Looking closely, we already know the position of 2
: it
must be in the fourth position since it’s the only secret which matches that feedback pattern
for LEAKS
. This allows us to remove 2
from the lists of possibilities in the first
two positions:
Now that we’ve removed 2
as a possibility in the first and second positions, we see
the first position must be either 5
or 6
. Consider the following two scenarios:
- If
5
is in the first position,6
must be in the second position since there would be no other option that could go there. - If
6
is in the first position,5
must be in the second position since there would be no other option that could go there.
In Sudoku3 puzzles this is known as a Naked Pair.
While we don’t know which of the two scenarios we’re in yet, in every scenario 5
and 6
must be in the first two positions, allowing us to rule them out from any other position:
After this deduction, we know 1
must be in the fifth position since it’s the only viable
option. This allows us to remove it from the list of possibilities for the third
position.
By the same logic, 3
must be in the third position, and we can remove it from the
possibilities for the sixth position.
Finally, we can deduce 4
must be in the sixth position. Initially we only knew the position
of 2
, however after applying deductions we learn the exact position of four out of the six
secrets! If we submit guesses tuned to take advantage of our updated knowledge:
We’re able to solve every word in a total of 13 guesses, an improvement over 15 guesses
for both the MAJOR
strategy and the LEAKS
strategy on their own.
Taking a step back, where did this improvement come from? Like with the SALET
/REAST
example from earlier, the individual MAJOR
and LEAKS
strategies each have their
own strengths and weaknesses:
MAJOR
always knows the location ofFORTH
after submitting guess 1, whileLEAKS
doesn’t find this out until after guess 2.LEAKS
always knows the location ofDEUCE
after submitting guess 1, whileMAJOR
doesn’t find this out until after guess 2.
In the example we worked through above, notice how the first deduction uses information
from the LEAKS
half of the puzzle to rule out the location of DEUCE
(2
) in
the MAJOR
half of the puzzle earlier than it normally could. In other words, LEAKS
’ strengths covers for MAJOR
’s
weaknesses, which in turn gives MAJOR
enough information to cover LEAKS
’ weaknesses.
By exploiting the asymmetry in the strengths and weaknesses of each strategy,
we’re able to iteratively refine both strategies to perform better than the sum of their parts!
We can brute force over all \(6! = 720\) possible permutations of our secret words to build up histograms showing how much improvement deduction gives us on average:
On the left, we have the result of mixing the two strategies without using any deduction
tricks. This produces a vaguely Gaussian looking distribution averaging a score of 15,
the same as using MAJOR
or LEAKS
on their own.
On the right, we have the result of mixing the two strategies and using
deduction tricks to refine our guesses with an average score of 13.9, a 1.1 point improvement!
Widen Scope
Now that we’ve seen this work with permutations of six secret words, let’s see how we
do against permutations of the complete list 2315 secret words. We can start off with the
SALET
/REAST
mixed strategy we showed earlier:
The values on the right are the same from earlier, showing the score distribution of the SALET
/REAST
mixed strategy on 1000 random permutations of the 2315 secret words.
On the left we have the results on the same 1000 permutations after eliminating possible
states via deduction each turn and refining our guessing strategy accordingly. Deduction takes our
average score from 7921.5 to 7768.8, a 150 point improvement!
SALET
and REAST
were chosen since they’re the top two deterministic Wordle strategies,
but what about mixing other strategies? During the competition, the best combination of
strategies I found was by assigning 10% of the permutation to each of the top 10
Wordle starting words: SALET
, REAST
, CRATE
, TRACE
, SLATE
, CRANE
, CARLE
, SLANE
, CARTE
,
and TORSE
. Plotting this against the previous two histograms:
The top 10 mix with deduction is shown in green with an average
score of 7628.0, an additional 130 point improvement over the SALET
/REAST
deduction strategy!
Trying to mix in more words (e.g. top 20) seems to have diminishing returns since introducing
less efficient starting words drags the expected score without deductions up.
The top 10 mixed strategy is what I ultimately used in the competition mentioned earlier, winning
with a score of 7574
– a 4.4% improvement over the optimal Wordle strategy by itself!
Final Words
If you want to tinker with ideas, I generated all the data for the strategy histograms in this post using this very adhoc Rust code. Some interesting open questions are:
- What’s the best mix of two starting words (i.e. lowest average score)? Intuitively, some starting words might “synergize” with each other better than others if the structure of their decision trees tend to lead to more deductions.
- My winning strategy only behaves non-deterministically on the first turn, when we randomly use 10 different guesses despite every word having identical feedback at that point (i.e. no feedback). Can we get further improvements by behaving non-deterministically on later turns?
- The deduction strategy only removes words from the possibility pool when we’re certain they must be somewhere else. Is there a way to “fuzzily” refine our possibilities to values other than 0, e.g. “this word is likely to be in position A, so it’s less likely to be in position B”?
- In the version of Hyper-Wordle played in this writeup, guesses are permutations with replacement. What do strategies look if we limit guesses to be permutations without replacement?
- Are there any other games/scenarios where combining multiple suboptimal strategies outcompetes a strategy which would normally be stronger?
- Should I find less convoluted things to do with my free time?
If you enjoyed reading this, this entire writeup was actually much longer before I broke it into three standalone parts. Excluding the one you’re reading right now, the other two are:
- The Sixteen Bottles of Wine Riddle – I thought of this riddle while trying to think of a simpler version Hyper-Wordle to use as a toy example to introduce some concepts. Despite trying to make it as simple and symmetric as possible, it still ended up having a surprising amount of depth!
- Writing Wordle bots for fun and profit – This gives some context on some of the other stages of the Wordle strategy competition, which I also happened to win. No novel discoveries to share there, but it’s a fun story anyway if you’re into Wordle.
Anyway, thanks for reading!
-
2315 was the original number of Wordle secret words. After the New York Times acquired Wordle, it was revised down to only 2309 secret words. ↩
-
Leading with
SALET
was the best strategy in early 2022, but since the New York Times changed the word list and secret list after acquiring Wordle, it is no longer optimal. ↩ -
You can think of the deduction steps between guesses as very oblong Sudoku puzzles, where instead of a 9x9 grid with uniqueness constraints on 1 to 9, you have a 2315x1 line with uniqueness constraints on 1 to 2315. ↩