Solving Doublets in Python

July 11, 2015

This is a follow-up to an article I wrote a few years ago on Solving Doublets in Mathematica.

Doublets are a type of word puzzle invented by Lewis Carroll (author of “Alice in Wonderland”). The goal is to change one word into another by adding, removing, or changing one letter at a time. The tricky part is that each intermediate step must also be a valid word. For more information see this Wikipedia article on word ladders. Here’s an example from that page:

COLD → CORD → CARD → WARD → WARM

If we think of words as vertices in a graph then two words are connected by an edge if they differ by exactly one character. Once we have a graph of connected words we can use a shortest path algorithm to solve for doublets. We’ll solve this in Python 2.7 using the NetworkX graph library. I highly recommend using the Anaconda Python distribution which includes NetworkX and many other useful packages for data science.

First let’s import the packages we’ll need and define a function differ_by_one that returns True if two words differ by exactly one character and False otherwise. Note that we could simply check that the Levenshtein distance between the two words equals 1 but that algorithm is overkill for what we need here and runs much more slowly than the custom function below.

import re
import itertools
import urllib2
import networkx

def differ_by_one(word1, word2):
    # Make sure word1 is shorter or equal in length to word2
    if len(word2) < len(word1):
        word1, word2 = word2, word1
    if len(word2) - len(word1) > 1:
        # Words differ in length by 2 or more characters so return False
        return False
    elif len(word1) == len(word2):
        # Words are same length so check how many characters are different
        # and return True if exactly one
        n_chars_diff = sum(c1 != c2 for c1, c2 in zip(word1, word2))
        return n_chars_diff == 1
    else:
        # word2 is guaranteed to be one character longer than word1.
        # Chop out one character at a time from word2 and compare to word1.
        for i in range(len(word2)):
            word2_shortened = word2[:i] + word2[i + 1:]
            if word1 == word2_shortened:
                return True
        return False

Now let’s get a list of 5-letter words and compute which ones are connected to each other. If two words are connected we’ll add them to our graph.

# Compile list of all lowercase words with up to 5 letters.
response = urllib2.urlopen(
    'http://s3-us-west-1.amazonaws.com/cpeccei-public/words.txt')
words = [word for word in response.read().splitlines()
    if re.search('^[a-z]{,5}$', word)]

g = networkx.Graph()
for word1, word2 in itertools.combinations(words, 2):
    if differ_by_one(word1, word2):
        g.add_edge(word1, word2)

Now if we want to solve a doublet between two words we just do:

networkx.shortest_path(g, 'hello', 'world')

Which in this case returns:

['hello', 'hell', 'held', 'hold', 'cold', 'cord', 'word', 'world']

NetworkX makes this so nice and simple. For fun we can compute some doublets that turn words into their opposites (i.e. antonyms).

# From http://www.michigan-proficiency-exams.com/antonym-list.html
response = urllib2.urlopen(
    'http://s3-us-west-1.amazonaws.com/cpeccei-public/antonyms.txt')
antonyms = [line.split() for line in response.read().splitlines()]

antonym_paths = set()
for word1, word2 in antonyms:
    if word1 in g and word2 in g:
        try:
            path = networkx.shortest_path(g, word1, word2)
            antonym_paths.add(tuple(path))
        except networkx.exception.NetworkXNoPath:
            pass

with open('antonym_paths.html', 'w') as f:
    for p in sorted(antonym_paths, key=len, reverse=True):
        f.write('<p><b>{}</b> to <b>{}</b><br />{}</p>\n'.format(p[0], p[-1],
            ' &rarr; '.join(p)))

Which returns:

heavy to light
heavy → heady → beady → bead → beat → begat → begot → bigot → bight → light

noisy to quiet
noisy → nosy → nose → note → nite → site → suite → quite → quit → quiet

right to wrong
right → sight → sighs → signs → sins → sing → ring → wring → wrong

night to day
night → nigh → sigh → sign → sin → pin → pan → pay → day

tight to slack
tight → sight → sighs → signs → sins → sics → sacs → sack → slack

clear to vague
clear → clean → clan → can → cane → cage → age → ague → vague

dark to light
dark → dank → sank → sink → sins → signs → sighs → sight → light

light to dark
light → sight → sighs → signs → sins → sink → sank → dank → dark

fresh to stale
fresh → flesh → flash → flask → flak → flake → slake → stake → stale

below to above
below → blow → bow → boy → body → bode → abode → above

glad to sorry
glad → goad → good → wood → woody → wordy → worry → sorry

sober to drunk
sober → saber → saner → sane → sank → sunk → dunk → drunk

left to right
left → lest → best → besot → begot → bigot → bight → right

best to worst
best → lest → lost → host → hose → horse → worse → worst

blunt to sharp
blunt → bunt → hunt → hurt → hart → harp → sharp

bless to curse
bless → less → mess → muss → cuss → curs → curse

big to small
big → bid → mid → mil → mail → mall → small

alive to dead
alive → live → lie → lid → did → dad → dead

dull to clear
dull → dell → deal → dean → lean → clean → clear

tall to short
tall → tale → tare → hare → share → shore → short

rapid to slow
rapid → raid → said → slid → slit → slot → slow

low to high
low → sow → son → sin → sign → sigh → high

rich to poor
rich → rick → rock → rook → book → boor → poor

cruel to kind
cruel → creel → creed → reed → rend → rind → kind

dusk to dawn
dusk → dunk → dun → don → down → dawn

bold to timid
bold → told → toed → tied → timed → timid

stand to lie
stand → staid → said → slid → lid → lie

early to late
early → earl → ears → eats → lats → late

loss to find
loss → less → lens → lend → fend → find

long to short
long → lone → lore → sore → sort → short

up to down
up → uh → oh → on → own → down

fast to slow
fast → fat → flat → slat → slot → slow

new to old
new → now → nod → god → gold → old

found to lost
found → fount → font → foot → loot → lost

slim to thick
slim → shim → shin → thin → think → thick

open to shut
open → pen → pun → spun → shun → shut

happy to sad
happy → harpy → hardy → hard → had → sad

sour to sweet
sour → soar → sear → swear → sweat → sweet

odd to even
odd → ode → owe → ewe → eve → even

dry to wet
dry → cry → coy → cot → wot → wet

hard to soft
hard → hart → tart → tort → sort → soft

tame to wild
tame → time → tile → wile → wild

sow to reap
sow → sop → sap → rap → reap

out to in
out → but → bun → bin → in

few to many
few → fen → men → man → many

find to lose
find → fine → line → lone → lose

land to sea
land → lad → lead → lea → sea

peace to war
peace → pace → pare → par → war

fat to thin
fat → tat → tan → tin → thin

less to more
less → loss → lose → lore → more

guest to host
guest → gust → lust → lost → host

take to give
take → hake → hike → hive → give

come to go
come → code → cod → god → go

loud to soft
loud → lout → loft → soft

hate to love
hate → have → hove → love

mad to sane
mad → sad → sand → sane

last to first
last → list → fist → first

bad to good
bad → gad → god → good

cold to hot
cold → colt → cot → hot

cheap to dear
cheap → heap → hear → dear

first to last
first → fist → list → last

me to you
me → ye → yo → you

near to far
near → fear → far

thick to thin
thick → think → thin

wax to wane
wax → wan → wane

here to there
here → there