GeodSoft logo   GeodSoft

Good and Bad Passwords How-To

Cracking "Good" Passwords With Custom Programmed Dictionaries

Passwords made from two unrelated short words and a non letter are often recommended as good. We show that all such passwords can be cracked in an afternoon on a low end desktop PC. The use of two non letters makes much stronger passwords but they are still within the reach of desktop PC technology. Some alternative ways of creating good passwords are examined, but our final conclusion is that humans just aren't very good at creating strong passwords and certainly not strong passwords that have a reasonable chance of being remembered.

Manual Passwords, Too Weak

We've now looked at a number of ways not to form passwords and in the process have eliminated nearly every method that most people like to use to make passwords. A common suggestion for creating good passwords is to take two short, unrelated words and combine them with one or more digits or punctuation or symbol characters. Sometimes the suggestion is explicit that there should be both a digit and a symbol (including punctuation). Sometimes both go between the words and sometimes one or the other goes at the beginning or end of the word. I don't recall seeing suggestions that sometimes both should go at the beginning or end of the words or that one should go at the front and the other the end so that the two words run together.

Using a '#' to stand for any non letter character, the suggested patterns look like:

worda#wordb
worda##wordb
#worda#wordb
worda#wordb#

I've used the linux.words dictionary and extracted all the 2, 3, 4 and 5 character words from it. Reviewing these lists, I'd say that most of the words are common but that a third to a fifth are not what I think of as common words. There are enough uncommon words that I think most people would have some difficulty thinking of short words that aren't in these lists so that for a significant majority of users, both words would come from these lists. I don't know whether it's 60% or 98%, but enough people following the previous advice will be using words from these lists that it's worth examining the feasibility of cracking these by creating custom dictionaries, or having the cracking tool generate such passwords.

The linux.words list contains 49 two character words, 536 three character words, 2236 four character words and 4174 five character words. Making every possible combination of two and three, two and four, two and five, three and three, three and four, and four and four character words results in a little over 8.3 million combinations. If you separate every combination with all 42 non letter characters (not including the space) there are just over 350 million combinations.

In about a half hour, I wrote a Perl script to generate all of these. It took the script between 30 and 45 minutes to write these to disk or 16 minutes in memory not outputting the created passwords. The saved file was 3.7 gigabytes. L0phtCrack takes about 15 minutes to read the file initially, and it's counters overflow with this number of words in the dictionary file. Despite the counter overflow, after the initial read, it processed about 5 million words a minute or somewhat over 70 minutes for the full list and cracked two test accounts I'd created using passwords formed like this. A single non letter separating two words simply isn't good enough, not when a first attempt covered the most likely combinations in about three hours on a very modest machine (PIII 500). These can be cracked almost as easily as single dictionary word based passwords.

Manual Passwords, Better

The number of passwords jumps dramatically with two words and two non letter characters. 8.3 million is multiplied by 42 * 42 * 3 giving 44 billion. The three is for the three different arrangements of non letter characters. At this number, saving the list starts to become a serious issue for just about anyone or any organization as it will require half a terabyte of disk space. It's been my experience that with loop and integer arithmetic intensive processes, that compiled C programs run about 40 times faster than almost identically coded Perl scripts. Thus a C program could generate the simpler passwords at about 850 million per minute (on a PII 450). Even if the extra character imposed a significant overhead, 500 million a minute seems quite conservative or less than an hour and a half for 44 billion.

The cryptography side is more CPU intensive. Returning to the nice round 100,000 per second, generating the hashes for our 44 billion passwords will take about 5 days. But we also know that rate is just a crude estimate that could easily be off by more than an order of magnitude.

As long as I can remember, combining two unrelated words with two non letters has been one of the primary recommendations for creating strong passwords. I know if I were seriously cracking passwords belonging to others, that I would replace the brute force password generator with one that made passwords from short words. It's hard to see how it could be less productive than brute force and might pick up some root or administrator passwords that their owners thought were really good. Of course the yield or efficiency, due to the very large password universe, would be very poor and all the standard dictionary methods should be tried first.

At least one other factor should be covered. The 42 * 42 figures are based on any non letter character in either non letter character position. This also means allowing both characters to be the same and both to be numbers. 00 and 11 would be allowed in the middle of words and worda7wordb7 would also be OK. If we make the stipulation that each password must contain a digit and a symbol or punctuation, the correct multiplier is 8.3 million * 32 * 10 * 3 or about 5.7 billion passwords. This is significantly less than 44 billion we were talking about. If we decide to allow two symbols but never two digits, the numbers are 8.3 million * 42 * 32 * 3 or just under 38 billion.

This shows how important it is that the cracker's assumptions match the methods by which the passwords were created in the file to be cracked. If the good guys use the broadest rules and thus create the largest password sets there will necessarily be some relatively weak passwords that are acceptable. In this case a cracker working with "small" password set of 2.5 billion that come from just the words and numbers will get some passwords but completely miss any with symbols or punctuation. If you use the compromise rule set allowing two symbols but not two digits, unless a cracker has inside information, they will have to search the larger universe wasting processing time on the two number passwords that don't exist.

There are some additional ways to complicate the crackers problem. There are some two word patterns that are never mentioned:

wordawordb##
#wordawordb#
##wordawordb

It's true that the words are run together but so what? Is this going to make them harder to remember? Can we not tell where one ends and the other begins? Does it matter? The fact is these new patterns double the size of the password universe a cracker has to match. There is one group that we do need to pay special attention to. In the word combinations of two three letter words and two four letter words, there will be duplicate words.

Though there is no evidence that they are currently doing so, cracking tools could get these with by repeating the word and prepending or appending arbitrary character sequences. L0phtCrack can append arbitrary character sequences of arbitrary length but can't double words. Crack 5 and John the Ripper can double words and prepend or append short digit sequences. It's not clear their rule syntax provides a convenient method for making mixed symbol and digit sequences.

If you don't allow duplicate words and digit sequences, these passwords are much stronger than any that can be derived from single word dictionaries. Current tools cannot transform any single word to create these sequences. Separate words must be combined programmatically to obtain these sequences.

It's very important not to confuse some of the DONT's that apply to variations on single dictionary words with superficially similar character sequences that appear as part of programmatically generated patterns. abduct66 is fundamentally different than bitaid66. The former is a trivial transformation to a common dictionary word which cracking programs may get in a few minutes with current dictionaries and rules. The only ways to get the latter is with brute force which is unlikely even using a character set lacking symbols and punctuation or using a custom programmed dictionary where multiple words and digits have been combined for the express purpose of cracking passwords. Without insider information, it's unlikely that bitaid66 will ever be found because it does not fit any standard recommendations on forming good passwords.

Fortunately for the good guys, we haven't mentioned the case of letters as a factor in these passwords made from short words plus random non letter characters. If we start mixing the case of the letters in the words, we can add about two orders of magnitude of complexity to the cracker's problem.

While truly randomly mixing case on 8 letters will expand the possibilities by 128 times, there is nothing in passwords that I find harder to remember or type than truly arbitrary case words. There is an approach that significantly complicates the crackers' job while not being so demanding of our memory and fingers. This is to limit the capital letters to the first, last, inner or outer positions. Some examples will help: 1Bad&Tuba = first, *losT)baG = last, bolD^2Rug = inner, [3HateraT = outer.

In the examples, I always uppercased both positions to make what I meant clear. In practice, the choices should be between 1) either or both or 2) either, both or none. Keeping no upper case as an option once again creates more possibilities but means some individual passwords that are more likely to be found by a cracker looking at only lower case options. The either or both approach increases the choices by three and is the safer option.

But it's better than that. We don't tell the cracker which approach we use and different people use different approaches; the choices are now increased by 12. To get these with any efficiency a cracker needs three custom programmed dictionaries. The first has all lower case, the second has the first, last, inner and outer upper case combinations and the third all the other mixed case combinations. That's likely to be a pain to program; if we're lucky the cracker jumps from all lower case to full mixed case.

Lets review. There are 8.3 million word pairs. These are combined with two non letters; both cannot be digits. There are six patterns created by where the non letters are placed relative to the words. We're using 12 capitalization variations. This is 8.3 million * 42 * 32 * 6 * 12 which gives 809 billion possible passwords. This a lot better than dictionary derived passwords but if our encryption estimates are off by very much, a cracker has lots of computing power or is willing to wait a while (it's 94 days at 100,000 passwords per second) it's still clearly within the realm of today's technology. If we really want our passwords to be safe we have to do better, a lot better.

Also, if decent frequency tables could be found for short words (like the Census name lists), it would be possible to build several smaller dictionaries and process the most likely word combinations first. This could dramatically alter the time it takes to get at least some of the passwords created as we have described.

Alternative Manual Passwords

Among the discussions of how to create good passwords, there are sometimes suggestions for creating passwords from the first letters or initial few characters of words in sentences or phrases that the user can remember. No specific suggestions were included in the list of Common Password DO's because no single suggestion is nearly as common as combining multiple words with non letters and there is no easy way to phrase such suggestions that won't result in a method that can be programmed to create dictionaries to crack the resulting passwords. Unless the recommendation also includes explicit discussion of minimum password length and the use of mixed case and non letters, there is a good chance resulting passwords may fall to brute force attacks.

For the next few years, any method that helps a user remember passwords that are of a reasonable length and character diversity and do not contain dictionary words or simple transformations of dictionary words, will likely create passwords of moderate to good strength. The actual strength of the password will depend on the actual length and character diversity, as well as the avoidance of dictionary words. It's surprising just how obscure the result of any two of the following transformations of a dictionary word is likely to be: reverse, rotate, keyboard shift, collating sequence shift, drop or add a character. In other words, passwords created by these methods look almost random to a human, but are easily found by the cracking tools. To be fair, program generated passwords may contain such sequences by chance, as are sequences derived from phrases or sentences.

Passwords created via a personal algorithm that is more complex than multiple transformations of a single dictionary word are likely to be better than two short words and one non alpha character and depending on the algorithm, as good or better than two short words and two non alpha characters.

I'm going to suggest such a personal algorithm that deliberately violates some of the more common negative advice regarding passwords. A typical family has four or more members. A starting point for passwords might be the first two to four characters of the first and middle names of your family, avoiding any sequences that are by themselves dictionary words. Another component might be a variety of alphanumeric sequences that represent birth dates of family members but avoid using the "19" part of the year and the more common date formats and sometimes mix in one or two character month abbreviations instead of numbers. If you know the day of the week that family members were born on, the day abbreviation might sometimes substitute. If family members were born in different locations, city and state abbreviations might form an additional part. Putting these together in a variety of combinations would likely yield a number of not easily crackable passwords.

For a hypothetical family similar to mine, these might yield the following bits and pieces: ge geod ged geda gdav gda gd ph phi phl phly plly lly phlly pl sy sylv syli sli lit sl wa wac wacr wcr wcri cri wc 1222 de22 241222 dec22 d22 1224 2412 24d22 24de22 24dec22 0909 99 909 9928 90928 28909 280909 s9 se9 sep9 sep09 s28 se28 sep28 28s9 28se9 28sep9 28s09 28se09 28se09 718 0718 j18 ju18 jul18 j55 ju55 jul55 55718 550718 55ju 55jul 55j18 55ju18 18ju55 18j55 47 047 0407 4762 040762 40762 a7 ap7 apr7 apr07 a762 ap762 6247 620407 62a7 62ap7 62ap07 62apr7 fbhin bhin fthin infbh inftbh infth fnj fmnj mnj ftmnj monnj njft njftm njfm njftmo hpa pah paha pahar harpa penn hpen hapen wwv wvw whwv whewv wvwh weva wheewv.

Obviously, I wasn't completely consistent in terms of taking a set number of letters or using correct abbreviations but each piece is easily derived from names, birth dates and places of somewhat hypothetical family members. One of the birth places had a three piece name, parts of which were or were not used. If you start combining these, especially if you start changing case or using punctuation for separation there end up being quite a few possibilities, most of which would not fall to a dictionary attack. Most should be much easier to remember than arbitrary or random sequences of similar length.

I deliberately created a sample approach to creating passwords that violates some of the most common advice so that it won't be repeated elsewhere. If someone picks up on this and starts making passwords similar to these, based on their own family and are careful about resulting dictionary words, they are likely to have secure passwords especially if they add some personal variation to the method. If they tell someone else how they make their passwords, the method immediately becomes much less secure. If the method becomes publicized as an example of how to make good passwords it becomes still less secure.

If the cracker community believed that this had become a common way for users to create passwords, then much of any value it may have once had would be greatly diminished because the crackers could start trying to program dictionaries of such passwords. If enough information about a specific individual were obtained a focused attack could be launched but building a general purpose dictionary would seem more productive.

An attempt to cover all possibilities suggested by the method under discussion would make the programmed dictionaries discussed so far look small. The number of unique two to four character sequences derived from all common first and middle name combinations is several thousand. The number of possible birth dates in all formats is huge; by focusing on the most common formats and possibly limiting them to month and day without year the number is greatly reduced. For cities and states, by focusing on a the couple hundred largest cities, plus cities and states in the immediate geographical vicinity of the targeted computer, this method is brought to a manageable size.

Regardless of the programmed dictionary details, exact pieces included and total size, the results would almost certainly be better than a brute force approach, that is assuming the method became a reasonably common method for creating passwords. If not, the efforts would be a complete waste, producing no positive results with typical password hash database sizes. Since the method depends on personally related information of the most obvious type, it is unlikely to ever be widely repeated as a suggested method for creating passwords. Still, used with knowledge of how password cracking tools work, it is in reality probably as good as any other manual method for creating passwords unless the user happend to have a family biography on-line from which all the pieces might be extracted.

My conclusion is, that any method that creates passwords that the individual can remember where the resulting passwords are reasonably long and have sufficient character diversity and cannot be derived from the transformation of a single dictionary word or the combination of multiple dictionary words, is likely to result in a reasonably strong password that is unlikely to be cracked by today's or near future technology. The more widely known any such method is and the more specific it's instructions on creating passwords, the poorer it becomes. For these reasons, I won't repeat any advice related to creating passwords from sentences or phrases. Any specific password or passwords used to illustrate the use of any method for creating good passwords are immediately and for ever after poor passwords.

Humans Are Not Good Password Generators

People think much as they did in the early 70's and create their passwords from familiar parts while computing power increases exponentially. It's only a matter of time until human created passwords don't stand a chance against automated cracking tools.

By the very nature of the way our minds work, humans are not good password generators. Even well trained and intelligent people who understand the security issues of computer passwords, can't help but fall into predictable patterns and repetition when they create passwords manually. Perhaps before the time that computers can reliably crack any human created passwords, all computer authentication will have become biometric or some other non password based approach. I expect that the disappearance of passwords will be like the arrival of simple and reliable voice recognition, one of those things that keeps taking longer than everyone thinks it should.

We want passwords that are easy to remember and naturally select words and character strings the come to mind quickly. Even if it were our goal, humans can't mentally create random character sequences. Where people have shown themselves to be inventive regarding passwords, is in finding ways to make easily cracked passwords that effectively circumvent system imposed limits that attempt to enforce strong passwords. For example, "Attack1", has mixed case and a digit, thus includes three of the four character types, and yet will fall to all three cracking tools discussed here.

As an experiment, I timed how long it took to think of and write 20 "good" passwords, similar to two short words and one or two non letter characters, but trying to avoid using actual words. It was just under 10 minutes. There were nine words and would have been more if I hadn't stuck an extra letter on some when I realized I'd written a word. I understand that random does not mean an even distribution and especially with small sample sizes you shouldn't expect an even distribution of characters. I suspect however that 18 t's when there were 3 letters that were used only once and 5 only twice was not a product of random variation but the start of a pattern. I also suspected that 12 a's and u's versus 4 e's might be part of a pattern. At the rate I was working it would have taken a little under a year, without stopping for food or sleep to create a million passwords.

If we make them pronounceable and thus relatively easy to remember there is a strong tendency for the result to be real words. Any password that we have to memorize and think through character by character without a pattern or pronounceability to help us is just too hard to remember for practical purposes.

Computers on the other hand are ideally suited for generating lots of good passwords. I had an early version of password.pl password generator, create a number of one million password lists. Each took a few minutes on 500 MHz computers. With default settings there were less than 30 duplicates within a million password list and slightly less between lists. By making minor changes to the settings, thus making the passwords somewhat harderand out of a larger universe of possibilities, one million password lists could be generated with no duplicates.

transparent spacer

Top of Page - Site Map

Copyright © 2000 - 2006 by George Shaffer. This material may be distributed only subject to the terms and conditions set forth on http://GeodSoft.com/terms.htm. These terms are subject to change. Distribution is subject to the then current terms, or at the choice of the distributor, those defined in a verifiably dated printout or electronic copy of http://GeodSoft.com/terms.htm at the time of the distribution. Distribution of substantively modified versions of GeodSoft content is prohibited without the explicit permission of George Shaffer. Distribution of the work or derivatives of the work, in whole or in part, for commercial purposes is prohibited unless prior permission is obtained from George Shaffer. Distribution in accordance with these terms, for private, unrestricted and uncompensated public access, non profit, or internal company use is allowed.

 
Home >
How-To >
Good Passwords >
human_passwords.htm


What's New
How-To
Opinion
Book
                                       
Email address

Copyright © 2000-2006, George Shaffer. Terms and Conditions of Use.