The purpose of this WEB site is to make the largest database of free association ever collected in the United States available to interested researchers and scholars. More than 6,000 participants produced nearly three-quarters of a million responses to 5,019 stimulus words. Participants were asked to write the first word that came to mind that was meaningfully related or strongly associated to the presented word on the blank shown next to each item. For example, if given BOOK _________, they might write READ on the blank next to it. This procedure is called a discrete association task because each participant is asked to produce only a single associate to each word.
We started collecting these norms in 1973 as result of a desire to compare the effectiveness of rhymes and synonyms as retrieval cues (Nelson & Brooks, 1974; Nelson, Wheeler, Borden, & Brooks, 1974). Subjects studied individually presented words under various conditions and we prompted their recall using either rhyming words or meaningfully related words as cues or prompts. Because our interest was in determining what type of cue was more effective under different conditions of learning, we were concerned about pre-existing strength between the words used as test cues and the words that had just been studied. Deciding issues about the effectiveness of rhyme as compared to meaning cues made sense, at least to us, only after adjusting for initial differences in cue-to-target strength acquired prior to the laboratory experience. Free association data for rhyme and meaning appeared to provide a useful means for indexing pre-existing strength in the absence of a study trial. The normative data provided a large sample control group. At the outset of this research, we chose words that we thought would produce the studied words as responses in order to use them as test cues. We then collected norms after the experiment was completed to determine the probability that the normed word produced the studied word in free association. Hence, by using dictionaries and our own associative knowledge we tried to second guess what words a group of students would produce under conditions of minimal constraint, a procedure that is still used by many researchers. Interestingly, as a result of collecting norms after the experiment, we discovered that we were often correct in our guesses but just as often we were not. Cues that we thought would work effectively sometimes worked so effectively that no one ever failed to recall the associated target whereas other cues did not work at all. This work convinced us that there was a real need for additional normative data if we were going to continue cued recall research. The famous Jenkins and Palermo (1964) norms were useful, but too limited for our purposes because only 200 words were normed.
The finished product from all this effort was a mostly legible hand-written sheet of paper with the normed word appearing at the top of the page and the responses and tallies in a random order on the rest of the page. These pages were kept in neat piles organized around the booklets because it seemed like a good idea to know when the data were collected and with what other words. Eventually, we alphabetized the words within each pile, although the responses still appeared in random order of appearance. Of course, this kind of organization got to be more and more like the Federal government. To this day we are unsure about the cognitive skills involved in being able to find words sorted into such nice piles. Finding words took a long time and we began suffering from redundancy because students using the norms often could not find a word that they were looking for and then normed it again. Over 300 words were normed a second time by accident. Of course, accidents can provide data too, and our mistakes eventually provided the impetus for a reliability study (Nelson & Schreiber, 1992).
After norming about 1,500 words into the most advanced version of this organizational scheme we decided that it might be time to buy a computer and a database and try our luck with the machine age. This was in about 1986. The task of learning about computers and databases required a considerable period of time for DN for reasons that need not be specified and words that cannot be printed but the job was eventually completed with some success. At that time, our analysis of the situation led us to believe that the best way to get the information into the computer was to type it. This meant that each of our tabulated sheets had to be organized in order of strength and otherwise cleaned up to avoid making our typist suicidal. Each page was re-organized, re- written, re-counted, and re-checked by either DN or CM, and then typed in the computer by the blessed Charlotte Hall. All parts of this task were found to be genuinely tedious by all. If CM had not tested so high on the "clerical" part of the Strong-Campbell vocational interest inventory, we would surely have quit the task before midway. Inns in Vermont were frequently discussed, but by the early 90s we figured a way out of much of the drudgery. Pay someone else to do it. As additional norms were collected heroic secretaries entered each cue and its responses directly from each booklet page into a spreadsheet and then this information was reduced, organized and counted by either DN or CM. The majority of the words appearing in the norms were organized on this basis because it provided a more efficient use of our time and because the resulting information could be directly imported into the database. Furthermore, because of this increase in efficiency more than one-half of the norms were collected during the 1990's.
Throughout the life of this project, all responses were classified by either DN or CM or by both when questions of classification arose. What is important about this point is that the responses for the majority of the words were not automatically included regardless of spelling, plural status, and so on. Spelling errors were corrected, and rules were developed to pool items that, in our judgment, should be put together. For example, the stimulus WOMAN produced MAN as the dominant response, but a few subjects wrote MEN. Instead of treating each of these responses as separate items the count for MEN was pooled with the count for MAN. Insofar as plurals were concerned, the rule was to pool minority responses with the majority when the same word stem was involved. Similar rules were used for tense and grammatical form. In general, pooling was used reluctantly and only when it seemed clearly justified, but clearly the responses were not simply counted as in a frequency count (Kucera & Francis, 1967). We engaged in this practice because our interest was in assessing the relative strength of a given response for use in cuing and priming studies and we assumed, rightly or wrongly, that a more accurate indication would be provided by pooled responses rather than by separate tabulations. However, as a result of this practice, scholars who are interested in specific forms of response should be especially careful in using these norms.
Many words, particularly in the last several years, were added because they completed or extended our ability to norm entire associative sets. In about 1989, DN set up an associative matrix in which a normed word and all of its associates were listed as column names and again as row names, thereby creating an n x n associative matrix. This procedure allowed us to count the mean number of connections running from associate-to-associate (what we call connectivity) and from each associate to the normed word (what we call resonance). Initially, all this work was done by looking up the associative connections in a printed copy of the norms, e.g., DINNER produces associates of supper, lunch, meal, and so on, and this procedure requires looking up supper to determine the strength of its connections to dinner, lunch, meal, and so on.
By looking up each associate, its connection to each of the other words making up the set could be determined and entered into the matrix. DN did the first several hundred of these matrices to get a feel for the job, and then handed the task over to Nancy Gee who was in her first week in graduate school. Fortunately, Nancy did not leave the lab for more interesting pursuits and, fortunately again, Tom Schreiber learned enough computer language to write a program that would do all these calculations automatically in a few seconds. His diabolical little program identified the associates that had been normed, looked up all the values, and printed the matrix in a clear and usable form (see Appendix C). The program also determined statistics of interest and what associates needed to be normed, Tom's so called MIAs, which stands for missing-in- action. Many of the words in the norms were selected to change a word from MIA status to a "normed" status so that we could fill in a critical associate in the matrix. As of this writing, we have completed or nearly completed associative matrices for 4,097 words. Incidentally, this whole effort was initiated by DN after having forgotten Deese's (1965) seminal work that presaged our effort by nearly 30 years. Not surprisingly, DN cannot recall exactly what associative matrix he was working with when he remembered that he had seen one before but clearly Deese deserves the credit for suggesting that such matrices might be useful for exploring memory. If we are due any credit at all, then it would be for persistence in the face of intense tedium. This accolade may serve as a new definition for courage in the computer age. In publishing the norms our hope is that others may find their courage in different but equally worthwhile pursuits.
Given the presumptuous importance of prior associative knowledge in processing everyday experience, the study and understanding of this structure is a self-justified scientific goal. Geneticists are justified in mapping genes, cosmologists in mapping the galaxy, and geologists in mapping the earth, and psychologists are justified in mapping the connections among words learned as a result of everyday experience. However, this is not to say that structure itself should be the only or even the primary goal of psychological science. Knowing that two words are not directly connected but are connected though mediating links is like knowing that nails can be used to connect one piece of lumber to another. Knowing about the relationship between nails and lumber is important but in and of itself such knowledge will not get a wall built correctly.
Building a wall requires coordinated mental and physical acts that use this knowledge. So it is with studies of memory in which people are asked to study words and then recall or recognize them when given words as test cues. People engaged in this activity are apt to rely on several different kinds of mental acts or processes and identifying and understanding what these processes are has been an important part of psychological science for many years. Nevertheless, we believe that mental acts such as comprehension, elaboration, retrieval, and so on, cannot effectively be understood in isolation from the materials to which the acts are directed. Just as knowledge of the relation between nails and lumber is insufficient for producing a wall, mental and physical acts of themselves cannot produce a wall without knowledge of how nails and lumber are to be combined. At the least, such a wall is not likely to pass inspection. The point that mental acts depend upon knowledge has been made more eloquently and more completely by Jenkins (1979), but an examination of the current literature on memory and memory theories suggests that this important point is often forgotten or ignored. For us, the main justification for using normative data is that researchers will benefit from some knowledge of what this structure is before they go about selecting materials for their research.
Having concluded that free association is likely to be better than rating procedures, it is important to note that free association suffers shortcomings as well. The first is that it provides a relative index of strength, not an absolute index. Knowing that the response "read" is produced by 43% of the participants to the cue BOOK does not tell us how strong this response is in any absolute sense; it tells us only that this response is stronger than "study" which was produced by 5.5% of the participants. Unfortunately, free association norms like relatedness ratings provide only ordinal measures of strength of association but, as far as we know, there are no known measures of absolute strength. Furthermore, free association provides an index of connection strength that comes without a measure of dispersion or variance. It indicates or points to the probability that one word produces another under the free association instruction and given a particular sample size. Measures of dispersion require repeated measurement with either the same individual or with different groups if individuals. This shortcoming may limit the use of this index in some situations but two important facts ameliorate concern. First, as noted earlier, free association norms are reliable. Knowing that 43% of the participants in one group produce "read" to BOOK tells us that another similarly constituted group of equal size is likely produce the same response at this level (e.g., Cramer, 1968; Nelson & Schreiber, 1992). Second, free association norms have strong predictive relationships to cued recall (e.g., Nelson, Schreiber, & McEvoy, 1992), feelings of knowing (e.g., Schreiber, 1998), priming (Canas, 1990), and to other types of performance that rely on memory (e.g., Cramer, 1968). Despite the absence of a measure of dispersion, the strength index has proven useful in predicting and controlling performance in psychologically important tasks.
Finally, two caveats for using free association norms must be mentioned, one concerning strength and one concerning generalizability. The concern over strength arises because only a single response was required in the discrete association task used for these norms. As a result of this restriction, the norms probably underestimate the strengths of very weak responses that are directly connection to the word being normed. Although the norms provide a reliable index for the strongest associates, they presumably underestimate the strengths of very weak associates and this point should be kept in mind when using norms to build materials for research. The concern over generalizability arises as a result of comparisons of our norms to those collected in other places. Insofar as we know, the largest free association database ever amassed was collected in Great Britain by Kiss, Armstrong, and Milroy (1972). When we were one-third of the way through the present norms, we discovered that they had norms for more than eight thousand words and like a fox in a chicken coop CM gratefully sunk her teeth into them only to discover substantial differences between their values and ours. Differences in language experience between Great Britain and Florida are the most likely culprit, but such differences may also exist to some extent within the US as a result of how specific words are used in different regions of the country. For example, associates to APPLE may be different in Florida than in other locations where apple trees and traditions of apple pie are more frequent. Although Florida students surely know about apples, some have never seen let alone climbed an apple tree and therefore their most frequent responses to apple are "red" and "orange" with "tree" and "pie" given relatively infrequently. They are familiar with orange trees so they are not completely deprived. They just have a different experience and that experience is reflected in their responses. Although the present norms have been used successfully in many places in the US, the important point is that free association norms, or norms of any kind, must be used with sensitivity to word usage in particular locations.
These considerations indicate that the free association procedure used for this book breeds its own devils. At best, it seems to be an imperfect tool. Ultimately, we may discover that other procedures such as continuous association, co-occurrence norms, or even relatedness ratings provide a superior means for assessing connection strengths between related words. The measurement issue begs for additional research because the importance of mapping word knowledge justifies such attention. Of course, one course of action would be to abandon the goal of mapping such knowledge on the grounds that the task is too difficult and too boring for great minds. Abandoning or ignoring the problem has historically been the mode of choice in memory research and in other fields as well, but relinquishing this effort is likely to come at the cost of creating a field that cannot effectively deal with one of the most fundamental questions about memory. How does word knowledge interact with ongoing memory performance (e.g., Kintsch, 1988; Nelson, McKinney, Gee & Janczura, 1998)? Another course of action, which is one that we advocate, is to compare procedures for measuring the strength of pre-existing connections and then decide which of the procedures has the fewest problems or which procedure seems to work best for implementing a particular aim. We cannot know which procedure is better until they are critically evaluated, and partly to this end, the present norms should prove useful.
The fields appearing in Appendix A are separated by commas in text format so that the document can be opened in a variety of different programs and databases, e.g., it can be opened in a column format in StatView, Excel, and other database programs. The files are labeled Cue-Target Pairs followed by a letter designation indicating that cues beginning with the designated letters can be found in this field, e.g., "Cue Target Pairs.A-B" means that normed words beginning with the letters A or B and their responses can be found in this file. In this format, data for 5,019 normed words and their 72,176 responses can be found. For each file, 31 data fields are presented so that the total matrix size when pooled across beginning letters is 31 columns by 72,176 rows. There are potential data entries for 2,237,456 cells in this matrix. A file containing the entire matrix was not provided because we thought that it would be too large to open on some computer systems. Instead, we provide smaller files based on 8 letter groupings, i.e., A-B, C, D-F, G-K, L- O, P-R, S, T-Z. Grouped in this way, the files are approximately the same size and this procedure was followed for the other appendices as well.
Data. The first column or field in each file presents the normed words or Cues listed in alphabetical order, and the second field presents their responses or Targets. In this format, the cues and their responses (targets) are presented as pairs. We refer to these items as cue- target pairs because of how such items are selected for use in research in our area of memory. Targets are selected as words to be studied in memory experiments, and cues are used to prompt their recall. Given the wide variation in word properties, the norms are used for constructing lists of pairs that systematically vary in some properties while holding other properties constant.
As a result of incorporating the norms into a database program, our list construction processes have entered the computer age and it is now feasible to control certain word attributes while varying others with greater degrees of rigor than ever before. For example, by imposing search restrictions on the targets in the pool, such as reporting only words that occur 50 or more times per million, that have a concreteness rating of 4.8 or greater, and that have no more than 16 and no fewer than 8 associates, all words whose associates are connected to an average of 3 other associates in the set can be reported. Instead of selecting words on only a single attribute such as frequency, they can be selected on the basis of a multitude of attributes while simultaneously holding other attributes constant. This capability also holds for pairs of related words. Instead of selecting attribute levels for manipulation blindly, the distribution of values can be plotted and then cutoffs marking extreme values can be set with full knowledge of the form of the distribution, its mean and its variance. Moreover, instead of selecting items to be representative of some particular dimension of interest, items can be selected randomly with normative values used after data collection to develop prediction equations for various tasks. In short, there may be no end to the uses to which a database of this sort might be applied. Our experience has been that list construction processes take more rather than less time since we created the database, but the final product is far superior because the "noise" resulting from uncontrolled factors can be substantially reduced. With less noise in the lists, more subtle main effects can be detected with greater ease and shy but theoretically interesting interaction effects become more bold. In general, Appendix A can be used for selecting pairs of related words that have been produced by two or more subjects in free association, but by incorporating the information in Appendix A into a database program the materials can be manipulated and selected in much more sophisticated ways.
The remaining fields present information about the pairs or the individual words comprising them. The 3rd field, called NORMED?, indicates whether the target word in the pair has been normed by a separate group of participants. A Y stands for "yes" and indicates that the target has been normed and an N stands for "no" indicating that it has not been normed. Of the 72,176 responses or targets appearing in the database 8,557 have not been normed and therefore cells that depend on normative information for these items have been left blank. This means that data are provided for only 63,619 of the 72, 176 responses. These responses comprise the 5,019 normed words produced redundantly by different cues, e.g., 18 different words produce ABILITY as a response. The Normed? field is particularly important for researchers wishing to select pairs with known forward (cue-to-target) and backward (from target-to-cue) strengths. Those tempted to infer the strength of the backward connection from the strength of the forward connection should beware. The correlation between forward and backward strength for cues whose targets have been normed is positive but not high, r = .29 (n = 63,619), and the chances of correctly guessing back strengths from knowledge of forward strengths are low.
The 4th field is called #G which stands for the number of participants serving in the group norming the word, and the 5th field is called #P for the number of participants producing a particular response. The 6th field is called FSG which stands for forward strength or what has sometimes been called cue-to-target strength. This value is calculated in the traditional way by dividing #P by #G which gives the proportion of subjects in the group who produce a particular target in the presence of the cue word. For example, for the word ABILITY, 17 out of the 143 participants in the group produced CAPABILITY as a response, so FSG for this pair is calculated to be .119. From this value we assume that it is reasonable to infer that the probability of producing CAPABILITY in the presence of ABILITY in the absence of studying either of these words in an experimental context is approximately .119. Each of the files in Appendix A was sorted first on the beginning letter of the normed cue word, then by FSG from highest to lowest, and then, within FSG, alphabetically by the target.
The 7th field is called BSG which stands for backward strength or target-to-cue strength. The word "backward" here is apt to be confusing to some because BSG is measured in the same way as forward strength, except the word appearing as the "target" now serves as the "cue" to be normed instead of the reverse. The term backward simply follows the conventional but admittedly misleading use of the term in memory research. If it is important for some purpose to know #G and #P for the index of BSG, look up the word serving as the target in a given pair as a cue. For example, for CAPABILITY in the above pairing, 35 out of a group of 124 participants produced ABILITY as a response, so BSG in the ABILITY CAPABILITY pairing is calculated at 35/124 = .282.
The next 6 fields index indirect connections between the word pairs. FSG and BSG represent measures of direct strength because one word directly produces the other as an associate in free association. Indirect connections index links between related words that occur through other words. Such connections are often ignored in research applications of normative data but they can be very strong and can have large effects on memory performance in certain tasks (Nelson, Bennett, & Leibert, 1997; Nelson et al., 1998). The 8th field is named MSG for mediated strength which is also sometimes called 2-step strength in the memory literature. For example, ABILITY produces competence as an associate with a probability of .06 which in turn produces capability as an associate with a probability .08. The mediated strength of the ABILITY CAPABILITY pairing is calculated by cross multiplying the individual links and then summing the results across each link. Given that no other mediated links were detected for this pair MSG was calculated as .06 * .08 = .0048. This particular pair has one 2-step mediated link, but some word pairs have no such connections whereas others have as many 17. The highest calculated MSG in this database is .66 and it should be noted that indirect strength as indexed by this procedure sometimes exceeds direct strength.
The 9th field is named OSG for overlapping strength. Two words comprising a particular pair may also have associates in common, what have sometimes been called overlapping, convergent or shared associates. The cue word and the target word may produce some of the same words as associates. For example, both ABILITY and CAPABILITY produce the same 6 words as associates, including able, strength, talent, potential, capacity, and knowledge. The overlap strength for this pair is calculated as shown in Table 2. From this example, it should be clear that OSG is calculated like MSG in that the strengths of the individual connections are cross multiplied and then summed.
|Example for calculating OSG.|
|Cue to Overlapping |
|Target to Overlapping |
The next 9 fields provide information about the cue, information that is independent of its targets. Each field name contains the letter Q as a indication that the information presented is related to the cue or normed word. The 14th field provides a relative index of how many near neighbors the cue has, or what we generally call its cue set size, QSS. This index is calculated by counting the number of different responses or targets given by two or more participants in the normative sample. Some words have set sizes of 1.00 (e.g., LEFT) whereas others have set sizes of 30 or more different words (e.g., FARMER), and in general, set size closely approximates a normal distribution. The criterion of "two or more" participants was chosen many years ago on the assumption that idiosyncratic responses given by a single participant would tend to be "off the wall." The opinion was that such responses should not be counted as in the set because they would "vary with different walls" and would therefore be unreliable. However, after years of data collection it has become more clear that such responses make sense most of the time to an objective observer so most are not "off the wall" as the senior author once thought. They are however, unreliable because re-normings of hundreds of the same words showed that a completely different set of idiosyncratic responses were produced each time the words was normed (Nelson & Schreiber, 1992). Words given by two or more subjects tend to be highly reliable, as is the number of different words produced by the cue, regardless of whether they are given by two or more participants or by a single participant. What is different between normings are the specific idiosyncratic responses produced by a single participant.
We now interpret these findings to mean that most words are linked to very large numbers of other words, links that presumably are created as a result of experience with words in spoken conversation, reading and thinking. Discrete free association norms, we believe, provide a reliable index of the number of strongest associates, or nearest neighbors in the sense of semantic distance. Even a response that is provided by only 2 out of 150 participants is regarded as a relatively strong associate. However, because idiosyncratic responses seem to be unreliable members of the set, we concluded that words are connected strongly to some of their associates and are very weakly connected to many other associates, associates that are produced out of context rarely and with some inconsistency. The lesson we take from these considerations is that discrete free association provides a very good indicator of the number of strong associates and a very poor indicator of the number of weak associates. Hence, we conclude that QSS provides a relative index of the set size of a particular word by providing a reliable measure of how many strong associates it has. Because it fails as an indicator of the number of weak associates, this index should not be construed as providing an index of absolute set size.
The 15th field presents the printed frequency of the cue, QFR, and these values were borrowed from the Kucera and Francis (1967) norms for the convenience of readers. The 16th field shows a concreteness rating on a scale of 1-7 for many of the words in the norms, QCON. Many but not all of these values were borrowed. First, we looked up a given word in the Paivio, Yuille and Madigan (1968) norms, and if the word was located, then its concreteness was entered into our database. If the word was not located in these norms, we then looked the word up in the Toglia and Battig norms (1978) and used this value. Finally, if the word was not in either source, we sometimes normed it ourselves using procedures described by Paivio et al. (1968). In this way, concreteness values are provided for 3,260 words for the convenience of readers (non- normed words have been left blank).
The 17th field provides information on whether the cue is a homograph, QH. The information was also borrowed from other databases that separate the associates into two or more classes on the basis of different meanings. A blank space indicates that the cue word under consideration is probably not classified as a homograph, and a single letter indicates that it is a homograph or that it is likely to be one. The letters refer to the first letter of the first author associated with the homograph norms so that interested readers can pursue source if desired. This information is provided in Table 3, and it should be noted that, as with concreteness ratings, sources were used in a particular ordering. This ordering can be described by arranging the letters of the authors from first to last used: N, P, W, T, G and C. Other than selecting what was handy at the time, no particular rationale was used in determining this ordering but it does mean that some words will appear in more than one set of norms and this fact is not recognized here.
|Sources of homograph norms.|
|C||6||Cramer, P. (1970).|
|G||247||Not normed to our knowledge. |
Identified as likely homographs
by Nancy Gee
|N||297||Nelson et al., (1980).|
|P||33||Perfetti et al., (1971).|
|T||167||Twilley et al., (1994).|
|W||48||Wollen et al., (1980).|
The 18th field presents the part-of-speech classification of the cue word, QPS, which was determined by the first part of speech listing in The American Heritage Dictionary of the English Language (1980). Only a single entry is provided for each word, even when, for example a word can be classified as either a noun or a verb. Part of speech is indicated by the first letter or by two letters for each classification, and Table 4 provides the definitions.
|Definitions of parts of speech.|
|Abbreviation||Part of Speech|
The 19th field provides an index of the mean connectivity among the associates of the normed word, QMC. This measure is obtained by norming the associates of the cue word with separate groups of participants, counting the number of connections among the associates in the set, and then dividing by the size of the set (minus MIAS if there are any). This index captures the density and in some sense the level of organization among the strongest associates of the cue. The 20th field provides an index of a related measure, QPR, which measures the probability that each associate in the set produces the normed cue as an associate. The P stands for probability and the R stands for resonance to recognize the fact that, if a resonant connection exists between the normed word and one of its associates, then there must be a connection in both directions. In activation models activation can presumably resonate between the initiating stimulus and the back- connected associate. This index is calculated by simply counting the number of associates in the set than produce the cue word as an associate and then dividing by set size (minus MIAS if any). The 21st field provides a companion value called QRSG representing the resonance strength of the cue. This index is calculated by cross-multiplying cue-to-associate strength by associate- to-cue strength for each associate in the set and then summing the result. Table 5 illustrates this calculation for the cue ABILITY. The table includes only resonating associates because associates that do not produce the cue word do not contribute anything to the sum, i.e., they zero out.
|Example calculation of QRSG|
The 22nd field provides what we call a Use Code value for the cue, QUC. QUC values are 1's or 0's depending on whether there is an important associate that has not yet been normed. For many of the cues given UC's of 1, all of their associates have been normed, but some cues having non-normed associates with strengths equal to or less than .04, have also been assigned UC's of 1. These were items that, in the senior author's opinion, could be used in experimentation because the missing associates were unlikely to alter the estimates of connectivity and resonance in a significant way. QUC's assigned values of 0 indicate that many of the associates or that an important associate was not normed. Such items should not be selected for purposes of experimentation when the purpose of the study is to investigate the influence of variables linked to the associative organization of the network, such as connectivity and resonance. In general, we recommend using items with UC's assigned a value of 1.
The next 9 fields, fields 23-31, provide information about the target itself, information that is independent of its cue. This information is parallel to that described for cues, so parallel names were created by substituting the letter T for target in front each designated field. For example, TSS stands for Target Set Size and this index of how many strong associates there are for a given target is calculated in the same way as it was for the cue. Hence, these designations include TSS, TFR, TCON, and so on.
Quick Reference. Table 6 provides a quick reference guide for the abbreviations appearing at the head of each data field:
|Abbreviations of terms and their equivalencies in Appendix A.|
|TARGET||Response to Normed Word|
|NORMED?||Is Response Normed?|
|#P||Number of Participants Producing Response|
|FSG||Forward Cue-to-Target Strength|
|BSG||Backward Target-to-Cue Strength|
|OSG||Overlapping Associate Strength|
|#M||Number of Mediators|
|MMIA||Number of Non-Normed Potential |
|#O||Number of Overlaping Associates|
|OMIA||Number of Non-Normed Overlapping |
|QSS||Cue: Set Size|
|QH||Cue is a Homograph?|
|QPS||Cue: Part of Speech|
|QMC||Cue: Mean Connectivity Among Its |
|QPR||Cue: Probability of a Resonant Connection|
|QRSG||Cue: Resonant Strength|
|QUC||Cue: Use Code|
|TSS||Target: Set Size|
|TH||Target is a Homograph?|
|TPS||Target: Part of Speech|
|TMC||Target: Mean Connectivity Among Its |
|TPR||Target: Probability of a Resonant |
|TRSG||Target: Resonant Strength|
|TUC||Target: Use Code|
Data. The data in Appendix B represent a special arrangement of the data available in Table 1. However, instead of presenting each normed word and each of its responses, each response is provided in alphabetical order and all the words from the norms that produce it as an associate are listed below it. This format will be particularly useful for anyone who has already selected target words, and is now looking for suitable cues in order to prime or cue these targets. For example, if ABILITY is selected as a target, Table 7 shows that the word CAPABILITY produces this word with a forward strength of 0.28, that COMPETENCE produces it with a probability of 0.17, and so on. The cues are listed in terms of the strength of the forward connection between the cue and the target.
Appendix B also provides additional information concerning the cue-target relationship as well as information about the cue and the target as individual words. A more complete description of the field names can be found in Appendix A. Finally, note that at the end of the listing of related cues, the number of such cues is reported. This number is interesting because it provides an index of how many words in the norms produce the target word as an associate. This number provides a rough index of the production frequency of a word which may be related to its general accessibility in memory (e.g., see Nelson & Xu, 1995). Appendix E compares this measure of accessibility with Kucera and Francis (1967) printed frequency for those who might be interested.
|Targets and the cues that produce them.|
|NO. OF CUES: 18|
Quick Reference. Table 8 provides a quick reference for the abbreviations heading the data fields for Appendix B:
|Abbreviations of terms and their equivalencies in Appendix B.|
|FSG||Forward Cue-to-Target Strength|
|BSG||Backward Target-to-Cue Strength|
|OSG||Overlapping Associate Strength|
|QSS||Cue: Set Size|
|TSS||Target: Set Size|
|QMC||Cue: Mean Connectivity Among its |
|TMC||Target: Mean Connectivity Among its |
|QUC||Cue: Use Code|
|TUC||Target: Use Code|
Data. Appendix C provides an alphabetical listing of the n x n associative matrices for the normed words along with a file for missing associates. The ALL MIAS file lists the normed words with associates that have not yet been normed, their set size, each missing associate, the rank of the missing associate in the set and, finally, its strength. In general, missing associates represent weak associates in the set of the normed word, and have a mean rank in the set of 12.62 (SD = 4.78) and a mean strength of connection to the normed word of .02 (SD = .01).
The files labeled Matrices.A-B, and so on, offer a two-dimensional view of the information in Appendix A. The matrices provide a concrete representation of associative structure for a given word and they can be useful when interest is focused on controlling or manipulating the number and pattern of connections between a word and its associates. For example, as shown in Table 9, the word DINNER has a set of five associates, including supper, eat, lunch, food, and meal. In this matrix and in all others, only the first three letters of each associate are shown on the columns to conserve space (each associate is printed completely on the rows).
|DINNER and the connections among its associates.|
DINNER Mss:5 MssA:12.60 Conc:5.38 ConcA:5.46 Freq:91 ConnA:17 ConnM:3.40 ResP:0.60 UI:1.00 |DIN|SUP|EAT|LUN|FOO|MEA|#Co|PrT|΅St|Mss|PrA| __________|___|___|___|___|___|___|___|___|___|___|___| DINNER | - |.54|.11|.10|.09|.09| - | - | - | - | - | __________|___|___|___|___|___|___|___|___|___|___|___| SUPPER |.55| |.02|.03|.17|.01| 4 |.80|.06|11 |.36| __________|___|___|___|___|___|___|___|___|___|___|___| EAT | | | | |.41|.02| 2 |.40|.21|11 |.18| __________|___|___|___|___|___|___|___|___|___|___|___| LUNCH |.27|.02|.08| |.20|.06| 4 |.80|.09|17 |.24| __________|___|___|___|___|___|___|___|___|___|___|___| FOOD | | |.41|.01| |.02| 3 |.60|.14|18 |.17| __________|___|___|___|___|___|___|___|___|___|___|___| MEAL |.21|.06|.06|.06|.49| | 4 |.80|.17| 6 |.67| __________|___|___|___|___|___|___|___|___|___|___|___| #Connect | 3| 2| 4| 3| 4| 4| ProbConnec|.60|.50|1.0|.75|1.0|1.0| ΅ Strength|.34|.04|.14|.03|.32|.03|
To construct each matrix, each of its associates was normed with separate groups of subjects, e.g.,SUPPER was presented to one group, EAT to another, and so on. The matrices contain forward strengths and should be read along their rows from left to right. For example, when SUPPER was normed, it produced LUNCH as a target with a forward strength of .03. To determine backward strength, look up the pair in reverse order, e.g., for LUNCH-to- SUPPER look in the LUNCH row which shows this value to be .02. The total number of matrices (4,095) presented in this appendix is smaller than the total number of normed words (5, 019) because any normed word having an non-normed associate stronger than .04 was eliminated from the pool. Of the words comprising the pool, an average of 92% (SD = 8%) of their associates have been normed. The absence of a value in the matrix is interpreted as an indication that there is either no connection or that it is too weak to be measured by free association and therefore represents a negligible value that presumably can be ignored.
As can be seen by reading along the first column of the matrix, some of the words produced by DINNER also produce this word as a response (e.g., supper, lunch and meal each produce DINNER). The DINNER-to-supper-to-DINNER connection is an example of a 2- step link (.54 x .55), and for convenience of reference we refer to such links as resonant connections because they return to the target. Also note that there are associate-to-associate connections throughout the matrix, e.g., supper is connected to each of the other four associates in the set, eat is connected to food and meal, and so on. In our terms, such connections define the connectivity of the normed word.
Indices of both resonance and connectivity are reported in the printed version of the norms and in other appendices because they appear to effect cued recall and recognition (e.g., Nelson, Bennett, Gee, & Schreiber, 1993; Nelson et al., 1998). They may be important in other tasks as well, and such values are reported in Appendices A and B with a USE CODE (UC) index of 1 or 0. In Appendix C all of the reported matrices have a UC index of 1. A UC of 1 indicates that all of the critical associates of a word have been normed. Given an interest in either resonance or connectivity as variables, only those with a UC designation of 1 should be selected in building lists for experiments. An even more stringent criterion can be used by selecting only those items with a Usability Index (UI) of 1.00. At the top of each matrix, the UI index indicates the proportion of associates normed.
Quick Reference. Each matrix provides some redundant as well as some new information about each normed word that is listed on the same line as the normed word. It also provides a list of the missing associates listed under the matrix, if any, as well as summary calculations on the rows and columns that some may find useful. It should be noted that summary calculations appearing in the ProbConnec row have been adjusted for missing associates and self- connections (each were subtracted from the divisor). The information provided about the target is defined in Table 10 (see other Appendices for comparable statistics):
|Abbreviations of terms and their equivalencies in Appendix C.|
|Mss (also see QSS or TSS)||Meaning Set Size of Normed Word|
|MssA||Average Meaning Set Size of the Associates |
of the Normed Word
|Conc (also see CON)||Concreteness Rating of the Normed Word|
|ConcA||Average Concreteness Rating of the |
Associates of the Normed Word
|Freq (Also see QFR &TFR)||Kucera & Francis (1967) Printed Frequency |
of the normed Word
|ConnA||Number of Connections Among the |
Associates of the Normed Word
|ConnM (also see QMC & TMC)||Mean number of Connections for Each |
Associate of the Normed Word
|ResP (also see QPR & TPR)||Probability that the Associates Produce the |
Normed Word as an Associate
Data. Appendix D provides the idiosyncratic responses for each normed word, that is, it provides the responses given by only one subject. The file contains three columns of data. The first presents the cues, the second presents their idiosyncratic responses, and the third presents the probability of response production by a single participant. The number of idiosyncratic responses was calculated by subtracting the number of different responses produced by two or more subjects from the total number of different responses produced in the group (respectively, MSS and TSS in Table 1). Given this measure, participants produced 111,157 idiosyncratic responses which comes to an average of 22.15 such responses per normed word. On average, more idiosyncratic responses are produced than responses given by two or more participants. This production was highly variable across different words, ranging from 1-73 responses with a standard deviation of 10 words. However, we hasten to note that only 111,026 idiosyncratic responses are reported in this appendix because some were missing as a result of errors of various types. Rather than spending weeks tracking down the errors, we are simply reporting what we have.
As noted earlier in this report, at the outset of this work we thought that idiosyncratic responses would tend to be "off the wall" so they were not included in the database. Specific idiosyncratic responses did turn out to be unreliable (Nelson & Schreiber, 1992), but interestingly our reliability studies indicated that the total number of idiosyncratic responses produced in response to a given word was highly reliable. In other words, when the same word was normed a second time, about the same number of idiosyncratic responses are produced each time except they tend to be different words. We have now seen enough of these responses to believe that most are very weakly related responses. As noted earlier, the free association procedure seems to provide a reliable index of the strongest associates of a word but not of its weakest associates. In any case, idiosyncratic responses are provided for nearly all of the normed words in Appendix D in case someone wants to study or use them in research.
Data. Appendix E in the electronic file reports what we call the accessibility index which consists of all the responses in the database ranked by how many normed words produced them as associates. The accessibility index is related to the data presented in Appendix B, but instead of presenting the responses alphabetically followed by the cues that produce them, they are presented by rank. For example, responses of FOOD, MONEY and WATER, were produced as associates, respectively, by 324, 302 and 276 of the normed words appearing in the database. We refer to these values as an accessibility index because they provide a measure of the ease with which a given word comes to mind in free association to a variety of different cues. The assumption is that some words, such as FOOD, are more generally accessible in memory because they are produced by a greater of variety of other words (e.g., Howes, 1957; Rubin & Friendly, 1986). The accessibility index is, in some ways, similar to measures of printed frequency, and we have added frequency values from Kucera & Francis (1967) for the sake of comparison.
Any estimate of accessibility is bound to be biased because it will depend to a great extent upon its source. Kucera and Francis selected 2,000 paragraphs of 500 words resulting in a sample of one million words, whereas the present norms were based on a semi-random sample of 5,019 words producing about 600,000 free association responses by two or more subjects. Despite these differences, the two measures are strongly related, r= .76, n = 10, 470 (this correlation was computed on the log10 of each index, with zero values replaced with 1 before the logs were taken). Rubin and Friendly (1986) report similar results using other free association databases. Although our experience with cued recall suggests that printed frequency and production frequency appear to have similar effects (Nelson & Xu, 1995), Rubin and Friendly (1986) have shown that free recall is better predicted by production frequency or what we call accessibility. Regardless of the high correlation between these two measures, they may be capturing different aspects of experience.
Data. Appendix F presents 2,883 words in the norms which were produced by at least one of several types of non-semantic cues. The cues producing these words consisted of beginning sounds, ending sounds, beginning fragment cues, or ending fragment cues. Examples of each type of cue for the word BEST are, respectively, BE read aloud, EST read aloud, and both BE_ _, and _ EST presented visually with spaces for missing letters indicated by the dashes. For some words, such as BEST, non-semantic cues were normed for each of the four types of cues whereas for others only 1-3 non-semantic cues were normed. The beginning sound norms were collected from two samples of subjects (n = 113 and n= 135). Each subject was given a booklet containing a list of blank lines and they were given to understand that we wanted them to write the first word they thought of when they heard each beginning sound. The sound was read to them over a tape recorder twice with a slight pause between repetitions, and they were asked to repeat it to themselves silently and then write the first word to come to mind that began with the same sound. Five seconds was allowed for writing each word and each person in each group was asked to respond to 90 beginning sounds, producing a total of 180 normed beginning sounds. These 180 sounds produced 1,296 of the target words appearing in the normative database. Of course, other words not appearing in the database were also produced but they are not represented here. The cues for targets produced by beginning sounds are not presented in Appendix F because such cues can be easily inferred from the target itself by pronouncing the initial letters up through the initial vowel sound, as in BE for BEST.
The ending sound norms were collected in the same manner. Given our greater interest in rhyme, these norms were actually collected first and in greater numbers. A total of 397 ending sounds were normed. In each of two samples (n = 184 and n = 201), 130 ending sounds that formed single rhymes (Woods, 1971) such as A, AB, ACH, and so on, were presented. A total of 123 of these sounds were unique to each group and 7 were repeated to provide a small sample for checking reliability which averaged r = .79 according to a Spearman Rank Correlation. In two other samples (n = 153 and n = 242), 144 double rhymes such as A' BE, AB' IT and A' BER were normed. It is important to note that the single and double rhyme sounds are separated in Appendix F by placing an asterisk next to the generated word for only the double sounds. Hence, if there is no asterisk present next to the target word listed, this should be taken to mean that the last few letters of this item beginning with the terminal vowel sound was used to form the sound cue, as with EST for BEST. The single and double rhyme sounds produced a total of 2,120 words from the database. Finally, the same female (CM) read the beginning and ending sounds in all groups.
The word fragment cues were collected by presenting participants with printed letters and spaces for missing letters as in BE _ _ and _ EST in booklets. Letter fragments were defined in terms of the letters that were present in the cue, e.g., a beginning letter fragment has at least its first letter present in the fragment. Participants were asked to produce the first word to come to mind that fit with the letters and spaces provided as the cue. For example, as suggested above, some people responded with the word BEST to each of these non-semantic cues. Five different samples were involved in collecting these norms and the number of participants differed considerably (n = 148, n = 132, n = 79, n = 67 and n = 59). Totals of 279 and 283 beginning and ending fragments cues, respectively, were normed and they produced 1,274 and 1,110 of the words appearing in the normative database. Because fragment cues vary substantially, they are presented in Appendix F.
In addition to presenting the words produced by one or more non-semantic cues, Appendix F provides the set size associated with each cue as well as the probability of its production in the subject sample. For example, the sound produced by pronouncing the beginning letters BE produced a total of 14 words sharing this sound, with this information appearing in the column labeled BSSQ--which stands for beginning set size of the cue. The probability of generating the word BEST from this sound provides an index of cue-to-target strength from the sound BE to the word BEST. A value of 0.04 in this case appears in the column labeled BSGQ--which, in shorthand terms, stands for beginning strength of the cue in relation to the target. In this shorthand the term beginning simply indicates that participants were told that the non-semantic cue they heard consisted of the beginning letters (as opposed to ending letters).
The ending sound EST had a set size of 20 different words (see ESSQ) and the probability of producing BEST from this sound was .38 (see ESGQ). Similarly, the fragment cues BE _ _ and _ EST produced this word with respective set sizes of 19 and 7 and with respective strengths of 0.06 and 0.42. Hence, the non-semantic norms provide information concerning the number of readily available words generally given to four types of non-semantic cues as well as estimates of baseline cue-to-target strength in the absence of recent experimenter controlled study. Quick Reference. Table 11 presents definitions for the abbreviations appearing on the columns in Appendix F: Abbreviations related to target characteristics are defined and described in Appendices A and B.
|Abbreviations of terms and their equivalencies in Appendix F.|
|TARGET||Response to Non-Semantic Cue|
|BSSQ||Beginning: Set Size of Cue|
|BSGQ||Beginning: Strength of the Cue|
|ESSQ||Ending: Set Size of Cue|
|ESGQ||Ending: Strength of the Cue|
|BFQ||Beginning Fragment Cue|
|BFSS||Beginning Fragment: Set Size|
|BFSG||Beginning Fragment: Strength of the Cue|
|EFQ||Ending Fragment Cue|
|EFSS||Ending Fragment: Set Size|
|EFSG||Ending Fragment: Strength of the Cue|
|TSS||Target: Set Size|
|TH||Target is a homograph?|
|TPS||Target: Part of Speech|
|TMC||Target: Mean Connectivity Among its |
|TPR||Target: Probability of a Resonant |
|TRSG||Target: Resonant Strength|
|TUC||Target: Use Code|
|TUI||Target: Usability Index|
Canas, J. J. (1990). Associative strength effects in the lexical decision task. The Quarterly Journal of Experimental Psychology, 42, 121-145.
Cramer, P. (1970). A study of homographs. In L. Postman & G. Keppel (Eds.), Norms of Word Association. NY: Academic Press.
Cramer, P. (1968).Word Association. NY: Academic Press.
Deese, J. (1965). The Structure of Associations in Language and Thought. Baltimore, MD: The Johns Hopkins Press.
Jenkins, J. J. (1979). Four points to remember: A tetrahedral model of memory experiments. In L. S. Cermak & F. I. M. Craik (Eds.) Levels of Processing in Human Memory . Hillsdale, NJ: Lawrence Erlbaum Associates.
Jenkins, J. J., & Palermo, D. S. (1964). Word Association Norms. Minn.:University of Minnesota Press.
Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction- integration model. Psychological Review, 95, 163-182.
Kiss, G. R., Armstrong, C. A., & Milroy, R. (1972). An associative thesaurus of English (microfilm version). Wakefield: E. P. Microforms.
Kucera, H., & Francis, W. N. (1967). Computational Analysis of Present-day American English. Providence, RI : Brown University Press.
Howes, D. A. (1957). On the relation between the probability of a word as an association and in general linguistic usage. Journal of Abnormal & Social Psychology, 54, 75- 85.
McEvoy, C. L. (1988). Automatic and strategic processes in picture naming. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 618-626.
McEvoy, C.L., & Nelson, D.L. (1982). Category name and instance norms for 106 categories of various sizes. American Journal of Psychology, 95, 581-634.
Nelson, D. L., Bennett, D. J., Gee, N. R., Schreiber, T. A., & McKinney, V. (1993). Implicit memory: Effects of network size and interconnectivity on cued recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 747-764.
Nelson, D. L., Bennett, D. J., & Leibert, T. W. (1997). One step is not enough: Making better use of association norms to predict cued recall. Memory & Cognition, 25, 785-796.
Nelson, D. L., & Brooks, D. H. (1974). Relative effectiveness of rhymes and synonyms as retrieval cues. Journal of Experimental Psychology, 102, 503-507.
Nelson, D. L., LaLomia, M., & Canas, J. J. (1991). Dissociative effects in different prime domains. Memory & Cognition, 19, 44-62.
Nelson, D. L., & McEvoy, C. L.. (1979). Encoding context and set size. Journal of Experimental Psychology: Human Learning and Memory, 5, 292-314.
Nelson, D. L., McEvoy, C. L., Walling, J. W., & Wheeler, J. W. (1980). The University of South Florida homograph norms. Behavior Research Methods & Instrumentation, 12, 16- 37.
Nelson, D, L., McKinney, V. M., Gee, N. R., & Janczura, G. A. (1998). Interpreting the influence of implicitly activated memories on recall and recognition. Psychological Review, 105, 299-324.
Nelson, D. L., & Schreiber, T. A. (1992). Word concreteness and word structure as independent determinants of recall. Journal of Memory and Language, 31, 237-260.
Nelson, D. L., Schreiber, T. A., & McEvoy, C. L. (1992). Processing implicit and explicit representations. Psychological Review, 99, 322-348.
Nelson, D. L., Schreiber, T. A., & Xu, Jie (in press). Cue set size effects: Sampling activated associates or cross-target interference? Memory & Cognition, 00, 000-000.
Nelson, D. L., Wheeler, J. W., Borden, R. C., & Brooks, D. H. (1974). Levels of processing and cuing: Sensory vs. meaning features. Journal of Experimental Psychology, 103, 971- 977.
Nelson, D. L., & J. Xu (1995). Effects of implicit memory on explicit recall: Set size and word frequency effects. Psychological Research, 57, 203-214.
Paivio, A., Yuille, J.C., & Madigan, S. (1968). Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology Monograph Supplement, 76, (1, Pt.2), 1-25.
Perfetti, C. A., Lindsey, R., & Garson, B. (1971). Association and Uncertainty: Norms of Association to Ambiguous Words. Learning Research and Development Center, University of Pittsburgh.
Rubin, D. C., & Friendly, M. (1986). Predicting which words get recalled: Measures of free recall, availability, goodness, emotionality, and pronunciability for 925 nouns. Memory & Cognition, 14, 79-94.
Schreiber, T. A. (1998). Effects of target set size on feelings of knowing and cued recall: Implications for Cue effectiveness and partial-retrieval processes. Memory & Cognition, 26, 553-571.
Toglia, M.P., & Battig, W.F. (1978). Handbook of Semantic Word Norms. Hillsdale, NJ: Erlbaum.
Twilley, L. C., Dixon, P., Taylor, D., & Clark, K. (1994). University of Alberta norms of relative meaning frequency for 566 homographs. Memory & Cognition, 22, 111- 126.
Wollen, K. A., Cox, S. D., Coahran, M. M., Shea, D. S., & Kirby, R. F. (1980). Frequency of occurrence and concreteness ratings of homograph meanings. Behavior Research Methods & Instrumentation, 12, 8-15.
Woods, C. (1971). Wood's Rhyming Dictionary. NY: The World Publishing Company.
Correspondence concerning these norms should be addressed to Douglas L. Nelson, Department of Psychology, University of South Florida, Tampa, Florida, 33620-8200. firstname.lastname@example.org.
|The normed words with affiliated information, their responses, and the probabilities of these responses.|
|Abbreviations of terms for affiliated information and their equivalencies are shown below, and a more complete explanation of these terms can be found in Appendix A.|
|PS||Part of Speech|
|MSS||Meaning Set Size--number of different |
responses produced by 2 or more participants
|TSS||Total Set Size--total number of different |
responses, including idiosyncratic responses
|MC||Mean Connectivity among the responses |
(associates) of the normed word
|PR||Probability of a Resonant Connection-- |
probability that associates of the normed word
produce it as a target
|UC||Use Code--an index of suitability when |
connectivity or resonance are being varied
|UI||Usability Index--proportion of associates in the |
set that have been normed
|CON||Concreteness--a 1-7 rating of how well the |
word reminds someone of a sensory experience
|FR||Frequency--Kucera & Francis printed frequency|
|NR||No Response--omission of a response|
|H||Homograph--presence of a letter indicates word |
is likely to be a homograph