Methods of Lexicological Analysis

An important and promising trend in modern linguistics which has been making progress during the last few decades is the quantitative study of language phenomena and the application of statistical methods in linguistic analysis.

The first requirement for a successful statistical study is the representativeness of the objects counted for the problem in question, its relevance from the linguistic point of view. Statistical approach proved essential in the selection of vocabulary items of a foreign language for teaching purposes.

It is common knowledge that very few people know more than 10% of the words of their mother tongue. It follows that if we do not wish to waste time on committing to memory vocabulary items which are never likely to be useful to the learner, we have to select only lexical units that are commonly used by native speakers.

It goes without saying that to be useful in teaching statistics should deal with meanings as well as sound-forms as not all word-meanings are equally frequent.

Besides, the number of meanings exceeds by far the number of words. The total number of different meanings recorded and illustrated in Oxford English Dictionary for the first 500 words of the Thorndike Word List is 14,070, for the first thousand it is nearly 25,000. Naturally not all the meanings should be included in the list of the first two thousand most commonly used words. Statistical analysis of meaning frequencies resulted in the compilation of A General Service List of English Words with Semantic Frequencies. The semantic count is a count of the frequency of the occurrence of the various senses of 2,000 most frequent words as found in a study of five million running words. The semantic count is based on the differentiation of the meanings in the OED and the frequencies are expressed as percentage, so that the teacher and textbook writer may find it easier to understand and use the list. An example will make the procedure clear.


room (’space’)

takes less room, not enough room to turn round (in)  

make room for (figurative)

room for improvement – 12%


come to my room, bedroom, sitting room; drawing room, bathroom – 83%


(plural = suite, lodgings)

my room in college

to let rooms – 2%



It can be easily observed from the semantic count above that the meaning ‘part of a house’ (sitting room, drawing room,) makes up 83% of all occurrences of the word room and should be included in the list of meanings to be learned by the beginners, whereas the meaning ’suite, lodgings’ is not essential and makes up only 2% of all occurrences of this word.

In Ukrainian:

Кімната (окреме приміщення перев. для проживання в квартирі, будинку) – 41%

Хата розм. – 17%  

Покій, палата заст. (перев. розкішне, багате приміщення) – 3%

Світлиця, горниця розм. (перев. чисте, парадне приміщення) – 7%

Вітальня (приміщення, обладнане для приймання гостей) -  29%

Ванькир (бічне приміщення, відокремлене від великої кімнати, яке є спальнею і дитячою кімнатою) –3%

One more specific feature must, however, be stressed here. All modern methods aim at being impersonal and objective in the sense that they must lead to generalizations verifiable by all competent persons. In this effort to find verifiable relationships concerning typical contrastive shapes and arrangements of linguistic elements, functioning in a system, the study of vocabulary has turned away from chance observation and made considerable scientific  progress.

Thus, statistical analysis is applied in different branches of linguistics including lexicology as a means of verification and as a reliable criterion for the selection of the language data provided qualitative description of lexical items is available.


I.3. Immediate Constituents Analysis


The theory of Immediate Constituents (IC) was originally elaborated as an attempt to determine the ways in which lexical units are relevantly related to one another. It was discovered that combinations of such units are usually structured into hierarchically arranged sets of binary constructions. For example in the word-group a black dress in severe style we do not relate a to black, black to dress, dress to in. but set up a structure which may be represented as a black dress / in severe style. Thus the fundamental aim of IC analysis is to segment a set of lexical units into two maximally independent sequences or ICs thus revealing the hierarchical structure of this set. Successive segmentation results in Ultimate Constituents (UC), two-facet units that cannot be segmented into smaller units having both sound-form and meaning. The Ultimate Constituents of the word-group analysed above are: a | black | dress | in | severe | style.

It is mainly to discover the derivational structure of words that IC analysis is used in lexicological investigations. For example, the verb denationalise has both a prefix de- and a suffix -ise (-ize). To decide whether this word is a prefixal or a suffixal derivative we must apply IC analysis. The binary segmentation of the string of morphemes making up the word shows that *denation or *denational cannot be considered independent sequences as there is no direct link between the prefix de- and nation or national. In fact no such sound-forms function as independent units in modern English. The only possible binary segmentation is de | nationalise, therefore we may conclude that the word is a prefixal derivative. There are also numerous cases when identical morphemic structure of different words is insufficient proof of the identical pattern of their derivative structure which can be revealed only by IC analysis. Thus, comparing, snow-covered and blue-eyed we observe that both words contain two root-morphemes and one derivational morpheme. IC analysis, however, shows that whereas snow-covered may be treated as a compound consisting of two stems snow + covered, blue-eyed is a suffixal derivative as the underlying structure as shown by IC analysis is different, (blue+eye)+-ed. In Ukrainian: без/совіс/ний, за/турк/ан/ий, ні/куди/ш/ній, без/пом/іч/н/ий, зрад/н/ик, за/прод/ан/ець, не/роз/суд/л/ив/ий, роз/важ/л/ив/ий, без/перспектив/ний, не/гід/н/ик, с/пад/ко/єм/ець.

It may be inferred from the examples discussed above that ICs represent the word-formation structure while the UCs show the morphemic structure of polymorphic words.


I.4. Distributional Analysis and Co-occurrence


Distributional analysis in its various forms is commonly used nowadays by lexicologists of different schools of thought. By the term distribution we understand the occurrence of a lexical unit relative to other lexical units of the same level (words relative to words / morphemes relative to morphemes). In other words by this term we understand the position which lexical units occupy or may occupy in the text or in the flow of speech. It is readily observed that a certain component of the word-meaning is described when the word is identified distributionally. For example, in the sentence The boy — home the missing word is easily identified as a verb — The boy went, came, ran, home. Thus, we see that the component of meaning that is distributionally identified is actually the part-of-speech meaning but not the individual lexical meaning of the word under analysis. It is assumed that sameness / difference in distribution is indicative of sameness / difference in part-of-speech meaning.

According to Z. Harris, "The distribution of an element is the total of all environments in which it occurs, the sum of all the (different) positions (or occurrences) of an element relative to the occurrence of other elements". In Soviet linguistics this definition has been improved, applied on different levels and found fruitful in semasiology. The "total" mentioned by Z. Harris is replaced by configurations, combining generalized formulas of occurrence with valency. Defining word classes for distributional analysis depends on the structural use of the word in the sentence.

Observation is facilitated by coding. In this, words are replaced by conventional word-class symbols. Each analyst suggests some variant suitable to his particular purpose. A possible version of notation is N for nouns and words that can occupy in the sentence the same position, such as personal pronouns. To indicate the class to which nouns belong subscripts are used; so that Np means a personal noun, Nm — a material noun, Ncoll — a collective noun, etc. V stands for verbs. A — for adjec­tives and their equivalents, D — for adverbs and their equivalents. Prepositions and conjunctions are not coded.

Observation is further facilitated by simplifying the examples so that only words in direct syntactic connection with the head-word remain.

Thus, when studying the verb make, for example: The old man made Henry laugh aloud may be reduced to The man made Henry laugh.

Until recently the standard context was taken to be the sentence, now it is often reduced to a phrase, so that this last example may be rewritten as to make somebody laugh.

When everything but the head-word of the phrase is coded we obtain the distributional formula: make+ Np + V

The examples collected are arranged according to their distributional formulas, and the analyst receives a complete idea of the environments the language shows for the word in question. The list of structures characteristic of the word's distribution is accompanied by examples:

Make + a + N - make a coat, a machine, a decision

Make  + (the) + N + V - make  the machine go, make somebody work

Make + A  - make sure

Make + a + A+N - make a good wife.

In each of these examples the meaning of make is different. Some of these patterns, however, may be used for several meanings of the word make, so that the differentiation of meanings is not complete. Compare, for instance, the following sentences, where the pattern make + N remains unchanged, although our intuition tells us that the meaning of make is not the same:

60 minutes make an hour.

60 people make a decision.

A phrase, all elements of which, including the head-word, are coded, is called a distributional pattern, for instance to make somebody laugh to V1 Np V2

Another example:

Get + N (receive) – get letter

Get + Adj (become) – get angry

Страницы: 1, 2, 3



Реклама
В соцсетях
рефераты скачать рефераты скачать рефераты скачать рефераты скачать рефераты скачать рефераты скачать рефераты скачать