Geoffrey Williams
Europe in Le Monde 2004
This research was funded by a grant from the INTUNE project (Integrated and United: A quest for Citizenship in an ever closer Europe) financed by the Sixth Framework Programme of the European Union, Priority 7, Citizens and Governance in a Knowledge Based Society (CIT3-CT-2005-513421)
1. Introduction
Corpus linguistics is an exploratory discipline, as has been shown so clearly by Alan Partington in his text ‘Aims, tools and practices of Corpus Linguistics’. The very nature of the discipline means that before starting any analysis it is necessary to build a corpus, something that is far from being simply a collection of texts. This means that although we can monitor the press on a daily basis, a real in-depth analysis requires time as the keywords that will reveal linguistic choices must be discovered not imposed.
There are two complementary approaches to data in corpus linguistics; corpus-based and corpus-driven. In the former, the corpus is used to test hypotheses, whilst the latter is entirely inductive. The danger with deductive approaches is that in looking for something you will miss other vital clues; this is why an inductive approach is generally preferred as the data is taken as a whole with the regularities discovered opening the paths of research. We seek to understand and describe what we see without eliminating elements that do not fit into a pre-established hypothesis.
This pilot study on identity in this paper is a corpus-driven one, seeking to reveal factors of identity contained within a corpus of texts. It is based on the entirety of the 2004 edition of Le Monde, a French left-leaning daily newspaper. It must be stated that this is not really a corpus. A corpus is something that is built following strict selection criteria so that it can be deemed representative of a language or language variety; Le Monde is thus just a very large collection of texts representing the style and content of one published source. However, as one of the two leading national papers a study of identity can be very revealing as to a view of Europe given in France. The hypothesis here is that the choice of subjects covered will show not only events happening in the world at a given point of time, but also reveal the subject areas that are deemed of interest to readers of that newspaper. The subject areas that interest a readership are part of their world view, and by extension part of their identity.
The analysis carried out here is on a French corpus, but the main examples have all been translated. As work progresses we are also working on a bilingual lexicographical database which should help at least in understanding the French-English interpretation of the data. In order to illustrate the corpus-driven approach, this paper looks at two keywords, “identity” and “Europe” in three ways. The first is the cognitive approach as used in the Wordnet data base. This is a formalised hierarchical view of concepts largely based on linguistic introspection. The second is to see how these units are seen in dictionaries. Lastly we look at how the keywords can be seen through a rapid analysis of data from an inductive standpoint.
2. Identifying identity
2.1 Identity as a cognitive unit
Identity is central to human existence, it is what permits us to show that we exist, that we belong, and that we differ. It is however a highly diffuse and abstract area which, in linguistic terms, can be studied either through a variety of means, including the introspective approaches of cognitive linguistics, which aim to get at the essence of meaning, or through formal lexical analyses. The most famous of the cognitive studies is that of WordNet,[1] a project led by the University of Princeton that seeks to understand conceptual relationships through the building of an ontology.
Without going into details as to the construction of this database, it is sufficient to say that the structure relies heavily on hierarchical relations such as hyponymy and meronymy. The former places concepts within terms of higher and lower level classes, whereas the second is concerned with part-to-whole relationships. Words are grouped using lexical semantic relations into synonym sets, that is to say groups of concepts from the same semantic field, as these are not synonyms in the strict sense of one word being able to literally replace another within a given context. It is generally accepted in linguistics that true synonymy is extremely rare, if it exists at all. Each entry in the ontology is illustrated by a brief definition, but this is only to assist with natural language understanding and is not a full dictionary definition.
Wordnet gives four entries to the notion of identity. It may be perceived as:
1. (14) identity, personal identity, individuality - (the distinct personality of an individual regarded as a persisting entity; "you can lose your identity when you join the army")
2. (11) identity - (the individual characteristics by which a thing or person is recognized or known; "geneticists only recently discovered the identity of the gene that causes it"; "it was too dark to determine his identity"; "she guessed the identity of his lover")
3. (2) identity, identity element, identity operator - (an operator that leaves unchanged the element on which it operates; "the identity under numerical multiplication is 1")
4. (2) identity, identicalness, indistinguishability - (exact sameness; "they shared an identity of interests")
These do not give the characteristics that define identities, but show general senses for the concept. These meanings turn around one essential notion: what makes us what we are and therefore how we can share characteristics with another individual and also how we differentiate ourselves from others. The example in sense one shows that we have an individual identity, in fact probably several, and that identity is affected by a collective identities. Thus what we are seeking in the INTUNE media group are the lexical characteristics that display collective identities. These will be seen in the quantitative analyses carried out on corpora. However, behind this notion of citizenship, the individual cannot be forgotten and will be seen through qualitative analyses of individual texts.
If identity is the interplay between what is shared and what differentiates, it should be possible to isolate characteristics of identity that will demonstrate the different types of belonging that arise from multiple collective identities.
As Wordnet gives only formal relationships, we must next turn to a dictionary for a fuller picture of what identity can mean.
2.2. Identity as a lexicographical unit
An ontology is a list of concepts linked through formal semantic methods; it is not a dictionary. A dictionary will give a more pragmatic view of a word as it seeks to clarify and display meanings. Traditionally dictionary writing has been largely based on the intuition of lexicographers working from collections of citations. Such a methodology is dangerous as intuitions and reality often fail to coincide. In addition, citation indexes tend to privilege the unusual whilst we are seeking the usual. The dangers of using dictionaries that have not been built from corpora are well-known (Sinclair 1987, Church and Mercer 1993), which is why the examples used in this article come from two advanced learner’s dictionaries, the MacMillan English Dictionary for Advanced Learners (MEDAL) and the Oxford Advanced Learner’s Dictionary (OALD).
These two dictionaries have been chosen as being very recent, complete and based on corpora. Both have been built using the British National Corpus, a large balanced corpus of texts and spoken data in British English. A learner’s dictionary will attempt to define a word to help with comprehension, but also give assistance with production through the use of compound words, collocations and examples. Below are the entries for ‘identity’ in the electronic versions of these two dictionaries.
identity noun ***
1 [count or uncount] who you are or what your name is:
Do you have any proof of identity?
conceal/hide/protect your identity: He had managed to conceal his real identity.
reveal/disclose your identity: They refused to reveal the identity of the person who won the lottery.
mistaken identity (=when people wrongly think that someone is someone else): It was just a case of mistaken identity.
1a. the qualities that make someone or something what they are and different from other people:
You have to let the children develop a sense of their own identity.
The countries have kept their own distinct political and cultural identities.
corporate identity: A merger with the banking giant will lead to a loss of their corporate identity.
identity crisis (=not being certain about your identity): Lorna went through a bit of an identity crisis after her divorce.
2 [uncount] VERY FORMAL the fact of being exactly the same
(c) Macmillan Publishers Ltd. 2003 |
iden·tity noun (pl. -ies)
1[C, U] (abbr. ID) who or what sb/sth is:The police are trying to discover the identity of the killer. Their identities were kept secret. She is innocent; it was a case of mistaken identity. Do you have any proof of identity? The thief used a false identity. She went through an identity crisis in her teens (= was not sure of who she was or of her place in society).
2[C, U] the characteristics, feelings or beliefs that distinguish people from others: a sense of national / cultural / personal / group identity a plan to strengthen the corporate identity of the company
3[U] identity (with sb/sth) | identity (between A and B) the state or feeling of being very similar to and able to understand sb/sth: an identity of interests There’s a close identity between fans and their team.
(c) Oxford Advanced Learner’s Dictionary 2005 |
Once again we remain in a world of generalities. Both dictionaries see personal identity as the most meaningful sense. This largely takes the form of civil identity as given on an identity card or passport. This is something attained by right of birth or naturalisation and says nothing about how a person views that identity. However, definition 1a in MEDAL and the example in OALD of an identity crisis displays a very different type of identity to that carried on a card; this is how a person sees him or herself. This, like the example of corporate identity, is about image building. It is this aspect that is built on in the second definition of OALD that will lead to the multiple layers of identity that are part of citizenship. The difference between definitions 2 and 3 are simply ones of position, number 2 says that characteristics distinguish whilst number 3 recognises that what distinguishes may also be shared.
These are general meanings from a large reference corpus; they bring us closer to a notion of identity as belonging, but still do not show what those characteristics of membership may be.
2.3. Identity as a corpus unit
In a corpus a word is studied by using a concordance built from a Key Word in Context list calculated by a computer. The tool used here is WordSmith Tools,[2] a suite of lexical analysis programmes that allow the production of wordlists, concordances, the calculation of co-occurrences and other statistics.
A concordance output places the keyword in a list within a fixed span words or characters to the left and right. The results can be sorted by words occurring to the left and right to reveal regularities. It is also possible to search for occurrences of the keyword and an accompanying word so as to limit the contexts being displayed. A search for “Europe”, for example, will give all the occurrences of that word in context, whereas “Europe” + “identity” will only display concordance lines where the second word is to be found within the span fixed in the search parameters, usually three or four words to the left and right of the keyword. The tool can also calculate the number of times another word will be found co-occurring with the keyword. These are broadly termed “collocates”, meaning here significant co-occurrence, although collocations itself has a number of different interpretations in linguistics. Statistically significant collocation can also be calculated by WordSmith using formulae such as Mutual Information (Church et al. 1994). Significant collocation does not simply show multiple word lexical units, but may also be used for thematic analysis and the classification of texts in prototypical categories (Williams 1998, 2002).
In this corpus of 24 million words, the word identité (identity) occurs 1622 times, appendix 1 shows the 100 most frequent collocates of the key word sorted by part of speech. We have retained only the nouns, verbs and adjectives.
2.3.1. Noun identity
From this list a number of multiword units can be built up as with the case of Carte nationale d’identité (identity card).
This has variations such as carte d’identité française and carte d’identité électronique, but it remains the physical proof of civil identity and nothing more. Other variants are the journalist’s professional card, but this simply shows a particular professional status through the notion of badging to limit access to something. Equivalent expressions are papiers d’identité (identity papers) and pièce d’identitié (proof of identity). In all cases these are papers that can be lost, found, stolen and held, but tell you nothing about the owner beyond their name and age. Apart from professional papers, the key word is nationale (national) as they relate to the State of which the holder claims nationality.
The third noun from this list is quête (search). The search or quest for identity opens up a much more hopeful field. In this combination of a quest for identity, the concept is seen as something difficult to define and yet essential for existence, despite its often metaphysical status. Young people (jeunes) and regions seek an identity; that is, something to distinguish them from others whilst giving a sense of belonging to a community. This is however very much individual identity, as is shown by adjectives such as juive (Jewish) which refer to an individual rather than a religious group or people. This is illustrated in the following concordance where the noun phrase is preceded by son (his).
partir en quête de son |
identité juive. |
épouse du roi Assuérus, révéla son |
identité juive. |
Bernard-Henri Lévy explique pourquoi son |
identité juive. |
la revendication répétée par Edgar Morin de |
identité juive. |
Strangely enough, crise d’identité (identity crisis) is not used to refer to individuals but to communities, whether socio-professional or national. In this context one mention of a European identity crisis is to be found, and this relates to the candidacy of Turkey to EU membership.
The noun phrases shown here do not tell us a lot about what constitutes identity, the adjectives may be more telling.
2.3.2. Adjective identity
| |
conceptual |
| |
cultural |
| |
group |
| |
historic |
identity |
national |
| |
personal |
| |
political |
| |
religious |
| |
social |
Table 1. Adjectival collocates for ‘identité’
The adjective conceptual (conceptual) covers areas such as imaginary and visual identity whilst identité personnel (personal identity) includes factors such a sexual identity, the texts under the latter actually correspond only to literary discussions of book or films. In the quest for a European identity, the more interesting areas will concern what I have grouped under nationale (national), that is adjectives for countries, but also the supranational, European, and sub-national, regional.
Most significant in this context will be the characteristics of French, regional and European identities as these are characteristics that will be either shared or divisive. In the case of France two interesting factors appear; language and immigrant identity. The former is an essential part of French identity and one that is fiercely defended. Languages are divisive, but also seen as part of the shared cultural identity of Europe. Attitudes to language are thus an important factor in defining identity.
The question of immigrant identity will be another interesting issue. In the corpus studied, the question was of black identity, the art of integrating a French identity without colonial overtones whilst retaining specificities of their cultural origins. This area will also be of interest as migrant communities are partially opportunist, but largely linked to historic relationships with the country of origin. The first perceived identity of such a community will be French; to what extent do they envisage as yet a wider European identity where the factors of history and language are not shared? This brings us back to the notion of social identity, which is national or sub-national. Regional identity reinforces this importance of the sub-national sense of belonging through verbs such as ‘defend’ and ‘reinforce’. These clearly imply that there is a danger to these essential factors in collective and personal identity from a call to what might be perceived as a homogenising effect. The national and sub-national identities are not necessarily divisive factors; they may appear as ones that require fostering to the benefit of larger shared characteristics. The latter may be seen though the collocations of ‘European’ and ‘identity’.
European identity is stressed through the political aspects of defence and common foreign policy as well as a call to shared values. The shared values come over as something to be ‘defended’ and ‘celebrated’, the values are those of democracy, culture and Christian heritage. European identity is seen to be based on a “triptyque Rome-Athènes-Jerusalem”. These are what underpin the western world in general, including the United States.
2.3.3 Verbal identity
In the list of collocates for ‘identity’, we have isolated a short list of verbs. As can be seen below these can be classified into quite revealing groups.
Identity Verbs |
|
| |
|
Something to be created |
|
Forger |
create |
Construire |
build |
| |
|
Something that pre-exists |
|
| |
|
Something hidden |
|
Découvrir |
discover |
Dévoiler |
reveal |
Reveler |
reveal |
| |
|
Something unclear |
|
Intérroger |
question |
Expliquer |
explain |
Definir |
define |
| |
|
Something that can be gained or lost |
|
Rester |
remain |
Devenir |
become |
Prendre |
lose |
Perdre |
take |
| |
|
Something in danger |
|
Affirmer |
assert |
Defendre |
defend |
Renforcer |
strengthen |
Table 2. Verbal collocates for identity classified by function
Two main groups can be seen; identity is something that must be created or something that pre-exists. In the latter case, four subgroups can be found. Identity may exist, but it has to be revealed, or at worst redefined. If it does exist, it can be won or lost, but can also be something endangered. This is interesting in itself, but still begs the questions as to what characteristics can be seen to define identity in the psyche of those who are expected to hold it.
2.3.4. Inconclusive identity
What has not been discussed here is the attitude of the French press towards other countries as this presumably frames the opinions of the population by creating or overthrowing caricatures. This is an issue which must be addressed and is one that a group of students have started on using the press sources isolated for building the pilot corpus but studying the entire texts of October 2005. A synthesis of the results of this study has yet to be written, and to render the outcomes more significant it will be necessary to repeat the process over a long period. However, the approach could be fruitful in revealing what is considered divisive and what is held in common. Quantifiable factors could thus be revealed, all the more so if the same process is carried out in the press of other member countries.
This rapid look at lexical factors derived from the collocates of ‘identité’ has shown some of the different factors that must be considered, these include attitudes to national identity, attitudes to political issues and to the values that are seen to underpin the notion of European cultural identity. This is though only part of the question; the main problem with the concordances is that although the concept is seen as central, it is not defined. All we can say is that identity exists and must be preserved, that there are different forms of identity which can be isolated. Attitudes to the concept of identity can be seen through the analysis of verbs and modifiers, but we still lack the essential characteristics that form these different types of identity. If we are to see what factors are involved in European identity it might be worthwhile taking ‘Europe’ as a keyword.
3. How European identity is portrayed
3.1. Europe as a cognitive entity
Wordnet gives three main entries for Europe: a geological/geographical concept, a politico-economic unit and a vague general socio-political unit.
1. (28) Europe -- (the 2nd smallest continent (actually a vast peninsula of Eurasia); the British use ‘Europe’ to refer to all of the continent except the British Isles)
2. European Union, EU, European Community, EC, European Economic Community, EEC, Common Market, Europe -- (an international organization of European countries formed after World War II to reduce trade barriers and increase cooperation among its members; "he took Britain into Europe")
3. Europe -- (the nations of the European continent collectively; "the Marshall Plan helped Europe recover from World War II")
What is immediately obvious is the vagueness of all three concepts. The first entry is contradicted by the example as what is implied by referring to Britain is the mainland as opposed to the continental landmass. Geologically and geographically this is nonsense as the British Isles are clearly part of Eurasia. The second definition throws in some terms that are neither synonymous nor defined in terms of membership. The third is just a vague historical notion. In other words, Wordnet is not going to help in the quest for a European identity.
3.2. Europe as a lexicographical entity
Europe noun [count]
1 the large area of land that is between Asia and the Atlantic Ocean. It is one of the five continents of the world.
1a. BRITISH the whole of Europe apart from the UK
2 the European Union:
There have been deep divisions in the party over Europe.
(c) Macmillan Publishers Ltd. 2003 |
Eur·ope noun [U]
1the continent next to Asia in the east, the Atlantic Ocean in the west, and the Mediterranean Sea in the south: western / eastern / central Europe
2the European Union: countries wanting to join Europe He’s very pro-Europe.
3(BrE) all of Europe except for Britain: British holidaymakers in Europe
(c) Oxford Advanced Learner’s Dictionary 2005 |
The dictionaries have a similar breakdown to that given in Wordnet, but both sources give a very anglocentric view of Europe, which is understandable given that Wordnet gets its glosses from a British dictionary, and that the two learner’s dictionaries used here are based on a reference corpus of British English. A prototypical analysis will give the following characteristics:
Europe is:
- a continent
- a large land mass
- situated between Asia and the Atlantic Ocean
- delimited to the south by the Mediterranean sea
- composed of a number of countries
- partly composed of a political unit called the European Union
- seen a consisting only of the mainland European countries by the British
These entries imply two forms of identity: geographical and political. No more detail is given as to how these identities may be expressed is given in the definitions, but the examples imply that the political aspect can be divisive and that the mainland attracts holidaymakers from Britain.
The use of a prototype analysis (Hanks 1994) does clarify the issue slightly in showing how the meanings involved may be semantically linked, but it still gives no indication of what the identifying factors might be. Once again it is necessary to turn to a corpus.
3.3. Europe as a corpus unit
This analysis is based on the noun ‘Europe’, in a full survey it would be necessary to take into account all the forms of this word.
It is significant that Europe is a very high frequency word in this collection of texts. With 13222 occurrences, the word comes in at 123rd position, out of 200739. In all corpora the most frequent words are the grammatical units that carry no semantic meaning, which makes the position of ‘Europe’ as the 22nd most frequent lexical unit all the more important. Even more significant is that the lexical words that precede it are words turning around France, its government and ministers. ‘Euro’ is also one of the more frequent lexical items. This then, rather than the word ‘identité’ is most likely to give some clues as to how Europe is seen in the French press. Appendix 2 gives the 132 most frequent adjectives, nouns and verb collocates for ‘Europe’, the most frequent of which was ‘pays’ (country or countries).
3.3.1. Divided Europe
From the concordance of Europe and pays (countries) it is impossible to say whether we are looking at a geographical or political Europe. Although the dividing up is geographical, we are looking at countries, which are political units. Hence:
PAYS D ' EUROPE |
Countries of Europe |
EUROPE CENTRALE |
Central Europe |
EUROPE CONTINENTALE |
Continental Europe |
EUROPE DE L'EST |
Eastern Europe |
EUROPE DU NORD |
Northern Europe |
EUROPE DU SUD |
Southern Europe |
EUROPE OCCIDENTALE |
Western Europe |
| |
|
CERTAINS PAYS D ' |
Some countries of …. |
NOMBREUX PAYS D |
Numerous countries of …. |
DIFFÉRENTS PAYS D ' |
Different countries of …. |
AUTRES PAYS D |
Other countries of …. |
PLUSIEURS PAYS D ' |
Several countries of …. |
This allows for a comparative view of Europe, north v. south, central v. western, continental v. an unnamed entity which is probably Britain or the British Isles. We also have a political unit in PECO – Pays d’Europe Centrale et Orientale. A closer look at the concordance allows these units to be partially identified. Poland, Hungary and Romania are named as belonging to central Europe. PECO includes Poland, and also, in these concordance lines, Ukraine, Moldavia and Kazakhstan. Although PECO may have a restricted definition, the context of use in this corpus extends it central Europe plus the old Russian satellite states. However, some websites[3] name only the 12 new members, those who entered in 2004 plus Romania and Bulgaria. This makes PECO a very vague concept.
Another vague concept is that of continental Europe. This is not defined in these texts, although France is named as belonging to this unit. Eastern Europe is named as being Bulgaria, Estonia, Hungary, Latvia, Lithuania, Poland, Romania, Slovakia, Slovenia and the Czech Republic. Northern Europe is named as consisting of Denmark, Finland, Germany, Holland, Iceland, Norway, and Sweden. Great Britain is not in this group, but is classed as one of the Anglo-Saxon group, which also includes Canada and the United States. Southern Europe is said to include Albania, Bulgaria and Romania and ex-Yugoslavia. This does not give a full picture as the concordance line classes these as Southern Europe of the central block. In other contexts Southern Europe is defined as the ‘Club Med’, that is the European Mediterranean countries. Western Europe is the 15 of the third major enlargement of the EU, plus Norway. The result is a very complex system of groupings as can be seen in the comparative table in appendix 3.
The fact that we are dealing with different units that form groups of varying dimensions is reinforced by the adjectives. This can only mean that members of these prototypical groupings have some shared features of identity, maybe identity of interest, which need to be identified. To these we add:
les nouveaux pays adhérents d'Europe |
New member states of Europe |
la nouvelle Europe |
New Europe |
la vieille Europe |
Old Europe |
These imply a before and after situation with an old Europe, that has to be defined, the new countries, which represent a finite list, and the new Europe that must be a result of this unification. This of course remains to be verified.
Another way to look at Europe is to classify the nouns, adjectives and verbs as was done for ‘identity’. Here we shall only look at 48 of the nouns, those which fall into relatively clear categories.
3.3.2. Nominal Europe
Eight broad categories are identified here: area, capital, continent, country, culture, economy, politics and sport. Those of “area” and “continent” are primarily geographical, although as we have already seen ‘European union’ is part of a political space of varying size. “Country” is quite clear, it covers some major European and world economic and political powers; Turkey is present because of the negotiations that took place in 2004. It is obvious that this is a French view of the world, hence the only capital city present in the list being ‘Paris’. In reality ‘Paris’ may not always refer to the city itself, but by metonymy to the government of the day. The same will be true of the names of the countries; to really understand the concordances it is necessary to sort the contexts. This can be done with WordSmith by creating sets, but the addition of an attribute in the mark-up of names of people and places would render the analysis easier. As discussed above under ‘identity’, to understand the role of the country names it is necessary to see how each country is portrayed as this will show those seen as partners and those as adversaries in different contexts.
The other four categories, “culture”, “economy”, “politics” and “sport”, are more directly analysable in terms of European identity. “Culture” obviously covers a variety of aspects including common musical, literary and artistic heritage. In this list, which consists, it must be borne in mind, of only the first 100 full words co-occurring with the keyword ‘Europe’, only history is mentioned. History is an essential characteristic of identity; personal and national. It is part of the construction of identity and is a complex issue as it is the diachronic counterpart to the analysis of relationships between different countries, but seen from the viewpoint of an individual country and also through the prism of historical methodology.
PAYS |
Countries |
area |
CONSEIL |
Council |
politics |
RÉGIONS |
Regions |
area |
CONSTITUTION |
Constitution |
politics |
UNION |
(European) Union |
area |
DÉBAT |
Debate |
politics |
PARIS |
Paris |
capital |
DÉFENSE |
Defence |
politics |
ASIE |
Asia |
continent |
DÉMOCRATIE |
Democracy |
politics |
AFRIQUE |
Africa |
continent |
ÉLARGISSEMENT |
Widening |
politics |
AMÉRIQUE |
America |
continent |
GOUVERNEMENT |
Government |
politics |
ALLEMAGNE |
Germany |
country |
GUERRE |
War |
politics |
ESPAGNE |
Spain |
country |
MEMBRES |
Members |
politics |
ETATS |
(United) States |
country |
MINISTRE |
Minister |
politics |
FRANCE |
France |
country |
OSCE |
OSCE |
politics |
JAPON |
Japan |
country |
PARLEMENT |
Parliament |
politics |
NATIONS |
(United) Nations |
country |
PRÉSIDENT |
President |
politics |
RUSSIE |
Russia |
country |
PS |
Socialist Party |
politics |
TURQUIE |
Turkey |
country |
PUISSANCE |
Power |
politics |
HISTOIRE |
History |
culture |
RÉFÉRENDUM |
Referendum |
politics |
CROISSANCE |
Growth |
economy |
SÉCURITÉ |
Security |
politics |
ECONOMIE |
Economy |
economy |
CHAMPION |
Champion |
sport |
MARCHÉ |
Market |
economy |
CHAMPIONNAT |
Championship |
sport |
MARCHÉS |
Markets |
economy |
CHAMPIONNATS |
Championships |
sport |
PRIX |
Prices |
economy |
CHAMPIONNE |
Champion |
sport |
TRAVAIL |
Work |
economy |
CHAMPIONS |
Champions |
sport |
EMPLOI |
Employment |
economy |
COUPE |
Cup |
sport |
CITOYENS |
Citizens |
politics |
FOOTBALL |
Football |
sport |
Table 3. Categorisation of the noun collocates for ‘Europe’
“Economy” and “politics” cover a wide spectrum, including the obvious areas of socio-politico-economic identity such as defence, employment, markets and the institutions of government. These are all factors that must be analysed to see how world views may be shared or may differentiate. The last factor, “sport”, is possibly not one that comes up in political sciences, but is part of the life of many citizens. Whilst a political Europe may not contain Turkey, the football one does. This means that many people who will not necessarily read the political pages will see different countries as being in a sporting network, whilst referred to in very different political groupings. How countries are represented through sport may well have an effect on how the ordinary citizen will see these countries in political terms and therefore how they may enter a meaning of identity that goes beyond that of regional or national sports teams.
4. Conclusion
The first point that must be made after this rapid analysis of two words is that what is represented here is only the world view of one newspaper, Le Monde. It is thus heavily biased. The other factor is that it is not possible to subdivide the corpus by column. A newspaper is not a single genre, but a complex mixture of genre and thematic categories. To carry out a real analysis of attitudes to Europe it is vital to subdivide by category to see who is speaking and to whom. Any valid analysis must take into account the sociolinguistic factors of corpus design. This will be done in the full study which will be multi-source and categorised using methods of external and internal classification calling on prototype theory to create groupings of variable geometry.
The second point is one of method. This is a very rapid lexical analysis using surface parameters of co-occurrence. Meanings are not made with words alone but with an interaction of lexis and syntax in context. Context means text and co-text, meaning goes beyond the sentence and beyond the texts to the aspects of context of situation and culture. A full concordance analysis would take into account these broader factors with a mix of quantitative and qualitative analyses.
A third and vital point is the nature of corpus linguistics. This is not the pilot study, but a preliminary study prior to the pilot study. Although some research procedures can be automated, early automation is a dangerous exercise as only an in-depth study will allow a full appreciation of the parameters at play. Even after automation will remain dangerous as once a list of factors have been drawn up only these will be followed through, thereby ignoring new factors that will come up over the four years of the INTUNE project.
So what is the next stop?
Obviously we must now start the task of analysing our pilot corpus, both in its written and spoken forms. To do this the texts will be converted into machine readable formats using XML, a metalanguage that allows the annotation of texts. This task is currently underway, but requires close cooperation between the four national teams in the media group as we must develop a common methodology for the encoding of the texts. In the case of the French group the initial conversion of the texts has been carried out, but using our interpretation of the Text Encoding Initiative, the international standards for corpus annotation. Our interpretation is designed to cover some of the computing problems we have encountered; those of our colleagues from the UK, Italy and Poland will be different thereby requiring adjustments before a common analysis protocol can be developed. This will require time, but it is time well spent as the encoding and analysis of this corpus will open the road to our main analysis later. What has been presented here are but a few clues drawn from one source in one language. When the four pilot corpora are brought together the product will be infinitely richer.
References
Church K. and Mercer R. L. 1993. “Introduction to the special issue on Computational Linguistics”. Computational Linguistics 19. 1-24.
Church K., Gale W., Hanks P., Hindle D., Moon R. 1994. “Lexical Substitutability”. In Atkins and Zampolli (eds.) Computational Approaches to the Lexicon. Oxford: Clarendon Press. 153-177.
Hanks P. 2000. “Do word meanings exist?”. In A. Kilgarriff and M. Palmer (eds.) Senseval: Evaluating Word Sense Disambiguation Programmes. Special issue of Computers and the humanities 34.1-2, 205-215.
Partington A. 2005. Aims, tools and practices of Corpus Linguistics. INTUNE Working paper.
Sinclair J. McH. (ed) 1987. Looking Up: an account of the COBUILD Project in Lexical Computing. London: Collins.
Williams G. 1998. “Collocational Networks: Interlocking Patterns of Lexis in a corpus of plant biology”. International Journal of Corpus Linguistics 3.1, 151-171.
Williams G. 2002. “In search of representativity in specialised corpora: categorisation through collocation”. International Journal of Corpus Linguistics 7.1, 43-64.
Appendices
Appendix 1. 100 most frequent lexical units co-occurring with “identité”.
AMÉRICAINE |
adjective |
LIEU |
noun |
AUTRE |
adjective |
MÉMOIRE |
noun |
BIEN |
adjective |
MONDE |
noun |
CHRÉTIENNE |
adjective |
NOM |
noun |
COLLECTIVE |
adjective |
NOMBRE |
noun |
CULTURELLE |
adjective |
ORIGINES |
noun |
EUROPÉENNE |
adjective |
PALESTINIEN |
noun |
FAUSSE |
adjective |
PAPIERS |
noun |
FORTE |
adjective |
PARTIE |
noun |
FRANÇAIS |
adjective |
PAYS |
noun |
FRANÇAISE |
adjective |
PERSONNAGES |
noun |
HISTORIQUE |
adjective |
PERSONNES |
noun |
IMAGINAIRE |
adjective |
PERTE |
noun |
JEUNE |
adjective |
PEUPLE |
noun |
JEUNES |
adjective |
PHOTO |
noun |
JUIVE |
adjective |
PHOTOS |
noun |
MASCULINE |
adjective |
PIÈCE |
noun |
MUSULMANE |
adjective |
PIÈCES |
noun |
NATIONALE |
adjective |
PLACE |
noun |
NOUVELLE |
adjective |
POLITIQUE |
noun |
PALESTINIENNE |
adjective |
PROBLÈME |
noun |
PERSONNELLE |
adjective |
QUESTION |
noun |
PROPRE |
adjective |
QUÊTE |
noun |
RÉGIONALE |
adjective |
RAVISSEURS |
noun |
SEXUELLE |
adjective |
RECHERCHE |
noun |
SOCIALE |
adjective |
SOCIALISTE |
noun |
VÉRITABLE |
adjective |
TEMPS |
noun |
VISUELLE |
adjective |
UNIS |
noun |
PS |
noun |
VALEURS |
noun |
AFFIRMATION |
noun |
AFFIRME |
verb |
AMÉRICAINS |
noun |
AFFIRMER |
verb |
CARTE |
noun |
CHERCHE |
verb |
CARTES |
noun |
COMMENT |
verb |
COEUR |
noun |
CONSTRUIRE |
verb |
COMMUNE |
noun |
CONSTRUIT |
verb |
CONSTRUCTION |
noun |
DÉCOUVRIR |
verb |
CONTRÔLE |
noun |
DÉFENDRE |
verb |
CONTRÔLES |
noun |
DÉFINIR |
verb |
CRISE |
noun |
DEVENIR |
verb |
CULTURE |
noun |
DÉVOILER |
verb |
DÉFENSE |
noun |
DONNER |
verb |
DOCUMENTS |
noun |
EXPLIQUE |
verb |
EMPRUNT |
noun |
FACE |
verb |
ETATS |
noun |
FORGER |
verb |
EUROPE |
noun |
INTERROGE |
verb |
FRANCE |
noun |
PERDRE |
verb |
HISTOIRE |
noun |
PREND |
verb |
HOMME |
noun |
RENFORCER |
verb |
IDÉE |
noun |
RESTE |
verb |
IDENTITÉ |
noun |
RÉVÉLER |
verb |
Appendix 2. 133 most frequent lexical units co-occurring with “Europe”.
UNIS |
adjective |
JAPON |
noun |
FRANCE |
adjective |
UNION |
noun |
CENTRALE |
adjective |
BESOIN |
noun |
SOCIALE |
adjective |
PLACE |
noun |
ÉLARGIE |
adjective |
PARTIE |
noun |
POLITIQUE |
adjective |
PRIX |
noun |
PREMIER |
adjective |
HISTOIRE |
noun |
NORD |
adjective |
GROUPE |
noun |
ORIENTALE |
adjective |
JUIN |
noun |
NOUVELLE |
adjective |
TEMPS |
noun |
OCCIDENTALE |
adjective |
CONSTRUCTION |
noun |
VIEILLE |
adjective |
IDÉE |
noun |
GRANDE |
adjective |
DIMANCHE |
noun |
MOINS |
adjective |
MINISTRE |
noun |
SUD |
adjective |
EMPLOI |
noun |
EUROPÉENNE |
adjective |
MILLIONS |
noun |
AUTRES |
adjective |
DÉFENSE |
noun |
FORTE |
adjective |
PS |
noun |
GAUCHE |
adjective |
ENSEMBLE |
noun |
FRANÇAIS |
adjective |
EUROPÉENS |
noun |
GRAND |
adjective |
MARDI |
noun |
PLUSIEURS |
adjective |
OSCE |
noun |
ÉCONOMIQUE |
adjective |
GUERRE |
noun |
PREMIÈRE |
adjective |
EXPRESS |
noun |
LIBÉRALE |
adjective |
MEMBRES |
noun |
NOUVEAUX |
adjective |
AFRIQUE |
noun |
LATINE |
adjective |
PARIS |
noun |
MIEUX |
adjective |
QUESTION |
noun |
GRANDS |
adjective |
RÉGIONS |
noun |
NOUVEAU |
adjective |
ALLEMAGNE |
noun |
TROP |
adjective |
AVRIL |
noun |
EUROPÉEN |
adjective |
ESPAGNE |
noun |
AMÉRICAIN |
adjective |
RUSSIE |
noun |
AMÉRICAINE |
adjective |
CITOYENS |
noun |
PARTICULIER |
adjective |
MODÈLE |
noun |
UNIE |
adjective |
RAPPORT |
noun |
AUJOURD'hui |
adverb |
RECHERCHE |
noun |
SEULEMENT |
adverb |
RÉFÉRENDUM |
noun |
MAINTENANT |
adverb |
RÔLE |
noun |
AUTANT |
adverb |
SEPTEMBRE |
noun |
PAYS |
noun |
DOSSIER |
noun |
ETATS |
noun |
MARCHÉS |
noun |
MONDE |
noun |
DÉMOCRATIE |
noun |
AMÉRIQUE |
noun |
TITRE |
noun |
TURQUIE |
noun |
CHAMPIONNE |
noun |
CONSEIL |
noun |
FIN |
noun |
ASIE |
noun |
JACQUES |
noun |
COUPE |
noun |
JEUDI |
noun |
AVENIR |
noun |
LUNDI |
noun |
CHAMPIONNATS |
noun |
SITUATION |
noun |
CHAMPIONNAT |
noun |
COMPTE |
noun |
CHAMPION |
noun |
FOOTBALL |
noun |
CONSTITUTION |
noun |
TRAVAIL |
noun |
NATIONS |
noun |
CHAMPIONS |
noun |
ANS |
noun |
GOUVERNEMENT |
noun |
ECONOMIE |
noun |
LIEU |
noun |
PRÉSIDENT |
noun |
ÉTÉ |
verb |
PARLEMENT |
noun |
AVAIT |
verb |
SÉCURITÉ |
noun |
CONSTRUIRE |
verb |
COOPÉRATION |
noun |
VEUT |
verb |
PUISSANCE |
noun |
DIRE |
verb |
ÉLARGISSEMENT |
noun |
DEVRAIT |
verb |
PROJET |
noun |
POURRAIT |
verb |
CROISSANCE |
noun |
VEULENT |
verb |
MARCHÉ |
noun |
DÉCLARÉ |
verb |
DÉBAT |
noun |
EXPLIQUE |
verb |
Appendix 3. Comparative table showing memberships of three ‘Europes’
Click to enlarge image

|