testata
contatti chi siamo indice dettagliato


dossier
articoli
resoconti
recensioni
notiziario
ricerche e studi
forum
newsletter
archivi iconografici

Dipartimento SITLEC SITLeC
 

Gedit Edizioni

 

Geoffrey Williams
Europe in Le Monde 2004
This research was funded by a grant from the INTUNE project (Integrated and United: A quest for Citizenship in an ever closer Europe) financed by the Sixth Framework Programme of the European Union, Priority 7, Citizens and Governance in a Knowledge Based Society (CIT3-CT-2005-513421)

1. Introduction

Corpus linguistics is an exploratory discipline, as has been shown so clearly by Alan Partington in his text ‘Aims, tools and practices of Corpus Linguistics’. The very nature of the discipline means that before starting any analysis it is necessary to build a corpus, something that is far from being simply a collection of texts. This means that although we can monitor the press on a daily basis, a real in-depth analysis requires time as the keywords that will reveal linguistic choices must be discovered not imposed.

There are two complementary approaches to data in corpus linguistics; corpus-based and corpus-driven. In the former, the corpus is used to test hypotheses, whilst the latter is entirely inductive. The danger with deductive approaches is that in looking for something you will miss other vital clues; this is why an inductive approach is generally preferred as the data is taken as a whole with the regularities discovered opening the paths of research. We seek to understand and describe what we see without eliminating elements that do not fit into a pre-established hypothesis.

This pilot study on identity in this paper is a corpus-driven one, seeking to reveal factors of identity contained within a corpus of texts. It is based on the entirety of the 2004 edition of Le Monde, a French left-leaning daily newspaper. It must be stated that this is not really a corpus. A corpus is something that is built following strict selection criteria so that it can be deemed representative of a language or language variety; Le Monde is thus just a very large collection of texts representing the style and content of one published source. However, as one of the two leading national papers a study of identity can be very revealing as to a view of Europe given in France. The hypothesis here is that the choice of subjects covered will show not only events happening in the world at a given point of time, but also reveal the subject areas that are deemed of interest to readers of that newspaper. The subject areas that interest a readership are part of their world view, and by extension part of their identity.

The analysis carried out here is on a French corpus, but the main examples have all been translated. As work progresses we are also working on a bilingual lexicographical database which should help at least in understanding the French-English interpretation of the data. In order to illustrate the corpus-driven approach, this paper looks at two keywords, “identity” and “Europe” in three ways. The first is the cognitive approach as used in the Wordnet data base. This is a formalised hierarchical view of concepts largely based on linguistic introspection. The second is to see how these units are seen in dictionaries. Lastly we look at how the keywords can be seen through a rapid analysis of data from an inductive standpoint.

2. Identifying identity

2.1 Identity as a cognitive unit

Identity is central to human existence, it is what permits us to show that we exist, that we belong, and that we differ. It is however a highly diffuse and abstract area which, in linguistic terms, can be studied either through a variety of means, including the introspective approaches of cognitive linguistics, which aim to get at the essence of meaning, or through formal lexical analyses. The most famous of the cognitive studies is that of WordNet,[1] a project led by the University of Princeton that seeks to understand conceptual relationships through the building of an ontology.

Without going into details as to the construction of this database, it is sufficient to say that the structure relies heavily on hierarchical relations such as hyponymy and meronymy. The former places concepts within terms of higher and lower level classes, whereas the second is concerned with part-to-whole relationships. Words are grouped using lexical semantic relations into synonym sets, that is to say groups of concepts from the same semantic field, as these are not synonyms in the strict sense of one word being able to literally replace another within a given context. It is generally accepted in linguistics that true synonymy is extremely rare, if it exists at all. Each entry in the ontology is illustrated by a brief definition, but this is only to assist with natural language understanding and is not a full dictionary definition.

Wordnet gives four entries to the notion of identity. It may be perceived as:

1. (14) identity, personal identity, individuality - (the distinct personality of an individual regarded as a persisting entity; "you can lose your identity when you join the army")

2. (11) identity - (the individual characteristics by which a thing or person is recognized or known; "geneticists only recently discovered the identity of the gene that causes it"; "it was too dark to determine his identity"; "she guessed the identity of his lover")

3. (2) identity, identity element, identity operator - (an operator that leaves unchanged the element on which it operates; "the identity under numerical multiplication is 1")

4. (2) identity, identicalness, indistinguishability - (exact sameness; "they shared an identity of interests")

These do not give the characteristics that define identities, but show general senses for the concept. These meanings turn around one essential notion: what makes us what we are and therefore how we can share characteristics with another individual and also how we differentiate ourselves from others. The example in sense one shows that we have an individual identity, in fact probably several, and that identity is affected by a collective identities. Thus what we are seeking in the INTUNE media group are the lexical characteristics that display collective identities. These will be seen in the quantitative analyses carried out on corpora. However, behind this notion of citizenship, the individual cannot be forgotten and will be seen through qualitative analyses of individual texts.

If identity is the interplay between what is shared and what differentiates, it should be possible to isolate characteristics of identity that will demonstrate the different types of belonging that arise from multiple collective identities.

As Wordnet gives only formal relationships, we must next turn to a dictionary for a fuller picture of what identity can mean.

2.2. Identity as a lexicographical unit

An ontology is a list of concepts linked through formal semantic methods; it is not a dictionary. A dictionary will give a more pragmatic view of a word as it seeks to clarify and display meanings. Traditionally dictionary writing has been largely based on the intuition of lexicographers working from collections of citations. Such a methodology is dangerous as intuitions and reality often fail to coincide. In addition, citation indexes tend to privilege the unusual whilst we are seeking the usual. The dangers of using dictionaries that have not been built from corpora are well-known (Sinclair 1987, Church and Mercer 1993), which is why the examples used in this article come from two advanced learner’s dictionaries, the MacMillan English Dictionary for Advanced Learners (MEDAL) and the Oxford Advanced Learner’s Dictionary (OALD).

These two dictionaries have been chosen as being very recent, complete and based on corpora. Both have been built using the British National Corpus, a large balanced corpus of texts and spoken data in British English. A learner’s dictionary will attempt to define a word to help with comprehension, but also give assistance with production through the use of compound words, collocations and examples. Below are the entries for ‘identity’ in the electronic versions of these two dictionaries.

identity  noun ***
            1          [count or uncount] who you are or what your name is:
Do you have any proof of identity?
conceal/hide/protect your identity: He had managed to conceal his real identity.
reveal/disclose your identity: They refused to reveal the identity of the person who won the lottery.
mistaken identity (=when people wrongly think that someone is someone else): It was just a case of mistaken identity.

1a. the qualities that make someone or something what they are and different from other people:
You have to let the children develop a sense of their own identity.
The countries have kept their own distinct political and cultural identities.
corporate identity: A merger with the banking giant will lead to a loss of their corporate identity.
identity crisis (=not being certain about your identity): Lorna went through a bit of an identity crisis after her divorce.
            2          [uncount] VERY FORMAL the fact of being exactly the same

 (c) Macmillan Publishers Ltd. 2003


iden·tity  noun (pl. -ies)
1[C, U] (abbr. ID) who or what sb/sth is:The police are trying to discover the identity of the killer. Their identities were kept secret.  She is innocent; it was a case of mistaken identity.  Do you have any proof of identity?  The thief used a false identity.  She went through an identity crisis in her teens (= was not sure of who she was or of her place in society).
2[C, U] the characteristics, feelings or beliefs that distinguish people from others: a sense of national / cultural / personal / group identity  a plan to strengthen the corporate identity of the company
3[U] identity (with sb/sth) | identity (between A and B) the state or feeling of being very similar to and able to understand sb/sth: an identity of interests  There’s a close identity between fans and their team.

(c) Oxford Advanced Learner’s Dictionary 2005

Once again we remain in a world of generalities. Both dictionaries see personal identity as the most meaningful sense. This largely takes the form of civil identity as given on an identity card or passport. This is something attained by right of birth or naturalisation and says nothing about how a person views that identity. However, definition 1a in MEDAL and the example in OALD of an identity crisis displays a very different type of identity to that carried on a card; this is how a person sees him or herself. This, like the example of corporate identity, is about image building. It is this aspect that is built on in the second definition of OALD that will lead to the multiple layers of identity that are part of citizenship. The difference between definitions 2 and 3 are simply ones of position, number 2 says that characteristics distinguish whilst number 3 recognises that what distinguishes may also be shared.

These are general meanings from a large reference corpus; they bring us closer to a notion of identity as belonging, but still do not show what those characteristics of membership may be.

2.3. Identity as a corpus unit

In a corpus a word is studied by using a concordance built from a Key Word in Context list calculated by a computer. The tool used here is WordSmith Tools,[2] a suite of lexical analysis programmes that allow the production of wordlists, concordances, the calculation of co-occurrences and other statistics.

A concordance output places the keyword in a list within a fixed span words or characters to the left and right. The results can be sorted by words occurring to the left and right to reveal regularities. It is also possible to search for occurrences of the keyword and an accompanying word so as to limit the contexts being displayed. A search for “Europe”, for example, will give all the occurrences of that word in context, whereas “Europe” + “identity” will only display concordance lines where the second word is to be found within the span fixed in the search parameters, usually three or four words to the left and right of the keyword. The tool can also calculate the number of times another word will be found co-occurring with the keyword. These are broadly termed “collocates”, meaning here significant co-occurrence, although collocations itself has a number of different interpretations in linguistics. Statistically significant collocation can also be calculated by WordSmith using formulae such as Mutual Information (Church et al. 1994). Significant collocation does not simply show multiple word lexical units, but may also be used for thematic analysis and the classification of texts in prototypical categories (Williams 1998, 2002).

In this corpus of 24 million words, the word identité (identity) occurs 1622 times, appendix 1 shows the 100 most frequent collocates of the key word sorted by part of speech. We have retained only the nouns, verbs and adjectives.

2.3.1. Noun identity

From this list a number of multiword units can be built up as with the case of Carte nationale d’identité (identity card).

This has variations such as carte d’identité française and carte d’identité électronique, but it remains the physical proof of civil identity and nothing more. Other variants are the journalist’s professional card, but this simply shows a particular professional status through the notion of badging to limit access to something. Equivalent expressions are papiers d’identité (identity papers) and pièce d’identitié (proof of identity). In all cases these are papers that can be lost, found, stolen and held, but tell you nothing about the owner beyond their name and age. Apart from professional papers, the key word is nationale (national) as they relate to the State of which the holder claims nationality.

The third noun from this list is quête (search). The search or quest for identity opens up a much more hopeful field. In this combination of a quest for identity, the concept is seen as something difficult to define and yet essential for existence, despite its often metaphysical status. Young people (jeunes) and regions seek an identity; that is, something to distinguish them from others whilst giving a sense of belonging to a community. This is however very much individual identity, as is shown by adjectives such as juive (Jewish) which refer to an individual rather than a religious group or people. This is illustrated in the following concordance where the noun phrase is preceded by son (his).

partir en quête de son

identité juive.

épouse du roi Assuérus, révéla son

identité juive.

Bernard-Henri Lévy explique pourquoi son

identité juive.

la revendication répétée par Edgar Morin de

identité juive.

Strangely enough, crise d’identité (identity crisis) is not used to refer to individuals but to communities, whether socio-professional or national. In this context one mention of a European identity crisis is to be found, and this relates to the candidacy of Turkey to EU membership.

The noun phrases shown here do not tell us a lot about what constitutes identity, the adjectives may be more telling.

2.3.2. Adjective identity

 

conceptual

 

cultural

 

group

 

historic

identity

national

 

personal

 

political

 

religious

 

social

Table 1. Adjectival collocates for ‘identité’

The adjective conceptual (conceptual) covers areas such as imaginary and visual identity whilst identité personnel (personal identity) includes factors such a sexual identity, the texts under the latter actually correspond only to literary discussions of book or films. In the quest for a European identity, the more interesting areas will concern what I have grouped under nationale (national), that is adjectives for countries, but also the supranational, European, and sub-national, regional.

Most significant in this context will be the characteristics of French, regional and European identities as these are characteristics that will be either shared or divisive. In the case of France two interesting factors appear; language and immigrant identity. The former is an essential part of French identity and one that is fiercely defended. Languages are divisive, but also seen as part of the shared cultural identity of Europe. Attitudes to language are thus an important factor in defining identity.

The question of immigrant identity will be another interesting issue. In the corpus studied, the question was of black identity, the art of integrating a French identity without colonial overtones whilst retaining specificities of their cultural origins. This area will also be of interest as migrant communities are partially opportunist, but largely linked to historic relationships with the country of origin. The first perceived identity of such a community will be French; to what extent do they envisage as yet a wider European identity where the factors of history and language are not shared? This brings us back to the notion of social identity, which is national or sub-national. Regional identity reinforces this importance of the sub-national sense of belonging through verbs such as ‘defend’ and ‘reinforce’. These clearly imply that there is a danger to these essential factors in collective and personal identity from a call to what might be perceived as a homogenising effect. The national and sub-national identities are not necessarily divisive factors; they may appear as ones that require fostering to the benefit of larger shared characteristics. The latter may be seen though the collocations of ‘European’ and ‘identity’.

European identity is stressed through the political aspects of defence and common foreign policy as well as a call to shared values. The shared values come over as something to be ‘defended’ and ‘celebrated’, the values are those of democracy, culture and Christian heritage. European identity is seen to be based on a “triptyque Rome-Athènes-Jerusalem”. These are what underpin the western world in general, including the United States.

2.3.3 Verbal identity

In the list of collocates for ‘identity’, we have isolated a short list of verbs. As can be seen below these can be classified into quite revealing groups.

Identity Verbs

 
   

Something to be created

 

Forger

create

Construire

build

   

Something that pre-exists

 
   

Something hidden

 

Découvrir

discover

Dévoiler

reveal

Reveler

reveal

   

Something unclear

 

Intérroger

question

Expliquer

explain

Definir

define

   

Something that can be gained or lost

 

Rester

remain

Devenir

become

Prendre

lose

Perdre

take

   

Something in danger

 

Affirmer

assert

Defendre

defend

Renforcer

strengthen

Table 2. Verbal collocates for identity classified by function

Two main groups can be seen; identity is something that must be created or something that pre-exists. In the latter case, four subgroups can be found. Identity may exist, but it has to be revealed, or at worst redefined. If it does exist, it can be won or lost, but can also be something endangered. This is interesting in itself, but still begs the questions as to what characteristics can be seen to define identity in the psyche of those who are expected to hold it.

2.3.4. Inconclusive identity

What has not been discussed here is the attitude of the French press towards other countries as this presumably frames the opinions of the population by creating or overthrowing caricatures. This is an issue which must be addressed and is one that a group of students have started on using the press sources isolated for building the pilot corpus but studying the entire texts of October 2005. A synthesis of the results of this study has yet to be written, and to render the outcomes more significant it will be necessary to repeat the process over a long period. However, the approach could be fruitful in revealing what is considered divisive and what is held in common. Quantifiable factors could thus be revealed, all the more so if the same process is carried out in the press of other member countries.

This rapid look at lexical factors derived from the collocates of ‘identité’ has shown some of the different factors that must be considered, these include attitudes to national identity, attitudes to political issues and to the values that are seen to underpin the notion of European cultural identity. This is though only part of the question; the main problem with the concordances is that although the concept is seen as central, it is not defined. All we can say is that identity exists and must be preserved, that there are different forms of identity which can be isolated. Attitudes to the concept of identity can be seen through the analysis of verbs and modifiers, but we still lack the essential characteristics that form these different types of identity. If we are to see what factors are involved in European identity it might be worthwhile taking ‘Europe’ as a keyword.

3. How European identity is portrayed

3.1. Europe as a cognitive entity

Wordnet gives three main entries for Europe: a geological/geographical concept, a politico-economic unit and a vague general socio-political unit.

1. (28) Europe -- (the 2nd smallest continent (actually a vast peninsula of Eurasia); the British use ‘Europe’ to refer to all of the continent except the British Isles)

2. European Union, EU, European Community, EC, European Economic Community, EEC, Common Market, Europe -- (an international organization of European countries formed after World War II to reduce trade barriers and increase cooperation among its members; "he took Britain into Europe")

3. Europe -- (the nations of the European continent collectively; "the Marshall Plan helped Europe recover from World War II")

What is immediately obvious is the vagueness of all three concepts. The first entry is contradicted by the example as what is implied by referring to Britain is the mainland as opposed to the continental landmass. Geologically and geographically this is nonsense as the British Isles are clearly part of Eurasia. The second definition throws in some terms that are neither synonymous nor defined in terms of membership. The third is just a vague historical notion. In other words, Wordnet is not going to help in the quest for a European identity.

3.2. Europe as a lexicographical entity

Europe  noun [count]
            1          the large area of land that is between Asia and the Atlantic Ocean. It is one of the five continents of the world.
1a. BRITISH the whole of Europe apart from the UK
            2          the European Union:
There have been deep divisions in the party over Europe.

 (c) Macmillan Publishers Ltd. 2003


Eur·ope noun [U]
1the continent next to Asia in the east, the Atlantic Ocean in the west, and the Mediterranean Sea in the south: western / eastern / central Europe
2the European Union: countries wanting to join Europe  He’s very pro-Europe.
3(BrE) all of Europe except for Britain: British holidaymakers in Europe

(c) Oxford Advanced Learner’s Dictionary 2005

The dictionaries have a similar breakdown to that given in Wordnet, but both sources give a very anglocentric view of Europe, which is understandable given that Wordnet gets its glosses from a British dictionary, and that the two learner’s dictionaries used here are based on a reference corpus of British English. A prototypical analysis will give the following characteristics:

Europe is:

  • a continent
  • a large land mass
  • situated between Asia and the Atlantic Ocean
  • delimited to the south by the Mediterranean sea
  • composed of a number of countries
  • partly composed of a political unit called the European Union
  • seen a consisting only of the mainland European countries by the British

These entries imply two forms of identity: geographical and political. No more detail is given as to how these identities may be expressed is given in the definitions, but the examples imply that the political aspect can be divisive and that the mainland attracts holidaymakers from Britain.

The use of a prototype analysis (Hanks 1994) does clarify the issue slightly in showing how the meanings involved may be semantically linked, but it still gives no indication of what the identifying factors might be. Once again it is necessary to turn to a corpus.

3.3. Europe as a corpus unit

This analysis is based on the noun ‘Europe’, in a full survey it would be necessary to take into account all the forms of this word.

It is significant that Europe is a very high frequency word in this collection of texts. With 13222 occurrences, the word comes in at 123rd position, out of 200739. In all corpora the most frequent words are the grammatical units that carry no semantic meaning, which makes the position of ‘Europe’ as the 22nd most frequent lexical unit all the more important. Even more significant is that the lexical words that precede it are words turning around France, its government and ministers. ‘Euro’ is also one of the more frequent lexical items. This then, rather than the word ‘identité’ is most likely to give some clues as to how Europe is seen in the French press. Appendix 2 gives the 132 most frequent adjectives, nouns and verb collocates for ‘Europe’, the most frequent of which was ‘pays’ (country or countries).

3.3.1. Divided Europe

From the concordance of Europe and pays (countries) it is impossible to say whether we are looking at a geographical or political Europe. Although the dividing up is geographical, we are looking at countries, which are political units. Hence:

PAYS D ' EUROPE

Countries of Europe

EUROPE CENTRALE

Central Europe

EUROPE CONTINENTALE

Continental Europe

EUROPE DE L'EST

Eastern Europe

EUROPE DU NORD

Northern Europe

EUROPE DU SUD

Southern Europe

EUROPE OCCIDENTALE

Western Europe

   

CERTAINS PAYS D '

Some countries of ….

NOMBREUX PAYS D

Numerous countries of ….

DIFFÉRENTS PAYS D '

Different countries of ….

AUTRES PAYS D

Other countries of ….

PLUSIEURS PAYS D '

Several countries of ….

This allows for a comparative view of Europe, north v. south, central v. western, continental v. an unnamed entity which is probably Britain or the British Isles. We also have a political unit in PECO – Pays d’Europe Centrale et Orientale. A closer look at the concordance allows these units to be partially identified. Poland, Hungary and Romania are named as belonging to central Europe. PECO includes Poland, and also, in these concordance lines, Ukraine, Moldavia and Kazakhstan. Although PECO may have a restricted definition, the context of use in this corpus extends it central Europe plus the old Russian satellite states. However, some websites[3] name only the 12 new members, those who entered in 2004 plus Romania and Bulgaria. This makes PECO a very vague concept.

Another vague concept is that of continental Europe. This is not defined in these texts, although France is named as belonging to this unit. Eastern Europe is named as being Bulgaria, Estonia, Hungary, Latvia, Lithuania, Poland, Romania, Slovakia, Slovenia and the Czech Republic. Northern Europe is named as consisting of Denmark, Finland, Germany, Holland, Iceland, Norway, and Sweden. Great Britain is not in this group, but is classed as one of the Anglo-Saxon group, which also includes Canada and the United States. Southern Europe is said to include Albania, Bulgaria and Romania and ex-Yugoslavia. This does not give a full picture as the concordance line classes these as Southern Europe of the central block. In other contexts Southern Europe is defined as the ‘Club Med’, that is the European Mediterranean countries. Western Europe is the 15 of the third major enlargement of the EU, plus Norway. The result is a very complex system of groupings as can be seen in the comparative table in appendix 3.

The fact that we are dealing with different units that form groups of varying dimensions is reinforced by the adjectives. This can only mean that members of these prototypical groupings have some shared features of identity, maybe identity of interest, which need to be identified. To these we add:

les nouveaux pays adhérents d'Europe

New member states of Europe

la nouvelle Europe

New Europe

la vieille Europe

Old Europe

These imply a before and after situation with an old Europe, that has to be defined, the new countries, which represent a finite list, and the new Europe that must be a result of this unification. This of course remains to be verified.

Another way to look at Europe is to classify the nouns, adjectives and verbs as was done for ‘identity’. Here we shall only look at 48 of the nouns, those which fall into relatively clear categories.

3.3.2. Nominal Europe

Eight broad categories are identified here: area, capital, continent, country, culture, economy, politics and sport. Those of “area” and “continent” are primarily geographical, although as we have already seen ‘European union’ is part of a political space of varying size. “Country” is quite clear, it covers some major European and world economic and political powers; Turkey is present because of the negotiations that took place in 2004. It is obvious that this is a French view of the world, hence the only capital city present in the list being ‘Paris’. In reality ‘Paris’ may not always refer to the city itself, but by metonymy to the government of the day. The same will be true of the names of the countries; to really understand the concordances it is necessary to sort the contexts. This can be done with WordSmith by creating sets, but the addition of an attribute in the mark-up of names of people and places would render the analysis easier. As discussed above under ‘identity’, to understand the role of the country names it is necessary to see how each country is portrayed as this will show those seen as partners and those as adversaries in different contexts.

The other four categories, “culture”, “economy”, “politics” and “sport”, are more directly analysable in terms of European identity. “Culture” obviously covers a variety of aspects including common musical, literary and artistic heritage. In this list, which consists, it must be borne in mind, of only the first 100 full words co-occurring with the keyword ‘Europe’, only history is mentioned. History is an essential characteristic of identity; personal and national. It is part of the construction of identity and is a complex issue as it is the diachronic counterpart to the analysis of relationships between different countries, but seen from the viewpoint of an individual country and also through the prism of historical methodology.

PAYS

Countries

area

CONSEIL

Council

politics

RÉGIONS

Regions

area

CONSTITUTION

Constitution

politics

UNION

(European) Union

area

DÉBAT

Debate

politics

PARIS

Paris

capital

DÉFENSE

Defence

politics

ASIE

Asia

continent

DÉMOCRATIE

Democracy

politics

AFRIQUE

Africa

continent

ÉLARGISSEMENT

Widening

politics

AMÉRIQUE

America

continent

GOUVERNEMENT

Government

politics

ALLEMAGNE

Germany

country

GUERRE

War

politics

ESPAGNE

Spain

country

MEMBRES

Members

politics

ETATS

(United) States

country

MINISTRE

Minister

politics

FRANCE

France

country

OSCE

OSCE

politics

JAPON

Japan

country

PARLEMENT

Parliament

politics

NATIONS

(United) Nations

country

PRÉSIDENT

President

politics

RUSSIE

Russia

country

PS

Socialist Party

politics

TURQUIE

Turkey

country

PUISSANCE

Power

politics

HISTOIRE

History

culture

RÉFÉRENDUM

Referendum

politics

CROISSANCE

Growth

economy

SÉCURITÉ

Security

politics

ECONOMIE

Economy

economy

CHAMPION

Champion

sport

MARCHÉ

Market

economy

CHAMPIONNAT

Championship

sport

MARCHÉS

Markets

economy

CHAMPIONNATS

Championships

sport

PRIX

Prices

economy

CHAMPIONNE

Champion

sport

TRAVAIL

Work

economy

CHAMPIONS

Champions

sport

EMPLOI

Employment

economy

COUPE

Cup

sport

CITOYENS

Citizens

politics

FOOTBALL

Football

sport

Table 3. Categorisation of the noun collocates for ‘Europe’

“Economy” and “politics” cover a wide spectrum, including the obvious areas of socio-politico-economic identity such as defence, employment, markets and the institutions of government. These are all factors that must be analysed to see how world views may be shared or may differentiate. The last factor, “sport”, is possibly not one that comes up in political sciences, but is part of the life of many citizens. Whilst a political Europe may not contain Turkey, the football one does. This means that many people who will not necessarily read the political pages will see different countries as being in a sporting network, whilst referred to in very different political groupings. How countries are represented through sport may well have an effect on how the ordinary citizen will see these countries in political terms and therefore how they may enter a meaning of identity that goes beyond that of regional or national sports teams.

4. Conclusion

The first point that must be made after this rapid analysis of two words is that what is represented here is only the world view of one newspaper, Le Monde. It is thus heavily biased. The other factor is that it is not possible to subdivide the corpus by column. A newspaper is not a single genre, but a complex mixture of genre and thematic categories. To carry out a real analysis of attitudes to Europe it is vital to subdivide by category to see who is speaking and to whom. Any valid analysis must take into account the sociolinguistic factors of corpus design. This will be done in the full study which will be multi-source and categorised using methods of external and internal classification calling on prototype theory to create groupings of variable geometry.

The second point is one of method. This is a very rapid lexical analysis using surface parameters of co-occurrence. Meanings are not made with words alone but with an interaction of lexis and syntax in context. Context means text and co-text, meaning goes beyond the sentence and beyond the texts to the aspects of context of situation and culture. A full concordance analysis would take into account these broader factors with a mix of quantitative and qualitative analyses.

A third and vital point is the nature of corpus linguistics. This is not the pilot study, but a preliminary study prior to the pilot study. Although some research procedures can be automated, early automation is a dangerous exercise as only an in-depth study will allow a full appreciation of the parameters at play. Even after automation will remain dangerous as once a list of factors have been drawn up only these will be followed through, thereby ignoring new factors that will come up over the four years of the INTUNE project.

So what is the next stop?

Obviously we must now start the task of analysing our pilot corpus, both in its written and spoken forms. To do this the texts will be converted into machine readable formats using XML, a metalanguage that allows the annotation of texts. This task is currently underway, but requires close cooperation between the four national teams in the media group as we must develop a common methodology for the encoding of the texts. In the case of the French group the initial conversion of the texts has been carried out, but using our interpretation of the Text Encoding Initiative, the international standards for corpus annotation. Our interpretation is designed to cover some of the computing problems we have encountered; those of our colleagues from the UK, Italy and Poland will be different thereby requiring adjustments before a common analysis protocol can be developed. This will require time, but it is time well spent as the encoding and analysis of this corpus will open the road to our main analysis later. What has been presented here are but a few clues drawn from one source in one language. When the four pilot corpora are brought together the product will be infinitely richer.

References

Church K. and Mercer R. L. 1993. “Introduction to the special issue on Computational Linguistics”. Computational Linguistics 19. 1-24.

Church K., Gale W., Hanks P., Hindle D., Moon R. 1994. “Lexical Substitutability”. In Atkins and Zampolli (eds.) Computational Approaches to the Lexicon. Oxford: Clarendon Press. 153-177.

Hanks P. 2000. “Do word meanings exist?”. In A. Kilgarriff and M. Palmer (eds.) Senseval: Evaluating Word Sense Disambiguation Programmes. Special issue of Computers and the humanities 34.1-2, 205-215.

Partington A. 2005. Aims, tools and practices of Corpus Linguistics. INTUNE Working paper.

Sinclair J. McH. (ed) 1987. Looking Up: an account of the COBUILD Project in Lexical Computing. London: Collins.

Williams G. 1998. “Collocational Networks: Interlocking Patterns of Lexis in a corpus of plant biology”. International Journal of Corpus Linguistics 3.1, 151-171.

Williams G. 2002.In search of representativity in specialised corpora: categorisation through collocation”. International Journal of Corpus Linguistics 7.1, 43-64.

Appendices

Appendix 1. 100 most frequent lexical units co-occurring with “identité”.

AMÉRICAINE

adjective

LIEU

noun

AUTRE

adjective

MÉMOIRE

noun

BIEN

adjective

MONDE

noun

CHRÉTIENNE

adjective

NOM

noun

COLLECTIVE

adjective

NOMBRE

noun

CULTURELLE

adjective

ORIGINES

noun

EUROPÉENNE

adjective

PALESTINIEN

noun

FAUSSE

adjective

PAPIERS

noun

FORTE

adjective

PARTIE

noun

FRANÇAIS

adjective

PAYS

noun

FRANÇAISE

adjective

PERSONNAGES

noun

HISTORIQUE

adjective

PERSONNES

noun

IMAGINAIRE

adjective

PERTE

noun

JEUNE

adjective

PEUPLE

noun

JEUNES

adjective

PHOTO

noun

JUIVE

adjective

PHOTOS

noun

MASCULINE

adjective

PIÈCE

noun

MUSULMANE

adjective

PIÈCES

noun

NATIONALE

adjective

PLACE

noun

NOUVELLE

adjective

POLITIQUE

noun

PALESTINIENNE

adjective

PROBLÈME

noun

PERSONNELLE

adjective

QUESTION

noun

PROPRE

adjective

QUÊTE

noun

RÉGIONALE

adjective

RAVISSEURS

noun

SEXUELLE

adjective

RECHERCHE

noun

SOCIALE

adjective

SOCIALISTE

noun

VÉRITABLE

adjective

TEMPS

noun

VISUELLE

adjective

UNIS

noun

PS

noun

VALEURS

noun

AFFIRMATION

noun

AFFIRME

verb

AMÉRICAINS

noun

AFFIRMER

verb

CARTE

noun

CHERCHE

verb

CARTES

noun

COMMENT

verb

COEUR

noun

CONSTRUIRE

verb

COMMUNE

noun

CONSTRUIT

verb

CONSTRUCTION

noun

DÉCOUVRIR

verb

CONTRÔLE

noun

DÉFENDRE

verb

CONTRÔLES

noun

DÉFINIR

verb

CRISE

noun

DEVENIR

verb

CULTURE

noun

DÉVOILER

verb

DÉFENSE

noun

DONNER

verb

DOCUMENTS

noun

EXPLIQUE

verb

EMPRUNT

noun

FACE

verb

ETATS

noun

FORGER

verb

EUROPE

noun

INTERROGE

verb

FRANCE

noun

PERDRE

verb

HISTOIRE

noun

PREND

verb

HOMME

noun

RENFORCER

verb

IDÉE

noun

RESTE

verb

IDENTITÉ

noun

RÉVÉLER

verb

Appendix 2. 133 most frequent lexical units co-occurring with “Europe”.

UNIS

adjective

JAPON

noun

FRANCE

adjective

UNION

noun

CENTRALE

adjective

BESOIN

noun

SOCIALE

adjective

PLACE

noun

ÉLARGIE

adjective

PARTIE

noun

POLITIQUE

adjective

PRIX

noun

PREMIER

adjective

HISTOIRE

noun

NORD

adjective

GROUPE

noun

ORIENTALE

adjective

JUIN

noun

NOUVELLE

adjective

TEMPS

noun

OCCIDENTALE

adjective

CONSTRUCTION

noun

VIEILLE

adjective

IDÉE

noun

GRANDE

adjective

DIMANCHE

noun

MOINS

adjective

MINISTRE

noun

SUD

adjective

EMPLOI

noun

EUROPÉENNE

adjective

MILLIONS

noun

AUTRES

adjective

DÉFENSE

noun

FORTE

adjective

PS

noun

GAUCHE

adjective

ENSEMBLE

noun

FRANÇAIS

adjective

EUROPÉENS

noun

GRAND

adjective

MARDI

noun

PLUSIEURS

adjective

OSCE

noun

ÉCONOMIQUE

adjective

GUERRE

noun

PREMIÈRE

adjective

EXPRESS

noun

LIBÉRALE

adjective

MEMBRES

noun

NOUVEAUX

adjective

AFRIQUE

noun

LATINE

adjective

PARIS

noun

MIEUX

adjective

QUESTION

noun

GRANDS

adjective

RÉGIONS

noun

NOUVEAU

adjective

ALLEMAGNE

noun

TROP

adjective

AVRIL

noun

EUROPÉEN

adjective

ESPAGNE

noun

AMÉRICAIN

adjective

RUSSIE

noun

AMÉRICAINE

adjective

CITOYENS

noun

PARTICULIER

adjective

MODÈLE

noun

UNIE

adjective

RAPPORT

noun

AUJOURD'hui

adverb

RECHERCHE

noun

SEULEMENT

adverb

RÉFÉRENDUM

noun

MAINTENANT

adverb

RÔLE

noun

AUTANT

adverb

SEPTEMBRE

noun

PAYS

noun

DOSSIER

noun

ETATS

noun

MARCHÉS

noun

MONDE

noun

DÉMOCRATIE

noun

AMÉRIQUE

noun

TITRE

noun

TURQUIE

noun

CHAMPIONNE

noun

CONSEIL

noun

FIN

noun

ASIE

noun

JACQUES

noun

COUPE

noun

JEUDI

noun

AVENIR

noun

LUNDI

noun

CHAMPIONNATS

noun

SITUATION

noun

CHAMPIONNAT

noun

COMPTE

noun

CHAMPION

noun

FOOTBALL

noun

CONSTITUTION

noun

TRAVAIL

noun

NATIONS

noun

CHAMPIONS

noun

ANS

noun

GOUVERNEMENT

noun

ECONOMIE

noun

LIEU

noun

PRÉSIDENT

noun

ÉTÉ

verb

PARLEMENT

noun

AVAIT

verb

SÉCURITÉ

noun

CONSTRUIRE

verb

COOPÉRATION

noun

VEUT

verb

PUISSANCE

noun

DIRE

verb

ÉLARGISSEMENT

noun

DEVRAIT

verb

PROJET

noun

POURRAIT

verb

CROISSANCE

noun

VEULENT

verb

MARCHÉ

noun

DÉCLARÉ

verb

DÉBAT

noun

EXPLIQUE

verb

Appendix 3. Comparative table showing memberships of three ‘Europes’

Click to enlarge image

View larger image


(DOI 10.1473/media71)

[1] Wordnet can be consulted online or a database downloaded from http://wordnet.princeton.edu/perl/webwn.

[2] www/lexically/net.

[3] For example http://www.cra-normandie.fr/peco/repere3.htm, consulted 19/01/05.

Università degli Studi di Bologna e Gedit Edizioni
area articoli Dipartimento SITLEC