语料库及其基本操作杨林伟烟台大学外语教育技术研究中心语料库的概念及其发展简述自建小型语料库语料库工具、软件教学实践与应用1234语料库的概念及其发展简述1Acorpusisacollectionofpiecesoflanguagetextinelectronicformselectedaccordingtoexternalcriteriatorepresentasfaraspossiblealanguageorlanguagevarietyasasourceofdataforlinguisticresearch.(Sinclair,1991)acollectionofsampledtexts,writtenorspoken,inmachinereadableformwhichmaybeannotatedwithvariousformsoflinguisticinformation.(McEneryetal.2006)语料库的定义语料库的概念及其发展简述1alargecollectionofwell-sampledandprocessedelectronictexts,onwhichlanguagestudies,theoreticalorapplied,canbeconductedwiththeaidofcomputertools.ByBFSUCRGmembers语料库的定义语料库的概念及其发展简述11959:SEU(SurveyofEnglishUsage)thefirstattempttoprovideanongoingcollectionofpresent-dayEnglish…wasaprecursoroflatercorpora
suchastheBritishNationalCorpusandtheAmericanNationalCorpus.1961:TheBrownCorpuswasthefirstcomputer-readablegeneralcorpusoftextspreparedforlinguisticresearchonmodernEnglishatBrownUniversity.语料库百万词级语料库的概念及其发展简述11970s:TheLancaster-Oslo/BergenCorpus(LOBCorpus)wascompiledtoprovideaBritishcounterparttotheBrownCorpus.1975:TheLondonLundCorpus(LLC)wasthecomputerisedspokenpartofSEU,usedasthebasisforthefamousComprehensiveGrammar(Quirketal.1985).语料库百万词级语料库的概念及其发展简述11980s:COBUILD(Collins-BirminghamUniversityInternationalLexicalDatabase).In1991,thesuccessoftheCOBUILDledtothedevelopmentofalarge
monitorcorpus,theBankofEnglish.
1980s:LONGMAN/LANCSTERCorpus.AspartoftheLongmanCorpusNetwork,theLongman/LancasterCorpusisnotavailableforpublicaccess.
语料库千万词级语料库的概念及其发展简述11980s—early1990s:BNC(BritishNationalCorpus)1亿1990s:COCA(TheContemporaryAmericanEnglish)4.5亿语料库亿词级语料库的概念及其发展简述1Late1990s—2002:ICLE(TheInternationalCorpusofLearnerEnglish)Late1990s:CLEC(ChineseLearnerEnglishCorpus)HKUSTLearnerCorpusSeeMoreCorpora:http://www.lancaster.ac.uk/fass/projects/corpus/cbls/corpora.asp热点:学习者语料库语料库的概念及其发展简述1热点:双语语料库TheBFSU(BeijingForeignStudiesUniversity)Chinese-EnglishParallelCorpuscontains30millionwords.PresentlyitisthelargestparallelcorpusofEnglishandChinese.Thecorpusiscomposedoffoursubcorpora,i.e.BalancedCorpus,TranslationCorpus,BilingualSentencesCorpusandCorpusforSpecificPurpose.语料库的概念及其发展简述1热点:网络语料库WaCWfCWa/fC检索工具、软件语料库工具、软件2WordSmithToolsMonoConc/ParaConcAntConc:freeware,copyleftXaira:BNCCQPWeb:SketchEngine,BFSUCQPWebWebCorp检索工具、软件语料库工具、软件2检索工具、软件语料库工具、软件2KWICWordlistandCollocationN-gramPractice1检索工具、软件语料库工具、软件2Practice1语料库标注工具语料库工具、软件2StanfordPOStaggerTreeTaggerCLAWS5语料库标注工具语料库工具、软件2StanfordPOStaggerTreeTaggerCan_MDyou_PPcan_MDa_DTcan_NNas_INa_DTcanner_NNcan_MDcan_MDa_DTcan_NN?_SENTCan/MDyou/PRPcan/MDa/DTcan/MDas/INa/DTcanner/NNcan/MDcan/MDa/DTcan/MD?/.11/13=84.69/13=69.2Practice2语料库文本处理工具语料库工具、软件2EditpadProPowerGrepRegexBuddyRegexRegularexpression正则表达式wordlessbaw*bd+bw{6}b语料库文本处理工具语料库工具、软件2Practice3RemovethetagsRemovethewordsCollectallthesentencesofthestructure: