dinsdag 3 april 2012

Waarom full-text-search niet zaligmakend is

Ik had het al een tijdje niet meer gehoord of gelezen, maar deze week was het weer raak: "Waarom hebben we metadata en ordening nodig, als we slimme zoekmachines hebben?"
Nou hierom... Uit een onderzoek van Blair & Maron bleek enkele jaren geleden dat advocaten met behulp van een zoekmachine uit een collectie van 40.000 documenten, slechts 20% van de relevante documenten vonden. Voor het gemak citeer ik dat onderzoek hier tweedehands (pdf, p.24). Het gaat om een onderzoek naar een ongeluk op een metrostation.
In the legal case in question, one concern of the lawyers was an accident that had occurred and was an object of litigation. The lawyers wanted all the reports, correspondence, memoranda, and minutes of meetings that discussed this accident. Formal queries were constructed that contained the word ‘accident’ along with the names of the [city] where it occurred. In the search for unretrieved relevant documents, the experimenters later found that the accident was not always referred to as an ‘accident,’ but as an ‘event,’ ‘incident,’ ‘situation,’ ‘problem,’ or ‘difficulty,’ often without mentioning the relevant proper name – the name of the city in which it occurred. The manner in which an individual referred to the accident was frequently dependent on his or her point of view. Those who discussed the event in a critical or accusatory way referred to it quite directly – as an ‘accident.’ Those who were personally involved in the event, and perhaps culpable, tended to refer to it euphemistically as, inter alia, an ‘unfortunate situation,’ or a ‘difficulty.’ Sometimes the accident was referred to obliquely as ‘the subject of your last letter,’ ‘what happened last week was...,’ or, as in the opening lines of the minutes of a meeting discussing the issue, ‘Mr. A: We all know why we’re here....’ [the words ‘accident’ and the name of the city were not used at any time in the meeting either]. Sometimes relevant documents dealt with the problem by mentioning only the technical aspects of why the accident occurred, but neither the accident itself no[r] the people or place involved. Finally, much relevant information discussed [contributing factors in] the situation prior to the accident and, naturally, contained no reference to the accident itself.
En dan blijven typefouten of slechte OCR nog buiten beschouwing!

Gerelateerd
Hoeveel is veel


Plaatje: The Droids we're googling for van Stéfan

Geen opmerkingen:

Een reactie posten