LINES OF THE DAY

". . . But the past does not exist independently from the present. Indeed, the past is only past because there is a present, just as I can point to something over there only because I am here. But nothing is inherently over there or here. In that sense, the past has no content. The past -- or more accurately, pastness -- is a position. Thus, in no way can we identify the past as past." p. 15

". . . But we may want to keep in mind that deeds and words are not as distinguishable as often we presume. History does not belong only to its narrators, professional or amateur. While some of us debate what history is or was, others take it into their own hands." p. 153

Silencing the Past: Power and the Production of History (1995) by Michel-Rolph Trouillot

Saturday, March 5, 2011

I Still Want To Know ....

Who scans all these dox so the computer can data mine them?  For that's what this is, data mining, i.e. pattern recognition -- not analysis.

Armies of Expensive Lawyers, Replaced by Cheaper Software

“There is no reason to think that technology creates unemployment,” Professor Autor said. “Over the long run we find things for people to do. The harder question is, does changing technology always lead to better jobs? The answer is no.”
Automation of higher-level jobs is accelerating because of progress in computer science and linguistics. Only recently have researchers been able to test and refine algorithms on vast data samples, including a huge trove of e-mail from the Enron Corporation.
“The economic impact will be huge,” said Tom Mitchell, chairman of the machine learning department at Carnegie Mellon University in Pittsburgh. “We’re at the beginning of a 10-year period where we’re going to transition from computers that can’t understand language to a point where computers can understand quite a bit about language.”
Nowhere are these advances clearer than in the legal world.

E-discovery technologies generally fall into two broad categories that can be described as “linguistic” and “sociological.”

The most basic linguistic approach uses specific search words to find and sort relevant documents. More advanced programs filter documents through a large web of word and phrase definitions. A user who types “dog” will also find documents that mention “man’s best friend” and even the notion of a “walk.”

The sociological approach adds an inferential layer of analysis, mimicking the deductive powers of a human Sherlock Holmes. Engineers and linguists at Cataphora, an information-sifting company based in Silicon Valley, have their software mine documents for the activities and interactions of people — who did what when, and who talks to whom. The software seeks to visualize chains of events. It identifies discussions that might have taken place across e-mail, instant messages and telephone calls.
Then the computer pounces, so to speak, capturing “digital anomalies” that white-collar criminals often create in trying to hide their activities.
And etc.  It's very interesting.  But it doesn't answer the question I asked at the top.

1 comment:

Foxessa said...

Ah, I found out the answer to my question via a friend who turns out to be one of those who has created the programs that do this work.

There are whole agencies who pay less than what data enterers got paid to do this. They are contracted by the law firms, and cds are then created.

It was a lot more fun and paid better to be part of creating the program(s) that do this.

Love, C.