Tuesday, June 20, 2006

patents assigned to the national security agency

the following patents are all issued to the national security agency, and describe one proprietary method the nsa may be using to analyze the network traffic it intercepts by tapping into data streams at telecommunications companies such as at&t:


United States Patent 6704449
Method of extracting text from graphical images
US Patent Issued on March 9, 2004

The present invention relates to a method for extracting text from an image which is typically displayed on web pages of the World Wide Web. The character recognition will be determined to be successful by comparing the extracted text to a lexicon of legitimate words, computing the likelihood of the sequence of characters (e.g. "Qxv" is highly unlikely in English, whereas "com" is relatively common), or character recognition software which can provide a confidence measure for each character based upon how well its pixels matched the nominal template or features.


United States Patent 7020338
Method of identifying script of line of text
US Patent Issued on March 28, 2006

Script identification is a useful preprocessing step in automatic document recognition. Most optical character recognition (OCR) devices are trained to recognize a limited set of scripts. If an OCR device was presented with a document that includes text printed in a script for which the OCR device was not trained to recognize then the OCR device would not be able to process the document correctly. So, there is a need for a method of identifying each script in which a document is printed so that an OCR device that was trained in all of the scripts can be identified and used to process the document.


United States Patent 6904564
Method of summarizing text using just the text
US Patent Issued on June 7, 2005

Prior art methods of processing text typically incorporate linguistic knowledge that is not resident in the text (e.g., document) being processed. Prior art text summarization methods often rely on a collection of exemplary text that is external to the text being processed to assess the role a word plays in the text being processed. For those methods that rely on a collection of exemplary text, it is difficult, if not impossible, to generate a single collection of exemplary text that can be used to successfully summarize textual documents on widely different topics because a word in one context may have a different meaning in another context. This problem is often overcome in the prior art by generating multiple collections of exemplary text, where each collection is tailored to a specific topic (e.g., scientific, financial). Generating a collection of exemplary text is difficult, time consuming, and prone to error (e.g., biases of those generating the collection).


United States Patent 6990634
Method of summarizing text by sentence extraction
US Patent Issued on January 24, 2006

The field of automatically summarizing text consisting of a collection of sentences has been studied for over forty years. However, automatic text summarization has received greater attention recently because of its pervasive use in present information retrieval systems. One type of text summarization method consists of extracting a number of sentences from the text that convey the essential points of the text. The number of sentences extracted from a text may be few to present only enough information to allow a user to determine whether or not to read the entire text or many to act as a substitute for the text.