english pure theory

Who is Karen Spärck Jones?

Scientist and researcher in computer science, her work focused on natural language processing and information retrieval. She developed the TF-IDF methodology, a weighted relevance measure that is still used in most search engines today. Please meet Karen Spärck Jones.

Born on 26th August 1935 in huddersfield in the UK, Karen Spärck Jones joined Girton College, one of the Cambridge University colleges, in 1953. There, she studied history and philosophy in the moral sciences department. She met Margaret Masterman, head of the Cambridge Linguistic Research Unit, who inspired her to work in this field.

She began working for Margaret Masterman with the aim of programming a computer so that it could understand words with multiple meanings, and thus began programming a thesaurus. Her 1964 paper, Synonymy and semantic classification, is considered a founding and fundamental document in the natural language processing field.

In 1972, Karen Spärck Jones published an article in the Journal of Documentation, « A statistical interpretation of term specificity and its application in retrieval », in which she provides the basis for search engines by combining statistics and linguistics. She outlines how computers interpret the relationships between words. This model involves a weighted relevance measure that assigns weight to words in a text in order to better understand what the text is about. This model is nowadays used in most search engines – with many evolutions – under the name TF-IDF.

In information retrieval, once documents are identified as being able to answer a query, they must be sorted by relevancy. The use of TF-IDF makes it possible to draw up a document description in a vector model.

TF stands for « Term Frequency ». It consists of defining the frequency of a word in a document.

IDF stands for « Inverse Document Frequency ». It consists of assessing whether a word in a document is rare or not in the general language, based on the idea that the rarest words are always more meaningful.

By associating the TF factor with the IDF one, we can thus link the « physical » presence of the word in a text with the weight of its « general » importance. As such, TF-IDF makes it possible to assess the relevancy of a specific keyword in a text.

For SEO purposes, TF-IDF is useful for optimising content. It allows you to aim for a better ranking in the SERPs for certain words – even though nowadays we tend to prefer more efficient developments that take into account forms of co-occurrence or even context vectors.

From 1974 onwards, Karen Spärck Jones worked in the Computer Science Laboratory at Cambridge University. In the 1980s, she started working on speech recognition systems. In 1982, the British government asked her to work on the Alves Program, an initiative aiming to promote computer science research in the country. In 1994, Karen Spärck Jones became President of the Association for Computational Linguistics, an international group of computing professionals. In 1999, she became Professor of Computer Science and Information at Cambridge University before retiring in 2002.

As a natural language processing specialist and women’s advocate in the industry, Karen Spärck Jones has been ahead of the curve on many topics. She has been awarded numerous distinctions: the ACM SIGIR Gerard Salton Award in 1988, the ACL Award in 2004, the ACM/AAAI Allen Newell Award in 2006 and the British Computer Lovelace Medal. Microsoft and the British Computer Society award the BCS IRSG Karen Spärck Jones Award for discoveries in enhancing information retrieval and natural language processing.

Her ideas, which were not initially valued, are now being implemented and continue to inspire. She also mentored a generation of researchers, both men and women, and coined the slogan « Computing is too important to be left to men ».

Translated by Nicolas Piquero.
Nicolas Piquero is a seasoned SEO with 6 years experience. He worked as an in-house, agency and freelance SEO, with various major european companies. Being both a and power user, it was obvious that he could be an ambassador for babbar’s tools. So, being based in London, he is the first babbar ambassador for the UK.

Utiliser Babbar pour sa strategie de netlinking