english pure theory

Who is Gerard Salton?

Born Gerhard Anton Sahlmann on 8th March 1927 in Nuremberg and died on 29th August 1995, scientist, professor and researcher in computer science, he is considered a pioneer in the information search industry and the « father » of web-based information retrieval. He is credited with the first vector model for information retrieval and the development of the SMART Information Retrieval System… Let us introduce Gerard Salton.

Forced to flee during World War II, he reached the United States in 1947. He graduated in mathematics in 1950, then got his master’s degree in 1952 at Brooklyn College. In 1958, he graduated with a PhD in applied mathematics from Harvard University, where he worked as a professor until 1965. There, he led the group that created the SMART model (System for the Mechanical Analysis and Retrieval of Text), an information retrieval system.

He then joined Cornell University and co-founded the computer science department where he would teach for the rest of his life. As a member of the Association for Computing Machinery (ACM), he became its Communications and Journal Editor. He served as a board member for seven years.

At the origin of the discipline called « information retrieval », Gerard Salton invented and structured a large part of what would later be used by search engines. He created the algorithms including the relevancy measurement and the Salton’s Cosine.

Widely used nowadays, Salton was one of the first to develop this vectorial space model for information retrieval. In this model, documents and queries are mapped as vectors and the similarity between the document and the query is given by the cosine between the semantic vector and the document vector.

The more the angle is narrow, the more the vectors are aligned, the more the texts are identical. Furthermore, the smaller the angle between the query vector (Q) and the document vector (D), the more relevant the document is to the query.

In order to compare several Web pages and identify those that best match a user’s query, search engines use various mechanisms, including Salton’s Cosine.

Moreover, in SEO, the Salton’s Cosine also plays a key role. By calculating the similarity degree between two pieces of content, this method makes it possible to detect « duplicate content ». Two Web pages with similar content will have the same vector values and the cosine angle between them will therefore be equal to 1.

He is the author of no less than 150 research papers, of which many relate to information retrieval. He was awarded numerous honours, the most prestigious of which were the Guggenheim Fellowship in 1962, the ASIS Award for Best Information Science Paper in 1970, the Best Information Science Book in 1975, and the ASIS Merit Award in 1989. He was the first recipient of the SIGIR Award for his outstanding work in the study of information retrieval. This award is now being named… Gerard Salton.

In addition to these numerous scientific contributions, Gerard Salton is known as an inspiring teacher whose students in turn made their mark in the 1970s and 1980s. Among them, Karen Sparck-Jones, who created the TF-IDF, a relevancy measurement that assigns a weight to words in a text in order to better understand what this one is about; Stephen Robertson, who, following on from Karen Sparck-Jones’ work, introduced a more advanced text weighting model. And finally, Amit Singhal, Gerard Salton’s PhD student, who, right after joining Google, will completely recode half of the search engine as an extension of Gerard Salton’s ideas.

Some of Gerard Salton’s literature:

Salton, Gerard and Michael J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983

Translated by N.PIQUERO.

Nicolas Piquero is a seasoned SEO with 6 years experience. He worked as an in-house, agency and freelance SEO, with various major european companies. Being both a and power user, it was obvious that he could be an ambassador for babbar’s tools. So, being based in London, he is the first babbar ambassador for the UK.