Ad
related to: how to measure similarity in research paper definition
Search results
Results From The WOW.Com Content Network
Bibliographic coupling, like co-citation, is a similarity measure that uses citation analysis to establish a similarity relationship between documents. Bibliographic coupling occurs when two works reference a common third work in their bibliographies. It is an indication that a probability exists that the two works treat a related subject matter.
A similarity measure can take many different forms depending on the type of data being clustered and the specific problem being solved. One of the most commonly used similarity measures is the Euclidean distance, which is used in many clustering techniques including K-means clustering and Hierarchical clustering. The Euclidean distance is a ...
Like bibliographic coupling, co-citation is a semantic similarity measure for documents that makes use of citation analyses. The figure to the right illustrates the concept of co-citation and a more recent variation of co-citation which accounts for the placement of citations in the full text of documents.
Normalized compression distance (NCD) is a way of measuring the similarity between two objects, be it two documents, two letters, two emails, two music scores, two languages, two programs, two pictures, two systems, two genomes, to name a few. Such a measurement should not be application dependent or arbitrary.
Other variations include the "similarity coefficient" or "index", such as Dice similarity coefficient (DSC). Common alternate spellings for Sørensen are Sorenson , Soerenson and Sörenson , and all three can also be seen with the –sen ending (the Danish letter ø is phonetically equivalent to the German/Swedish ö, which can be written as oe ...
Obviously, similarity of other domain-specific aspects are important as well; these can — and should be combined with relational structural-context similarity for an overall similarity measure. For example, for Web pages SimRank can be combined with traditional textual similarity; the same idea applies to scientific papers or other document ...
In statistics, Gower's distance between two mixed-type objects is a similarity measure that can handle different types of data within the same dataset and is particularly useful in cluster analysis or other multivariate statistical techniques. Data can be binary, ordinal, or continuous variables.
Computationally, semantic similarity can be estimated by defining a topological similarity, by using ontologies to define the distance between terms/concepts. For example, a naive metric for the comparison of concepts ordered in a partially ordered set and represented as nodes of a directed acyclic graph (e.g., a taxonomy ), would be the ...