LLM and Cosine Distance for .NET Developers

Introduction: When working with text data or any data represented as vectors, measuring the similarity between vectors is crucial for various applications such as recommendation systems, information retrieval, and natural language processing. One popular similarity metric is the cosine similarity, which provides a way to determine how closely related two vectors are. In this blog post, I will describe the concept of cosine similarity and provide a code in C#. The definition of Cosine Similarity? Cosine similarity is a measure used to determine the similarity between two vectors. It calculates the cosine of the angle between the two vectors, hence the name cosine similarity. The resulting value ranges from -1 to 1, where 1 indicates identical vectors, 0 indicates no similarity, and -1 indicates completely opposite vectors. How does Cosine Similarity work? To understand how cosine similarity works, let's consider two vectors, A and B, in a multi-dimensional space. Each dimension represents a feature or attribute. Cosine similarity calculates the cosine of the angle between the two vectors, which can be interpreted as a measure of their alignment or similarity. The formula for cosine similarity is as follows: cosine_similarity = (A dot B) / (||A|| * ||B||) Here, dot represents the dot product between vectors A and B, and ||A|| and ||B|| represent the magnitudes (cardinalities or lengths) of vectors A and B, respectively. Following is the code in C# that calculates the cosime similarity (and optionally cosine distance) public class CosineDistanceCalculator : ISimilarityCalculator { public double CalculateSimilarity(double[] embedding1, double[] embedding2) { if (embedding1.Length != embedding2.Length) { return 0; } double dotProduct = 0.0; double magnitude1 = 0.0; double magnitude2 = 0.0; for (int i = 0; i < embedding1.Length; i++) { dotProduct += embedding1[i] * embedding2[i]; magnitude1 += Math.Pow(embedding1[i], 2); magnitude2 += Math.Pow(embedding2[i], 2); } magnitude1 = Math.Sqrt(magnitude1); magnitude2 = Math.Sqrt(magnitude2); if (magnitude1 == 0.0 || magnitude2 == 0.0) { throw new ArgumentException ( embedding must not have zero magnitude. ); } double cosineSimilarity = dotProduct / (magnitude1 * magnitude2); return cosineSimilarity; // Uncomment this if you need a cosin distance instead of similarity //double cosineDistance = 1 - cosineSimilarity; //return cosineDistance; } } Applications of Cosine Similarity: Text Document Comparison: Cosine similarity is widely used in text mining and natural language processing to compare and rank documents based on their similarity. It can be used to build search engines, plagiarism detectors, and document clustering algorithms. Recommendation Systems: Cosine similarity is leveraged in collaborative filtering algorithms to recommend items to users based on their similarity to other users or items. It helps identify similar user preferences or item characteristics to make personalized recommendations. Image and Audio Processing: Cosine similarity can be applied to image and audio feature vectors to measure similarity between images, music tracks, or audio clips. It has applications in content-based image retrieval and audio fingerprinting. Conclusion: Cosine similarity is a powerful tool for measuring the similarity between vectors. It has diverse applications in various domains, including text analysis, recommendation systems, and multimedia processing. By understanding cosine similarity, you can leverage its capabilities to solve problems involving vector comparison and similarity assessment, enabling you to build more efficient and accurate data-driven solutions. Remember, cosine similarity is just one of many similarity metrics available, and its suitability depends on the specific use case. As you dive deeper into the world of vector comparison, explore other similarity measures to find the most appropriate one for your needs. References: Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. Salton, G., & McGill, M. J. (1986). Introduction to Modern Information Retrieval. McGraw-Hill. Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446. Witten, I. H., Moffat, A., & Bell, T. C. (1999). Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann.

zum Artikel gehen

BRUDERER BSTA 200M [23524]

Bruderer BSTA200M High-performance automatic punching press. Max. pressure: 200 [kN] (20 [T]). Speed range: 100 1800 [spm]. Year of manufacture: 1996. Feed BBV 101/75 on the right-hand side. Max. belt width: 160 [mm]. Max. feed distance: 85 [mm]. Adjusta

zum Artikel gehen

This is a Sticky Post

This is default paragraph. Morbi sagittis sem quis lacinia faucibus, this is a text link orci ipsum gravida tortor, vel interdum mi sapien ut justo. Nulla varius consequat magna, id molestie ipsum volutpat quis. Suspendisse consectetur fringilla luctus. F

zum Artikel gehen

Making the Smart Factory

Dear all, just want to point out our PodCast. Madeleine Mickeleit - IoT in interview with me and Stefan Endorff Temlead Digital Transformation Office bei Fränkische Industrial. Stefan is leading an extrimely exiting Smart Factory project, mosly covered by

zum Artikel gehen

Microsofts neuer Browser: EDGE

This announcement came at today's Microsoft Build developer event in San Francisco. Microsoft Edge is positioned as a super-modern, super-fast web browser that provides lots of contextual, personalized information. It integrates with Cortana, Microsoft's

zum Artikel gehen

How to Adjust Bow Sight Most Common Sight Adjustment Issues

When the bow and arrow first came about, it would take a great deal of skill and practice to become accurate with it, especially at long range. This was due to the materials available at the time which made shooting the bow consistently rather challenging

zum Artikel gehen