Page 1 of 1

Evaluation using metrics

Posted: Wed Jan 22, 2025 4:45 am
by Maksudasm
When using numbers in the model, in particular, rating predictions, probability of coincidence, the effectiveness of the recommender system can be assessed in a standard way - based on measuring the number of errors. For example, the mean square error (MSE) will do. The model uses some interactions for training, and the rest are used for testing.

For a model based on numeric values, it is acceptable to use a binary transformation using a standard thresholding method. Results greater than this are positive, and results less than this are negative. Since the set of data on the user element's previous interactions is binary, precision and recall can be measured for interactions not involved in training.

If we consider a recommendation overseas chinese in australia data system that does not use numeric values, but returns only a list of recommendations (client-client, item-item based on the knn method), then the accuracy can be calculated using an estimate of the proportion of recommended items that the user has not interacted with. In this case, only the information included in the test set for which there are customer reviews should be taken into account.

Download a useful document on the topic:

Checklist: How to Achieve Your Goals in Negotiations with Clients
Human Based Assessment
When creating a recommendation system, it is necessary to obtain a model that is not only capable of providing relevant recommendations, but also has other useful characteristics, in particular, diversity and predictability of recommendations.

Of course, no one wants the user to face an unsolvable problem in the area of ​​information limitation. The term "randomness" is often used to characterize the tendency inherent in the model or creating such a limit area. Serendipity, which can be estimated using the calculation of the distance between recommended positions, cannot be too low, since this will create limit zones. At the same time, it cannot be too high, since this means that the interests of users are not taken into account to the necessary extent.

Human Based Assessment

In order for the choice offered to the consumer to be sufficiently diverse, it is necessary to include both products that match his preferences and those that are not too similar to each other. For example, fans of the Fast and the Furious franchise will enjoy all parts – from the first to the eighth. But instead of including the other 7 in the recommendations for the first part, it would be better to recommend the user, along with Fast and the Furious, the films Gone in 60 Seconds and Racer.

An important role in forming the user's trust in the recommendation system is its explainability. It has been proven empirically: if the client does not understand why certain products are recommended to him, the authority of the system falls. In this regard, it is useful to add a short explanation to the suggested products: "Also often purchased together with this product...", "Perhaps you will also be interested in this product...".

It should be taken into account that, along with the complexity of assessing the categories of diversity and explainability, it is no less difficult to determine the effectiveness of a recommendation that does not belong to the test data set. For example, how to find out whether an offer is relevant before providing it to a client? This encourages testing the recommendation model in practice. Since the main task of a recommendation system is to encourage a given action (following a link, listening to a track, etc.), a conclusion about its effectiveness is made based on the user's performance. In particular, A/B testing can be used, or only a sample of users, but such processes require a sufficient level of trust in the model.