What this algorithm does?

This algorithm calculates the so-called HV worth vector, introduced in Herrero & Villar (2012) to evaluate group performance with categorical data.

The HV worth vector provides an evaluation of the relative performance of a given number of groups whose outcomes are expressed in terms of a finite set of ordered characteristics. So the ingredients of the problem are: a population divided into a set of \(g\) groups whose outcomes are classified into \(s\) categories, ordered from best to worst.

The input you need to feed the application is the matrix \(A = \{ a_{ ij } \} \) of relative frequencies, where the generic entry \( a_{ij}, \) \( i = 1, 2, \dots, g, \) \(j = 1, 2, \dots, s,\) denotes the share of individuals of group \( i \) in category \(j\). That is, a matrix with \(g\) rows and \(s\) columns, all whose rows add up to one.

The output you get is a vector of \(g\) components, one for each group, which gives you the relative evaluation of the performance of the \(g\) groups (the HV worth vector).

What does the HV worth vector tell us?

To understand what the HV worth vector means consider the following ideas. We say that a group \(i\) dominates another group \(j\) when it is more likely that picking at random an individual from group \(i\) she belongs to a higher category than that of another individual randomly chosen from group \(j\). The probability that an individual from group \(i\) dominates another from group \(j\), \(p_{ij}\), is calculated as follows:

\(p_{ij} = a_{i1}(a_{j2} + \dots + a_{js}) + a_{i2}(a_{j3} + \dots + a_{js}) + \dots + a_{i,s-1}a_{js}\)
[1]

From here we can define the relative advantage of group \(i\) with respect to group \(j\), \(RA_{ij}\), as follows:

\(RA_{ij} = \frac{p_{ij}}{\sum_{k \neq i}{p_{ki}}}\)

The relative advantage of group \(i\) with respect to group \(j\) is nothing more than the probability that group \(i\) dominates group \(j\) divided by the sum of the probabilities that group \(i\) be dominated by some other group.

To obtain an overall evaluation of group \(i\) in society, we take a weighted sum of its relative advantages with respect to all other groups. That is, the relative advantage of the group \(i\) is given by:

\(RA_{i} = \sum_{j \neq i}{\lambda_{j}RA_{ij}}\)

Since the weights \(\lambda_{j}\) reflect the relevance of the different groups, it is only natural to choose them consistently with their own evaluation, ie: taking \(\lambda_{j} = RA_{j}\). In this way, each group enters the evaluation of the relative advantage of the others with the weight corresponding to its own relative advantage. This implies that we have to find a vector \(v = (v_{1}, v_{2}, \dots, v_{g}) > 0\) such that:

\(v_{i} = \sum_{j \neq i}{v_{j}RA_{ij}} = \frac{\sum_{j \neq i}{v_{j}p_{ij}}}{\sum_{k \neq i}{p_{ki}}}\)
[2]

Herrero & Villar (2012) prove that this vector always exists, is strictly positive and unique (once normalized and provided a technical condition, known as irreducibility of the domination probability matrix, is satisfied).

How is this vector obtained?

This vector can be easily calculated since it corresponds to the dominant eigenvector of the following matrix:

\(Q = \begin{bmatrix} g-1-\sum_{i \neq 1}{p_{i1}} & p_{12} & \dots & p_{1g} \\ p_{21} & g-1-\sum_{i \neq 2}{p_{i2}} & \dots & p_{2g} \\ \dots & \dots & \dots & \dots \\ p_{g1} & p_{g2} & \dots & g-1-\sum_{i \neq g}{p_{ig}} \end{bmatrix} \)
[3]

The off-diagonal elements of the \(Q\) matrix are the pair-wise dominance probabilities \(p_{ij}\). The elements on the diagonal tell us the probability that a randomly chosen individual from group \(i\) belongs to a category that is not worse than a randomly chosen individual from any other group. Is easy to see that the matrix \(Q\) is a Perron matrix whose columns add up to (\(g - 1\)). From this it follows the existence, positivity and uniqueness (when \(Q\) is irreducible) of the \(v\) vector whose components satisfy equation [2].

How to proceed?

Step 1. Construct the matrix \(A\) of relative frequencies as an Excel table, with groups in rows and categories in columns, checking that all rows add up to one and ordering the categories from best (first column) to worst (last column). If you fail in any of those steps you will not get what you are looking for.

Step 2. Once you have accepted the “conditions of use”, copy the body of that table (i.e. omitting the column with the names of the groups or the row with the labels of the categories) and plug it into the space provided by the application, indicating the number of rows and columns, as requested.

Step 3. Chose “Solve” and you get instantly the HV worth vector on the left hand side of the screen. Click on the “Copy outcomes” message and plug it back into the Excel spreadsheet. That’s all, you are done.

Remark.- Don’t forget that the worth vector is the eigenvector of a matrix and, as such, has a degree of freedom. So you can normalize the vector as it best suits you. The normalization provided by default is such that the sum of the elements of the HV worth vector is equal to \(g\) (the number of groups) so that the mean worth value is equal to 1.

An example

If you want to practice, we provide here an example that may help you getting used to handling the application. Consider the following matrix that describes the distribution of five groups into three categories:

Category A Category B Category C
Group 1 0.25 0.40 0.35
Group 2 0.15 0.50 0.35
Group 3 0.10 0.80 0.10
Group 4 0.30 0.30 0.40
Group 5 0.35 0.20 0.45

Write this table as a Excel file. Then copy the body of the resulting Excel table (without the first row and the first column where the denomination of the groups and categories appear) and plug it into the application, filling the entries regarding rows (5) and columns (3). Hit “Solve” and you should get the following:

10.97822783
20.781556944
31.308394467
40.969857408
50.961963352

Numbers on the left column refer to the groups, ordered as you ordered them in the original table. Then press “Copy outcomes”, go back to your Excel sheet and paste it. That’s it. You can always re-scale the values of the outcome vector according to your preferences.

Conditions of use

This is a free application that can be used for personal or institutional research. The only requirement is that of citing the sources:

Herrero C, Villar A (2013) On the Comparison of Group Performance with Categorical Data. PLoS ONE: e84784. doi:10.1371/journal.pone.0084784.

Comments to: villar@upo.es and cherreroblanco@gmail.com