Vincents for Individuals
Vincents for SME
Vincents for Corporate
Vincents for Government
Vincents for NFP

Want to know more?

Contact Us

Sampling represents an efficient method for estimating the total value of damages for all group members. Given the importance of sampling, a question that often arises is: What should the sample size be? This article addresses the common misconception that determining an appropriate sample size is a necessary and sufficient condition for estimating damages across the class. This view oversimplifies the damages estimation process and overlooks more fundamental sample design and analysis decisions.

What is Sampling?

Sampling is a well-established and flexible tool for producing information about large populations based on the analysis of a portion of individuals in the population. The process of sampling effectively splits the target population into two groups,

i. The sample group for which information is gathered; and
ii. An extrapolation group for which information is not known but for which information from the sample group is applied.

The specific piece of information that is produced through the sampling process is called a sample statistic. In the case of a class action, the sample statistic might be average damages value (or average damages value for a specific subset of the class).

Sampling can be an efficient data analytics tool, especially when there is a need to analyse multiple data points that are prohibitively expensive or time consuming to analyse for each group member. When samples are properly designed, executed, and analysed, they can provide reliable inferences about a target population. However, poor sampling design and execution can lead to unreliable evidence, contestable conclusions, and costly mistakes.

The following table lists several causes of errors that can impact the reliability of sample statistics.

As the table shows, sample size is just one consideration when designing a sampling methodology and is not a substitute for poor sample design. The sampling error refers to the difference between the damages estimate obtained from the sample and the damages value of the ‘true’ population that would be calculated if information was available for the entire target population.

Non-sampling errors arise from the process of drawing samples from a target population. These are errors that stem from missing data, flawed survey design, human biases, and data processing errors. Sometimes these errors occur randomly, and their impact may cancel out as the sample size increases. For example, while individual response rates to surveys might be low, they may occur randomly across all subsets of group members. In contrast, some errors can skew results in only one direction and therefore accumulate over the entire sample. For example, individual response rates may be low especially among certain types of group members. In this case, the group members with low response rates will be underrepresented.

Population

Population definition is an important step in the sampling process. Class action damages estimation represents a type of research problem in which the objective is to estimate the loss suffered across a group of affected individuals called the target population.

There are two key dimensions that describe the target population:

i. The population boundary that is set by the criteria that determine who incurred damages; and
ii. The population composition that refers to how similar the individuals are in the population.

A population that is diverse may exhibit variability in the magnitude of individual damages. Not all population characteristics are necessarily relevant to understanding the distribution of damages. For example, the target population may be diverse with respect to age cohorts, but this may not be associated with differences in damages amounts across the class. Moreover, not all population characteristics may be observable. In fact, information about a target population and its characteristics may not be readily available at the outset.

Understanding the nature of the harm is a useful starting point for developing initial hypotheses about the target population. Independent experts and third-party published research can provide insights into the mechanisms and relationships between the alleged (or similar) actions and harm. Information from stakeholders in the form of company disclosures, regulatory documents, and customer complaints can also provide insights into the group size and relevant characteristics. Soliciting information through surveys is another approach.

Data Gathering and Analysis

Sampling requires a source of data (called a sampling frame) from which information about the target population can be drawn. A classic example is phone listings that might serve as a sampling frame for the entire population of a city. Sampling frames may not align perfectly with the target population and can impact the reliability of the resulting sample. Therefore, an important design step is to determine whether there are differences between the individuals included in, and excluded from, the sampling frame (e.g., in the classic phone book example, those members of the population without a phone would not be represented).

Where the target population contains sub-groups of members who need to be considered separately, different sampling frames may be required across the sub-groups. Some sampling frames may result in efficient data collection (e.g., individuals who are able to fill out online questionnaires) that can support large sample sizes at low cost. Other sampling frames may require a more time-intensive and costly data gathering process that can place practical limits on the sample size.

Sampling also requires that sampled individuals provide information that is relevant to the research question. Collected information can be unreliable for several reasons. For example, questionnaire designers may unintentionally omit relevant questions that are important to understanding differences in damages or not anticipate that some questions may be leading, misleading, or unclear. In addition, using multiple interviewers with different levels of training may lead to unreliable results. Respondents may also misinterpret a question, refuse to answer a question, misremember facts, or provide inaccurate information. Moreover, errors in data processing and interpretation can occur. These sources of error can all work to undermine the usefulness of samples even when sample sizes are very large.

Sample Size Determinants

Sample size is about managing trade-offs. As the sample size increases the more confident one can generally be that the statistic produced from the sample data will reflect the true value of the entire population. However, large sample sizes impose economic and time costs sometimes with diminishing returns in terms of statistical confidence.

Sample size in class actions is often ‘determined’ based on rules of thumb or practical considerations. For example, the sample size may be constrained by the availability of funds, time, and access to individuals in a class. From a conceptual standpoint, the appropriate sample size can be calculated based on three main factors:

i. The variance of the target population;
ii. The margin of error;
iii. The confidence level.

In essence, practical constraints set an upper bound on the sample size while accuracy requirements inform the lower bound. The lower bound sample size will be larger if the target population is diverse with respect to damages and if the estimate of damages needs to be precise.

Types of Sampling

Sampling can be conducted using different methods including simple random sampling and stratified sampling.

The most widely used type of sampling is simple random sampling where every individual in the target population has the same probability of selection. Simple random sampling is often performed when little is known about the population’s characteristics. The approach is also intuitive to understand and relatively easy to implement. The main limitation of simple random sampling is that it does not guarantee that any sub-group or type of person in the population will appear in the sample. This can be problematic when the target population consists of many different sub-groups that have incurred different magnitudes or types of damage. While this problem can be addressed by increasing the sample size, other sampling methods can lead to more reliable results and may require a smaller sample size.

Stratified sampling involves dividing the target population into multiple sub-groups based on ‘relevant’ and observable individual characteristics. Relevant characteristics in the context of class actions are those that are associated with similar types of damage or similar magnitudes of damage. In stratified sampling, population sub-groups are separately sampled to create group specific sample statistics. In general, the grouping of similar individuals through stratification lowers the variance within each sub-group. This lower variance supports a lower sample size for that sub-group. However, the relationship between sample size and population size can be more complex when using stratified sampling. The optimal sample size may depend on the size and variability of the sub-groups within the population.

There are different types of stratified sampling. The first type of stratified sampling is called proportionate stratification. Proportionate sampling requires the sample size for each subgroup to be based on the relative proportion of that sub-group in the target population.2 The second type of stratified sampling is called disproportionate stratification. Disproportionate stratification does not require the sample size for each sub-group to be based on the relative proportion of that sub-group in the population. Disproportionate stratification has several advantages. One advantage is cost savings due to the smaller sample size.3 Another advantage of disproportionate stratification is that it allows for the sampling of unrepresented yet important individuals (from a damages perspective) in the target population.4 The greater the differences among the sub-groups, the greater the potential for an incorrect assessment of the overall damages if samples from different sub-groups are not taken. Therefore, investing the time in stratified sampling can provide increased precision and a more accurate assessment of damages.

Variance

Variance is a measure of the variability of a target population and an important consideration with respect to modelling damages. To maintain a precise estimate of a given sample statistic, the sample size needs to increase as the variance of the population increases. Only a small sample is required if the population is generally similar with respect to the variable of interest. For example, predicting the average age of a child in grade 10 requires a smaller sample size than predicting the average age of a worker at a firm.

In practice, the variance of a population with respect to damages will be unknown. However, the variance can be estimated using a variety of approaches. One approach is to rely on variances based on similar situations or past studies. Another method is to analyse the variability of observable and relevant characteristics that may be correlated with damages. Finally, a sample variance of damages can be estimated based on a sample of individuals.

Margin of Error

The margin of error is a measure of the sample statistic’s precision. A larger margin of error indicates that the estimate is less precise, while a smaller margin of error indicates that the estimate is more precise. When a different sample, with a different statistic, is chosen at random, the value of the sample statistic may change. The margin of error represents the range of uncertainty surrounding the sample statistic. The margin of error decreases as the sample size increases.

The importance placed on precision will influence the choice of margin of error. A precise sample statistic will require a low margin of error generated using a larger sample size. Low margins of error may be desirable in class actions since they convey precision and therefore the associated damages estimates are more likely to be perceived as reliable.

Confidence Level

The confidence level is a measure of the sample statistic’s reliability. It refers to the number of samples within a population that include a true population parameter. For example, a confidence level of 95% indicates that in 95 samples out of 100, one would expect to calculate a sample mean that would be within the margin of error.

Conclusion

While sampling can serve as an efficient analytical tool for estimating damages, it is only reliable if it is carefully designed. Sampling design decisions and sample size decisions are intertwined. Thus, understanding these relationships represents an important step in designing sampling approaches that can produce reliable damages estimates. From a conceptual standpoint, sample size will be influenced by the purpose of the sample statistic, the characteristics of the population, and the sample design. Stratified sampling, which requires information about the population’s characteristics, can lead to a lower sample size in aggregate relative to simple random sampling when applied to a diverse population.

2. For example, suppose the target population is stratified by occupation type, with 80% of the population comprising workers and 20% comprising supervisors. An overall sample size of 100 would be split so that 80 individuals would be drawn from the worker sub-group and 20 individuals would be drawn from the supervisor sub-group.
3. For example, a small percentage of the worker sub-group may be sampled on the basis that (i) a higher margin of error may be more tolerable for a group that had lower damages potential and/or (ii) the worker sub-group contained individuals who were very similar and had a lower variance.
4. For example, for a small sub-group with high damages, the sample size may include the entire sub-group.

Disclaimer: The content of this article is general in nature and is presented for informative purposes. It is not intended to constitute tax, financial or legal advice, whether general or personal nor is it intended to imply any recommendation or opinion about a financial or legal product. It does not take into consideration your personal situation and may not be relevant to circumstances. Before taking any action, consider your own particular circumstances and seek professional advice. This content is protected by copyright laws and various other intellectual property laws. It is not to be modified, reproduced or republished without prior written consent.

Sign up to get access to Vincents Insights