Date: 01 Feb 93 22:51:51 CST From: Jim Thomas Subject: File 3--How does the SPA Calculate Piracy? The Software Protection Association (SPA) estimates that software piracy has declined between 1989-91. But, says the SPA, piracy still cost the industry over $1.2 billion in lost revenues in 1991. Critics argue that the piracy rate and its costs are grossly over-estimated. The SPA believes that its estimates, while perhaps imperfect, nonetheless are quite conservative and, if anything, significantly underestimate the extent of software piracy. Who's right? How does the SPA arrive at its estimates? The information below comes from SPA documents and from SPA David Tremblay, SPA's Research Director. Identifying and counting behaviors that are normally hidden presents several methodological problems. Calculating the extent of piracy is no exception. First, there is no victim in the traditional sense. There are no snatched purses, dead bodies, empty bank accounts, trashed computers, or other directly obvious signs of predation. Therefore, we rarely have direct knowledge of an alleged "offense." Second, the concepts used to define or measure an "offense" can pose particular problems, because definitions are subject to imprecision. Third, "victims" of piracy are often unaware that they are victims until informed by someone who measures victimization, such as the SPA. The "DARK FIGURE OF CRIME" is the knowledge-gap between crimes KNOWN to have occured and crimes that ACTUALLY occured. No existing methodolgy can precisely measure this dark figure, and even the most sophisticated provide only approximations. It's therefore not surprising that the SPA's attempts to measure the "dark figure of piracy" face methodological problems. The Methodology Four sets of facts and an assumption underlie the SPA's methodology. One set of facts is hardware sales from Dataquest, a marketing research company in San Jose, Calif. The calculations begin by determining the number of Intel- and MacIntosh-based PCs sold during a given year. The second set of data derives from an SPA reporting program in which about 150 of the generally larger companies report their unit sales and revenue to the SPA. The business applications sales are taken from the report and used to estimate the the total unit sales of software in the U.S. in a given year. Operating systems are excluded. The data do not constitute a random sample, but are based on voluntary self-reporting of the participating companies. This method is common in survey research and, if used with caution, the lack of randomness or representativeness of the population surveyed need not be a problem. The third set of facts is the average number of applications that users are estimated to have on their personal computers. This body of data comes from member research that is sent back to the SPA. The members obtain this information from several sources, including surveys of their own customer base and from returned registration cards. The SPA estimates that the typical DOS (or Intel-based) PC user has three applications, and the typical MacIntosh user has five. One reason that Mac users may have more than Intel-based users is the ease of use and the cross-learning between different Mac programs that reduces the learning curve and better-integrates the Mac programs with each other. The fourth datum is the average price for a software program in a given year. However, in calculating the total dollar volume of revenues lost to piracy, David Tremblay indicates that "street value" prices are factored in, rather than assuming that each program would sell for market list price. Finally, the methodology is based on the ASSUMPTION that all of the units of software that are purchased in a calendar year are purchased by or for use on PCS that are new that year. It assumes no application sales to computers purchased in previous years. These data are then plugged into a formula (figures are illustrative): 1. The PC hardware sales (in number of units) are multiplied by the number of applications used. If there are 1 million Intel-based units sold, and each has 3 commercial software applications (excluding the operating system itself), we get a figure of 3 million. 2. The number of applications used is subtracted from the number of applications purchased during that year. If 2.4 million applications are sold, the difference is 600,000. This is assumed to be the number of applications pirated. 3. The number of applications pirated is then multiplied by the average cost of a software package, which has declined from $189 in 1989 to $152 in 1991. David Tremblay candidly recognizes the methodological problems, although he feels that, on balance, the problems understate rather than overstate the level of piracy. He recognizes several market problems that could affect the estimates (the skewing directions are my own): 1) Since 1989, the average price per software application has decreased. This skews DOWNWARD the proportion of dollar losses from year to year. 2) Hardware sales have been revised downward by Dataquest, which reduces the base number of PCs on which piracy estimates are based. This skews the piracy estimate UPWARD. 3) Contrary to the assumption of "no application sales to installed base," there is evidence that an increasing percentage of software is being sold for use on existing PCs. This skews the piracy estimate UPWARD. There are additional problems. Among them: 1) The total software sales include sales of upgrades. This would seem to under-estimate the extent of illicit software, because it over-estimates the base-figure of software sold. For example, if 100 PCS are sold in a given year, and if each PC has an average of three applications, we would expect 300 applications to be sold. If, however, we find that only 270 applications are sold, the "piracy score" would be 300-270= 30; 30/300 = .1, or ten percent. If upgrades are included, and if 20 percent of sales are upgrades, that means 300-216 = 84; 84/300 = .28, or a 28 percent piracy rate. Including upgrades skews the piracy estimate DOWNWARD but the costs of piracy UPWARD. This, however, is misleading, because the base number of applications is taken for *all* PCs, not just the PCs purchased in the first year. There is no evidence to suggest that the number of applications on a PC declines overtime. The evidence, as the SPA acknowledges, is the opposite. Hence, the base-figure of total applications (3) does not give an accurate expectation of the expected number of software sales, which would dramatically inflate the base of software sales. Consider this example: Person A purchases a computer and three software programs in 1989. Person A purchases two more programs in 1990, and one in 1991. Person B purchases a computer in 1991 and three applications in 1991. Assuming that they are the only ones who purchased software or hardware in 1991, the average number of installed applications on a PC is 4.5. The number of software sales in 1991 is 4. An awkward percentage aside, The piracy score is .5 (half a program, or 12.5 percent piracy rate). In reality, all applications can be matched to sales, but the method's assumptions inflate the score. It's currently difficult to assess how severely inclusion of installed applications on previously purchased computers exaggerates the piracy figure. But, if the SPA's current piracy estimate of 20 percent is correct, even a small influence would produce a dramatic inflation of the estimate. The SPA's method of including all installed applications in its base data, while restricting comparison to only applications purchased in the most recent year, is to my mind a fatal flaw. In short, the applications on a PC include not only applications purchased the first year, but also include all those collected in subsequent years. Further, even if upgrades are included (which would push the piracy score DOWNWARD), the price of upgrades at street prices is generally a fraction of cost for a program's first-purchase, and failing to take this into account skews loss of revenue UPWARD. 2) A second problem involves the reliability (consistency) and validity (accuracy) of reporting methods of company-generated data, especially registration card data. It cannot be assumed that the methodological procedures of different reporting companies are either consistent among themselves (which means they may not be reporting the same things) or that their procedures are uniformly accurate. Differing definitions of concepts, variations in means of tracking and recording data, or differences in representative are but a few of the problems affecting reliability and validity. This could skew estimates EITHER upward or downward. 3) The value of lost revenue also is dramatically inflated by other questionable assumptions. For two reasons, it cannot be assumed that every unpurchased program represents a lost sale. First, there is no evidence to support, and much evidence to challenge, the assumption that if I did not possess a copy of dBase or Aldus Pagemaker "borrowed" from my employer that I would purchase it. The ethics of such borrowing aside, such an act simply does not represent nearly $1,000 of lost revenue. Second, as an actual example, I (and many others at my university) have dBase and Word Perfect (and many other programs) licitly installed on a home or office PC. These two programs alone have a street value of about $700. I would include them as "installed" programs in a survey. However, I did not purchase either program. Hence, they would not show up in sales statistics, and would therefore be attributed to "piracy." But, I did not obtain them illicitly. They were obtained under a site license and are installed licitly. Consider another example. When I purchased a PC in 1988, it came (legitimately) loaded with two programs. I bought two more. Now, I have four legitimate programs loaded, but only two would show up in normal sales figures. It would seem, from the statistics, that I had two "pirated" programs--two purchased, two unpurchased, even though there were none. BOTH the piracy score and the lost revenue estimate are skewed UPWARD. Although the subject of a separate article, the SPA's method also fails to consider the possibility that casual copying and sharing may enhance rather than reduce sales by creating a "software culture" and increasing the visibility and end-user facility with the products. If sales are increased, it would skew the lost revenues UPWARD. Whatever the result, this is an assumption that cannot be discarded without strong empirical evidence. These are just a few of the problems that inflate the overall picture of piracy and why I cannot accept the figure given by the SPA as accurate. And, if the piracy rate for 1991 is only about 20 percent (and in decline), it would appear that--even if the problem is only mildly inflated--the losses are far, far less (and the problem therefore not as severe) as anti-piracy advocates claim. Yet, despite dramatic evidence of decline on a variety of key indicators, SPA rhetoric, its advocacy for broader and more punitive legislation, and its lucrative aggressive litigation campaigns continue to escalate. A caveat: David Tremblay, the SPA Research Directory, makes no claims about total accuracy. He is also aware of and quick to point out some of the methodological problems. He would not agree with my view of at least some of the problems, and perhaps has antidotes for others. In my own discussions with him, he was careful not to speak beyond the data, and--like any good methodologist--approached the task of calculating piracy as a puzzle. His own attitude, if I understood him correctly, was that he's more than willing to modify the method with a better procedure if one can be pointed out. Perhaps I misunderstood him, but I was continually left with the impression that his goal was not to "prove" a preferred outcome, but to refine the data and method to provide as accurate an estimate possible, whatever answer it might provide. In short, he has no preconceived ideological ax to grind in coming up with his figures. It should be noted that if a different methodology were used, it is quite possible that both the extent of piracy and the lost revenue costs *could* be much higher than the SPA's estimates. However, at stake is *this* methodology. Contrary to SPA claims, *this* methodology appears to INFLATE the frequency and costs. This, however, does not alter the fact that SPA press releases and other material appear to manipulate the data to promote a distorted image of piracy. We can agree that there are those who unethically (and illegally) profit from piracy, and we can agree that if one uses a commercial software program regularly, payment should be made. This does not mean that we must also accept the dramatic image of rampant piracy and multi-billion dollar revenue loss by casual "chippers." Software piracy is, according to SPA data, in dramatic decline. Evidence suggests that this decline is the result of education and awareness, rather than coercive litigation. At stake is not whether we accept ripoff, but rather what we do about it. The statistical method and its results do not seem sufficient to warrant increased demands for tougher piracy laws or for expanding the law enforcement attention to address what seems to be a declining problem. If I am correct in judging that the SPA's estimate of piracy is significantly inflated, then it seems that they are engaging in hyperbole to justify its highly publicized litigation campaign. Some might find this a good thing. My own concern, however, is that the litigation campaign is a revenue-generating enterprise that--to use the SPA's own promotional literature--resembles a law unto itself, more akin to a bounty hunter than a public-interest group. The SPA appears to have an image problem, and the root of the image problem lies in some critics see as speaking beyond the data in describing piracy and in using the law to fill its coffers. It is unfortunate that the many valuable things the SPA does are overshadowed by its self-laudatory high-profile image as a private law enforcement agency. The methodology underlies an ideological opposition not just to intellectual property, but to human interaction and socal norms. In promoting a zero-tolerance attitude toward a strict definition of "piracy" and rigid adherence to the limitations of shrinkwrap licenses, the SPA would isolate the causal swapper and criminalize along with major predators non-predators as well. As Richard Stallman, a promoter of freeware, argues in the first issue of _Wired_ Magazine (p. 34), violation of shrinkwrap is called piracy, but he views sharing as being a "good neighbor:" I don't think that people should ever make promises not to share with their neighbor. It's that gray area between being a good neighbor and crossing over into unacceptable behavior that, to my mind, poses the dilemma over which there is room for considerable honest intellectual disagreement. Downloaded From P-80 International Information Systems 304-744-2253