Date: 01 Feb 93 22:51:51 CST
From: Jim Thomas <tk0jut2@mvs.cso.niu.edu>
Subject: File 3--How does the SPA Calculate Piracy?

The Software Protection Association (SPA) estimates that software
piracy has declined between 1989-91. But, says the SPA, piracy still
cost the industry over $1.2 billion in lost revenues in 1991.  Critics
argue that the piracy rate and its costs are grossly over-estimated.
The SPA believes that its estimates, while perhaps imperfect,
nonetheless are quite conservative and, if anything, significantly
underestimate the extent of software piracy.  Who's right?  How does
the SPA arrive at its estimates?  The information below comes from SPA
documents and from SPA David Tremblay, SPA's Research Director.

Identifying and counting behaviors that are normally hidden presents
several methodological problems.  Calculating the extent of piracy is
no exception. First, there is no victim in the traditional sense.
There are no snatched purses, dead bodies, empty bank accounts,
trashed computers, or other directly obvious signs of predation.
Therefore, we rarely have direct knowledge of an alleged "offense."
Second, the concepts used to define or measure an "offense" can pose
particular problems, because definitions are subject to imprecision.
Third, "victims" of piracy are often unaware that they are victims
until informed by someone who measures victimization, such as the SPA.

The "DARK FIGURE OF CRIME" is the knowledge-gap between crimes KNOWN
to have occured and crimes that ACTUALLY occured.  No existing
methodolgy can precisely measure this dark figure, and even the most
sophisticated provide only approximations.  It's therefore not
surprising that the SPA's attempts to measure the "dark figure of
piracy" face methodological problems.

The Methodology

Four sets of facts and an assumption underlie the SPA's methodology.
One set of facts is hardware sales from Dataquest, a marketing
research company in San Jose, Calif.  The calculations begin by
determining the number of Intel- and MacIntosh-based PCs sold during a
given year.

The second set of data derives from an SPA reporting program in which
about 150 of the generally larger companies report their unit sales
and revenue to the SPA.  The business applications sales are taken
from the report and used to estimate the the total unit sales of
software in the U.S. in a given year.  Operating systems are excluded.
The data do not constitute a random sample, but are based on voluntary
self-reporting of the participating companies. This method is common
in survey research and, if used with caution, the lack of randomness
or representativeness of the population surveyed need not be a
problem.

The third set of facts is the average number of applications that
users are estimated to have on their personal computers.  This body of
data comes from member research that is sent back to the SPA.  The
members obtain this information from several sources, including
surveys of their own customer base and from returned registration
cards. The SPA estimates that the typical DOS (or Intel-based) PC user
has three applications, and the typical MacIntosh user has five. One
reason that Mac users may have more than Intel-based users is the ease
of use and the cross-learning between different Mac programs that
reduces the learning curve and better-integrates the Mac programs with
each other.

The fourth datum is the average price for a software program in a
given year.  However, in calculating the total dollar volume of
revenues lost to piracy, David Tremblay indicates that "street value"
prices are factored in, rather than assuming that each program would
sell for market list price.

Finally, the methodology is based on the ASSUMPTION that all of the
units of software that are purchased in a calendar year are purchased
by or for use on PCS that are new that year. It assumes no application
sales to computers purchased in previous years.

These data are then plugged into a formula (figures are illustrative):

1. The PC hardware sales (in number of units) are multiplied by the
number of applications used. If there are 1 million Intel-based units
sold, and each has 3 commercial software applications (excluding the
operating system itself), we get a figure of 3 million.

2. The number of applications used is subtracted from the number of
applications purchased during that year. If 2.4 million applications
are sold, the difference is 600,000. This is assumed to be the number
of applications pirated.

3. The number of applications pirated is then multiplied by the
average cost of a software package, which has declined from $189 in
1989 to $152 in 1991.

David Tremblay candidly recognizes the methodological problems,
although he feels that, on balance, the problems understate rather
than overstate the level of piracy.  He recognizes several market
problems that could affect the estimates (the skewing directions are
my own):

1) Since 1989, the average price per software application has
decreased. This skews DOWNWARD the proportion of dollar losses from
year to year.

2) Hardware sales have been revised downward by Dataquest, which
reduces the base number of PCs on which piracy estimates are based.
This skews the piracy estimate UPWARD.

3) Contrary to the assumption of "no application sales to installed
base," there is evidence that an increasing percentage of software is
being sold for use on existing PCs.  This skews the piracy estimate
UPWARD.

There are additional problems. Among them:

1) The total software sales include sales of upgrades.  This would
seem to under-estimate the extent of illicit software, because it
over-estimates the base-figure of software sold.  For example, if 100
PCS are sold in a given year, and if each PC has an average of three
applications, we would expect 300 applications to be sold. If,
however, we find that only 270 applications are sold, the "piracy
score" would be 300-270= 30; 30/300 = .1, or ten percent. If upgrades
are included, and if 20 percent of sales are upgrades, that means
300-216 = 84; 84/300 = .28, or a 28 percent piracy rate.  Including
upgrades skews the piracy estimate DOWNWARD but the costs of piracy
UPWARD.

This, however, is misleading, because the base number of applications
is taken for *all* PCs, not just the PCs purchased in the first year.
There is no evidence to suggest that the number of applications on a
PC declines overtime. The evidence, as the SPA acknowledges, is the
opposite.  Hence, the base-figure of total applications (3) does not
give an accurate expectation of the expected number of software sales,
which would dramatically inflate the base of software sales.  Consider
this example: Person A purchases a computer and three software
programs in 1989. Person A purchases two more programs in 1990, and
one in 1991. Person B purchases a computer in 1991 and three
applications in 1991. Assuming that they are the only ones who
purchased software or hardware in 1991, the average number of
installed applications on a PC is 4.5. The number of software sales in
1991 is 4. An awkward percentage aside, The piracy score is .5 (half a
program, or 12.5 percent piracy rate). In reality, all applications
can be matched to sales, but the method's assumptions inflate the
score.  It's currently difficult to assess how severely inclusion of
installed applications on previously purchased computers exaggerates
the piracy figure. But, if the SPA's current piracy estimate of 20
percent is correct, even a small influence would produce a dramatic
inflation of the estimate.  The SPA's method of including all
installed applications in its base data, while restricting comparison
to only applications purchased in the most recent year, is to my mind
a fatal flaw.

In short, the applications on a PC include not only applications
purchased the first year, but also include all those collected in
subsequent years.  Further, even if upgrades are included (which would
push the piracy score DOWNWARD), the price of upgrades at street
prices is generally a fraction of cost for a program's first-purchase,
and failing to take this into account skews loss of revenue UPWARD.

2) A second problem involves the reliability (consistency) and validity
(accuracy) of reporting methods of company-generated data, especially
registration card data.  It cannot be assumed that the methodological
procedures of different reporting companies are either consistent
among themselves (which means they may not be reporting the same
things) or that their procedures are uniformly accurate. Differing
definitions of concepts, variations in means of tracking and recording
data, or differences in representative are but a few of the problems
affecting reliability and validity. This could skew estimates EITHER
upward or downward.

3) The value of lost revenue also is dramatically inflated by other
questionable assumptions.  For two reasons, it cannot be assumed that
every unpurchased program represents a lost sale.  First, there is no
evidence to support, and much evidence to challenge, the assumption
that if I did not possess a copy of dBase or Aldus Pagemaker
"borrowed" from my employer that I would purchase it. The ethics of
such borrowing aside, such an act simply does not represent nearly
$1,000 of lost revenue.  Second, as an actual example, I (and many
others at my university) have dBase and Word Perfect (and many other
programs) licitly installed on a home or office PC. These two programs
alone have a street value of about $700. I would include them as
"installed" programs in a survey.  However, I did not purchase either
program. Hence, they would not show up in sales statistics, and would
therefore be attributed to "piracy." But, I did not obtain them
illicitly. They were obtained under a site license and are installed
licitly.  Consider another example. When I purchased a PC in 1988, it
came (legitimately) loaded with two programs. I bought two more.  Now,
I have four legitimate programs loaded, but only two would show up in
normal sales figures. It would seem, from the statistics, that I had
two "pirated" programs--two purchased, two unpurchased, even though
there were none.  BOTH the piracy score and the lost revenue estimate
are skewed UPWARD.

Although the subject of a separate article, the SPA's method also
fails to consider the possibility that casual copying and sharing may
enhance rather than reduce sales by creating a "software culture" and
increasing the visibility and end-user facility with the products.  If
sales are increased, it would skew the lost revenues UPWARD.  Whatever
the result, this is an assumption that cannot be discarded without
strong empirical evidence.

These are just a few of the problems that inflate the overall picture
of piracy and why I cannot accept the figure given by the SPA as
accurate. And, if the piracy rate for 1991 is only about 20 percent
(and in decline), it would appear that--even if the problem is only
mildly inflated--the losses are far, far less (and the problem
therefore not as severe) as anti-piracy advocates claim.  Yet, despite
dramatic evidence of decline on a variety of key indicators, SPA
rhetoric, its advocacy for broader and more punitive legislation, and
its lucrative aggressive litigation campaigns continue to escalate.

A caveat: David Tremblay, the SPA Research Directory, makes no claims
about total accuracy. He is also aware of and quick to point out some
of the methodological problems. He would not agree with my view of at
least some of the problems, and perhaps has antidotes for others.  In
my own discussions with him, he was careful not to speak beyond the
data, and--like any good methodologist--approached the task of
calculating piracy as a puzzle. His own attitude, if I understood him
correctly, was that he's more than willing to modify the method with a
better procedure if one can be pointed out.  Perhaps I misunderstood
him, but I was continually left with the impression that his goal was
not to "prove" a preferred outcome, but to refine the data and method
to provide as accurate an estimate possible, whatever answer it might
provide. In short, he has no preconceived ideological ax to grind in
coming up with his figures.

It should be noted that if a different methodology were used, it is
quite possible that both the extent of piracy and the lost revenue
costs *could* be much higher than the SPA's estimates. However, at
stake is *this* methodology. Contrary to SPA claims, *this*
methodology appears to INFLATE the frequency and costs.

This, however, does not alter the fact that SPA press releases and
other material appear to manipulate the data to promote a distorted
image of piracy. We can agree that there are those who unethically
(and illegally) profit from piracy, and we can agree that if one uses
a commercial software program regularly, payment should be made. This
does not mean that we must also accept the dramatic image of rampant
piracy and multi-billion dollar revenue loss by casual "chippers."
Software piracy is, according to SPA data, in dramatic decline.
Evidence suggests that this decline is the result of education and
awareness, rather than coercive litigation.  At stake is not whether
we accept ripoff, but rather what we do about it. The statistical
method and its results do not seem sufficient to warrant increased
demands for tougher piracy laws or for expanding the law enforcement
attention to address what seems to be a declining problem.

If I am correct in judging that the SPA's estimate of piracy is
significantly inflated, then it seems that they are engaging in
hyperbole to justify its highly publicized litigation campaign.  Some
might find this a good thing. My own concern, however, is that the
litigation campaign is a revenue-generating enterprise that--to use
the SPA's own promotional literature--resembles a law unto itself,
more akin to a bounty hunter than a public-interest group. The SPA
appears to have an image problem, and the root of the image problem
lies in some critics see as speaking beyond the data in describing
piracy and in using the law to fill its coffers. It is unfortunate
that the many valuable things the SPA does are overshadowed by its
self-laudatory high-profile image as a private law enforcement agency.

The methodology underlies an ideological opposition not just to
intellectual property, but to human interaction and socal norms.  In
promoting a zero-tolerance attitude toward a strict definition of
"piracy" and rigid adherence to the limitations of shrinkwrap
licenses, the SPA would isolate the causal swapper and criminalize
along with major predators non-predators as well. As Richard Stallman,
a promoter of freeware, argues in the first issue of _Wired_ Magazine
(p. 34), violation of shrinkwrap is called piracy, but he views
sharing as being a "good neighbor:"

     I don't think that people should ever make promises not to
     share with their neighbor.

It's that gray area between being a good neighbor and crossing over
into unacceptable behavior that, to my mind, poses the dilemma over
which there is room for considerable honest intellectual disagreement.

Downloaded From P-80 International Information Systems 304-744-2253