Criminal Justice Section
Criminal Justice Magazine
Volume 16, Issue 2
Sex Offender Civil Commitments: Scientists or Psychics?
By Donna Cropp Bechman
In June 1997, the U.S. Supr eme Court gave its imprimatur to state laws that seek to involuntarily commit sex offenders to mental institutions in lieu of releasing them at the conclusion of their prison sentences. In a five to four decision, the Court in Kansas v. Hendricks, 521 U.S. 346, 358 (1997), upheld the state's Sexually Violent Predator Act against constitutional challenges, characterizing the Act as civil in nature and oriented toward providing treatment to those offenders "who suffer from a volitional impairment rendering them dangerous beyond their control." The decision makes arguments under substantive due process, double jeopardy, and ex post facto theories unavailable, despite the fact that many of the state laws would confine offenders indefinitely. As a result, Sexually Violent Predator (SVP) Acts-known in some jurisdictions as Sexually Violent Persons Acts-have been implemented in about 25 percent of the states, giving rise to the question: Can we determine with an appropriate degree of certainty what an individual will or will not do in the future?
How likely is "likely"?
With the exception of characterizing the SVP Acts as civil in nature, there is little consistency among the states with respect to defining terms, establishing evidentiary burdens, and providing treatment options. Similarly, although these Acts require a finding of likelihood of reoffense, different jurisdictions define the term as "substantially likely," "more likely than not," and "much more likely than not." Regardless of the definition used, each SVP Act seeks in some form and to some degree a quantification of the probability of recidivism.
Predictions of future behavior, including risk of recidivism (also known as risk assessment), have long been a part of psychological science. And expert testimony regarding an offender's risk to reoffend has historically been admissible in the courtroom. (See, e.g., Barefoot v. Estelle, 463 U.S. 880, 894-906 (1983).) In the past, mental health professionals used clinical assessment (also referred to as unaided clinical judgment), a largely subjective approach upon which risk predictions were derived from personal interviews, data obtained from psychological tests, and historical information applied against the clinician's background of education, training, and experience. A more recent approach to risk assessment, referred to as guided (or structured) clinical assessment, involves a consideration of the clinically derived information viewed in light of the presence or absence of elements known as risk factors that have been identified in scientific studies as contributing to the risk of recidivism.
Both of these approaches result in an assessment of relative risk, which is the risk in relation to the average risk posed by other offenders. Experts testifying to relative risk will ordinarily explain the risk using phrases that range from "very low risk" to "very high risk." Testimony, for example, that, "compared to others who are at risk to reoffend, this person poses a low to moderate risk of recidivism," is an assessment of relative risk.
Neither unaided clinical judgment nor guided clinical judgment was designed to answer the question presented by most SVP Acts: Just how likely is this person to reoffend? As a result, the psychological, psychiatric, and legal communities have begun to explore the possibility of quantifying risk, an exploration that has deeply divided scientists and professionals in this area. Some call it junk science while others claim to have discovered a scientifically reliable method of predicting recidivism. Both sides are referring to the use of risk assessment instruments (RAIs).
RAIs: quantifying the risk
The term "risk assessment instrument" refers to the use of actuarial, or statistically-based, profiles to provide a quantification of risk, or to predict the probability of absolute risk, as opposed to providing a relative risk assessment. In its simplest form, an RAI is the compilation of a series of historical or "static" risk factors that have been numerically weighted according to the degree of significance that factor is believed to have to the likelihood of reoffending. For example, in the scientific literature, follow-up studies of sex offenders who have committed new sex offenses after release from confinement show they have certain factors in common, such as number of prior convictions and sex of the victim, that are considered to be factors contributing to risk. Scientists have studied and reported on a variety of identifiable risk factors that include the relationship of the victim to the offender, marital history, and age of the offender at the time of first offense. These historical factors-said to be static because they happened in the past and generally are not subject to change-are weighted numerically according to perceived impact on recidivism, and, through mathematical formulae, result in a statistical profile that is then proffered as a calculation of likelihood over a given period of time of reoffending. For example, a score of 4 on one RAI corresponds to a 48.6 percent likelihood of reoffending over a 10-year period. This is based on studies upon which this RAI was constructed that show that, of offenders in a past study who possessed risk factors that would have resulted in a score of 4 on this instrument, 48.6 percent of them reoffended within 10 years.
The great debate: Is it science?
Undeniably, sex offenders are viewed by the general public as morally reprehensible and dangerous human beings, and the availability of scientific instruments that can tell us with precision and in advance who among them are going to reoffend is an enticing concept. Although far from perfect, RAIs are characterized by some professionals as a decided improvement over predictions based upon the unaided clinical judgment, and some SVP states even mandate their use in commitment evaluations. RAIs have been described as "the most exciting and potentially most useful new developments" in the sex offender risk assessment arena. (Richard Hamill, Recidivism of Sex Offenders: What You Need to Know, 15 (No. 4) CRIM. JUST. 24 (Winter 2001).) Expert testimony based in whole or in part upon scores obtained on RAIs has been routinely (and often without legal challenge) admitted in SVP commitment proceedings across the country.
But there is another viewpoint expressed in the scientific journals as well as in the courtrooms. It is the viewpoint of scientists, clinicians, and other professionals who are stating rather persuasively what has long been obvious to many who have closely studied this issue: that not even the best science can predict with an acceptable degree of accuracy what an individual will or will not do in the future. Perhaps even more importantly, there is increasing support for the position that, not only are RAIs not the "best science," they are not science at all.
The five fatal flaws of RAIs
The RAIs most commonly relied upon in SVP commitments certainly appear to carry the aura of legitimate scientific theory, and even the names of these instruments seem to imply that someone somewhere has definitively established their usefulness in this area. Reference is often made to the Violence Risk Appraisal Guide (VRAG), one of the earliest RAIs, as well as to the Rapid Risk Assessment for Sexual Offense Recidivism (RRASOR), the Sex Offender Risk Appraisal Guide (SORAG), and the Minnesota Sex Offender Screening Tool-Revised (MnSOST-R), all among the most commonly used instruments in SVP evaluations.
No two RAIs are exactly alike, and the number of risk factors to be evaluated range from four on the RRASOR to 21 on the MnSOST-R. As to the various risk factors that comprise the instruments, there is general agreement in the scientific and professional communities that a consideration of appropriate risk factors is important (and even necessary) in making risk assessments. The suggestion, however, that the quantifications provided by RAIs are accurate enough to allow decisions to be made about individual risk in SVP proceedings is a suggestion that is garnering substantial opposition. Opponents commonly point to five major problem areas with respect to the use of RAIs in SVP commitment evaluations.
- The dynamic dilemma. One commonly referred to RAI is known as the Static-99, so called because it was developed in 1999 and as an acknowledgment that RAIs traditionally apply a mathematical weighting of various "static" risk factors-those historical factors that will never change-and exclude "dynamic" factors-those that are subject to change, or have yet to occur. Perhaps the situation that best illustrates how dramatically dynamic factors can affect recidivism is that in which an offender exhibits an unequivocal intention and ability to commit a sex offense at the earliest opportunity. Surely, that individual can be considered with fairly sufficient accuracy to be at relatively high risk to reoffend, yet none of the RAIs include an offender's expressed intentions as a risk factor.
Similarly, all but one of the most commonly used RAIs completely ignore the relationship of active participation in sex offender treatment to an individual's risk of recidivism, and none of these RAIs give numeric weight to factors that are considered to be "protective" or to reduce risk. For instance, the scientific literature suggests that sex offenders who develop healthy relationships with sexually appropriate partners may be at a reduced risk to reoffend. The same can be said of those offenders who successfully avail themselves of pharmacological treatment options. Being placed in a supervised setting, such as lifetime probation, abstaining from drugs or alcohol, developing a debilitating illness, and having strong support from families and friends are all potential dynamic factors, or protective characteristics, that may significantly lower risk. Because they cannot be determined from a review of the historical data, however, and, in general, may be unknown or even unknowable as of the date of the evaluation, they are not taken into consideration when quantifying risk using an RAI. There is little dispute that a consideration of dynamic factors can be important in accurately assessing risk, and even proponents of RAIs concede that the failure of the instruments to address dynamic factors negatively impacts the comprehensiveness of the instruments.
- Peer review studies or fugitive literature? Much discussion and great dissension surrounds the availability of published, peer-reviewed articles describing in detail the construction, or development, of the RAIs. Referred to as "construction studies," such articles provide detailed information about the background, technical data, and procedures from which a scientific theory or technique has been promulgated. Proponents of the use of RAIs in SVP proceedings often rely upon materials submitted at conferences or data elicited during conference presentations, as well as information circulated via the Internet and various websites, as support for the developmental integrity of many of the instruments. The criticism of this type of data, which is referred to as fugitive literature, is that articles derived from such sources have not been submitted for nor subjected to the peer review process and, therefore, lack the authority of data that have been held up to an appropriate level of scientific scrutiny. With very limited exceptions, detailed information establishing the protocol of the construction studies and adequately documenting the technical development of the most commonly used RAIs has not been sufficiently presented for peer review by the relevant scientific community.
- An unknown margin of error. No matter who you ask, there is no one, not even the developers of the most widely used RAIs, who claims 100 percent accuracy of the risk quantifications derived from the application of the instruments. Even Dr. R. Karl Hanson, affiliated with the development of the RRASOR and the Static-99, asserts that the instruments are believed to be of only "modest" accuracy. There is, therefore, no dispute as to the existence of the possibility of error each time an RAI is used to predict percentage of risk. The query then becomes: What is the margin of error, or "error rate," for each RAI, and is it an acceptably low one? The short answer to that question is that it probably can never be answered at all.
Although published studies have attempted to ascertain approximate error rates for these instruments, many scientists contend that true error rates can never be determined with appropriate accuracy, in part due to the concept referred to as "base rates." Simply put, a base rate refers to the frequency with which an event occurs within a particular population, and the accuracy of any predictive or screening test is dependent upon the base rate of the event being screened for or predicted. All of the published studies upon which the RAIs are constructed contain established base rates for the populations studied. But the base rate of recidivism among the population that includes an individual offender being evaluated for an SVP commitment is unknown and unknowable, and, therefore, so is the error rate. In this respect, these instruments are said to be "postdictive" rather than "predictive," in that they look back at a group of offenders and attempt to make estimates of future risk based upon observations made with regard to historical samples. While base rates can be, and have been, determined as to the historical sample, they cannot be absolutely ascertained as to the population that includes the offender being evaluated. In response to this very legitimate concern, attempts have been made to establish the error rates of some RAIs by calculating different estimated base rates, and these estimates may well have their place in assessments that evaluate relative risk. But opponents of the use of RAIs to quantify future risk in SVP commitment proceedings often refer to these attempts as merely educated guesses, and point to the sophistry inherent in attempting to ascertain the accuracy of tests being utilized to make absolute-risk predictions about the future.
- Cross-validation studies. The term cross-validation refers to the process whereby the result of one study or test can be replicated on a subsequent, separate population, thereby establishing the integrity of the study or test. Scientific-that is, peer- reviewed-literature does not demonstrate cross-validation of the RAIs. Although proponents of the instruments often cite conference abstracts or other fugitive literature, scientists maintain that the accuracy of a predictive test cannot be established without scientifically scrutinized studies using appropriate cross-validation samples.
Associated with the concept of "generalizability," cross-validation can confirm accuracy of a given test if, used in different contexts on different populations, it "generalizes" well, or replicates its results closely. But failure to scientifically cross-validate RAIs can be fatal because characteristics of offender populations can vary dramatically, and an RAI constructed on one population may not generalize, or cross over, to a different population. The instrument known as the VRAG, for example, which contains a weighted consideration of 14 designated risk factors, was developed in 1994 to assess violent recidivism in a population, or "cohort," of mentally disordered offenders released from a maximum security psychiatric hospital in northern Ontario in the 1970s. The generalizability of the VRAG to other populations, such as pedophiles in the western United States in the year 2001, has not been empirically established, and, in fact, the limited data available suggests that the VRAG has an unacceptable accuracy rate in terms of assessment of sex offense recidivism. The problem presented by population demographics, or cohort characteristics, is vast and potentially insurmountable: Does an RAI derived from a study comprised of a cohort that included convicted rapists generalize to those offenders diagnosed with pedophilia? Is an instrument that excluded from its construction sample noncoercive incest offenders useful in evaluating risk in offenders with a history of noncoercive incest? Elusive as the answers to these questions may be, many scientists and professionals assert that without appropriate, scientifically scrutinized cross-validation, absolute risk predictions based upon RAI scores are meaningless.
In a related argument, many scientists and mental health professionals seriously question whether statistics derived from populations can be said to have predictive value with respect to individual future behavior. SVP commitment proceedings do not ask whether this individual possesses risk factors significantly similar to a group of offenders who reoffended at a rate of, for instance, 60 percent over a 10-year period. The stated goal of most SVP Acts is to determine the likelihood that a particular offender will reoffend, an objective that is not reached with statistical profile evidence. To the extent that postdictive group-risk data are being translated into a prediction of an individual's risk of recidivism, many would argue that RAI quantifications are irrelevant and should not be admitted at all in SVP commitment proceedings.
- Inter-rater reliability. When the same testing procedure is applied to the same person by different clinicians (or raters), any significant disparity among the results gives rise to concerns over the accuracy of the test instrument itself. This concept, known as "inter-rater reliability" is an important part of the development of any testing procedure, including RAIs, yet little is published in peer review literature that establishes whether or not these instruments have been appropriately evaluated for reliability when used by different raters in the field.
Although the concept of assigning a weighted score to a risk factor, adding up the numbers, and achieving a quantification provided by mathematical calculation may sound fairly objective in nature, and, therefore, not easily the subject of inconsistencies among raters, nothing could be farther from the truth. Even the very definition of "offense" differs from instrument to instrument. Where one instrument intends by the use of the word "offense" to have the rater take into account offenses charged regardless of conviction, another instrument defines the word "offense" as only that which resulted in conviction. And is a conviction for an attempted sex offense sufficient? What if the conviction is for burglary, but the underlying intent was to commit a rape? If the conviction was later overturned on appeal due to error in the trial court, is it still a conviction? Are all raters in the field answering these questions consistently for each instrument? Are all raters aware that most instruments do not intend to include in their calculations self-reported offenses that never result in a charge or conviction?
The rules for applying the appropriate definition to a particular risk factor, called "coding rules," are set forth for some instruments in informally circulated materials, as well as in documents that are available over the Internet. But even a set of coding rules, assuming the rater has them and uses them, can be so vague and confusing as to lead to a largely subjective interpretation. For instance, an oft-cited risk factor is the use of force or the threatened use of force. At least one instrument intends that this factor be counted if the victim was vulnerable due to intoxication. But whether or not all raters using that instrument are aware that victim intoxication is included in the definition of force is certainly questionable. And the decision as to whether to conclude that a victim was rendered vulnerable due to intoxication is often a subjective one. Similarly, some instruments inquire into the relationship between the offender and the victim, asking whether the victim was a stranger. But different instruments purport to define the word differently. Is the little girl who lives in the same apartment building as the offender a stranger? What about the child who is a nephew of a friend of the offender? One instrument defines a victim as a stranger if the offender had known the victim for less than 24 hours-an arbitrary definition, and, like many of the risk factors listed on the various RAIs, one subject to differing interpretations among raters.
Some clinicians and professionals using RAIs have complicated the dangers of the inter-rater reliability problem even more by adjusting the score obtained on the RAIs. Referred to as "adjusted actuarials," this highly criticized hybrid procedure involves using subjective criteria (i.e., clinical judgment) to alter the scores or percentages achieved on an instrument. A valid argument is made that if, as some professionals argue, RAIs are scientifically based procedures derived from documented studies and approved for use by the scientific community, changing the numbers according to some subjective criteria renders the instruments worthless. To date, there are no scientific studies that establish the accuracy of the adjusted actuarial approach, and many scientists regard the concept with great caution and skepticism.
Admissibility of RAIs
The many flaws inherent in the construction and application of the RAIs, and the intense debate over the appropriateness of using scores derived from the instruments at sex offender commitment proceedings, has created substantial controversy in both the psychological and legal communities. It is true that many courts are admitting RAI evidence without challenge, and the majority of the courts that have considered this issue have found that RAIs are accurate enough to meet evidentiary hurdles. Indeed, courts in at least one SVP state have ruled that questions as to accuracy of the instruments are more properly directed to the weight of the evidence rather than its admissibility. Nevertheless, a growing number of courts are evaluating the admissibility of such evidence under a Frye or Daubert analysis, or some combination thereof, and precluding expert opinions based in whole or in part on quantification provided by RAIs.
The legal challenge to the admissibility of RAI evidence focuses on whether these instruments are accurate enough to have achieved acceptance in the relevant scientific community for the purpose of predicting future behavior in SVP commitment proceedings. Acceptance of a scientific technique, procedure, or theory is the threshold question for admissibility of this evidence in states that follow Frye v. United States, 293 F. 1013 (D.C. Cir. 1923), and is one of the factors to be considered in jurisdictions that adhere to the ruling in Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993).
Of course, identification of the "relevant" scientific community is crucial in either a Frye or Daubert analysis; we don't ask only polygraphers, for example, if polygraph tests are accepted as sufficiently accurate in the scientific community. Similarly, the "relevant scientific community" for purposes of analyzing RAIs under Frye or Daubert is not only comprised of the clinicians who administer the instruments in SVP evaluations, but is more broadly defined to include the scientists and forensic psychologists who study them.
And, increasingly, these scientists and mental health professionals who study psychological testing and risk assessment are speaking out about their serious concerns over the use of RAIs in SVP commitment proceedings, and the legal community nationwide is listening. In a December 15, 2000, memorandum decision, a Missouri court found that neither the RRASOR nor the MnSOST-R has gained acceptance in the relevant scientific community. In opining that the instruments did not demonstrate sufficient reliability and relevance under a Daubert analysis, the court concluded that "the State is prohibited from offering or submitting testimony from any psychiatric, psychological or other witness regarding the use of actuarial tests, i.e., the RRASOR and MnSOST-R, in predicting sexual offense recidivism." (In the Matter of James Francis (CV-299-108MH), Circuit Court of Butler County, Missouri, Division III.)
In a recent Florida opinion, the court cited the absence of instruction manuals, concerns over translating group data into predictions about individual behavior, the failure to address the concept of base rates, reliance on assumptions that have not been proven, problems with inter-rater reliability and cross-validation, and the lack of peer review data, in ruling that "the test instruments do not possess the scientific reliability under Frye, nor has the general acceptance of the instruments been established in the scientific community." (State v. Klein, No. 05-1999-CF-08148, Cir. Ct. of the 18th Judicial Cir., Brevard County Florida, June 2, 2000.)
Similarly, in July 2000, an Iowa court concluded that "even using the liberal admissibility criteria utilized in Iowa, the assessment procedures . . . does [sic] not meet the test of admissibility as expert testimony in these proceedings," and listed six areas of concern, including the absence of peer review data, inappropriate application of actuarials to individual behavior, and the potential for misleading the jury about the degree of scientific acceptance of the RAIs. (In re the Detention of Harold Johnson, LACV038974, Iowa Dist. Ct., Story County, Iowa.) In March 2001, an Arizona court agreed, stating "the challenged actuarial instruments have not been accepted by the relevant scientific community to predict future recidivism . . . and will not be admitted." (In re the detention of John Woods, OP 2000- 0005, Cochise Cty., Arizona.)
Fundamentally, some courts are beginning to vigorously reject the proposal that science can predict what an individual will or will not do in the future, a position that many scientists, psychologists, and even the American Psychiatric Association are embracing wholeheartedly. Although the concept of identifying the most serious recidivists prior to their reoffending and incapacitating them is appealing, the reality may well be that there is not now, nor may there ever exist, an acceptable means to make such an identification with a sufficient degree of accuracy. Recognizing that human nature is not the likely nor appropriate subject for scientific predictions, a three-judge Florida panel in a consolidated opinion rendered in 14 SVP cases, rejected the admissibility of RAIs using a four-step analysis, which included a consideration of the Frye standard, stating:
We note, in passing, that the underlying assumption of these instruments is that human beings do not change but are programmed to act in the future in accordance with the manner in which they have acted in the past-an assumption which runs contra to the basic principles of our legal and penal system and contra to the premises of clinical psychology and psychiatry. We would further note that the fact that these instruments come clothed in a robe of statistics and presented in terms of mathematical significance makes it likely that any harm done by the improper admission of them would be compounded by the impression such scientific trappings would have on a jury.
(In re: The Commitment of Roberto Valdez, Case No. 99-000045CI, et al., Cir. Ct. of the Sixth Judicial Cir., Pasco and Pinellas Counties, Florida, August 21, 2000.)
Clearly this issue will be reviewed by higher courts, possibly even the U.S. Supreme Court. But the Florida, Missouri, Iowa, and Arizona opinions appear to be based upon solid scientific ground, and those courts that have, upon close scrutiny, identified the flaws inherent in Risk Assessment Instruments, have, even under liberal admissibility standards, declared the instruments to be fatally flawed.
Risk assessment without RAIs
The failure of RAIs to gain acceptance in the scientific community precludes their admissibility at SVP trials, thereby preventing juries from improperly considering this evidence in a scientifically significant light to which it is not entitled. This preclusion, however, affects only the admissibility of absolute-risk assessments, or quantification of future risk. As noted previously, testimony concerning the existence of certain scientifically validated risk factors and the role they play with respect to an individual's relative risk is generally admissible in the courtroom, assuming the appropriate foundation, and scientists continue to study and refine the realm of relative risk predictions. The Manual for Sexual Violence Risk-20 (SVR-20), for instance, which is not a risk assessment instrument but has been mistakenly utilized as such, is a set of scientifically-based guidelines designed to assist in relative risk assessments of sex offenders, and includes a consideration of dynamic aspects as well as historical factors.
Ultimately, however, whether or not the question posed by most SVP Acts can be answered sufficiently with relative risk assessments is yet to be determined, but science cannot always answer legal queries to a sufficient degree of accuracy, no matter how important those queries may be to society. The obvious concern over the use of these instruments is in incorrectly identifying an individual who would not have offended as one who would have offended (a false positive). But those who proffer that the RAIs are "accurate enough" to be used in SVP commitment evaluations would do well to keep in mind the famous "false negative" lesson taught by the case of Jeffery Dahmer, the Wisconsin sex offender and serial killer of at least 17 victims. Scoring Dahmer's likelihood of reoffending using the RRASOR, Dahmer rates a 36.9 percent risk to reoffend over a 10-year period, a quantification insufficient for commitment under most SVP Acts.
Donna Cropp Bechman is the chief deputy public defender at the Cochise County Public Defender's Office in Bisbee, Arizona. In addition to her regular felony caseload, she represents respondents in sexually violent persons commitment cases.