Blanchard, Clemmensen, & Steiner (1985) "Social Desirability Response Set and Systematic Distortion in the Self-Report of Adult Male Gender Patients" provides J. Michael Bailey evidence for his world-view. A closer examination of the study indicates transsexuals are remarkably honest under the circumstances.
This is page 7 of 9.         
Ray Blanchard, Leonard H. Clemmensen, and Betty W. Steiner (1985) looked at how the self-presentation concerns of 'adult male gender patients' are related to how they describe themselves when asked questions about how closely they fit with the stereotypical view of a transsexual.  I interpret these results as showing something about the circumstances transsexuals live in. That is, these results show that fulfilling a certain societal expectation is important to transsexuals, especially in an environment where they are being evaluated for much-desired hormones and sex reassignment surgery. As he reports on his website, J. Michael Bailey sees this study as important for explaining why you should believe his accounts of transsexuality rather the differing personal accounts of transsexuals. "I am not rejecting the claims [of transsexuals] for no reason." "There is good scientific evidence that says you should believe me and not them."  As I previously discussed, there really is no issue about who to believe since Blanchard's data supporting his theory is what transsexuals actually say. Nevertheless, as I previously discussed, the suggestion that transsexuals are deceptive is held by numerous supporters of Blanchard's model. Is their cynicism justified?
As previously discussed, J. Michael Bailey suggests something about autogynephilia makes 'heterosexual' transsexuals  obsessed with projecting their self-image so that others believe in the women they see themselves as. One way to find out if Bailey is correct is to measure how concerned transsexuals are with their self-presentation in an environment where projecting their woman-selves is at the forefront of their minds. Those obsessed with their self-presentation should be hyper-vigilant about their self-presentation. Blanchard, Clemmensen, & Steiner (1995) provide excellent data for this purpose. They measured self-presentation among clients at the Clarke gender identity clinic. They divided their participants into 'heterosexual' and 'homosexual' gender patients. For understanding Bailey's perspective, the 'heterosexual' group is assumed to have autogynephilia and the 'homosexual' group provides an excellent comparison group (i.e., control condition) because they are in the exact same circumstances but presumably lack autogynephilia.
When Blanchard compared the groups, far from finding a hyper-vigilant 'heterosexual' group (i.e., ceiling effect), he found that both groups had the same level of self-presentation concerns. On a scale from 0% (no self-presentation concerns) to 100% (total self-presentation concerns), the average 'heterosexual' (i.e., autogynephilic) scored 54% (s.d.,, 16%) and the average 'homosexual' scored 61% (s.d., 17%).  Ray Blanchard's evidence is inconsistent with Mike Bailey's conjecture that 'heterosexual' transsexuals are obsessed with projecting their self-image. In fact, Blanchard, Clemmensen, & Steiner (1995; pg. 514) actually say this explicitly, "The mean social desirability scores ... suggest that the homosexual subjects had, if anything, a greater tendency to present themselves in a favorable light that the heterosexual subjects, Thus, one should not conclude that the homosexual subjects were generally more honest or more accurate from the present finding ..." (emphasis in original)
Blanchard, Clemmensen, & Steiner (1985) correlated self-presentation in adult male gender patients with 8 measures of qualities that were either stereotypical or contrary to the stereotypes of classic transsexualism. They found that gender patients systematically distorted their responses to be more like stereotypical transsexuals. This was still true for the sample as a whole and when looking just at the 'heterosexual' sub-group. This was only true for 1 measure for the 'homosexual' group. The pattern of evidence clearly shows that gender patients' self-descriptions are systematically distorted to be more like stereotypes of transsexuals. Is this distortion due to autogynephilia or the circumstances?
To tease apart autogynephilia from circumstances in explaining systematic distortions by gender patients, we need a complicated but common statistical model: hierarchical linear regression. This type of model has an outcome we are trying to explain. Let's take an example (chosen for its typically) so that I can explain this concretely. What explains individual differences in how sexually aroused gender patients say they become from cross-dressing. We have various possible explanations: Circumstances for self presentation? Sexual orientation? Self-presentation biases linked to a sexual orientation? We can put these explanations into the statistical model one at a time and see what percentage of the individual differences (i.e., change in R^2) in self-reported arousal to cross-dressing can be explained by adding the new explanation on top of the old ones. I would suggest we put the possible explanations into a statistical model in the aforementioned order: self-presentation, sexual-orientation, and the self-presentation by sexual-orientation interaction.  Unfortunately, Blanchard did a series of correlations instead of the full hierarchical linear regression so teasing apart circumstances from autogynephilia will take a bit of inference.
First we put self-presentation into our statistical model and we find that we explained 8.41% of the individual differences in arousal to cross-dressing.  The contribution of self-presentation to other outcome measures was 7.29% to 12.25%. Now we put sexual orientation into the model. Since we know that both sexual orientation groups had the same self-presentation scores, we might guess that there is little overlap between the variance explained by sexual orientation and the variance already explained by self-presentation. Sexual orientation explains an additional 22.97% of the individual differences in sexual arousal to cross-dressing.  Next we put the "interaction" between sexual orientation and self-presentation into the model. This is the most important level for verifying Bailey's conjecture because we are finding out how much of the individual differences in arousal to cross-dressing are explained by something specific to the self-presentation of 'heterosexuals'. Unfortunately, to the best of my knowledge, Blanchard does not provide enough data for us to estimate a percentage. But we can still get a qualitative picture by comparing the sample as a whole and the heterosexual sub-group. If there is an interaction, then we would expect the statistical significance of the correlations to become stronger when we remove the 'homosexual' group from the entire sample. This is because, if Bailey is correct, the 'homosexual' gender patients were cluttering the true correlations with statistical 'noise'. If there is no interaction, then we would expect the statistical significance of the correlations to become weaker when we remove the 'homosexual' group from the entire sample. This is because we have fewer participants in our study. Of the eight correlations: 5 became weaker, 3 stayed the same, and 0 became stronger. Removing the homosexual sub-group was largely detrimental to the statistical significance because they were added participants and they were not cluttering the results. That is, the evidence is inconsistent with Bailey's conjecture that something specific to autogynephilia systematically distorts self-descriptions.
Though Bailey's conjectures about obsessive lies are clearly not supported by the data, these statistical results leave us with something odd to explain. Why could he present the correlations in such a way that it appeared to support his conjecture? Stated more precisely, how could there be such little evidence for a sexual-orientation by self-presentation interaction yet also have correlations so consistent for the 'heterosexual' group while not for the 'homosexual' group? How could the 'homosexual' sub-group both have contributed to the overall significance yet not shown significant correlations just amongst themselves? The answer is a statistical phenomena. The standard errors for the 'homosexual' group were, on average, under half the size the standard errors for the 'heterosexual' group. The 'homosexual' group was, on average, twice as close to either floor or ceiling than the heterosexual group. This suggests any effects in the 'homosexual' group may have been dampened by floor or ceiling effects.  Blanchard et al (1995; pg. 513) acknowledg this possibility as well, "... one possible cause for the lack of social desirability correlations in the 'homosexual' group is their uniform tendency to produce extreme scores on all the ... measures."
As previously mentioned, the 'homosexual' gender patients were less diverse in their responses to the 8 measures than the 'heterosexual' gender patients? What does this say about transsexuals? To figure out what this means we need to look at how Blanchard et al (1995) selected their participants and how they classified them as 'heterosexual' or 'homosexual'. The 'homosexual' gender patients were almost entirely transsexuals according to Blanchard's criteria (i.e., 96% felt like a woman consistently for at least one year). In contrast, only 69% of the 'heterosexual' gender patients were transsexuals. When Blanchard says "gender patients" he includes both transsexuals and cross-dressers. Their way of selecting participants for this study made the 'heterosexual' group more diverse than the 'homosexual' group. Worse, because of this diversity, his results conflate sexual orientation with transsexuality.
To classify participants as 'heterosexual' or 'homosexual', Blanchard used a scale from +14.13 (completely homosexual) to -31.40 (completely heterosexual). A participant was classified as 'homosexual' if their score was greater than 10. That is, only 9% of the possible scores a participant could get made them 'homosexual' according to Blanchard. 'Homosexuals' may only appear less diverse than 'heterosexuals' because they were chosen more selectively. That is, the results say little about transsexuals. Instead these results are likely an artifact of the way Blanchard et al (1995) chose participants and classified their sexual orientations.
Contrary to the way this study has been characterized, it does not provide any evidence that 'heterosexual' transsexuals are obsessive or lying. The results are consistent with the idea that fulfilling a certain societal expectation is important to transsexuals, especially in an environment where they are being evaluated for much-desired hormones and sex reassignment surgery. Self-presentation accounted for about 7% to 12% of the way gender patients responded. Some psychologists have looked at these results and become cynical about their gender patients. Instead, I hope they will recognize that their clients are in an uncomfortable position because of your gate-keeper role. Despite your evaluative role, the vast majority of what explains your clients' responses is probably genuine individual differences in how closely they resemble the stereotypical transsexual. About 60% of the individual differences was left unaccounted for. Could most of that be genuine individual differences among transgendered persons? The evidence some researchers use to justify their cynicism towards transgendered persons actually gives therapists reason to feel their transgendered clients are predominantly honest with them.
Footnote 1: The measure of self presentation was the Crowne-Marlow Social Desirability Scale (Crowne and Marlow, 1964). Self presentation was correlated with measures like "felt like a woman" and "aroused by cross-dressing."
Footnote 2: This quotation of Bailey was printed in "The Daily Northwestern", the student newspaper on his campus, on April 21, 2003 in an article by Sarah Dreier and Mitchell Anderson.
Footnote 3: I use the term "heterosexual transsexual " because this is the phrase used in Blanchard et al (1995). Though the manner of assigning participants is different, "heterosexual transsexual" and "non-homosexual transsexual" appears to denote the same persons.
Footnote 4: I wrote with percents because they are intuitive for lay-persons. The actual social desirability scale is a number from 0 to 33. The reported value for non-homosexuals is 18.72 (s.d., 5.50) and for homosexuals is 20.02 (s.d., 5.67).
Footnote 5: I am assuming a linear relationship as Blanchard does to avoid making this discussion even more complicated. Also, it would definitely be understandable if you felt that we should flip the first two levels in the regression. I chose the order because the first level is the only one Blanchard provides us with enough data to have a definitive results for R^2. A different order would not influence the results for the interaction term, which is the focus of this discussion. Interactions need to be entered after main effects.
Footnote 6: The correlation between self-presentation and sexual arousal to cross-dressing was -0.29 for the entire sample. Since this is the first level, the percent of variance is determined by squaring the correlation: -0.29 * -0.29 = 0.0841. Recall that this example was chosen because it is typical. The percent of variance explained by self-presentation across all 8 measures used by Blanchard ranged from 7.29% to 12.25%.
Footnote 7: Blanchard et al provide the means, standard deviations, and n for each group. From this I was able to calculate effect size to approximate the change in r^2. Here is the data from the paper: non-homosexual group (x=2.41 sd=1.42, n=64) and homosexual group (x=1.16 sd=0.61, n=51). Please, note 22.97% is the upper-limit of possible effect sizes because it assumes no overlap with self-presentation. The actual effect size may be somewhat lower. The range of effect sizes for sexual orientation on 6 of the measures was 4.58% to 36.64%. This excludes the 2 measures which were part of Blanchard's criteria for assigning participants to the sexual orientation groups.
Footnote 8: Here is a brief description of floor and ceiling effects in non-statistical language. Let's say we believe that studying leads to good grades on tests and we even found evidence for this in a bunch of correlations. But now let's say we correlate how much time pre-schoolers spent studying for a calculus exam. We find no correlation! That's because it's just way beyond a preschooler to do calculus so no matter how much they study, it won't mean much. There scores on the test were all very low (near 0%) and in a limited range (like 0% to 5% instead of the 100% to 60% we might typically expect). This is called a floor effect. Conversely, if you gave mathematicians a test of adding whole numbers, we would expect a ceiling effect. My examples are purposely very extreme cases so that it's easier to understand. The floor and ceiling effects in Blanchard's paper are not so pronounced.
This is page 7 of 9.