Jim’s done a pretty good writeup of the aggregate statistics (part one, part two, and part three – he’ll have a large aggregate post soon) – good enough that I asked him if I could take a crack at the data myself and see if I uncovered something with correlations, regressions, and other statistical goodness.
There is a lot of data here, and I’ll be breaking it up over the next several days (going into next week). Each day I’ll have some statistical goodness, but I’ll try to have an “executive summary” at the top of each section as well. Stats geeks will rejoice (and probably call me out on some mistakes, too.). I’m not sure if this makes me Jamie or Adam of this survey’s mythbusting team… Jim’s got the leather coat, but I have the top hat. Tough call.
Jim gave me the raw (anonymized) data, and it turns out I sorted things out a little bit differently than he did. Which brings us to section 1:
Executive summary: Jim and I grouped the responses slightly differently. This did cause some slight differences in aggregate statistics.
There were 246 distinct respondents to the survey. This was a convenience sample, though with the many categories of this population, it would be difficult to create a truly representative random sample. Some issues that need to be dealt with up front are as follows:
One survey question was “How many times, if any, was your novel rejected before it sold to a professional publisher?”. Due to responses both in the survey itself and in communications outside the survey instrument, an unknown number of respondents were confused about this question. Since we known the responses are not consistent between respondents, I scrapped the information and didn’t use it. Which is a pity.
For the question “In what genre was your first professional novel sale?”, several respondents placed multiple genres. To best reflect current bookstore practices, the following guidelines were used in coding these responses:
- Any response that included “Young Adult” was treated as “Young Adult” alone (e.g. “YA/Mystery” became “Young Adult”).
- Any response that indicated marketing for children (i.e. “Middle-school”) was treated as “Young Adult”.
- Any response that included “Tie-in” was treated as “Tie-in” alone (e.g. “Tie-in/Sci-Fi” became “Tie-in”).
- Any response that included “Romance” was treated as “Romance” alone (e.g. “Paranormal Romance” became “Romance”).
- “Paranormal”, “Occult”, and other descriptive terms that were not explicitly “Horror” were treated as “Urban Fantasy”.
- Any genre with less than three responses was folded into “Mainstream/Literary”.
Please note that this classification is different than the one that Jim used.
For the question “How did you sell your first novel to a professional publisher?”, the following guidelines were used to code the data:
- Any response that included “Self-published, then sold to a larger publisher” (n=1) was treated as a small press publication.
- There were many (n=30) responses in the dataset that were listed as “Other”. These values were treated as missing, and not calculated.
One further note: There were a few outliers in numbers of short stories sold; respondents who reported selling more than 10 short stories before the first novel was normalized to 10.
Executive summary: Those who sold more short fiction were only slightly more likely to have sold their first novel to a publisher or small press directly (and there’s a big, BIG caveat attached to that) and were more likely to have attended week-long writing workshops. And older writers were slightly more likely to have been writing longer, attended cons, writer’s groups, and weekend- and week-long writing workshops. The later the year of first publication, it was slightly less likely the author had been referred by a mutual friend to a publisher or agent.
Using the “Correlations” feature in stats programs is often a way to find out where to look more closely. That really was the case here; the correlations found were pretty weak and few were actually significant.
Correlations were sought between “How many short fiction sales, if any, did you have before making your first professional novel sale?” using Pearson’s Product Moment Correlation). Though not the appropriate statistic, this table suggested the direction for further analysis. There were only significant correlations with “How did you sell your first novel to a publisher” (rho(246)=0.21, p=0.001) and “Have you attended a week-long workshop” (rho(246)=0.13, p=0.05). BIG CAVEAT: I must caution that both of these are weak correlations, and this statistic is not applicable to nominal level data, and are largely presented here for completeness alone.
The table of correlations also demonstrated some rather obvious (though not strong) associations: Older writers were more likely to have been writing longer (rho(246)=0.17, p=0.006), attended cons (rho(246)=0.18, p=0.004), writer’s groups (rho(246)=0.18, p=0.004), and both weekend-long (rho(246)=0.27, p=0.001) and week-long workshops (rho(246)=0.2, p=0.001).
The year of publication also had some interesting associations, with significant weak negative correlations to being referred by a mutual friend to a publisher (rho(246)= -0.17, p=0.007) or agent (rho(246)= -0.13, p=0.04). These vaguely suggest that these factors had less of a role in more recent years, but that is not conclusive.
Tomorrow, we’ll look at how first novels were sold over time, how short story sales related to how first novels were sold (both as a whole, and by decade.) And yes, there will be tables. And graphs. Oh yes, there will be graphs.