Lies, Damn Lies, And Surveys About AI

EDIT 30 Sept: I flipped the percentages for an analogy I made, where I said "and less than 1 in 5 say yes, even though nearly all of them use SuperRandomBrandName at work" when it should be "1 in 5 of them will not use SuperRandomBrandName at home". Thanks to @de****@ha*******.io for pointing out my error.

GitHub’s survey about the benefits of AI coding assistants doesn’t say much, is embarrassingly bad, and makes many claims that its data simply does not support.

Hi. I’m Steven Saus. I taught research methods at the undergrad level, specifically with a focus on quantitative methods such as surveys. I use AI tools for my day job, and I have used them (and written about my experience) outside of work. I’m neither particularly rabidly pro- or anti- the use of AI (which has more than annoyed some of my artistic-aligned friends).

GitHub’s survey — conducted by Wakefield Research — is a mess.

For starters: Their "methodology" section makes several basic errors, from just being ultra-widescreen formatted images in a PDF container — way to humblebrag about the size of your monitor — to NOT ACTUALLY BEING THE METHODOLOGY.

Let me emphasize that. The "methodology" section is NOT the methodology of the survey; it is the summary demographic data of the respondents. No original hypothesis or materials section is included. {1} This is such a fundamental error that I would have stopped reading a student’s paper at this point and told them to redo it if they wanted to pass the class.

The Findings Don’t Show What They Claim To Show

Their "findings" do not support the conclusions (which are broadly positive and cheerleading for the use of AI tools). For example, it’s repeatedly mentioned that "almost everyone (upwards of 97%) reported that they have at some point used these tools both in and outside of work."

Okay, and? AI tools have been shoved into everything, whether we want it or not. Technically, everyone who has made a Google search has "used an AI tool".

Does it make things better? Well, it doesn’t actually ask that. There are a lot of questions about perception and anticipation. For example, "How do you anticipate that AI coding tools will impact code security?" That doesn’t mean it will. Those are two completely different questions.

But that doesn’t stop them from making this conclusion (emphasis theirs):

Software development teams are recognizing more benefits with AI coding tools than previously reported. Some of these include building more secure software, improved code quality, better test case generation, and faster programming language adoption. This ultimately translated to time savings that they could use for more strategic tasks.

That is NOT what the survey asked, and it is NOT what the respondents answered. This is the equivalent of asking people on the street how they believe the economy is doing and then treating that as if it were economic data. Except we know that those things are not the same:

"…the gulf between Americans’ downbeat view of the U.S. economy and its general healthy well-being could not be wider….the economy is in remarkably good shape – unemployment has not been this low for this long since the 1960s, real wages are rising, and GDP growth is above trend. … So why is Wall Street’s view not shared by Main Street? Surveys consistently show Americans are pessimistic on the economy at large."

There are a lot of questions that also make presumptions. For example, could expertise in AI tools "increase the desirability of a job candidate"? Notice that the survey did not ask hiring managers or HR staff this candidate — they asked the people who use coding tools {2}. So this is only measuring what the developers and coders think that HR is looking for. AI/ML is a huge buzzword at the moment, so much in the same way that other management buzzwords and fads could increase your application prospects, so could this.

But does it actually improve your job prospects? The survey CANNOT answer this question… but it sure pretends that it can. Rinse and repeat with nearly all of the survey’s "findings."

The One Nugget of Truth

However, it does have one important clue hidden in the midst of all this marketing nonsense pretending to be research.

All that clue gets is one short paragraph and a single image that tries to de-emphasize what I would argue is the single actual real bit of data in the whole thing:

Our survey data showed that nearly all of the survey participants reported using AI coding tools both outside of work or at work at some point. However, 17-27% of respondents indicated that they’ve only used AI tools at work, challenging the assumption that all developers are using AI outside of work.

This is very telling.

Imagine, for a second, that you’re at the mechanic and their shop uses SuperRandomBrandName tools. You ask the people working at the shop if they use SuperRandomBrandName tools at home, and a full fifth of them say they won’t, even though they use SuperRandomBrandName at work.

Hell, ask Boeing workers if they’d fly on Boeing’s planes.

If the professionals are not using the same tools at home that they use professionally, that is what you want to research.

This bit of data stands out as an unexpected — but very important finding of the survey.

It is also almost immediately skipped past, and never mentioned again.

And it’s not hard to figure out why.

The Conflict Of Interest

As some of you have probably already been saying to your screens, there’s a massive conflict of interest. GitHub has their GitHub Copilot tool that they desperately want you to use. GitHub was purchased by Microsoft back in 2018, and Microsoft has dumped billions into AI/ML infrastructure and research — including announcing an additional USD $2.7 billion in Brazil and USD $1.7 billion in Mexico just in the last few days.

Conclusions

This is another hype-filled marketing survey with poorly-worded questions asking about things the respondents could not possibly know. As a result, it only measures the perception of those individuals, which as we know from our economic example above (or countless others) may not have any relationship from the actual state of things.

The material surrounding the survey is even worse, with Kyle Daigle (GitHub’s COO) writing a summary making claims unsupported by the data they have, and drawing implications that are simply wishful thinking from the data at hand.

Particularly this one: "AI doesn’t replace human jobs — it frees up time for human creativity."

That conclusions makes assumptions about the quality of AI output that in my experience, are not accurate. More importantly, the answer to that question — as we have learned from other technological advancements — are influenced not only by technological advancements, but by the method those advancements are used by those who possess the capital to own those technological advancements.

And that was most definitely not included in the survey.

In The Pudding

Just recently, CIO published the findings of a different study conducted by UpLevel that examined actual metrics using code analysis.

It’s worth noting that UpLevel also has a marketing stake in the results: "Uplevel is the only holistic system of decision for enterprise engineering organizations." Further, don’t bother doing giving your email address to the "see the full report" from UpLevel unless you’re interested in seeing differently arranged infographics.

That said, UpLevel did do the right thing: They had a control group and a test group, and compared their findings. While not a "true" experiment {3}, this is the kind of pseudo experimental study that you want to see in any kind of research. They also looked at the right data for the questions that the GitHub survey claimed it was answering.

Their key findings? GitHub Copilot didn’t appreciably increase speed, but it did introduce a lot more bugs. Those in the group using GitHub CoPilot saw an increase in the bug rate of 41%, with no appreciable reduction in burnout or change in efficiency metrics.

This is also my experience with using AI assistants with coding. Give it something simple that, searching StackOverflow could answer, and you’re fine. More than that? It might be right… or it might be wrong in a way that you don’t know how to fix (if it’s fixable at all).

As with the adoption of any new technology, it’s vital that we get accurate information about its costs, benefits, and utility — not marketing hype used as a post-hoc justification for spending billions of dollars.

We need more research like what UpLevel did, and far fewer bad meaningless surveys used to justify marketing hype.

Hat tip to Baldur Bjarnason for both articles.

{1} See page 13 onward of my Master’s thesis if you want to see an okay example of a methodology section.
{2} "software engineers, developers, and programmers made up the majority of…respondents" as well as "a small number of data scientists and software designers"
{3} This is a precise technical term; a true experiment is almost impossible in real life sociological or psychological studies.

Featured Photo by julien Tromeur on Unsplash