I'm on a fact-checking and quality-assurance appreciation kick recently.
In their paper titled Sequence Polymorphisms Cause Many False cis eQTLs, Alberts et al take a look at the reliability of using gene expression data to find quantitative trait loci, areas of the genome that are closely tied to a given trait. QTLs are the kind of thing you hunt for when you're trying to figure out where a predisposition to schizophrenia comes from, for example.
In the eQTL method, gene expression data (rather than sequencing of actual genomes) is used to try and identify QTLs. However, as the authors tell us, small differences in the sequence of the same gene within your pool of sample sources can be a big problem for this method. A single nucleotide change in a gene may mean that its RNA no longer hybridizes nearly as well to the cognate probe sequence on the expression chip, thus giving an artificially low expression reading. As a consequence, you may inadvertently end up detecting minor genetic variations between your research subjects, rather than significant differences in gene expression that will help explain whatever it is you're researchers.
The authors present us with a statistical method that aims to clear out these erroneous results, thus settling the data set down to genuine expression differences that can be used to find QTLs.
This is another case of a group having an eye for detail and catching a potentially very problematic hiccup in an otherwise handy analytical system.