DaveE (15:41:04) : E.M.Smith is better qualified to answer this & I believe GISTemp does the same thing with station Temps. You note that Months with 9999 are ignored but what isn’t obvious from that is that there may only be one day of readings missing. The obvious thing to do would be to take the average of the adjacent days to salvage the month but they don’t, they just write the month off.
GIStemp gets the data from NOAA (in what it turns out is the NCDC product of GHCN) all ready rolled up into a single “monthly” datum. One has to swim upstream to NCDC / NOAA / GHCN to find out how they decided to put a “missing data flag” in a month (-9999 or sometimes 9999 in some data sets and some steps of GIStemp).
So to the assertion that a single missing day might cause a missing data flag, I can not speak (yet…) Heck, NCDC could choose to simply fill in any month with a ‘missing data flag’ if it fails their “QA Tests”. (They do something along those lines already). But by the time it gets to GHCN and thus to GIStemp there is no daily detail left.
To the issue of “data creation”: It is
RAMPANT throughout the entire GISS and HadCRUt process. There are so many holes in the data by time and by space that they have no choice but to pick one:
1) Admit they have no hope of creating a “global temperature” for any significant length of time.
2) Make up ‘temperature values’ for the 80% or so of time and space that are missing. (The southern hemisphere is substantially empty for the first half of the temperature record and is still remarkably blank. Everything with about 20 Degrees of the North Pole is fabricated. Etc.)
cover it pretty well in detail, while being readable at the top level by anyone.
Especially see the graph here:
and the coverage charts here:
Talking of E.M. Smith. I believe he’s fixed the -ve sum of squares problem, though I’ve not been over there to find out.
Yes, I have. A couple of different ways :-)
It’s a ‘square of integers’ (which can have overflow) problem. There is a commenter “Steve” who asserts it was just a single bad data item and that “Harry Readme” removing it is all it takes to “fix it”. Totally insufficient. There
was one bad data item big enough to cause an integer overflow, but there could just as easily be others that did not cause a crash like the one that lead “Harry Readme” to pluck that bad datum from the set. (i.e. there could still be bogus values not yet found).
There are 3 levels of fix:
1) Range check in the program (i.e. catch broken large data before it causes an overflow).
2) The “square of INTEGERs” gets stuck into a floating point number (so an implicit “cast to float” is done). Just change the INTEGERs into FLOATS before the squaring (“cast to float” first) and you eliminate the overflow ( IEEE compliant Floating point math does not overflow) though you might still have wrong too large data in your input. Not strictly needed if range checking is perfect, but a nice bit of robustness anyway. Belt and Suspenders, don’t you know…
3) Write a “preening” program to check for insane data in the input file prior to running. This can be more detailed than the basic range checks in the program itself. (I.e. the program might just check temps between -90 and +60 C while the ‘preening’ step might assume even 0C was too warm and wrong at the South Pole while -89 C was a possible, yet at the equator might accept nothing below 10C unless at altitude for hot countries…
This lets you run the ‘preening’ as a distinct step for debug, data quality report and assessment, efficiency, etc.
Source : https://wattsupwiththat.com/2009/11/25/climategate-hide-the-decline-codified/