Have you ever read a science paper that’s so statistically invalid, logically flawed and offensively awful that it accidentally becomes So Bad It’s Good? Here’s a review of a paper from the depths of 1980’s Evolutionary Psychology. There will be a lot of insensitive discussion of dead children, plus most forms of bigotry. It will not feature necrophilia, in case the title is misleading for anyone. If you think I’m exaggerating its faults, you can read it yourself here: https://sci-hub.tw/10.1016/0162-3095(89)90006-x
I was directed towards this paper by a blog post titled “An Especially Elegant Evpsych Experiment”. The writer of the blog concedes he did not read the paper, which is the only possible explanation for its title. The paper itself is titled “Human grief: Is its intensity related to the reproductive value of the deceased?” The basic premise of the paper is parents are more sad about losing children if those children were likely to have children of their own in the future if they were hunter-gatherers. They will establish this by questioning people who have never been either hunter-gatherers or bereft of children (as far as we know) and then over-analysing the results. I could criticise this experimental design for not asking real bereaved parents, but given how insensitive the authors are, that’s probably for the best. I don’t merely mean their language is clinical: they go out of their way to reference touchy topics while glossing over important technical details. The introduction summarises results from other work with phrases like “healthy male [child] > healthy female = unhealthy female = unhealthy male for intensity of grief [by parents]”. The use of these mathematical symbols when assigning value to deaths is so callous and uncaring: obviously they should have used the ‘approximately equals’ signs. Incidentally, this is result is not consistent with the paper’s thesis. You’re supposed to think that, since males have a wider range of number of children, their health matters more to parents. However, given the whole one-sperm-one-egg thing, the average number of children born to males and females should be very similar*. If your metric values healthy males more than healthy females, unhealthy males have to be valued less than unhealthy females to make up for it. Otherwise you’re just spouting Darwin-flavoured sexism.
The introduction also says that a proposed correlation between grief of loss and child age in a previous paper was wrong. So these authors are trying with reproductive fitness instead. Which, at least in this paper, is just a function of age and sex (while they flirt with ableism, they don’t really want to commit to it. Even though infertility would have been a stronger and faster test of the main thesis). So they could have avoided all this work if they just got the numbers from a previous paper and applied one function to the results before doing the same stats.
They then introduce the concept of ‘reproductive value’ (RV): “Fisher’s (1930) concept of reproductive value provides a measure of those factors related to the age of both the offspring and the parent. It is the relative number of offspring still to be born to an individual of age X.” There are technical respects in which this definition is wrong (RV is normally corrected for population growth), but let’s focus on why it’s a terrible metric to use in our case. It explicitly ignores the importance of nurturing existing children and only focuses on unborn children. Once you’ve had a child, your RV actually reduces because you’re going to invest in bringing up that child, whereas your expected number of grandchildren/later descendants increases. While there are other, better ways to define RVs that value parental nurture, this study isn’t using them. Which, for a study about the grief of parents losing children, is a pretty big problem.
They end the introduction by including a graph showing RVs (as defined here) for various human populations, and how it changes over time in Japan and between black and white South Africans. It’s reproduced from another study, and wouldn’t be relevant here if they’d properly explained what RV was. They specifically instruct us to “Note the differences between the curves for black and white South Africans and for Japan for 1964 and 1966. These figures suggest that reproductive value is a measure that may capture at least some of the cues that may regulate the intensity of parental grief over the death of offspring.” However this figure (there is only one) doesn’t capture anything about parental grief, and I have no idea why they think it does. It literally just shows a measure of population dynamics. Race doesn’t otherwise appear in this study at all – it seems to be included here to get another tick on the edgelord checklist (I notice that the lead author has literally won a ‘Prize in Support of Controversy’). The whole hypothesis of this paper specifically relies on humans not evolving since whenever we were all hunter-gatherers, so we wouldn’t expect to see any differences in grief between societies or races, irrespective of these curves. Quite why Japan changed so much in 2 years is also beyond me – perhaps some sort of baby boom/war effect, but it’s utterly unexplored, and quite possibly it just shows the huge error in measurements. As well as being useless, the graph is physically painful to read, as it relies on slightly different dashing patterns that don’t scan very well, between very similar trend lines.
They round it off by claiming that because females have higher minimal investment (i.e. giving birth is hard), “female parents can be expected to be more sensitive to the reproductive value of their offspring than male parents.” This isn’t true if both parents have to care for the child equally; the mother has already made this minimal investment by the time the child is born, so it’s a sunk cost fallacy. Also, given that you’re reading a study about parental grief, and they break everything down by gender, you might assume that they’re going to statistically test this hypothesis later. You’d be wrong. They don’t really do statistical tests here, they just calculate correlations and then comment on whether they think this number is high or not. It seems they missed the ‘correlation does not imply causation’ part of statistics. I do genuinely congratulate them for including this hypothesis at all though, since they later admit that their weird analysis of the data doesn’t support it. Many studies would have just found some reason for thinking that they expect different patterns.
If you want to see loads of spurious correlations that haven’t been explained using EvPsych, go to https://tylervigen.com/spurious-correlations. This one gets to 96%!
So finally, the actual data collection. They ask lots of test subjects, if two Canadian children of different ages (1 day – 50 years) but the same gender are killed in a car crash, which ones are the parents more concerned about? “Both the same” is not an option, and there’s no relative weighting, just a rank. They have 436 test subjects, which sounds a lot. However the paper splits up the test subjects by their gender (2 options) and their age band (3 options), and each will only be asked about children of one gender (2 options). Some of these categories have 15 in, some have 50. They do not care if the subjects actually have children, which you’d think would matter a lot. They need pairwise comparisons of car crashes with many combinations of 10 ages of children. They also mention in passing that, bizarrely, many test subjects found the questions disturbing, so didn’t complete the survey! I also think it’s uncommon to have families where one child is 50 and one 1 day old, and wonder if the subjects are accounting for the implied age of the parents (the introduction suggests that this matters but it never comes up again). They repeatedly state that requiring so much data means we can’t do statistical tests, because there’s too many samples needed to construct one line. What they mean to say is that in spite of having so many test subjects, their way of generating the normalised grief curves is so inefficient that the statistical tests have nothing to run on. This is the sort of thing you should maybe check when designing your experiment. If they’d asked all these people the same handful of questions, they could do normal tests.
They then calculate an implied ‘relative grief’ statistic in each case, and calculate the correlation between this and age (low), this and RV (medium) and this and the RV of female !Kung bushpeople (very high). If you’re wondering where that last one came from, so am I. It’s implicitly because !Kung are supposed to be culturally like ancestral hunter-gatherers, although no evidence is presented for this. You can tell that’s not true from reading their name: they’ve clearly made linguistic inventions most humans haven’t (the ‘!’ stands for one of many click sounds). Even if !Kung technological culture was identical to early human culture, and cultures and death rates didn’t depend on local geography, they would still experience spill-over effects from diseases, mosquito populations and land-use changes in the areas around them. RVs (properly defined) are supposed to depend on general population growth and are very sensitive to changes in infant mortality. If you’re on the lookout for more cognitive biases, you’ll be pleased to note that the study would explicitly prefer to use both female and male RVs, but couldn’t find it for males, so will assume it’s the same. Availability bias FTW! I wouldn’t normally care, but they spent half the paper banging on about how different the sexes are. They later find that the correlations are slightly higher between female !Kung RVs and the grief response for male Canadian children than for female Canadians, which should be a bit awkward for their theory, but is unremarked on. They do not consider other simple correlations, like mortality rate, which should look fairly similar to the !Kung RVs.
I indicated that the correlations are calculated for loads of groups, but there seems to be little systematic impact of the gender or the age of the assessor. Not to worry, they also calculate mean values. I have no idea how. Most of the values I can reproduce by averaging those numbers. Linearly averaging them. Like, using the 3 reported numbers without weighting for how many test subjects are in each category. This is basically looking for Simpson’s paradoxes. However the !Kung correlations that are so impressively high don’t follow this pattern, and the mean values can and do exceed the constituent values. This implies some sort of pooling action before calculating the values, which is the correct thing to do but is not described. The numbers they get look OK from the graph they show, but I’m not sure how they combine their results to get that graph.
They start the discussion by congratulating themselves for correctly predicting that there’s some correlation between some metric of RV and grief. Quote: “Thus, the validity of this study depends on the assumption that the subjects empathized with the situations described in the questionnaire. The variation in the correlations between grief and reproductive value for the different conditions suggests that it holds for this study.” I.e., we see different numbers and we got an answer we liked, so the premises are valid. They then acknowledge they were wrong about the male/female subject divide, and fumble around with the idea that there might be some weird patterns whereby estimates correlate better for male children than female children, but this is clearly post-hoc reasoned. They then consider random alternative hypotheses to the grief = RV, but not the most obvious hypothesis, that grief correlates with probability of dying, i.e. unexpectedness, which ancestrally obeys a similar curve (high mortality near birth and after 40), and would be the null hypothesis if I ran the experiment.
OK, so we know that correlation doesn’t imply causation, but should we at least be impressed by the high correlation they somehow calculate? No. Here’s another major stats fail. It’s a time series. Time series usually correlate with themselves. You can’t do normal correlation analysis on timeseries unless the values change so rapidly that the value at one time is totally unrelated to the value at neighbouring times. If the values change slowly, you can make a guess that data at a later point is about what it was before plus how fast it was changing. So if you compare two timeseries and the first two points line up, they’ll probably be similar at the next point too. This is why all the spurious correlations you can find at https://tylervigen.com/spurious-correlations are timeseries – the graphs nominally have 10 points, but the points aren’t truly independent. I’m not saying the high correlation values in the study are pure chance – the RV and grief timeseries both have a rising, a level and a falling bit – but this is nowhere near as impressive as if the high correlations came from a genuine scatter plot of independent points. There are ways around this problem – detrending the data, i.e looking only at derivatives that don’t correlate to themselves. This paper does not do them. Hence, even ignoring alternative explanations, its headline result is worthless.
In conclusion, evolutionary psychologists have a bad name for simplistic handling of complex and emotionally charged topics, ignoring the role of society in determining behaviour and drawing grandiose conclusions from an imaginary human history. This paper justifies all of these stereotypes and adds to the mix “can’t do stats”.
* Similar but not identical, since there can be (and usually are) slightly different numbers of males and females in the population at any given age. In general more male children are conceived, but more females make it to puberty. This paper is in no way ready for the subtlety of this consideration, or of the biological ambiguity inherent in the idea of binary sex. Let’s get them up to GCSE science first.