Skip to content

Standard, Population & Customised fetal size charts 16 – summary

September 21, 2019

That’s it folks. Let’s summarise.

Standard growth charts tell you how big a healthy fetus from a healthy pregnancy and healthy mother should be. Population charts plot the average for your local population, including unhealthy pregnancies (click here). Customised charts adjust the size on the basis of individual parental features (click here).

Customisation improves detection of pathology if we can be sure that the feature customised on, is not only objective and reproducible, but also not associated with pathology (click here).  Customisation on factors associated with pathology, such as smoking, condemn the small sick fetus damaged by smoking, to being classified as “normal for smoking”. Fortunately no-one customises on maternal smoking.

But some people do customise on maternal ethnicity, weight, height and parity. Ethnicity is not objective (click here) and is associated with pathology (click here). Maternal weight is objective, but at both extremes is associated with pathology (click here). Maternal height is objective but also associated with pathology, albeit only at the lower end (click here). Parity is objective but also associated with pathology (click here).

Fetal sex is objective and negatively associated with pathology, with larger male fetuses having higher mortality (click here). This should make it an ideal factor on which to customise, and is the reason why, after birth, paediatricians routinely use separate charts for boys and girls. But it’s not suitable for routine prenatal use at present.

The main customised charts available in the UK are produced by The Perinatal Institute (click here). They provide customised charts for fundal height and for estimated fetal weight. Their training programmes for measuring fundal height are excellent, but the fetal weight estimates on which those charts are based use Hadlock’s outdated formulae. Their customisation formulae for both fundal height and fetal weight are secret. Their charts customise on maternal ethnicity, weight, height and parity, and are therefore likely to condemn the fetuses of first pregnancies and of mothers whose ethnicity, weight and height are markers of past and present deprivation, to having pathology missed.

Fetal growth charts should be created on large populations, using careful techniques to avoid bias (click here). Only the two standard charts created by Intergrowth-21 (click here) and by WHO (click here) have done this.  A well-publicised academic dispute (click here) between the Intergrowth-21 and WHO authors, involved accusations of plagiarism, but had nothing to with the science. Both charts are scientifically rigorous. However, the Intergrowth-21 sample was twice the size of WHO’s, and that group used slightly more advanced techniques for avoiding bias, so their chart is to be preferred.

Claims that the introduction of customised charts caused the recent fall in UK stillbirth rates do not bear close scrutiny (click here).

Suggestions that standard charts are flawed because they detect different rates of growth restriction or macrosomia in different populations make no sense; that is a feature of growth standards.

Empirical comparisons (click here) between customised and population charts are unhelpful because they compare two type of customised chart. All empirical studies suggesting that customised charts detect more pathology than standard charts have also reported higher false positive rates with customisation. No empirical study has ever shown greater detection rates of pathology for a fixed false positive rate. The best quality empirical study (the POP study from Cambridge), where results were also concealed from clinicians, showed no difference with customisation.

Population and customised charts, as currently available, cannot be recommended.

The Intergrowth-21 or WHO fetal growth standard charts should be used.

Jim Thornton

Standard, Population & Customised fetal size charts 15 – empirical studies

September 20, 2019

How do customised charts perform in practice?

Previous posts have described why, in theory, fetal growth standard charts such as Intergrowth-21 or WHO are preferable to local population or customised charts. Here we look at empirical comparisons. Unfortunately there are few good ones, many are misleading*, and reliable ones difficult to do.

Comparisons between customised and population charts (e.g. here, or here) don’t help; a population chart is simply a type of customised one. Claims that percentages of fetuses falling above and below various centiles vary more with standard charts than with customised ones (Francis et al 2018 click here) miss the point. This is a feature not a bug of standard charts!

Studies comparing the rate of detection of fetuses who are small for gestational age (SGA) by charts A and B, but which either don’t define SGA (Gardosi and Francis 2005 click here) or define it by a third chart C (e.g. Odibo et al. 2018 click here) confuse the issue as well.

We need to compare different chart’s detection rates for the sort of pathology that matters, stillbirth or brain damage, or for surrogates for those things, such as low cord pH, Apgar scores or admission to neonatal intensive care.  Such studies need to fix the false positive rate. Detecting more babies with adverse outcomes is not necessarily better if the false positive rate is also higher. Francis et al 2018 (click here) reported that customised charts would have detected 411 stillbirths compared with Intergrowth’s 229 in a cohort of 1.2 million term singleton pregnancies. However to do so, the customised charts would have classified 10.5% of pregnancies as growth restricted, while Intergrowth-21 would have only classified 4.4%. At least two other studies, (Anderson et al. 2016 click here) and (Pritchard et al. 2018 click here) suggesting that customised charts detect more pathology than Intergrowth-21, also had double the false positive rates with customisation.

A recent large retrospective US study (Kabiri 2019 click here) of five different charts, including Intergrowth-21, WHO and GROW, compared detection rates of a range of different adverse outcomes for a fixed 10% false posive rate (figure 2 in the paper). Intergrowth had the highest sensitivity overall, and GROW the lowest, albeit the differences were small and the confidence intervals overlapping. Another retrospective study from Scotland (click here), which also reported detection rates for fixed false positive rates, showed that partial customisation on two objective factors (maternal height and parity) did not improve detection of pathology.

With one exception, these studies are the best we have. However, such retrospective comparisons where one chart type was used in practice are tricky because clinicians will have acted on the result from the chart in use. If, for example, they induced labour early they may have prevented, or caused, an adverse outcome, so called treatment paradox. Caesarean for a suspected small baby may prevent the death which the chart was correctly predicting. Conversely induction for a false prediction of growth restriction may cause hypoxia in a previously healthy baby as a result of long labour.

This final problem is difficult to avoid. It requires scan findings to be concealed from clinicians. This has only been done once, in the POP study from Cambridge (click here), which showed that customising using the GROW software did not improve detection rates of any adverse outcomes.

And that’s pretty much it. Many other papers purport to show that customisation is good or bad, but they all either reported percentages above or below different centiles, or compared customised with population charts, or detection rates for pathology without fixing false positive rates, or all three, or came to no clear conclusion.

Next and finally, a summary of what we have learned (click here).

Jim Thornton

* Readers beware. Authors often use the terms population, standard and customised charts rather loosely.

Standard, Population & Customised fetal size charts 13 – The Intergrowth/WHO plagiarism controversy

September 18, 2019

Soon after the Intergrowth-21 charts were published, allegations of skullduggery appeared from WHO (click here). Rumours flew around, big money was involved, and the organisations employing each group of researchers, WHO and Oxford University, conducted enquiries. Eventually WHO referred Jose Villar and Stephen Kennedy, two of the leaders of the Intergrowth-21 group, to the UK General Medical Council (click here), who declined to investigate.

Supporters of WHO claimed that the two Intergrowth-21 authors had plagiarised a previous WHO protocol. Villar had been employed by WHO before he moved to Oxford, and Kennedy had been a member of a WHO expert group.

Defenders of Intergrowth-21 say that the idea of creating fetal growth standards was not only “obvious”, but had already been widely discussed in public long before either team set to work.

The full story has never been published, but the editor of the Lancet, who had seen both the WHO and Oxford reports, sided with the Intergrowth-21 group and concluded that at worst it was a matter of academic rivalry (click here).  His comment that the WHO enquiry was “disappointingly insubstantial” clearly stung WHO, who responded by publishing it here, the “report from reviewers”.  Judge for yourself. The WHO also took the opportunity to confirm that.

“WHO has never questioned the scientific validity of the research conducted and the papers produced by Oxford University (as published in the Lancet and other peer-reviewed journals)”.

Three supporters of the WHO allegations wrote a letter which added little (click here). The Oxford report, exonerating the Intergrowth-21 authors, remains unpublished. And there the matter rests.

The issue is long past. But it’s worth recalling, because advocates of alternatives occasionally allow the idea to get out that “There’s something fishy about the Intergrowth or WHO charts”. There is not. The allegations never included any scientific criticism of either. Rather the opposite. The idea of international fetal growth standards was so good, that academic rivals sought credit for it. This mattered once to the people involved, but not to the rest of us.

Next customised charts (click here).

Jim Thornton

Standard, Population & Customised fetal growth charts 14 – GROW

September 17, 2019

Customised charts

The Perinatal Institute (click here) is a leading UK advocate of customisation.  It markets Gestation Related Optimal Weight (GROW) charts, customised on the mother’s height, weight, ethnicity and parity (1). Two charts are typically combined into one physical chart which can be printed out and filed in the woman’s record. The left hand scale shows fundal height in cm and the right hand estimated fetal weight (EFW) calculated from a combination of biparietal diameter, head circumference, abdominal circumference and femur length, using Hadlock’s formula, in grams.  Staff are encouraged to plot fundal height using an X symbol and EFW using a O. Examples below.


The left hand chart is for a normal weight and height British European woman (2), and the right for an underweight Indian woman.  Both customised charts include an estimate called the Term Optimal Weight (TOW). For the British European baby this is 3,429g and for the Indian one 3,042g.  The difference matters.  Imagine that the EFW was 2,000g at 39 weeks. The British European mother would be told that her baby was growth restricted and likely advised to have labour induced, but the Indian woman would be told that the weight was “normal for her ethnicity, height and weight” and probably allowed to let the pregnancy continue.

Their customisation principles are detailed on the website (click here) but the exact formulae in use at any one time are a commercial secret. They are regularly updated on the basis of data sent back to the Institute by their customers.  This is a potential weakness since, although GROW customers have been trained in both fundal height and scan measurements, it is unlikely that the biases which inevitably affect revealed human measures in practice will have all been removed (click here).

The underlying justification for the choice of features on which GROW charts are customised is also weak. Self-reported ethnicity has both theoretical (click here) and practical (click here) problems. Maternal height (click here), weight (click here) or parity (click here) are problematic because underweight, shorter, and first time mothers all have both smaller babies and higher perinatal mortality. Customisation on those features thus risks normalising pathology. It would only be justified if we could be sure that the strength of relation between each feature and adverse outcomes was weaker than that between birth weight and adverse outcomes (click here). There are no such data. Even those WHO standard chart authors who advocate customisation in theory, have not even attempted to provide such data (click here). Customisation by fetal gender makes theoretical sense (click here) but GROW charts do not at present offer this.

Finally GROW only provides customised charts for fetal weight, not for head or abdominal size, or fetal length separately. This condemns obstetricians using GROW charts to either ignore these component measures or to interpret them on a different, non-customised chart (click here).

The Institute’s director, Professor Jason Gardosi, claims that GROW chart introduction was associated with a reduction in stillbirth and that the reduction was greater in English regions with higher uptake of GROW charts. Both claims are doubtful. The following are screen grabs taken from a lecture he regularly gives, available on the Institute’s website (click here). The data are those in his BMJ Open paper (click here). The graph scales appear to have been selected to emphasise a point.


Note how the vertical scale differs between the left and middle slide. The right hand slide (3) is fairer. Stillbirths were falling long before GROW charts were introduced and if anything the trend has levelled off.

The choice of time periods and regions to report may also be selective. See left and middle figures below.


The right hand slide shows the same data for the whole of the UK (3). The rate of fall slowed slightly in England and Wales where GROW software was most widely used, and was steepest in Scotland where GROW software was not in use.

The reasons for this general trend are well understood. Falls in smoking, increased diagnosis and termination for lethal fetal abnormalities, and increased inductions near term, all three of which reduce stillbirths. Given the undoubted benefits of the rest of The Perinatal Institute’s training in encouraging staff to measure fundal height correctly and to act on the results, this hardly suggests that customisation is beneficial.

Next, other empirical comparisons (click here)

Jim Thornton


  1. The Perinatal Institute also provides training in various aspects of maternity care. This latter work is generally agreed to be excellent.
  2. British European is not currently one of the official UK ethnicity groups. Presumably the Perinatal Institute authors mean White British.
  3. I thank Professor Gordon Smith for the national stillbirth data.


Standard, Population & Customised fetal size charts 12 – WHO

September 17, 2019

World Health Organisation (WHO) fetal growth charts

Like Intergrowth-21 these are standard charts. They were published in 2017 (click here).

Participants came from ten urban centres, Rosario Argentina, Campinas Brazil, Kinshasa Congo, Copenhagen Denmark, Assiut Egypt, Paris France, Hamburg-Eppendorf Germany, New Delhi India, Bergen Norway and Khon Kaen Thailand. They were all living below 1,500m, aged 18-40, with BMI 18–30kg/m2, a singleton pregnancy, gestational age at entry between 8-13 weeks, no chronic health problems or long-term medication, no environmental or economic constraints likely to impede fetal growth, non-smokers, with no history of recurrent miscarriages, preterm delivery or birth of a baby <2,500g. The ultrasonographer training and scan techniques were carefully standardised and quality controlled.

There were three major differences with Intergrowth-21.  The WHO sample was smaller, only 1,387, compared with Intergrowth’s 4,321, which will have reduced the precision, especially of the outer centiles. WHO revealed the scan measures on the screen as the ultrasonographer placed the calipers, which could have biased the results (click here). Finally WHO used Hadlock’s formulae to estimate fetal weight, which may have affected those results, albeit in uncertain ways (click here).

The authors noted fetal sex differences in size, which others have also observed (click here). Otherwise, with the exception of estimated fetal weight, see below, the final charts were close to those of Intergrowth-21. Here are the two charts for head circumference. Note they are not exactly comparable since Intergrowth-21 left, gives 3rd, 10th, 50th, 90th and 97th centiles. WHO right, gives the 1st, 5th, 10th, 50th, 90th, 95th, and 99th.


And here they are for abdominal circumference


The WHO researchers also published separate estimated fetal weight charts by country, and noted, in alleged contrast to Intergrowth-21, that some country charts differed significantly from the pooled chart. The explanation may partly be that the two groups of researchers used different statistical techniques to calculate the smoothed centile curves*. But there is no dispute that some geographical size differences remain.

The most likely explanation is that neither group managed to completely exclude malnourished women, or those with other environmental constraints on growth, from the populations they studied. Both made a valiant attempt to produce healthy standard charts, but are unlikely to be the last word on the topic (click here).  Observing small differences, even in apparently healthy pregnancies, between some rich and poor countries does not prove the existence of an innate racial, ethnic or national difference on which we should customise.

Nevertheless seven of the WHO study’s twenty two authors see things differently and have gone on to strenuously argue (click here) that their data support the use of different charts for different populations. They provide no detailed prescription for how this should be done.

They wisely don’t support customisation by ethnicity since “Ethnicity, and particularly self-reported ethnicity, is not a straight-forward characteristic of a person or population”.  Nor customisation by country since this is both impossible for the 185 countries for which WHO produced no chart, and makes even less sense than customisation by ethnicity!

The WHO data showed that older women had bigger babies (2–3% per 10 years), that multiparity increased fetal weight by 1–3%, and that maternal height increased it by 1–2% per 10 cm. All three effects more marked on smaller fetuses. Maternal weight also increased fetal weight by 1–1.5% per 10 kg, but that effect was greatest among larger fetuses. Recognising that none of these effects were large and exerted unequally among different weight centiles, the WHO authors accepted that any customization for individual use would be complicated, although “statistical development, growing computer power, and more data accrual should handle it”.

Perhaps so. But customisation requires more than just showing that size differs according to a particular feature. To make customisation useful we need to also show that the strength of the relationship between size and pathology is stronger than that between the feature and pathology. There is a strong relation between smoking and stillbirth (click here) but no-one wants to customise on that! The WHO authors did not even attempt to measure the strength of the relation between age, parity, height or weight to pathology, and compare each with that of size to pathology.

It is difficult to know exactly what in practice the seven WHO authors recommend, and tempting to conclude that they had some other dispute with Intergrowth-21 (see next post).

WHO and Intergrowth-21 are the best two fetal growth standard charts. Since Intergrowth-21 is based on a larger sample, and used better methods for avoiding bias, their charts are marginally to be preferred. It’s a bonus that they are also user-friendly and have been integrated into most of the leading scan software packages.

Next the WHO-Intergrowth plagiarism dispute (click here).

Jim Thornton

* For those who are interested, the Intergrowth-21 authors tested whether the distribution of values for each gestational age was normally distributed. It was, so they created their standards using a statistical technique, fractional polynomials, that required such a distribution. The WHO researchers in contrast used a technique, quantile regression, which required no assumptions about the data distribution.

Standard, Population & Customised fetal size charts 11 – Intergrowth 21

September 16, 2019

Standard charts

In 2008 the Intergrowth-21 group, funded by Bill Gates, produced a series of growth standard charts for fetuses.  Click here for their website, here for the main report and here for their estimated fetal weight standards, which were published separately. According to some historians Intergrowth-21 was originally a spin off from the World Health Organisation (WHO) fetal growth standard group. However the main reports from Intergrowth-21 preceded publication of those from WHO.

Intergrowth-21 collected scan measures from 4,600 healthy fetuses and healthy mothers in eight geographically distinct urban areas, Pelotas Brazil, Turin Italy, Muscat Oman, Oxford UK,
Seattle USA, Shunyi County Beijing China; Central Nagpur India, and Parklands Nairobi Kenya, where environmental, nutritional and social constraints on fetal growth were judged to be minimal. It was called the Fetal Growth Longitudinal Study (FGLS). They chose cities located below 1600 metres, with low levels of pollution. The women had no clinically relevant medical problems, started antenatal care before 14 weeks, had a height ≥153 cm, a body-mass index (BMI) between 18 and 30 kg/m², a haemoglobin concentration ≥110 g/L, and were not on any special diet. This resulted in a group of educated, affluent, clinically-healthy women, with adequate nutritional status, who by definition were at low risk of fetal growth restriction and preterm birth. The Intergrowth-21 group used all the latest scan methods as well as modern techniques to avoid bias (click here).

The authors found little variation by ethnicity. Specifically there were no statistically significant differences between each geographical area and the pooled data from the other seven. The charts also aligned closely with newborn charts from similar healthy populations.

This lack of important size difference between healthy fetuses from different ethnic groups implies that the differences we see every day between such groups are largely a result of environmental and nutritional factors. Once these are removed the differences disappear.  It is strong evidence against customisation by ethnicity.

But it is also the reason why some enthusiasts for customisation push back so strongly against it. They dispute the statistical methods, or point to other standard charts, notably those from WHO which we will discuss tomorrow, showing small differences between geographic groups, as evidence for customisation. But the converse argument does not apply. Finding ethnic differences even in standard charts, is not strong evidence for customisation. There are small differences, e.g between Seattle and Shunyi County, in Intergrowth-21, and between countries in WHO, but the most likely explanation is that neither group of researchers succeeded completely in removing all study participants who had environmental constraints acting on their pregnancy.

The Intergrowth-21 authors concluded that their charts are the single best growth standard chart for use worldwide, and I agree. We use them in Nottingham.

Next (click here) the WHO standard charts.

Jim Thornton

Standard, Population & Customised fetal size charts 10 – estimated fetal weight

September 15, 2019

Another technical digression

There are no customised charts for direct fetal scan measures. They only exist for fetal weight and fundal height. This suits parents and non-experts who, unfamiliar with fetal biometry, may prefer weight to say, the abdominal circumference centile.  But it’s not straightforward for the obstetrician. Fetal weight is tricky to estimate, tricky to chart and tricky to interpret.

Once the baby is born we can weigh it. But until then we have to use formulae based on a combination of head, abdomen and femur measurements. The most popular were developed by Frank Hadlock in the 1980s, in Texas. He studied 276 fetuses each scanned within a week of birth (click here), which is rather few, especially as nearly half were less than 24 weeks. Nor would the details of his scan methods pass muster today (click here), although that is hardly his fault. Scan technology, and ways to reduce bias, have developed considerably since.

Turning weights, whether real or estimated, into a chart is also more complicated than just the choice of “population” or “standard” we discussed a few posts ago (click here). Charts based on babies actually born preterm, systematically underestimate centiles because babies born pre-term tend to be growth restricted. Normal weight babies at say 30 weeks, mostly don’t deliver at 30 weeks, so they can only be measured not weighed.

The figure below (taken from Stirnemann click here) shows the problem. The dotted lines are 3rd, 50th and 97th weight centiles based on babies actually born. The solid lines the same for all babies, including those who don’t deliver preterm. Not much difference at term (black square), but at 28 weeks a baby on the 50th centile of babies born at that gestation (black circle), lies below the 3rd for the whole population.

So how did Stirnemann and his Intergrowth-21 colleagues develop their chart? Their trick was this.

First they scanned a lot of fetuses, including those from unhealthy pregnancies, but using the up-to-date methods for avoiding bias. Some came from the Fetal Growth Longitudinal Study (FGLS) which we will describe later. The rest from an unselected cohort of women including smokers, and those with problem pregnancies, the INTERBIO‐21st Fetal Study. They then measured the birth weights of those babies who happened to be born within a short interval of the scan. There were 2,404 of these. Again they took special care with the birth weight measurements, drying the baby carefully, cutting the cord to a standard length, using a standard cord clamp and a trained person to work specially calibrated random-zero electronic scales to avoid digit preference.  The babies born preterm included many that were not healthy, but that didn’t matter at this stage. They simply used the measurements, with a small correction for the interval between the measurement and the actual birth, to create new formulae for estimating weight from head, abdomen and femur measures. i.e. the same as Hadlock, but with a larger sample, and modern scanning, weighing and bias reducing techniques.

The second stage of the procedure was to apply those formulae to the scan measures from healthy babies in the Fetal Growth Longitudinal Study, i.e. the main Intergrowth-21 population. Those are the weights and centiles that comprise the Intergrowth-21 fetal weight standard charts. Here’s the final result. Left the chart in Stirnemann’s paper and right the nicely printed version for regular use.


Frank Hadlock knew, and the creators of customised charts like GROW also know, that fetuses born preterm are systematically lighter than those destined to be born at term. Both made similar adjustments (click here for Hadlock’s, and here for GROW’s methods). I’m not suggesting that either confuse the weights of babies actually born preterm with those that go on the deliver at term.  However the GROW customised charts from the Perinatal Institute are based on weights estimated using Hadlock’s formulae, which themselves were created from a relatively small number of women in Texas in the 1980s, using the methods and scan machines available then.

Another issue with using ultrasound-based weight estimates to manage pregnancy is that they may not be the best predictors of babies who are likely to die, be brain damaged or unable to withstand the stress of labour. Expert interpretation of the individual measures from which the weight was estimated is often better. For example, if a fetus with a small abdominal circumference has relatively long legs or a big head, it will be heavier, but more malnourished.

Finally, since customised direct fetal measure charts don’t exist, obstetricians who use customised weight charts have to either ignore the individual components on which the weight was estimated, or use a non-customised chart to interpret those. At the very least a source of avoidable confusion.

Antenatal assessment of fetal compromise is complicated. That’s why we have fetal medicine specialists. Combining multiple size measures into a single estimate of weight may please parents, but is a potentially misleading simplification.

This is the last technical digression. Now we are ready to look in detail at the main modern charts, Intergrowth-21, WHO and the main customised chart, GROW. Tomorrow Intergrowth-21 (click here)

Jim Thornton

%d bloggers like this: