Evaluation

“Hammer in search of nails”

So called “rigorous impact assessments” are based on experimental designs pretty much like those used for pharmaceutical testing. They rely on randomised control trials (RCTs). In an interview with Hans Dembowski, Jim Rugh, a senior evaluator, discussed the need for more holistic approaches.

Interview with Jim Rugh

by Jim Rugh

22.06.2012

Rigorous impact evaluations that rely on RCTs have recently been promoted by several influential parties. The most prominent proponents are probably Abhijit Banerjee and Esther Duflo from the Poverty Action Lab at the Massachusetts Institute of Technology. What are the merits of their approach?
Banerjee, Duflo and others claim that the purpose of RCTs is to more rigorously evaluate the impact of programmes. One way to answer the question of what would have happened without an intervention is to compare data from individuals or communities where a development programme is being implemented, with data from other individuals or communities, where there was no intervention. That’s the definition of a counterfactual assessment. However, in most international development programmes, it is difficult to split target groups at random into “treatment” and “control” groups. And, indeed, it is often unethical to do so. That is the case, for instance if one knows beforehand that an intervention is likely to massively reduce suffering.

Obviously, you are uncomfortable with the approach. What is wrong with it?
There is nothing wrong with the approach itself, but there is something wrong with insisting that this is the only way to assess the results of aid and development cooperation. There are several reasons. Consider the major assumptions underlying the use of RCTs. They are based on a simplistic paradigm of cause-and-effect that will work no matter what the context. Though it’s not often acknowledged, they are based on a search for a “silver bullet” – one intervention that, by itself, is sufficient to produce measurable outcomes.

Please give an example.
One example of an RCT I’ve seen “proved” that deworming of children was the most cost-effective way to enhance school enrolment. While it may well be true that helping children to be healthier increases the chances of their attending school, consider all the other interventions or pre-conditions that must be in place to enable children to not only attend classes but to obtain a quality education. Most programmes need to address multiple causes of problems like that. Seldom, if ever, are situations that simplistic, nor are cookie-cutter or blueprint solutions applicable everywhere. Yet millions of euros are being spent on supposedly rigorous research designs that focus narrowly on such simplistic interventions. It’s like using an expensive microscope when a wide-angle lens would be more appropriate. For some important reform efforts, the microscope approach does not make sense.

What are you thinking of?
Well, consider judicial reform. The rule of law matters for several reasons. Contract enforcement is good for business, law enforcement is good for human rights and a strong judiciary can contribute to keeping a check on corruption and government action in general. However, you cannot measure judicial performance in statistical terms, because the quality of judgements is at least as important as their number. More generally speaking, many development programmes are meant to bring about good governance for instance, or the promotion of human rights, or other laudable goals which are hard to measure and quantitatively compare with other countries or contexts in ways that are directly attributable to any particular agency’s specific interventions.

So other evaluation approaches have not become obsolete?
No, absolutely not. Given the complexity of real-world contexts and programmes, we cannot afford to limit ourselves to a single evaluation design or approach. That would be like learning to use a hammer and then going around in search of nails. We have to work the other way around. Professional evaluators need larger toolkits that suit various scenarios. And they need the skills and wisdom to know what combination of tools is appropriate for any specific purpose.

Some argue that, far too often, agencies evaluate their work themselves, so their self-interest marks the reports they produce. Moreover, their data does not always look very reliable.
Well, there are two extremes to be avoided. On one end of the continuum can be the self-serving self-evaluation wherein the project implementers only want to make themselves look successful so they can get more funding. But the solution is not to go to the other extreme of an externally imposed evaluation design which narrowly focuses on a simplistic cause-effect relationship but does not adequately assess how a complex programme was implemented. There are ideal mixes of evaluation methodologies that combine objective accountability for achieving results but also promote formative learning and improvements in practices. But another way to respond to your question is to acknowledge that while it is true that evaluators face many serious constraints, what we try to point out in the RealWorld Evaluation book and workshops based on that approach is that there are methods for conducting adequately reliable, credible and useful evaluations in spite of those constraints.

How could development agencies facilitate more meaningful
evaluations?
Well, it would certainly help if they developed life-of-project evaluation plans right from the start, including some form of relevant baseline to document initial conditions. When an agency designs a programme, it should gather and document relevant data early on, ensuring that the evaluation team will have some kind of baseline against which to measure change at a later stage. It would similarly help if interventions were based on logical models that showed what higher-level outcomes and impacts are expected, and what kind of evidence will show that such outcomes or impacts were achieved or not – and for what reason.

What about controlling the agencies’ self interest?
This issue is being tackled. The need for external evaluators’ independence is generally accepted, at least in theory. One way to improve matters is to have evaluation departments report directly to the top management or even board, bypassing the desks that plan and run projects and programmes. Another way is to rely on external evaluators, provided they are adequately qualified and have enough business not to depend on a single agency for consultancy contracts. Otherwise, they are not free to be objective and even critical, but will be tempted to somehow prove success in order to get the next assignment. I understand that your government is taking an interesting approach by setting up an independent evaluation agency instead of relying only on the reports the implementing agencies come up with themselves. Finally, it would make sense to rely more on experts from developing countries who understand their countries’ history and know what other agencies are doing. There is a database of voluntary organisations of professional evaluators that can be contacted through the www.IOCE.net website.

To some extent, evaluation seems to be a donor obsession.
The interest in evaluation by governments of developing countries is certainly growing. In the past, many African leaders, for instance, believed their personal understanding of their country and programmes was all they needed. My impression is that more and more national governments are realising that they would benefit from a more profound knowledge of what impacts policies and programmes are having. This growing interest and capacity deserves to be encouraged. That’s an important goal of the EvalPartners Initiative – check out our website:
http://www.mymande.org/evalpartners