Workshop on The "Gold Standard" Debate

U.S. Department of State Third Annual Conference on Program Evaluation - Methods Track
Washington, DC
June 9, 2010

DR. SILVER: I would like to welcome everybody to this session. I think the description was intentionally a little bit vague about what this session would be. But before I start, I'd like to introduce our team of people that I'm here with.

We are from ECA Office of Policy and Evaluation, Evaluation Division. It's a mouthful. I'm here with my colleagues James Alexander, Jamie Shambaugh, and Indhu Sekar. And we are all evaluation officers in the evaluation division.

Why did we propose to do this session on the gold standard. I think for people who are within the field of evaluation, in particular, this is an issue that has been bubbling over a long period of time. And then also, we now have a much broader discussion of evaluation within the context of the U.S. Government as a whole and within the State Department as well.

So one of the reasons that I thought it would be important to have this kind of discussion, basically a methodological discussion, is to really begin to unpack some of the issues that are driven �'�' that have driven this kind of an argument.

Before I start, I want to ask �'�' I also �'�' I want to let you guys know that this is not going to be kind of the standard talking head session. What I was hoping to do �'�' and we'll see if it will work �'�' was to really have �'�' to engage in a real discussion of the issues.

I'm going �'�' what's �'�' the way I have this planned is that I'm going to talk a little bit about what the history of this gold standard debate has been, outline a couple of the key issues that I think are important, and then we're going to do something really radical and we're going to break into groups. And we've come up with a number of kind just guiding questions for the groups to stimulate your discussion about the larger issue.

And we're not going to �'�' unfortunately, we don't really have enough time to come back together to kind of debrief on what the individual groups discussed in each group. But I'm hoping to be able to move around somewhat and kind of cede from one group to another so that you have a sense of what each group is really discussing and what kinds of issue they're exchanging information on.

So it's partially just to kind of set the scene and then to allow you guys to really talk about it amongst yourself. There's a huge amount of expertise, I know, in this room. And this is a chance to really tap into it to exchange experiences, to exchange ideas about ways to address issues that you may come into �'�' come against.

Let me just get a hand raise of who feels like they understand exactly what the gold standard debate is about. Okay, so there's a lot of people who are a little bit unclear about maybe really what the whole thing was supposed to be geared about �'�' talk about.

So let me talk a little bit about some �'�' kind of a little historical background of how we got to where we are today where there is this term, the gold standard debate.

And I also want to really emphasize that the goal of this is not �'�' the goal of this discussion is not to come down on one side or another, but the goal is to kind of unpack the issues that this kind of raises within the evaluation community and I think it's an important area to think about even if you're not directly involved in evaluation.

Because you need to know something about �'�' everybody's either a consumer of evaluation information and you should understand more about where it's coming from. But it often influences, we hope, policy in one direction or another. So this is a chance to kind of understand one of the issues that's driving the field.

Basically, the question of what is a gold standard for a methodology in evaluation kind of traces it back �'�' traces the history back to the wars in the 70s and 80s over what type of knowledge do we really want to have. Do we want to have quantitative knowledge or qualitative knowledge?

And these debates went back and forth between academics mainly and those within the field of practice of evaluation. And the question really was, what kind of knowledge do we really need, do we want to get out of evaluation.

And that's kind of the epistemology that underlies the whole discussion. Do you think that you can assign a number to every kind of human endeavor and measure that and is that the goal of evaluation? Or are there other kinds of knowledge, other methodologies for developing knowledge that can be applied in the evaluation �'�' in the field of evaluation.

And over the years, I think the field has matured and a wide range of methodologies have become acceptable. And in fact, there's a strong movement towards a mix of methodologies as the kind of gold standard to evaluation.

But I think, as individuals within the field of evaluation, everybody comes to it with their own particular point of view and their own particular skill sets. What has happened in the recent history is there has been a number of organizations that have entered into the debate with a particular methodological perspective.

So instead of having, you know, kind of individuals who had a particular point of view discussing which is the, you know, which is the proper way to do, what's the proper methodology for evaluation, we now have organizations that are pushing a particular perspective.

And one of the instigation �'�' probably one of the instigating events to this �'�' the modern debate about the gold standards was the Department of Education's basic call for scientifically-based evaluation methods. But when they called for scientifically-based evaluation methods, that for the Department of Education meant one methodology, randomized controlled trials. So from now on, when I say RCTs, I'm referring to randomized controlled trials.

And that was kind of really the starting point for the more modern, the more up-to-date debate that's been going on. When that �'�' when the Department of Education came out with a statement and a policy about what they were going to use in evaluation, it came at a time when there were other organizations of evaluators who felt that it was time to kind of come together and make a statement in terms of what their beliefs.

And so the American Evaluation Association, which up to that point in time had been fairly non-political and not engaged with the government at all, then put forth �'�' gathered groups within the organization and came to a conclusion that they needed to push back against this idea.

So the American Evaluation Association sent a letter of basically dissent saying that as the professional field of evaluators, that the idea that there was one methodology that would be valued above all others was not the standard for doing evaluation that AEA was supporting.

But of course, like a lot of organizations, you know, if there is a thousand people in the organization, you get fifteen hundred opinions about it. So there was even a group within AEA that dissented from AEA and said that they were supporting that as the only methodology that should be used.

At the same time, there was a growth of, in some of the think tanks and non-governmental organization, in particular, the Center For Global Development, that was looking at the effectiveness of aid and foreign policy. And their perspective was also that, in fact, we should be putting our evaluation money through one perspective, one type of evaluation, one methodology, RCTs.

Out of the Center for Global Development's work, they actually created another organization to explicitly fund only evaluations that were done using RCTs. So we now have a core of people that's all that they �'�' that's the only thing that they're trying to look at. And the idea being that RCTs will provide a stronger basis of information for policymakers is the argument behind that.

At the same time, there was a growth of NGO organizations bringing NGOs together, such as NONIE, which I forget what that actually means.

A PARTICIPANT: (Inaudible.)

DR. SILVER: Okay. Networks of Networks for Impact Evaluation, which is made up of NGOs that also was engaging with the Center for Global Development to see whether or not �'�' to question whether or not that RCTs should be given primary focus. So the point here is that there is a lot of discussing being engaged with a lot of stakeholders at different levels.

There was a conference in Cairo on impact assessment where a lot of the parties came together, and this was primary in their debate as well.

And then lastly from the academic side, the poverty action lab at MIT has been very active over the last five years in developing and using RCTs as the primary model in developing countries. So we have a fairly dynamic and sophisticated academic use of RCTs that is going on in developing countries, which we didn't have very much before.

So these things are all kind of swarming in the atmosphere and have taken �'�' we've adopted the new media as well. So these discussions are taking place on blogs, on tweets. I get regular tweets about this and even in social networking sites like LinkedIn where people put out just a question and, boy, you see the responses come in from all different perspectives. And that's, you know, it's a professional association.

So there is a lot of expertise out there with perspectives about, you know, how do you choose the most appropriate methodology for your evaluation and what criteria should you be using for choosing those methodologies.

So that's kind of the �'�' that's where �'�' that's how we've gotten to this point here of a debate within the field of evaluation as it links into people who implement developed foreign policy projects about how we do �'�' how we will move forward to do our work, and at the same time a real push from the government, from the very highest levels, of we know that we want evaluation out there. We know that we need information to make decisions, but without specifying exactly how that should happen.

So it's up to us within the field of evaluation to really be aware of what the possibilities are and to �'�' and this is my own personal perspective �'�' to bring the widest possible toolbox to the question.

Some of �'�' I wanted to talk just a little bit �'�' I've already gone over the amount of time, but I just want to lay out just a few of the key issues that kind of drive the debate.

And I talked a little bit about the first one already. It's the �'�' how you value types of knowledge. Which types of knowledge do �'�' are most valued by policy makers, in particular, because they are one of the main things that we are looking to influence. And the question always �'�' we know the attractiveness of one number, you know, being able to provide a policymaker with one number that states what's going on.

So, you know, so the question comes, is that the kind of knowledge we want to produce within evaluation. There is a question of does an experiment, which is basically what RCTs are, is that the only way to produce credible knowledge and how do we make credible knowledge that will be accepted by policymakers that perhaps doesn't �'�' isn't necessarily so easy to understand.

There is a question of program fidelity when you get down to methodologies. Is it feasible to use a method that assumes that all programs are implemented in the same way across sites and what implications does that have for choosing a methodology?

I mentioned earlier the idea of mixed methods, that there is a certain value given just to the idea that you, in fact, use mixed methods and integrate different types of knowledge.

There is a question of the diversity within the field of evaluators. As I mentioned earlier, every individual has their own particular point of view and their own set of skills. And how do we work within the field to ensure that we have the right skills and are able to �'�' if we don't have the skills to do one particular type of evaluation we know where to find those skills.

So there is a question about the diversity within the field, which is �'�' and this is happening within a context where there is this very strong push for certification of evaluators. So what �'�' when we start talking about certification of evaluators, certified in what and what do we expect them to be able to do?

One thing that �'�' one criticism of RCTs has always been that they are not sensitive to culture, that they treat the context as you control for the context. And within development and foreign policy, we all know that culture blooms quite large and has a large influence on what happens.

So, you know, are we adequately meeting that need in the field that we're working �'�' that that evaluation is happening in. Are the methodologies we choose going to adequately reflect that.

I'm going to stop there and ask for questions before I try and break you up into groups. By �'�' my goal in this first part was just kind of to set the perspective of what is happening at this moment. Yeah?

QUESTION: Hi. Does your department have a position on this and if so, what is it?

DR. SILVER: Yeah, we do, and our position is basically the gold standard is whatever is the most appropriate method for the questions. And that's, you know, that's as far as we would go for a gold standard.

What we �'�' okay. So other questions? Comments? Things that you would �'�' anything that you would like for the larger group to consider when they're having their discussion? Please.

QUESTION: You did a wonderful job. I was just going to add, and maybe it was implicit in what you were saying, but I think there is also pressure from the World Bank. There is a lot of economists there who believe that impact evaluations are the only way to go.

And just my perspective, I'm a GW so I'm a couple of blocks from the World Bank and work with them and so on. And go �'�' I was just there last week. Anyway, it's in the air, it's in the water. It's like �'�' and I'm just not sure if you care or we care, but I just thought �'�' I would just add into all of the good things you were talking about.

DR. SILVER: And there �'�' actually, there is one other point that I wanted to make and it's a little bit related to that to, it's the issue that we don't like to talk about, which is the money, and, you know, where the money goes and what gets priority over, you know, something else. So �'�' and I think that it's important to discuss that because you do have to be aware of who the players are that fund evaluation and how they get funded because that is part of the discussion that's out there.

Other comments? Yes.

QUESTION: You mentioned the issue of culture and it not addressing the diversity of different cultures. In the UK, there is a sort of ethical issue around RCTs that plays a huge part in why we can't use them for a lot of different work. And I was just interested yesterday and about child labor and the use of RCTs. And we would never get away with that in a million years in the UK.


QUESTION: So there is also an issue if you're using this methodology for foreign policy and development issues in other countries, it may also have cross-cultural issues as well.

DR. SILVER: Yes. The ethical issue of �'�' one of the things that randomized control trials depend on is randomization. And there is an issue out there of the ethical use of randomization to either apply or deny an effort to a particular group of people or to a particular person. So it is definitely an issue out there.

There are ways, there are different ways to get around that issue. And one of the items �'�' if people were here this morning and heard Dr. Levine talk, which is this idea of, you know, rolling implementation where you can compare groups of early implementers against late implementers and get the kind of comparisons, but it's hard to do. It's very hard to do.

Okay. Oh, another question.

QUESTION: Could you tell me to what extent how the information or how you find this information dictate what or how you think this issue about what RCTs (inaudible).

DR. SILVER: Well, I mentioned that earlier. We �'�' that attraction of the number for a policymaker, that there sometimes is a tradeoff between giving a policymaker an easy, simple answer or a more complex nuanced answer. And sometimes when you know who your stakeholders are going to be and who you're trying to gear the evaluation towards, it may, in fact, influence the methodologies.

And that's like �'�' that's a very good question to, perhaps, discuss in the groups in terms of what your experiences have been in that kind of �'�' in that situation, that particular situation. Okay.

QUESTION: I'm curious also your position on certification of evaluators or (inaudible). USAID had �'�' in the past it's had a certificate course on evaluation. So it's like a certification, internal certification, and (inaudible) had no �'�' there was nothing in it about RCTs and �'�' very little. It was designed �'�'

DR. SILVER: That wasn't even �'�' yeah. Yeah. That wasn't even in our aspiration.

QUESTION: Yeah. I guess �'�' I mean, I, as an evaluator, I don't do RCTs myself, but I feel as (inaudible) have a knowledge of RCTs.

DR. SILVER: Absolutely.

QUESTION: And a vehicle to explain what they are (inaudible) interact with (inaudible). And �'�' but I know that's not shared by some evaluators who use RCTs. That's somebody else. I don't do RCTs. (Inaudible).

DR. SILVER: Well, I think your question has as much to do about USAID and their history with evaluation as it does the question of RCTs or not RCTs.

QUESTION: Yeah. I just wanted to say that I took the course as well and I think there was a lot of talk about ramification and quasi-experimentalism because I think a lot of the development work �'�' you know, I think RCTs are very, very effective.

I used to work with clinical trials in Family Health International. They're really good when you have a very precise direction. It lends itself more to that, quasi-experimental designs when you're randomizing villages or (inaudible) and things like that. We do discuss that have certainly done it in a lot of our programs to evaluate.

So I would say that, you know, it is addressed, but it may not be presented as experimental, pure experimental design. It lends itself a little bit more readily to the medical field, but it can be used in social settings. But to truly randomize, it's difficult, given geographical locations where public health programs are being implemented.

A PARTICIPANT: Really quickly. If you don't know about it, GAO came out with a report that looked at this. It came out in December. It's excellent. You can get it free online if you go to the Government Accountability Office. And in the title is something like evidence.

But the point was, Congress said what is this stuff about this coalition for evidence-based policy and this RCTs and what's the deal here. And so they �'�' and, you know, is this the only way to get evidence. And so GAO did the study. And it's good they also looked not only at the coalition, but at a variety of our federal agencies that have websites that talk about all of this.

Anyway, it's just superb if you want to get up to, you know, sort of to supply �'�' to just supplement what you're talking about. Anyway, it was in December and it has, like, strengthening evidence in it, but it really was because Congress �'�' some people on the Hill said what.

DR. SILVER: In relation to what you're saying, too, I think there is an important point to make in terms of that toolbox that, you know, as an evaluator, how broad is your toolbox and what do you bring to it. And I think that's the issue that �'�' actually, the Canadian Evaluation Association right now is working on a certification program for evaluators.

And AEA �'�' the American Association has stepped back from that. They kind of don't want to engage in that at this point in time. I suspect they will in the future, but the Canadians are actually taking that on in terms of what an actual certification would look like and what would have to be part of it.

A PARTICIPANT: (Inaudible.)

DR. SILVER: Are they already doing it?

A PARTICIPANT: I thought that their conference, which they just had in Vancouver, that they ruled it out.


A PARTICIPANT: I was not there.

A PARTICIPANT: It's on their website. They are working on that.

DR. SILVER: That's still under �'�' okay. I have a suggestion from my colleagues that we not break into a small group and continue the discussion in the larger group, which �'�' okay. I'm seeing �'�' and we only have 15 minutes left. So it wouldn't be very useful.

A PARTICIPANT: Well, to some extent, the debate and evaluation mirrors the debate that's been going on (inaudible) sciences for a long time when I started out in graduate school where they're doing lots of statistics and when I graduated, those modern things had come upon us and they were all determined to answer the question why and a more nuanced approach.

And I think Virginia is actually right, that it really has to depend upon the question. I think the gold standard makes sense if you have a question is this beneficial or not. But if your question is more nuanced, how does this work and why, then I really think the qualitative evidence is going to be the most important that you're going to gather.

DR. SILVER: James.

DR. ALEXANDER: I just want to sort of ask the question for the general audience here because what I've heard so far has been kind of okay, well, we should be open, we should be using the most appropriate methodology at the appropriate time.

Our �'�' I would like to hear some sort of defense of the RCT here. I mean, are there people here that are using them and they're finding them in their work, in all kinds of different environments to be successful and finding information that work well for their �'�' the policy stakeholders as well as perhaps meeting some of the ethical concerns that were raised down the road here.

A PARTICIPANT: Yeah. I work in Colombia, South America, and I know in Latin America, the RCTs have been used for evaluating condition across transfers programs.

And in the case of Colombia, in particular, it has been very, very useful at identifying to which groups the program has been effective and which type of interventions, in terms of the amount of the subsidy, the type of conditionalities and how it works different in urban rather than rural areas.

So definitely in the Colombia case. And as �'�' I don't know her name, but as she said, since it's a very, like, I mean, specific intervention and you know what the treatment is, I think that's one of the main reasons why it has been very, very useful.

DR. SILVER: So you were able to deal with the question of messiness of a lot of the development interventions by �'�' because yours was a focused intervention.


DR. SILVER: You had specific doses that you were doing that allowed �'�'

A PARTICIPANT: Right. And actually like the first �'�' like it was like six years ago, it started in small municipalities and one of the main results is that it was not having a big effect on �'�' like in urban �'�' quasi-urban areas.

So right now the program is expanding in large cities. And they actually �'�' based on those results, they redesigned the intervention in urban areas and they're actually experimenting in different types of subsidies for secondary school children who, I mean, need like a different type of intervention.

So yeah, I mean, I think it's �'�' I know it's not like a silver bullet for every type of intervention, but for a condition across transfers, it has been very, very useful.

A PARTICIPANT: I used to work at Family Health International and they do a lot of experimental and quasi-experimental designs, but experimental randomized clinical trials would compare standard IUDs versus a new IUD that they were going to be trying to introduce, like a copper T IUD.

So the RTCs (sic) would be able to produce good statistics so we could find which group fared better as far as failure rates, and then we could compare them and say okay, the new IUD is as good as the old as far as its failure rates, and therefore we can introduce it, or it might even be better. But we needed, you know, thousands of women in the Phase III trials.

DR. SILVER: So basically you were comparing two different kinds of treatments.

A PARTICIPANT: Right. And they were �'�' you know, we had precise start dates and end dates and we had the months and we had life tables to be able to, you know, do all that wonderful stats and we needed the statistics, we needed the precision, and we needed the numbers, we needed the sample sizes to be able to prove, statistically, which is �'�' are they as good or better than the standard treatment.

A PARTICIPANT: Well, I think the one thing we have to remember about the numbers also is that policymakers like to see numbers. So the thing is �'�' an interesting thing is going on with qualitative evidence now. There is a lot of effort to quantify it with key word analysis, software and all of that. And I think that that's good actually. I mean, now we have statistics that measure surprisingness of findings and all sorts of other things.

And I think that's sort of �'�' that sort of thing is actually good because our problem is we have to convince policymakers that the results that we're coming up with are rigorous and that they have some value. And if the only way to do it is to come up with a quantification, then, yeah, I think it's good to let them �'�' to see how far you can quantify your qualitative evidence.

DR. SILVER: Let me �'�' can I throw a question out to the group here since we're not �'�' we passed around the questions, but one of the �'�' to move away a little bit from the yes RCT, no RCT question, is in your evaluation practice, how have you chosen the appropriate methodology for your questions.

Is �'�' can anybody �'�' is there anybody who would like to kind of speak to �'�' what were the thought processes, what were the criteria for actually choosing what was the appropriate methodology?

A PARTICIPANT: I'll make a statement. So I had worked at USAID for 14 years when I left and became an independent consultant. And I got hired to work on an evaluation and showed up kind of the first day and they said, "Oh, good, Cindy, you can help us with the methods." And I said, "No, you don't understand. That whole discussion is over."

By the time you had funded this time, you chose who was going to be on it, they were limited by resources by the amount of time they could spend in the few countries that they could spend it in. And you kind of got down and it was something that was already going and it was down to, there really aren't any very interesting methodological questions that are left for you now.

That needed to have been addressed perhaps in the design of the program and maybe even in the scope of work of this before you got the team to show up on the first day. So I just shared the outside challenges.

DR. SILVER: I think it's more common than we expected.

A PARTICIPANT: First my background is evaluator and (inaudible) slash analyst. Years ago we were doing a review of world economists that (inaudible) on refugee conditions, (inaudible) conditions. And we had to decide, you know, which countries (inaudible). We got the specific conditions (inaudible) had been so broad that we decided to take a look at the issue on economy, on food deliver, and several other areas.

We bit into the big job and we had to first decide what was the most important question. And then in terms of collecting data, we were trying to (inaudible) qualitative data in terms of our observations, what we recorded, and also quantitative in terms of delivery of services (inaudible).

It was a daunting challenge because we spent a lot of time planning more so than we normally would because it was a challenging issue. I'm glad that we did (inaudible). The arguments about what we were going to do about (inaudible).

DR. SILVER: But so you ended up though with basically a mixed method approach �'�'


DR. SILVER: �'�' to be able to answer the questions that you wanted to answer.

A PARTICIPANT: You know, I agree that the hardest part �'�' I think everybody in the room would agree �'�' is really identifying the key questions and indicators because methodology flows from that. But I'm thinking of an example that so often, in the case of health, it's a little bit easier, it's a little more cut and dry, things like democracy building or conflict prevention.

You know, sometimes �'�' the analogy we use sometimes in-house is that, you know, this specific intervention might be like a drop in the ocean and then all of a sudden you're asked to measure the ocean. You know, that's it's kind of crazy.

I'm thinking years ago we did something. It was this community action Investment program of Central Asia. It was an experimental design. And to make a long story short, it's like, okay, yes, these interventions were sort of helping people raise �'�' sort of reduce potential somewhat for conflict, interethnic conflict. But the causes for that inter-ethnic tension were so much beyond the actual program.

So the experimental design, you know, it didn't really �'�' you were kind of left very unsatisfied with �'�' you know, was the program a good thing? Yes. You couldn't say it was good. Was it really �'�' is it going to solve the problem? No. You know, so you �'�' there's no real sort of ah-ah moment in these things.

DR. SILVER: Well, it was kind of taking a very sophisticated telescope to look at the problem and it's missing the bigger picture around it.

A PARTICIPANT: And it was rigorous. I mean, this was like longitudinal, it was quantitative and qualitative, you know, over three years, five data collection periods, four countries.

You know, it was really rigorous, but I have to say at the end of the day, you know, conflict prevention in these areas, you know, it's very deep. So can this, you know, and we were completely free to shape it because the funder wanted, you know, to get at certain things.


A PARTICIPANT: Just a sort of a meta observation. It seems that in the academic community and among many of the organizations that have been most active in the field of impact evaluation and other kinds of evaluation throughout the years, there has been this sort of transition from we're not doing a whole lot to randomize control as the gold standard to now the kind of �'�' I guess I could sort of identify it as a pseudo consensus emerging among the folks who have spoken in the room.

And I wonder aloud whether that is a sort of necessary, almost teleology among organizations that they have to go through this stage of very, very explicit distinction between this is rigorous, this is not rigorous before they can then take a step back from that and, you know, then staff now that they've been trained and sort of identifying what necessarily is an RCT and will be accepted by the community, what other kinds of methods might also be appropriate.

DR. SILVER: So we should thank those institutions that have been pushing that agenda because it's �'�' no. I'm actually only half joking because it has forced a lot of us, I think, to reexamine our own perspectives on what is knowledge and how does one go about it and what are the appropriate �'�' what is the real range of methodologies that are available to us out there depending on the question.

So I actually I agree with you. I think it has �'�' and to some �'�' I don't know whether it's a necessary process, but it is, in fact, the process that I think has been happening.

A PARTICIPANT: I should point out when I say I wonder aloud, I actually do wonder. That's not a pointed comment.


DR. SILVER: No, it's very good. It's very good.

A PARTICIPANT: I agree with what you just said and I was going to say, I don't know how many of you were aware, but there was a bill that was introduced �'�' it didn't go anywhere so (making a noise) �'�' but that every single program had to have evaluations that had to have six elements. And what worries me about that is that this is �'�' it's �'�' everybody in this room knows how complex evaluation is. We all know. It's not just in development. Okay?

But for people who don't really understand research methodology to think, oh, if there is just six elements and then we can be certain it's good evidence, oh, that's great. Why don't we just make sure that all evaluations do that and then we don't have to worry about evidence anymore. It simplifies it so much that it's scary.

DR. SILVER: Well, one of the things that came out of the Department of Ed efforts was what works clearing house. And basically by their standard, what works was only RCTs so that it gave priority to that methodology above and beyond any other.

So you never even got to see what the other knowledge was that was being developed, which I find it a kind of pretty egregious use of the gold standard because it meant that those, that the knowledge built from those evaluations did not get viewed at all. It wasn't even that it was devalued. It just wasn't seen at all because the only thing that went into the what works clearing house was that �'�' was on that basis.

A PARTICIPANT: I just, on the point that you had made about �'�' because I think that this is one of the big challenges that we have. Sometimes these very lofty goals and then we have these little programs, and then people want to see how that achieved that goal or it didn't.

But the one �'�' and this, you know, gets back to I think that if people do incorporate evaluation more and more as a matter of course in the design of a program and so on, it may actually have a profound impact on the kinds of programs one does, because when one says �'�' especially in budget constrained environments, but we should �'�' you know, even if budgets aren't constrained, we should be using our money wisely.

When you talk about a program like that where you say yes, it was good, there is not really evidence it changed anything significantly. If it hadn't been done, we can't really say that it would have really had a bad �'�' you know, it didn't have any negative impacts, but we can't really say that the world would have changed very much at all if we hadn't done it.

And let's imagine, I don't know how much that project's total cost was, but I mean, maybe you can tell us. But let's say it was at least a million dollars, right, because most projects sooner or later, you know, when all this had �'�'

A PARTICIPANT: The evaluations or �'�'

A PARTICIPANT: No-no, the entire project.

A PARTICIPANT: It was huge. So (inaudible).

A PARTICIPANT: So twenty million dollars or a hundred million?

A PARTICIPANT: There were many different parts of it and (inaudible).

A PARTICIPANT: Right. But let's say just a rough range. A rough �'�' I mean, like 50 million? I mean, can you �'�'

A PARTICIPANT: 50 million.

A PARTICIPANT: Okay. Let's say it was �'�'

A PARTICIPANT: But it was a big program.

A PARTICIPANT: Okay. So let's say it was $50 million and suddenly now let's say that there are 10 programs out there like that that are each $50 million and are having negligible impact. Now you're saying you're spending, you know, half a billion dollars and you don't really �'�' you can't really say that you're having much of an impact.

And that's where I think this becomes important. It's like if you're looking at one little program, you say, well, you know, we spent $5 million. Well, you know, it was a nice program, but we can't really �'�' you know, it didn't really get us much of anything that we really care about.

Once you start to multiply that out across all the government spending, you start to, I think, want to know a little bit more. And whether that means that we need to think more carefully about the kinds of interventions or what actually will have an impact or if we're not sure that it will have an impact, maybe we shouldn't do it and maybe we should go someplace else where we feel it should have some more of an impact. Or, you know, do this in a way that would have an impact.

DR. SILVER: Well, I think that speaks directly to the use of evaluation as well. If it was an ideal world, we would hope that all of our evaluations feed into those decisions. And I think the more that they do, the more incumbent �'�' the more it pushes the evaluation field to be rigorous in whatever form that takes, to really provide that information.

I'm afraid our time is up. I would like to really thank the group. I'm sorry we didn't end up breaking into smaller groups because I hope that everybody who had something to say got a chance to say it.

Our e-mails are on the bottom of the questions. So if you have other questions or you want to talk more about it to the evaluation geek, please feel free to e-mail us. And thank you very much.