Workshop on Trade Capacity Building Since the Launch of Doha Negotiations: A Systematic Model for Monitoring and Evaluation
Before we start, I want to thank Stephanie for pulling this conference together. I guess this is the second? Third?
MS. PATHAK: Third, yes. And inviting us, and giving us the opportunity to showcase this. And this is, I think, the first time this evaluation is even being presented to any kind of audience, other than a very select group of people, and it’s still ongoing.
The reason we picked this one to showcase and talk about is the relevance that it has for what is kind of the theme of the day, the whole of government and all donor-coordinated efforts to look at a big group of activities funded by different people. Although the evaluation started out that way and then focused on the IAD piece, IAD was the major player in trade capacity building. So, I hope you will have a lot of lessons learned from this, and we do too, as we go into phase three of this evaluation.
Before I introduce the panel, let me �'�' I don’t know how many of you are familiar with IAD, and the evaluation, et cetera, so I thought I will just run a very, very brief introduction.
We, as a development agency, always were focused on evaluations since our inception in the 1960s, with a very high focus, in the initial days, on using state-of-the-art and trying to evaluate the effectiveness. In fact, we did a lot of expose work in those days, the impact evaluations, and we were one of the first ones to do that, and to use it effectively for programming decisions.
But in the 1980s, in the 1990s, things started to change, re-engineering, and so on and so forth, and things needed to be decision-driven, so evaluations sort of started to take the back burner. Monitoring was more the word of the day, and that’s kind of how it stayed over the �'�' since then, with some blips in the middle with evaluation raising its head.
At one point, evaluations were mandatory for projects. And that, again, created a problem because then they became very routine, and not meaningful. And now, we �'�' through the last three administrators, we have gone to elevating it, dropping it, elevating it again, and now we do have a central office.
Ruth Levine is the head of the central evaluation office, and I think she is here at the conference. She is going to talk tomorrow. And she is from the Center for Global Development, and is charged with revitalizing evaluations to look at effectiveness of social programs as sort of an area that those of you in development know is up and coming, in terms of these impact evaluations.
Now, the �'�' our bureau covers a wide range of sectors, the economic, growth, agriculture, and trade. And so, as the interest in evaluations started to come up, one of the things we started doing was getting a sense for whether our staff that often helped design a lot of mission programs even understand these things in a development focus.
And we found �'�' in fact, we had MSI design a couple of mini-courses, and found that even things like causal logic, program logic, doing the causal change, doing the monitoring at that level, thinking it through was a weak link because AID (phonetic) had moved away from these things when we went into the re-engineering.
And so, what ECAT Bureau has done has put a heavy focus on training it staff, especially the new foreign service officers, as well as the existing staff, on this kind of causal logic monitoring evaluation, how it all fits together, why it’s important to look at these things during the design phase.
And this evaluation here is renewed �'�' is an example of one of the renewed focuses of our bureau. We have several big evaluations we have done over the past two years. And the trade capacity one is �'�' looks, as Brinton and Molly will explain further �'�' looks at the results and impact of our agency trade capacity programs, and it has a lot of relevance in the international community. Trade capacity building, as a whole, is being taken up by WTO, and as the WTO OECD (phonetic) initiative on how to do these things. So this turned out to be very timely, in a way, in forming that process.
Now, let me just hand it over to Brinton. Stephanie introduced Brinton and Molly, and Brinton has been with AID, and he oversaw the design and execution of this evaluation. And Molly Hageboeck led the study over the past couple of years. And with that, I will stop and hand it over to you.
[ View slide presentation - Bohling ]
[ View slide presentation - Hageboeck ]
MR. BOHLING: Thanks, Bhavani. As Bhavani mentioned and Stephanie mentioned, I work for the U.S. AID’s Washington technical bureau, which is the economic growth, agriculture, and trade bureau. And, as sort of a centralized pillar, we try to collect kind of best practices, and try to advise the field on best �'�' the best way to implement programs, or try to be a resource for them.
And we are �'�' the trade team sort of initiated this evaluation. And we have been referring to it sort of as a systematic evaluation. Another adjective we have used has been a cross-country, or a cluster evaluation. And in the presentation I will kind of describe kind of what we mean by that. And it’s just sort of a simple division of labor.
I will go through sort of the goals and the scope of the evaluation, and some of the approaches and some early findings, and I will let my colleague, Molly, from MSI discuss more in depth the six evaluation questions that we posed when we were �'�' to start �'�' to launch the evaluation, and some of the methodologies they used to answer those questions, and to develop findings, conclusions, and recommendations.
I should mention we’re not done quite yet. We’re sort of in phase two, and there is a �'�' well, actually, it’s just �'�' technically in phase three of three phases. And that should be completed in about September. But we have done enough work so I think I can preset some early findings, and talk a little bit about sort of the challenge, and how we address the different sort of challenges.
I should begin by mentioning that the �'�' this evaluation has a number of different audiences and interests. Congress �'�' well, actually, I should start with the WTO. Really, the World Trade Organization and its members, and particularly the developing countries within the organization, have been interested and somewhat concerned that they would be able to take advantage of the opportunities in the world trading system. And there has been an interest in whether or not donor assistance has been effective in helping these countries participate more in trade. And so, there is sort of �'�' there is that interest within the context of the round.
I should also mention that our U.S. Congress has been interested in the effectiveness of U.S. Government assistance, sort of generally interested in assistance effectiveness, but also specifically regarding trade capacity building, partly because of the round, of the �'�' of this coming up as an issue within trade negotiations.
So, they commissioned a report by the GAO, the General Accounting Office, in �'�' that was completed in early 2005. And basically, it looked at a bunch of trade capacity-building programs. And they kind of concluded that trade capacity building was extensive, but its effectiveness had not yet been evaluated.
And they came up with two recommendations. They said that AID should work on standard indicators and share them with other agencies, and that what was sort of needed was a strategy that looked systematically in sort of a cost-effective way at monitoring and evaluation of trade capacity building.
So, that was sort of our backdrop for sort of the issues that we were interested in when we launched this evaluation. But we had six evaluation questions, and I’m going to group them into three general themes which is, you know, what sort of impact are USAID programs having on trade capacity, you know, how do AID programs contribute, and have these interventions succeeded.
And then, a second issue is, so how can we be more successful? What factors or approaches were more successful in yielding results? Under what circumstances? What synergies are there between the different ways of delivering assistance and tactics? And how can we better monitor trade �'�' as a third issue, sort of how can we do a better job of monitoring and evaluating trade capacity building?
So, at the time of the study �'�' it was 2005, USAID trade capacity building was about $2.3 billion. I actually have some data here that describes trade capacity building assistance. So, in 2005 is actually when the report was written, and it �'�' there is �'�' USAID by itself, the blue �'�' actually, we’re the blue bar chart. And our assistance is probably about $2.3 billion at that point. So there had been a lot of it, and other agencies were contributing.
And in 2005, the MCC (phonetic) began launching with its compacts, and they �'�' and actually, as a part of the compact process, MCC has a number of trade capacity-building programs, too. So you can see there has been sort of a growth in this area of providing assistance. And there was sort of a question, generally. You know, how well is this assistance being done?
I should add that this data comes from a survey that’s done annually. They basically asked 20-plus agencies in the federal government, “So, what do you do, in terms of trade capacity building?” And this next chart kind of shows what we mean when the GAO says, “Trade capacity building was extensive.”
We had �'�' this is probably only the top 15 categories that AID works in, in terms of trade capacity building. You could have put more, but the lettering would have gotten smaller and smaller. But the point is that there is lots of different types of things that we have been calling trade capacity building. It’s often referred to as sort of an umbrella grouping of types of assistance.
So, we’ve sort of asked the question �'�' basically, we’re being asked a question of, “You’ve been giving a lot of money, you’re doing a lot of different things. So how do you evaluate the whole thing?” And so, we regard this as a bit of a challenge, and the trade capacity building survey was actually both a blessing and a curse, in terms of helping us out in this process.
It was kind of a �'�' it was a blessing, because it gave us sort of a sampling frame. We wanted sort of a systematic evaluation of these programs, and we didn’t want to just pick a couple of programs that everybody knew very well and evaluate them and say, “Well, we did a good job.” We wanted to sort of approach this in a more systematic way. And having this survey that had been done for, you know, eight or nine years �'�' and now it’s been done a decade �'�' was a good sampling, it was a good place to start in terms of, you know, what are we talking about.
And so, it allows us to look across countries. And this is sort of unusual for evaluations, generally. Usually, AID evaluations you will see �'�' or we could find mostly when we did sort of a review of what was out there in the field, is you find very many just kind of country-level, project-level evaluations of how that particular project had been done. So, this was sort of an approach to, “Can we be more systematic and look very broadly across programs?”
And so, we had �'�' and to do this we sort of needed to develop sort of a grouping, or a clustering of programs that were like enough that we thought could be evaluated and compared against one another. And so we developed these results-based clusters. And we kind of threw out in some ways the 25 or odd groupings of trade capacity-building. They were just �'�' there was too many, they overlapped, they weren’t helpful to helping us with these evaluations.
So, we said, “We’re going to try to do this �'�' we’re going to try to simplify it, and find like projects, and put them together, and look at them in a consistent way, and look at the kind of results they’re having.”
And we also narrowed it. We basically �'�' we had been looking at the whole U.S. Government, and then we decided there was just no way that could be done. Because, basically, at the time, we were kind of doing the sampling. There were over 4,000 activities that were in that database, and so the task was daunting. So we said, “We’ve got to focus this if we want to learn anything.”
So, the focus was we basically said, “Look, what are the programs that were most directly designed to impact the performance of a country’s trade? And we’ll focus on them. And we will focus on aid funded and implemented, since we are funding this. And, on top of that” �'�' but that actually was broader than it sounds, because it’s funded or implemented, and that meant we did look at USDA programs, we looked at Commerce programs. We looked at a number of different agencies that �'�' besides the USAID. So it was not the whole of government, but it was certainly more than just AID.
So, this is sort of our attempt to sort of group and focus on to sort of a program logic. This happens quite a bit in the field, or �'�' there is best practices to design a results framework for the project you’re working on, on a country level. And we kind of did this on sort of a �'�' on a whole concept level, or the whole grouping. And we kind of said, “If you can fit inside this chart, or you” �'�' or �'�' which is �'�' the colored boxes are sort of what we’re looking at. Anything that’s sort of the white area we’re going to say, “That’s interesting and helpful, but we’re going to set this aside for the moment.”
So, at the very top, all economic growth programs, basically. Our goal is, you know, broad-based economic growth in the target countries. And then we kind of divided programs into �'�' or results we were aiming at, programs that aimed at, you know, performance more on a domestic basis, and those who were really focused on trade performance of their counterpart countries.
And there are really three groupings of types of programs that we sort of developed and wanted to focus on. And I might add that there were quite a number of programs within the trade capacity-building database that were a little bit broader, that were kind of economy-wide, that clearly �'�' and that’s part of the reason why the line kind of goes up from this white area on the far white on economic business policy and climate, in terms of a goal, to improving this economic business climate in the country that was sort of broad-based, or that covered both traded and non-traded goods, because it does �'�' a lot of those programs do have an impact on trade performance.
But we just �'�' it was just too many to look at, and we wanted to focus on those that were very specific to trade, governance, and trade capacity building.
So, the three we came up with were basically interventions that are aimed at results that improved private sector practices, then another group that focused on public sector practices, and a third group that was actually kind of a mix of the two, but we thought was sort of helpful, because it focused on something that was sort of tangible, as a result, and it wasn’t as direct as many of the other programs that worked directly with firms.
Oftentimes the firm-level stuff was �'�' assistance was aimed at, you know, specifically getting a firm that’s a trading firm to get into a new market, or to improve its production, or to improve its ability to meet international standards. That was sort of what happened on the far left of private sector practice.
On the far right it was focusing on the system itself, to try to improve the efficiency of the goods kind of moving through the system. So that could include government practices, such as customs, but also could include logistics practices by, you know, freight forwarders or, you know, private sector practices. It was a combination of looking at sort of how you can make the whole system more effective, in terms of the time, cost, and reliability. So, those are our three areas, and this is �'�' oh, yes.
QUESTION: May I just ask a clarifying question?
MR. BOHLING: Sure.
QUESTION: In your �'�' in the packet here, you’re talking about a meta�' evaluation. And I’m a little bit off track. Are you talking about selection of evaluations of each of these that you’re putting in a meta�'evaluation, or are you talking about selection of programs for secondary date analysis, or selection of programs for some other reason?
MR. BOHLING: Well, it’s a good question. We did do sort of a meta�'analysis. And we actually did sort of a literature search sort of in the phase one. So, there was a meta�'analysis conducted. But this result framework has to do with actual projects. It’s not a meta�'analysis.
QUESTION: So, when you use meta�'evaluation in here, you really mean meta�'analysis?
PARTICIPANT: In the write-up.
QUESTION: Oh, in the write-up.
MR. BOHLING: I didn’t see the write-up.
QUESTION: So you didn’t do a meta �'�' okay, okay.
MS. HAGEBOECK: We did –
MR. BOHLING: Did, but that’s not what we’re talking about in this slide.
MS. HAGEBOECK: We will come back to that. Can we come back to that in a little bit?
MR. BOHLING: Right.
QUESTION: So this is the selection of programs for a primary analysis.
MR. BOHLING: Yes.
MS. HAGEBOECK: Yes.
QUESTION: An original evaluation.
MR. BOHLING: Yes.
MS. HAGEBOECK: Yes.
MS. HAGEBOECK: For this.
QUESTION: That –
MR. BOHLING: Right, from this part. Yes. So, yes. So, the programs we’re looking at here are actual delivery of assistance, it’s not an evaluation of an evaluation, or analysis of something else. We did do that, but that’s another thing.
MR. BOHLING: Not to confuse it too much.
MR. BOHLING: All right. So, what did this approach come up with?
All right, let me talk about factors of success, first of all, and a little bit of sort of the approach we used. Actually, the way I kind of constructed this is the really tough questions go to Molly, and then I can sort of describe, sort of in broad terms, what we did.
We used sort of a top-down and a bottom-up approach. And on sort of the bottom-up approach, we had narrowed down our focus from the 4,000 programs that were in there. We had narrowed it down to AID-funded or implemented programs that were focused on trade performance during this kind of sampling �'�' those sample years.
And that came up with about 800 projects, so that still seemed like a lot. But we went out to the field to try to collect as much as we could. And we came back with about 230 projects over the span of time since the Doha Round, and took those projects and sort of coded them.
And when I say “we,” I really mean MSI went through and coded all of these projects into different categories, like what factors were involved with this particular project, what different �'�' what modalities did they use? Did they just use equipment? Did they just use technical assistance? Did they use equipment and technical assistance? Did they use �'�' did they build something? I mean, what are the different ways you could deliver this sort of assistance and get a result?
And actually, project documentation was included as well, where there was some coding of what things �'�' the implementers say was a problem, or what things did they say was a challenge, or what things did they say helped? So they kind of coded a lot of information to a single database for these 230 projects.
And it’s allowed them to come up with �'�' and I should add on to that they also coded results. They also found different things that had happened that, you know �'�' they had built �'�' they kind of used the same more detailed results framework to kind of code these different possible results. And, you know, whether or not they’re meeting these results, or whether they exceeded the results, whether or not they hadn’t met the results yet.
And so, there is sort of a coding for each of these projects. And from that we were able to kind of determine a few different synergies, because �'�' from this very bottom-up approach. And I will mention a couple of them.
We did find that working with women was actually more effective when we had projects that would focus on that. We found there was a synergy between providing technical assistance and providing other forms of assistance. Like, technical assistance plus training was better than just training. We found that delivering equipment would be better if you delivered equipment with training. Some of this seemed sort of natural.
But one was �'�' and another interesting finding was that technical assistance by itself did really well. I mean, it had a very high result. If we just brought TA, on average the results were pretty good.
One thing we also found there was not a synergy between choosing different economic sectors. If you chose to work in services, or you chose to work in manufacturing and services and some agriculture, that would not produce better results by choosing a bunch of different sectors to work in. So, that was sort of an interesting finding, as well, sort of what did and what didn’t work.
In terms of factors of success, a lot of projects, particularly on the private sector side, chose sort of to focus around a particular cluster of sectors, or a value chain. We found that those projects that identified using this particular tactic as more effective than those that didn’t. Sort of an example of factors of success.
Some things that did not contribute to success �'�' and this probably would not shock no one, but �'�' would shock no one in the room, but we found it statistically significant, was that if your project started up late, you ended up with results that got delayed. And so that was not surprising, but you �'�' it came out very robustly from the data.
Also, you know, problems with AID management, or problems amongst implementers, or problems with partners also came out quite strongly as having a negative impact on the results you had. That probably wouldn’t shock anybody.
I would add on trade facilitation we found that there were not a lot of programs that actually focused on trade facilitation, but they had a relatively high rating compared to �'�' in terms of achieving results and making �'�' and recording improvements in time, cost, or reliability.
And let me mention one other thing, in terms of findings. That’s sort of the bottom-up approach. There is also sort of a top-down approach that was done, too, where we commissioned some research from the University of Pittsburgh that did sort of an academic look, looking at actual trade statistics between countries, and their ability to export and import, and the U.S. provided assistance. And they used something called �'�' in trade called gravity modeling, where you have sort of an expected flow of trade between economies of a certain size and a certain distance and population.
So, you control for all these different things, including fixed country effects, and you get sort of a result at the end, sort of what were the big determinants. Obviously, the size of an economy is a huge determinant. But amongst the statistically significant factors was AID programming for trade capacity building. So, that was an interesting and very helpful finding.
They also did this same sort of analysis on the Heritage Foundation’s index for trade policy and trade freedom. That’s sort of a compilation of barriers to trade, or lack of barriers to trade, and a lot of non-tariff measures. And there was regression analysis to see if there is any connection between USAID programs and improvements on that score, and they found that there was a connection there.
So, in those two areas, using sort of a top-down approach, they found some statistically significant conclusions. And those were two of the kind of highlights.
QUESTION: On the last slide you said you had public sector, private sector, and then you had trade facilitation. How do you define that, or is there an example of what you mean by trade facilitation, versus the other two?
MR. BOHLING: Yes. We defined it as a �'�' by the result, which is trying to improve the time, cost, or reliability of trade. So it was very �'�' it could have �'�' actually, they �'�' in other structures in OECD I think they have subsumed many of the same programs within public sector programs. So we kind of �'�' we split that out and focused on it a bit. But that was sort of our approach. And it was very results-driven, because you could tell a program that says, “We’re going to try to improve time, cost, and reliability.”
So, my last question, in conclusion, this is the question I get a lot when I work in government. My �'�' the kind of (inaudible) what I think is that we did actually find some �'�' so I think this evaluation will be helpful to both program managers and to policy makers. It kind of showed that AID and U.S. Government, more broadly of the programs we’re funding, is delivering on trade capacity building in the developing world. So I think it’s helpful, in that context, and really sort of systematically documenting a lot of the efforts that have been made over the last, like, eight years. So it’s helpful, sort of on that front.
From a program manager, from a technical bureau, it was also very, very effective, or very, very helpful to us, because we were able to gather together �'�' you know, 234 projects are all on the topic or area that we’re interested in. And we coded them, we looked at their results. And so, as kind of a centralized technical source, it really has provided quite a library and access to sort of bottom-up examples of what we’re doing in the field that we could find right or wrong. And that’s very helpful, working for an agency that’s very decentralized. So that’s the (inaudible).
I will take questions. But Molly might be able to answer them after hers. But –
MR. BOHLING: I think the answer is no, actually. The �'�' remember, we had 4,000 activities within the time frame. Actually, we had more than that. I think it was like 5,000 activities. And we said, “We’ve got to narrow this down. There is just no way we can look comprehensively at this.”
And so, the way we did it was we actually said AID-funded or implemented. And so, there was �'�' “implemented” meant other government agencies. So we did look at Ag, we did look at Department of Commerce.
One of the challenges, though, is that they’re not graded. We found out other folks, our other agencies, weren’t great at documenting their projects. And so, we were always �'�' had sort of the �'�' we had to rely on the DEC, the Development –
MS. HAGEBOECK: Experience.
MR. BOHLING: �'�' Experience Clearinghouse, which was unbelievably helpful. I mean, this evaluation relied upon a lot of tools that already existed within AID. I mean, a certain categorization of results was relied upon, also a lot of documentation that was available on this single website was also very, very helpful in really bringing together.
But the difficult part of that is most AID contractors are required to put this information in. And so, getting this from the other agencies was a little more challenging. So it was �'�' we did get a little bit of information, but not nearly as much as we would have preferred.
MR. BOHLING: We have even more complicated ones, if you like that one.
MR. BOHLING: Well, I should say we did get statistically valid results looking �'�' actually, to do this sort of analysis, the regression analysis, you needed a lot of data. In fact, we found that when we had more data, it was �'�' you could find much better, more significant results. When you had a really short data series, it was really difficult. I mean, it doesn’t matter anything about the program, itself. It’s just theirs wasn’t updated to prove your case.
So, you need a long time series. So we were able to show results at that level. Showing it at the level above kind of relies a lot on sort of academic thinking. We didn’t actually prove that our program statistically changed the GDP of countries. That was �'�' although I would really like �'�' that would be a really great talking point to have.
But I think your question is the right one, which is, you know, how big a program do you need to have to say that an AID program or a U.S. Government program really changed the GDP? I think the only ones who really have a shot at that, I think, are a really MCC project. And they have nice �'�' they have five-year windows. I’m actually quite jealous of MCC’s ability to pull together monitoring and evaluation and have these long time frames.
So, we found, at that second level down, we found statistically valid results. But at �'�' we didn’t find it �'�' well, actually, I don’t think we really claimed any attribution at a higher level.
QUESTION: Did �'�' was there room in the evaluation, or will there be, to look at unintended consequences? Because it seems like the premise is that the �'�' you know, that improving the GDP or improving the trade movement is good. But were there any look at, you know, social consequences, or other repercussions on movement of people, or U.S. foreign policy, other U.S. foreign policy objectives in that?
MR. BOHLING: That sounds like a good one to defer to Molly on.
MR. BOHLING: I don’t think we explicitly �'�' I mean, we know that’s one part of evaluation, is to look at those causes. And we’re kind of going into third �'�' into the third phase, which is looking more at sustainability.
Because a lot of this was done at sort of a project level, and using, you know, statistical methods, this �'�' we’re going to have to kind of ground--troop this. And so they’re �'�' we did have interviews with people in the field already, but there is going to be a significant amount of engagement, sort of the after �'�' the follow-up on these programs. So I think it would be more the third phase that would get into that, although I’m not really sure how much we’ve focused on �'�' I will defer to Molly on that.
With that, actually, why don’t I just turn it over to Molly? Unless there is another burning question –
PARTICIPANT: Oh, there is one more.
QUESTION: Yes, just wondering. You mentioned the process of disseminating those findings back to the field. I wonder if you can go into just a little bit more detail about the methods by which (inaudible) communicates certain findings like this that could be useful (inaudible).
MR. BOHLING: I can tell you how we were planning on doing it.
We do have an economic growth council that we pull together every couple of years. And so that’s always available, disseminating information that way. It’s more informal. We did plan for some training to be developed, and we wanted to put into our regular training materials that we do on an annual basis. We have in-depth training on program management of trade capacity-building programs, and we’re trying to put a lot of that more online, because our challenge is that AID is growing very fast right now. We have kind of a younger generation called DLIs (phonetic) that are coming in.
And so, we want to kind of dissemination that information, so there is an intense training they’re going through. Our concern is that they get so much training up front, at first, before they actually go in the field. I don’t know how much they’re going to be able to absorb. It’s sort of a big question. But those materials are being developed, and we’re trying to put them online. And we do have regular exchanges with economic growth officers in the field.
MS. PATHAK: Really, also incorporating it, like Brinton mentioned, into our training courses, EG (phonetic) does annual training where people come in from the field, a week-long (inaudible) does some modules in that, sometimes (inaudible) sometimes part of that. So those are the places to incorporate it.
But we also really want to use the Web to do more of these things, to �'�' based on a (inaudible) small model �'�' I don’t know if you’ve seen this, knowledge management �'�' and we want to expand that further. Because, at the end of the day, you really want this information out to a wide range of people, wider than what you can reach with one Brinton going out to as many countries as he can.
MR. BOHLING: Because he gets tired.
MS. PATHAK: Yes, he gets tired. So, yes, we are in the process of doing that. And so it’s helpful that AID, as such, is recognizing it, and also recognizing it at higher levels, like with �'�' Ruth, we’ve talked to Ruth about setting up these information exchanges at sectoral levels, that �'�' whether ECAT does it or the agency at a higher level does it.
MR. BOHLING: Molly?
MS. PATHAK: One more.
MR. BOHLING: Oh.
QUESTION: I wonder to what extent did you (inaudible).
MR. BOHLING: Yes, that’s a good question. Almost going forward, this is something we’re sort of in the position now to think about or talk about. I think I would like to �'�' this is actually an input to a broader U.S. Government process that we’re working with the USTR to develop sort of a government-wide monitoring evaluation strategy.
And so, yes. My goal is �'�' and we had discussions about, you know, how to do a U.S. Government strategy a few years ago. But it was �'�' every agency was very, very different. They all generally did some sort of M&E. And so we had kind of a discussion about, theoretically, how one does it.
But this �'�' but doing something so tangible allows people �'�' and MCC actually has lessons learned, too. They do their own M&E in their own way. So I think both their contribution and their tools, and sort of the way we’ve done it in our �'�' the results frameworks we developed will be helpful to doing a better job as a government, as a whole. Because I think there is that necessity. I think that sort of uncovered �'�' the evaluation sort of uncovered that.
So, I guess the answer is in the future we will be doing that.
(Interruption to recording.)
QUESTION: Is your response to the GAO recommendation that a systematic method for monitoring and evaluation was needed �'�' is your response that you developed that monitoring and evaluation system, or you developed a way of using the existing data in a better way?
MR. BOHLING: That’s a good question. The �'�' what they actually asked for was sort of a government-wide strategy. And we did have a discussion with other agencies about a government-wide strategy, and we sort of compared notes with the (inaudible) requirements that every agency has, and we decided sort of, you know, we all had our different approaches.
And so, there wasn’t much we could do beyond that. I think this will help, in the sense that we could use this framework, because it’s a little broader, I think, than the way the MCC approaches it, because they’re very compact-driven. I think this will enable, actually, a better U.S. Government strategy.
And we have spoken with the USTR, and we hope this will be sort of the input �'�' sort of the way �'�' kind of the way we approached it would be helpful to developing sort of a more consistent strategy across agencies.
So, it �'�' I think you had to do it, actually, for it to become clear enough for people to say, “Oh, that makes a lot of sense. I see how you did it. Why don’t we �'�' we fit in this thing, too.” And so, that’s sort of my hope.
MS. HAGEBOECK: All right. Now, my role is nuts and bolts of how we did this evaluation. And we will overlap a little bit, but not very much.
Now, there were six questions. I hope you can read them. I was trying to get them all on the page. Six questions in the request for proposal. And they were in �'�' we have rearranged them �'�' they are not in the order we were given �'�' because we wanted to group them to see what we were being asked to look at.
And the first two dealt with impact. The first question, which was the original question one, so it was the top priority question, because they were arranged in priority order, was to what extent these programs had contributed in some measurable way to improvements in trade capacities in the target countries. That brought us up to the country level. The question was country level. Have you made any difference at the country level?
The second question that was impact really broke it down much more specifically, and said, “Well, is there any impact on firms? Well, how about associations? What about sectors? What about economies? What about government agencies?” Asking the question at a lot more micro level. So those were both impact questions.
Then there was a whole set of questions that really were saying because the way USAID and some other government agencies run their programs is they have the program managers set objectives, sometimes targets, and then they monitor and ask whether they met their objectives. That may not necessarily equate to national impact, but it is what they’re trying to do in that project.
So, the second set of questions were really about whether the projects that had been funded met their stated objectives or not. And if not, why not?
And that also included, in the last of that grouping, to what extent have interventions funded succeeded in accomplishing the program’s objectives? And, as you see, that’s not defined. As we go along, we will say how we defined it.
And then, the last question. The last question was actually number two in the original order. It’s number two priority, which is �'�' and if you will notice the way it’s worded, it’s not really a question about what happened, it’s about what should we do next about monitoring and evaluation so that it’s better than it’s been in the past, because the GAO had hit pretty hard in saying, “You haven’t really evaluated this great big program.” So, those were the questions, or the types of questions we found we were trying to answer.
There were also research criteria in the RFP, in terms of how they wanted this done. And Brinton already said, “Don’t go out and just look at a few projects and �'�' you know, two or three �'�' and say, ‘Oh, gee, it was all good,’ or it was all bad.”
It was to be retrospective, from 2002, when the Doha Round started. That’s a long period. Any evaluation that has to be retrospective in that sense, and go back and try to recreate history, is �'�' has got problems right off the bat that are a little bit like the traffic accident problem, you know, that everybody reports the traffic accident differently. Well, if it’s a traffic accident eight years ago, the difference in memory is really going to complicate the situation.
So, the retrospective is a real thing to pay attention to. A lot of other evaluations are shorter in term, or they’re one-project evaluations, sometimes much shorter in term.
It was to be clusters. Those weren’t defined at the beginning. It was suggested we might use all 38 of those topics in the database. But we ended up, as Brinton said, with this much more narrow, results-focused set of clusters.
We were to capture and represent different geographic variations to represent projects in terms �'�' that varied by size and what their scope was, narrow scope, wide scope. What was there? We were supposed to find and include successful and unsuccessful implementations, so we could learn from both.
And too, because this was going to be big and look at �'�' across countries to rely on quantitative approaches to the extent possible, that didn’t mean we left out the qualitative. It just pushed us to get to things which would really allow comparisons across countries and across types of programs. And we kept a lot of qualitative in, as well, because it enhances the �'�' when you’re writing quantitative answers, you enhance it with the qualitative and what you’ve got. But this is much more workable, to move to the quantitative when you’re doing large numbers of projects, or looking across a large range.
And it was to be explanatory, as well as just descriptive. In other words, why are things happening? We weren’t supposed to just describe, but actually get into why. And to �'�' and the factors affecting success, what works better than others, why does that happen, as �'�' to the best we could over this range.
Brinton has talked about the design at the macro level. And it really had a two-level design. And a design is like the structure, you know, the plan for the house, the big picture of how this was going to be done. Clusters �'�' and, as we said, we went to the results framework �'�' and then this scale of research. And this was really, really quite interesting.
As Brinton said, the �'�' when we pulled all the data up through 2006 �'�' which, at the beginning, that’s all that �'�' we pulled it as far as they had it in the database when we started this project, and there were these 4,500 different activities. Well, the first thing we had to do was seam them together into projects. It was like stringing popcorn across the years. And that brought it down to about 2,200. In other words, all those activities in the database are single-year reporting on what may be actually a five-year project.
And so, when we strung it together, it was about 2,200. When we looked at what portion was AID, it was 1,249. In other words, about half of the projects were USAID, which is why the narrowing to USAID to get one big chunk of this to look at made sense.
And then, we narrowed again, as Brinton said, and we narrowed to directly trade-related. We lost about 400 projects in that, and they were things like small enterprise credit to all small enterprises across the country, or improve the banking system countrywide, or work on rural roads for the farmers countrywide. That has both domestic as well as trade implications, and we set those aside.
So, then we got down to the ones we went after. And those 876, what we captured, and we actually were able to get project documents on, this is not easy. Getting project documents is a lot harder than you think. We started with the USAID database, and we creamed through that, and we didn’t have very much.
We ended up going to every major implementing partner, firms, and voluntary agencies that work with USAID, and we begged, and we cajoled, and we pleaded, and we finally got a huge dump of documents. And we have those 256.
Now, the important thing there to note is while we were looking for 800 and we got 256, we got 70 percent of the dollar value. In other words, what we didn’t capture was the small, tiny projects, one training session, they did a study, small things that people didn’t keep as good records on, or couldn’t find.
We also went to USDA and they too had a lot of the same problems that we had with USAID. They had trouble finding the documents. Again, when you’re doing retrospective, unless and agency has a really strong document system, it’s not to be had.
So �'�' but having 70 percent of the dollar value, we know most of what, essentially, was going on from that USAID program. I think there is very little that really escaped us, substantively.
All right. Now, at the micro level, which is where we’re going to go now, is that we had to get answers to every single question. Each one of those six questions was quite different. And you take a macro approach, and then you need to go micro to get at it. And we’re going to show a table that we use called “Getting to Answers,” which is in the USAID evaluation training course, and we find it very helpful.
But the two challenges here, as I said, one is recovering the past. And that was a constant throughout this. The other is the counter-factual. What would have happened if these programs hadn’t been there? And we tried to look at that, too, as best we can, because that’s a key challenge.
Ruth Levine, who was over at CDG and now with USAID, was the head of a big study, where the �'�' called, “Why Don’t We Ever Learn Anything,” and it’s because we’re not looking at the counter-factual, in part. We’re not saying, “Did we cause this? Or would it have happened anyway?”
So, we really tried, as best we could. Though it’s difficult in a retrospective study, we did find some ways �'�' and I will point them out as we go �'�' to, in fact, try to look at the counter-factual.
Getting to answers is a basic tool that we’re finding more and more important in evaluations, because it allows us to do something systematic across all evaluations, take every question and tear it apart.
Just take any evaluation question one. The first thing we try to say is, “What kind of evidence is needed to answer this question? Are they asking for description? Explanation? Are they asking for comparisons?” And if we don’t know that, we don’t even begin to understand what methods to use.
The methods and the data sources come next. And if we need sampling, and then what kind of analysis we’re going to do. This imposes a discipline on evaluators, which some are not very good at doing, because it really says you �'�' each question gets a significant amount of attention. And if it needs different methodologies, it will get them.
And let me just go to that, and we will just go on how this all worked out.
The first question �'�' and, remember, this is the top priority question so it got the most attention, in some ways �'�' “What difference did we make, in terms of trade capacity at the national level,” was really the way we read this question.
The first thing we did was we went to get the evidence from these projects. We had, as we said, 256 projects. And we creamed those projects. We pulled everything out. We didn’t, at the beginning, say, “What’s more important than other,” but we �'�' any evidence they had, if they said they produced exports and they had some data and they could show that, we pulled that out. That was evidence. How that would go into the big picture, we had to see as we went along.
But in phase two, which is the one we just finished, what we did, in terms of this reconstructing history problem, was we chose to depend, for phase two, on the documentary records. The phase three we’re going to be following up with humans who were involved, and asking some more questions for clarification, or see if they are consistent with what we found from the documents.
But the contemporaneous reporting represented by project documents and final reports from contractors and PVOs (phonetic) is we judged that to be a better source of data.
Now, when you say that, you say, “Well, gee, those PVOs and contractors, they can write anything they want.” And maybe it’s true, and maybe it’s not.
Frankly, that doesn’t work in USAID, because they have these quarterly and annual reporting against their project objectives. And those, then, are summarized in the final reports, and the USAID people review it. So it would be very hard for those final reports, and those quarterly and annual reports, to be very far off from what the USAID project manager concurs is pretty accurate information.
Plus, USAID has a whole lot of things called data quality assessments. They’re operating under a rule that they can’t report out of the agency unless they have made sure that the data is good.
So that gave more credibility to looking, first, at these documentary sources, which were extensive, once you got the documents. I gained a respect for project documents that I didn’t have before, because I really found so much of this monitoring information, and the monitoring information was largely where we got evidence of results at different levels.
And, as I said, now –
PARTICIPANT: Molly, I know we got started a few minutes late, but there is another workshop in here at 11:00.
MS. HAGEBOECK: Okay.
MS. PATHAK: Okay, we still have –
MS. HAGEBOECK: All right. We did the regression analysis also. One of the things �'�' we only took that up to the level of trade performance. This was a type of regression analysis that we had seen done for the DG (phonetic) office, and they had done something similar, and we went and asked the same team, “Could that methodology be migrated into our study?”
So now it’s very interesting that this type of regression analysis that looks at the dollars put in, and the results out, we have �'�' it’s been done now for two segments of USAID’s work, and it’s turned out very informative in both cases. So it was methodology we had seen before.
We also looked at export performance with countries that got high results �'�' high exports and low exports, versus how much money that they had gotten from USAID, and we found some patterns in those economies that were very interesting, where trade agreements were present. You were seeing people doing better, whether or not they had a lot of money for programs.
Where there were micro level improvements in doing business, in just the basic mechanics of being in business in countries, those countries were doing better in exports, and particularly better if they had helped from the foreign aid. So that was very, very good, too.
And then we did some trace back, to see whether these national results connected back to projects. And so we were coming at this �'�' they talk about triangulation when you’re trying to get an answer. And we came at the answer to this question from four different directions.
We used �'�' one of the things that we found was in �'�' what came out of project reports and what the regression analysis used by way of measures was �'�' were very similar and overlapping, and that was good, to have these two different things coming at it the same way. Exports stood out at trade performance, far exceeding what we saw in the way of imports. We �'�' and even �'�' and �'�' or FDI (phonetic) improvements.
Down at the private sector level we saw new products coming online, and that was important, as a result. And we saw quality of exports improving.
We saw �'�' at the public sector level, we saw some visible changes like lowering tariffs, and a lot of behind-the-scenes activities of changing procedures, changing the laws of the country, once you had signed an international agreement. Time to export decreased in the case �'�' in a lot of the projects we saw. Cost to export, also.
So, these were the kinds of measures that were captured both in the regression and from the projects, and we saw agreement that some of these things were happening.
The impact question. We also took it up there to the higher level that you asked about, but only from the projects, not from the regression. And the answer was we’re producing jobs and we’re producing changes in income, and they were providing evidence of that. And we didn’t quite expect to get that high, because we were �'�' because you’re climbing this ladder. We really only expected to see trade performance, but we actually saw jobs added and income changes being claimed by the projects, and the projects presenting evidence in that direction.
And we looked �'�' there was a lot of where we were looking at existing data series to see what they had said at the sector level, and stuff. We found some problematic things at sectors and economies, but they weren’t coming so much from our projects, but they were in the unintended effects area.
There is something that this whole business of trade liberalization �'�' the FAO is particularly concerned about import surges in countries, of a lot of imports going into countries in areas like used clothing over in Zambia, and reducing the viability of the local clothing industry.
That wasn’t coming from our projects; we noticed it on this scan. But it falls into those unintended effects, and we don’t quite know how to put that into the report, because it isn’t USAID-related, but it’s definitely in that bigger area, when you’re thinking about trade.
Each one of these segments had an individual results framework underneath it. So you see practices, as Brinton was talking about, at the top. Private sector practices improved. Now you go down, and underneath it you see some basic business practices. And there was quite a bit of activity there, just improving the financial management of firms.
Below that you see that knowledge is changing. And below that you see local service organizations that are providing the training or technical assistance in-country to do that. If any of you have ever worked in education, you know there is an old model that says, “Knowledge, attitude, practice,” that that’s the sequence of change. Well, that KAP model is right in here. As knowledge changes, then practices begin to change. And then we get the trade results.
And each one of these had that kind of model. And we looked at results at every level, pulling out what the project said. In some cases, you had a lot more activity in one area than in another. And those things tell us something about the way people are designing their projects.
And that is informative, I think, for the future, too, as we know a lot about how people design their projects, because we could �'�' we sort of racked up the scores at each level. How many projects were active in producing results at each level like that?
In terms of all of those questions that said, “Were they successful in achieving their results,” now, the last thing an evaluator wants to do is look at an old project and make a judgment all by themselves about results. So, what we did was we adopted a tool that USAID uses in its own annual performance reports, which is a three-point scale: did not meet the objectives; met – improved, but met the objectives; and met, exceeded. And that is scored by comparing the project intent to the results they present, particularly targets.
And so, we adopted this, and we just changed it to numeric, so that we could go ahead and say, “Now we’re able to use the numbers more easily, going forward.”
And when we did that, we were able to come up with an average score across all projects. How did the projects do? Well, across those �'�' I think we actually only scored like a 230 on this. But the average success score was 237.
Well, on a one to three scale, what’s happening with the USAID TCB (phonetic) projects? They’re doing quite well. In fact, nearly 50 percent of them met all their objectives. That’s high. This is �'�' these programs, relative to what the GAO said, you know, “We don’t know what you’re doing,” the effectiveness looks very good on this kind of a measure.
But we were able to take this kind of a measure and kick it into the next question. We were able to look at different types of projects along this scale, and kick it into the next question, which said, “What about combinations?” Here, what we used was Venn diagrams to say, all right, all the agriculture, and all the manufacturing, and all the ones that shared manufacturing and agriculture in the middle.
And we did things like this. We were able to take �'�' Brinton mentioned the cluster projects, the ones that involved different type clusters of types of exports. And the ones that used a value chain approach, which was a vertical integration from supplier to the export.
And look at this. This is where �'�' this is kind of �'�' the kind of thing that would tease out synergy was there were 35 cluster projects and 20 over here, on the other side. We had the average score for each one, 2.7, 2.3. Now, these were tiny. It’s like scoring ice skating, you know? The differences are sort of small between them.
But what you see is in the middle. When they used both of those approaches, it jumped to 2.9. So these were some of the ways that we teased out synergy. We used that average project score to compare groups and to compare things. And there is a lot of that in the study.
This �'�' what we did was, primarily, we compared what USAID did to projects. The primary comparison was to a 2003 strategy paper, which all of the people doing all of these projects had read. And in many ways, they were consistent with what that policy paper said, so that that’s largely �'�' the policy paper did have a role, and appears to �'�'
MR. BOHLING: In Washington we were shocked and surprised by that.
MS. HAGEBOECK: What? There was more of a consistency than one would have expected from a policy paper. I think that’s fair to say.
This last piece about the �'�' and this is where I’m going to have to wrap it up �'�' is the last piece was about how do we go forward on monitoring and evaluation. At this point, in phase two �'�' and that should say “phase two,” not “phase three” �'�' we focused on current practices in two ways. We documented every indicator that every project used. That has given us a rank ordering of indicators by every level of the results framework about what people say is practical and usable.
Now, that’s going to give us some horsepower, going forward, if we try to say something about �'�' could we use more standard indicators? In fact, the standard indicators USAID already uses may, for this field, get influenced by this field voting about what’s practical. Because the ones who used it �'�' and we saw data and evidence against it �'�' that’s a real indicator that can be used.
For the project evaluations, there were only about �'�' we found 30 project evaluations matching our 256, and another 10 for other evaluations. Overall, we found maybe 14 other multi-project evaluations from the World Bank, and 1 from USAID, and maybe 14, 15 more UK and Japanese evaluations, which �'�' we looked at all of them, and tried to do some pulling out of the content, along with a lot of academic literature, to look at whether they were consistent with those results frameworks, which they largely were.
In terms of real meta�'evaluation, which is judging the quality of the evaluations, we did that, as well, with some of these 30 evaluations that we were able to look at. We use a tool in the evaluation course for USAID. We use two tools. One is a checklist for scoring the �'�' a statement of work for an evaluation against USAID’s automated directive system, or their policy, 80S203 (phonetic), and we have a checklist against 203. We scored the scopes of work against that, and we scored the evaluation reports against that.
And we are able to see, as �'�' with a group, with kind of lines showing us where there are weaknesses in these evaluation reports, and where there are weaknesses in the statements of work. That’s going to help us, in the next step, go forward in terms of how to improve that. And I think the voting that we saw, the indicators, is going to help us go forward on monitoring.
We also saw very interesting little things like, in the field, when they talk about new products, they talk about anything. They talk about two different colors of ballpoint pens as new products. When USTR talks about a new product, we’re talking about the system for categorizing products, so that jewelry is different than vegetables, if we do it numerically. That’s not being used in these projects, and it might help to get some of the standard international ways of coding products into these programs so that we had a better view of what’s really creating new products and not. So that may also help with some of the monitoring and evaluation.
And I’m sorry, we’re out of time.
MS. PATHAK: If you want �'�' have a couple of questions –
MS. HAGEBOECK: If you’ve got a couple of questions before people coming in, this was just a talk as fast as we could.
MS. PATHAK: I know. It’s very unfortunate.
QUESTION: A quick question. Is this report in your data on the DEC?
MS. HAGEBOECK: It’s not yet. We’re in �'�' going into a draft for circulation. It will be �'�' phase three, the draft, will be going out to people in the field, to the implementing partners, along with some communication with them to help (inaudible), in their view, what we got. We should be on the DEC by the first of October with a final.
MS. PATHAK: No, we’re just starting. This is actually the first time we have talked about this outside of the tiny group. But, yes.
MR. BOHLING: But we �'�' actually, it came up in a meeting we had with the DG LME (phonetic), and we �'�' so we owe him a copy.
MS. PATHAK: I think we should actually do a presentation. It’s much easier than reading a 100-page document.
MS. HAGEBOECK: We think a lot of these individual methods that we used for answering questions are going to have transferability that other projects can use. We’re going to have to �'�' and as we go forward, we’re going to kind of do probably a side handbook on M&E for these kinds of projects, and pull those methods and say, “You could do this, you could do this, you could do this.” Hopefully. Yes?
PARTICIPANT: I went to �'�' and we will, by the way, have all the proceedings, including the PowerPoint presentations, on our website. (Inaudible.)
PARTICIPANT: Good. I’ve got to make a typo correction this young lady is asking for.
PARTICIPANT: Right. So I want to thank our three presenters, thank you so much, Bhavani, Molly, and Brinton.