Workshop on Value of Program Evaluation: Case Studies

U.S. Department of State Third Annual Conference on Program Evaluation - Science, Technology, Trade Track
Washington, DC
June 8, 2010

Get Adobe Reader View slide presentation ]

MR. WILLIAMS: (In progress) �'�' sort of the theoretical background of closing the loop between evaluators and the implementers of these programs.

So, why is an energy guy here at a State Department conference? The energy programs are a necessary component of energy policy from a strategic perspective, from a �'�' national government strategic priorities. These programs fill a number of goals that are important, including mitigating climate change, energy security, reducing our dependence on foreign oil, also economic competitiveness.

With energy efficiency programs, lots of times these programs actually save consumers and businesses money, so they actually enhance the economic competitiveness of the U.S.

So, two of the types of programs that we look at in these case studies, the first one is more of a research and development program. It’s the Department of Energy hydrogen program. And this fulfills goals like climate change �'�' you know, you’re replacing oil with hydrogen�'powered vehicles �'�' and also energy security, you’re reducing your dependence on oil, especially foreign oil.

And then, energy efficiency programs, these are programs often that will give rebates or financial incentives for people to install energy�'efficient light bulbs or appliances or businesses that make improvements to their energy processes.

And so, these also fulfill a number of the same goals, climate change �'�' this is considered a low�'hanging fruit solution to climate change, because these are things that are very easy to implement, and often they save money. So you enhance your economic competitiveness, and you save energy, which reduces your dependence on oil and other forms of energy.

So, you know, with these programs, and because of the way markets work, you need a lot of government intervention in order to achieve these goals. And so, you need evaluation there to help these programs achieve their intended outcomes.

So, we’ve seen an increased emphasis from the White House, especially on using program evaluation to achieve the results efficiently and effectively, and also to strengthen the design and operation of programs �'�' you know, a continuous process of designing, evaluating, implementing recommendations from evaluations, and then following up to see how those programs improved.

So, what evaluation represents is a way to maximize these programs’ opportunities to achieve their goals. If you have an unproductive or inefficient program, it’s not aligned with the original goals, you have sort of lost opportunities in that these programs aren’t achieving their intended outcomes from what they originally intended.

So, evaluations inform decisions that managers can use in order to improve the likelihood of successful goal achievement, things like enhancing fiscal responsibility, which helps improve the chances of achieving the intended results.

So, what types of decisions are informed by program evaluation in an energy environment, environmental contacts? These decisions happen in very important parts of the budget cycle, in program design and redesign.

And from our case studies, there are a number of different types of decisions that were informed: whether to continue a project as is, or to make adjustments to the program to increase the likelihood of success; which processes need reform or streamlining and how to do so; whether to terminate an entire program area or individual projects; how much money to allocate to different research areas or different projects, depending on how well they’re performing; whether to redirect research of new priority areas, or just research (inaudible). So, if there is a new strategic priority in a new research area, and this is a high performing area, whether you want to redirect funds and resources to that.

And then, finally, which sectors or segments of the population to target, so that you’re getting the most bang for your buck, in terms of where those dollars are going.

So, to give you a little background on the project that I worked on last summer, the goal with these five case studies is to raise the profile of program evaluation. We wanted to provide some concrete, easy examples to understand that show how evaluation can improve efficiency in a program, or help a program achieve other goals. And so, we wanted to get some concrete examples in an easy�'to�'read format. So we got these two�'page case studies that we hope will be disseminated all around.

And in order to get �'�' to collect these case studies, we sent invitations to hundreds of evaluators all across federal and state levels, and we wanted to focus on energy and environmental programs. Obviously, this is a DOE�'funded project, but we did look at other ones that were maybe outside this, but settled mostly on energy and environmental programs.

And so, the ones that we chose, we conducted in�'depth interviews with the evaluators and program managers. We also reviewed evaluation reports and performance data. And we selected projects based on three criteria.

So, the three main criteria that we wanted to achieve with each of these case studies that they had to fulfill, the first is that an evaluation made specific observations or recommendations that a program manager could use. So this is a pretty straightforward. Most �'�' a lot of evaluations have this.

The second is that the program actually took some action to implement the recommendations that were in the evaluation. This is slightly more difficult to find, but you do find evidence of this. But often it’s not documented in a proper way.

And then, third, we wanted to have a follow�'up to see what kind of benefits became of those implemented recommendations. And so we wanted to find some evidence in some form of documentation, whether in a follow�'up evaluation or some other performance measure, to see what were the benefits of these changes that the program made.

So, this was �'�' this is a very difficult thing to find. In many cases, you see the evaluation takes place, and then the report gets sent, and nothing happens, or the program managers do make changes but they don’t document it.

So, in the end, we selected just four cases, plus we included one that the DOE had already developed regarding its hydrogen program.

So, I will go into a little more detail about each of these case studies now. First I will give you a preview of �'�' a brief introduction of each of the case studies.

The first one, the hydrogen program, this program actually saved about $30 million by using a peer review process where they had independent experts look at each of the projects and individual programs, and they identified poor performing projects that were then discontinued. Some of them were discontinued, some of them were continued, but with adjustments made to them. And then, the ones that were discontinued, they redirected the funds into higher�'performing projects.

The second one, a Wisconsin energy efficiency program increased CFL (phonetic) sales among women. Evaluation found that there is a significant gender gap in the sales that were occurring through this program, that more men were buying this than women, and so they thought there was an opportunity to reach more women through this program. So the program made some adjustments in order to increase CFL sales.

Third is a pesticide program with the U.S. EPA. This program was required to re�'register pesticide products. And there was a significant backlog in this project, and so they underwent an evaluation and found ways to streamline the process and actually cut their pace �'�' their estimated pace by about four years.

Fourth, we have a U.S. EPA program, ENERGY STAR website, that saved about 90 percent of its budget just by simplifying, because an evaluation found that people weren’t using all its sophisticated features.

And then, finally, there is a program in Quebec, energy efficiency program, which streamlined the application process for adding new customers to this program, and actually added more customers more quickly after evaluators suggested ways to streamline the process.

So, first, the hydrogen program. You’re going to see in bold kind of the three criteria in each of these slides, and how they fit into those three criteria. So the first is that the hydrogen program conducts hundreds of peer reviews using independent experts each year. And so, over five years, this case study looked at 695 reviews of projects that were conducted.

And after these peer reviews were conducted, most projects were continued, and they are rated on a scale from one to five. And the ones that were continued rated higher, 3.0, compared to the ones that were discontinued, only 5 percent, they got a 2.7 rating. There were also qualitative comments that determined whether �'�' that informed whether to continue with this project or not continue.

But still, a large majority �'�' 67 percent of the low�'performing projects �'�' were continued. But they had to make adjustments that were recommended by the peer reviews in order to increase the likelihood of success.

So, what became of these adjustments? Well, the ones that were rated a second time, they actually received a higher rating, most of them received a higher rating the second time. About 81 percent had a higher rating at their next review, the ones that were reviewed a second time. And, on average �'�' because the projects received a 2.6 rating in their first review, and then a 3.0 at their next review, and this was a statistically significant finding.

In addition, this program saved nearly $30 million by discontinuing projects that were considered poor performing, and weren’t aligned with the goals of the program. And this money was then redirected toward higher�'performing projects.

And so, the investment in these peer reviews amounted to about $2 million over 5 years. So it was about a 15�'fold return on investment, in terms of the money that was saved and redirected towards other programs.

Second, the CFL program. So there was a 2003 evaluation that found there was a significant gender gap in the participants in this program. You know, 62 to 65 percent of men were participating, where the Wisconsin population was actually 49 percent men.

And what they found was that women were more likely to buy light bulbs in grocery stores, but the program didn’t actually have many partners that were in these grocery stores or general retail outlets. So �'�' and actually, most of the program’s partners were home improvement or hardware stores like Home Depot, Ace Hardware, that sort of thing, where men tend to shop.

So, after looking at these findings, they adjusted their tactics, they added new retail partners in grocery and drugstores. They also had more targeted messages to try to increase sales among women. They determined that this is a real growth market for them.

So, the result is that later evaluations found an increase in sales among women. We saw the slide before. It was about 65 to 35. That was close to about 60 to 40, or even lower. And even though this gender gap shrank, the overall sales increased. So it was a sales increase among women, and this was further evidenced in later evaluations, as well.

And even among their new retail partners, they set a goal for themselves at a five percent increase in sales among these retail partners, like Mass Merchant, which are like Target or ShopCo (phonetic) or Wal�'Mart type stores. Lighting showroom stores, grocery stores, drug stores, they had an original goal of a five percent increase in sales from year to year. And actually, in 1 year, sales increased 68 percent. So this is an example of where regular evaluation can really determine what opportunities there are for growth, and what types of tactics people can use to achieve those opportunities.

The third case study is the U.S. EPA. So the EPA is required to look at all pesticides that were registered before 1984. This is to update information on health and safety. So, in addition to that, any product that has the active ingredients of a pesticide needs to be re�'registered. And by 2005 they had a significant backlog of products that needed to be re�'registered.

So, they underwent a process of evaluation to determine where they can streamline the process. And actually, the evaluation found that they were on pace to complete all of these projects in 2018. This was actually five years longer than what the EPA had budgeted for. So they needed to make significant cutbacks, or they would run over budget.

So, evaluators made 21 recommendations in order to streamline the process. And you can see those a little more in detail in your case studies. But managers implemented about 17 of them. And, in less than a year, they found that the number of decisions that were taken, the number of actions that were taken on these products almost doubled. And so the efficiency really jumped. And now they’re on pace to actually finish these �'�' this project in 2014. So it’s a four�'year cut in what their pace was.

So, this allows staff to move to other projects, and so it saves valuable time and resources, from a staff perspective, but also gets these labels on these new products out there sooner. These labels have important health and safety information that are useful to the people who are applying these pesticides out in the field. So the eventual outcome is that the people who are using these products will have a better idea of how to apply these safely.

Our fourth case study is a home energy advisor. This is also a U.S. EPA program. And the program had a website where you basically input your information of your home square footage, what kind of appliances you have, that sort of thing, and it would come back with recommendations for products to buy. And it’s very detailed information that the program originally thought consumers can just go to a hardware store or a home improvement store, hand them this sheet of paper, and say, “I want to buy this.”

And they actually underwent an evaluation to see how people are using this website, and they found that people weren’t actually using it that way. They found that people saw this more as options to explore, rather than something they would go right out and buy because of these recommendations. So, the evaluation really opened the eyes to the program.

And they were spending $100,000 a year on this auditing tool, and there were similar websites �'�' one from the Department of Energy and a couple other non�'profits �'�' so, looking at this evaluation and looking at the money they were spending on it, the program managers decided to eliminate the web tool, because it was a little bit redundant. And so they ended up redirecting people to this Department of Energy website.

But later they developed their own simplified tool, and this one had a lot less information that you had to input, and it gave out less information. But it was still the same �'�' had the same usefulness to people who were actually using this website. So �'�' and this web tool only cost $10,000 a year to maintain. So you see a 90 percent reduction in cost for getting the same results.

And finally, the empower programs. Hydro�'Quebec is a utility that provides electricity both in Canada and the U.S. And they had these programs where they provide financial incentives to businesses and industrial customers to make energy�'efficient improvements in their businesses.

And so, they decided to renew these programs, and they wanted a thorough evaluation in order to make sure that these projects would stay on track. And one of the things that evaluators found was that the application process was too difficult, too cumbersome for most customers. And so they recommended ways to streamline this process.

And actually, the program managers took this information and they created a task force to implement the recommendations. And one of the outcomes of that was that they cut the amount of application documents in half, among other ways that they improved the work flow.

And so, the results from this saw a significant drop in the amount of time the �'�' for application processing. You will see on the right�'hand graph here the average time before the adjustments between receipt of an application and a payment of incentive was 152 days. Well, first they developed a fast track feature for standard projects, and so that was cut to only 67 days. And then, for other projects, the application time was 97 days. So almost a halving of the application time.

And, because of this, the program is on track to meet their energy goals for 2010, much sooner than they would have been, otherwise. And, actually, because of these changes that they made, they saw an increase in customer satisfaction among the businesses that they worked with jump from about six�'and�'a�'half to a seven on a scale of one to ten, and that was a statistically significant improvement.

So you see, in this case evaluation helped them achieve �'�' they had a goal set of number of �'�' amount of energy that they wanted to save, and this really helped them achieve that goal much more quickly.

So, some common themes among all these evaluation case studies. First is that these evaluations provided information that was useful to the program managers and program implementers.

First is detecting and quantifying problems. The manager with the pesticide program said, “You can’t attack the problem until you’ve quantified it.” They knew they had a backlog, but they didn’t know how bad it was. And so, really, this program evaluation helped them with this �'�' determine, you know, what types of resources do we need to devote to this, and how long is it going to take.

Second, the information provided results and actionable decisions. So, these rating �'�' for the example �'�' for the hydrogen program, the ratings on the different projects, it helped prioritize budget decisions. Either you �'�' either discontinuing a project or continuing by making adjustments, or redirecting money to more productive elements.

And also, it provided information and recommendations on how to eliminate some of these unproductive or inefficient elements, and streamline the process.

And finally, they enabled goal achievement. There were several goals besides energy savings. There was more �'�' better customer satisfaction, these �'�' they reach more participants, or they reached a different sector or a different segment of the population. So the result is that they’re �'�' the services are better targeted to who they’re trying to reach.

And then finally, an additional benefit of evaluation is that they motivated the staff to improve. Again, an example with the pesticides, while they were doing the evaluation, the staff of this program actually decided to develop pilot projects on their own, and really motivated them to start looking at ways to improve the efficiency. And what the evaluator said is that just by having an evaluation, you demonstrate that the management is serious about getting this �'�' about correcting this problem.

Some of the keys to success that kind of go across these evaluation case studies is that communication occurs between evaluators and implementers throughout the entire process. It’s not just a one�'way street, it’s not just a single report. There is conversations throughout the process, and even after the evaluation is completed, through things like following up, through subsequent evaluations, or developing performance measures, so that these programs can measure their success on their own.

And a big problem is that in a lot of evaluation reports that we got there wasn’t really a follow�'up to it. But you know, if you want to demonstrate the success of this program, you really need to follow up, and there needs to be some sort of mechanism in place in order to continuously follow up, see how �'�' see what those actions accomplished.

And then, finally, these �'�' need to make evaluations useful to the people that are going to be using them. It sounds pretty straightforward, but the information that you provide is really important. It’s really important that it’s at a level that the implementers can understand and can use, and so it’s something that they can act upon.

And this sort of fits into what a colleague of mine, Ed Vine (phonetic), who is with Lawrence Berkeley (phonetic) National Laboratory calls “closing the loop.” It’s a continuous process of evaluating, implementing recommendations, and then following up, so that you continuously improve the program design, the planning process, program implementation process, because it’s �'�' you know, these programs are not just going to need a one�'time fix. They’re going to need to be continuously improved. And so, by following up and doing �'�' you know, following this sort of three�'step process, you get this continuous improvement.

Some concluding thoughts that �'�' just overall, evaluations are important for energy program success. In many cases, these programs need to prove that they are saving energy, or they’re meeting their goals. And so evaluations are an important part of that, helping them achieve their intended results efficiently and effectively.

Both evaluators and implementer have an incentive to show the success of decisions formed by evaluations. This is �'�' and it’s mostly just a matter of following up. Lots of times we saw that people were taking action based on recommendations, but they didn’t have any way of following up. If you can follow up, it looks better for your program, that you can demonstrate success.

And then, third, communication and program redesign are a continuous process, like I said with the last slide with that loop. It’s not a one�'time lecture, it’s an ongoing conversation between evaluators and implementers.

And then, finally, by following a sort of three�'step process of evaluating, implementing recommendations, and following up, you minimize the lost opportunities to achieve the program goals �'�' you know, in this case it’s energy savings and climate change goals. And so this process can really �'�' can help close the loop and help these programs continuously improve.

So, with that, I will take any questions that you have.

QUESTION: I have a question. Marcella (phonetic) (inaudible), Department of Veterans Affairs. In my setting, we consider program evaluation �'�' one of the values of program evaluation is our independence. And so, I’m thinking about your comments about following up, and that continuous conversation with �'�' between the evaluators and the program managers.

And how do you see our maintaining the unbiased or less biased stance of independent evaluators, versus a continuing conversation with program managers about how they’re managing a program?

MR. WILLIAMS: I think you can maybe create a structure where there are specific instances where, you know, before the final evaluation report is given, you have managers able to review the evaluation and make comments on it. And then also, once the evaluation is given, have some sort of structure for managers to respond to the evaluation.

QUESTION: Yes, I agree with that. But do they �'�' do you want them to be responding to the evaluators, or do you want them to be responding to, as we do in the VA, to the very top leadership in the department?

MR. WILLIAMS: I think, you know, it’s not the case that these managers are only going to listen to evaluators. Obviously, evaluation is an important component, but it’s not the only thing that’s going to inform their decisions.

PARTICIPANT: My name is Jeff Dowell (phonetic), Department of Energy. My office actually sponsored this work.

One of the ways we manage independence and evaluation is that we commission studies using professional evaluators �'�' professional, independent evaluators �'�' to do our evaluations. We don’t do impact evaluations, for example, in�'house. We �'�'

QUESTION: Contract it out?

PARTICIPANT: Contract it out, get professional evaluators and maintain some distance between the evaluator and the implementer. The evaluator comes in to work with their client, and they try to make sure they understand the programs and issues, and they set the questions and they design and research strategy, and they collect the data, they do the analysis. They �'�' all that is done independently of the program.

But we also have a standard operating procedure which requires all of our evaluation studies to be peer reviewed. So when we hire a contractor, that contractor’s work is subject to peer review, like other evaluators who are not part of this team.

So, we have several layers of independence that we build in to the process. And it’s worked well for the past five years.

MR. WILLIAMS: I think in many cases with these case studies, you find that it was an independent contractor who conducted these evaluations.

QUESTION: Can I ask a follow�'up question to your comment, sir? When you say “peer reviews,” are you talking about peers of the program manager, or peers of the independent evaluators?

PARTICIPANT: Peers of the evaluator.


PARTICIPANT: You’re actually evaluating the evaluators.


PARTICIPANT: You’re evaluating their work.

QUESTION: With sort of competitors of that evaluating �'�'



PARTICIPANT: It’s a way of actually �'�'


PARTICIPANT: Instead of using one brain to make sure you get a good product, you’re actually using five or six, because you’re using one to do the work, and the other four or five to check their work. So at the end of the day you come up with a really good product that is not likely to be criticized.

QUESTION: And I think “peer review,” that term was used earlier in your presentation.

MR. WILLIAMS: Mm�'hmm.

QUESTION: Were you talking about the same type of �'�'


QUESTION: Okay, thanks.


MR. WILLIAMS: In the hydrogen program.

PARTICIPANT: That peer review was a peer review of hydrogen project. We have a $100 million to $200 million annual hydrogen program �'�' annual budget for hydrogen program that has about 600 projects a year from hydrogen production to storage, (inaudible). So they have about 80 percent of those projects reviewed every year.

In fact, as I speak, they have a major peer review that’s being conducted now, involving several hundred reviewers reviewing about 400 projects over a period of 4 days. I think it starts today and it ends on Thursday. And those were reviews of specific projects, very technical, scientific projects being reviewed by other scientists.

The review that I was referring to was having the work of an evaluator reviewed by other evaluators.

QUESTION: I got it, yes. So it varies, depending upon the nature of the program. Is that right? And the nature of the review?


QUESTION: Yes? Okay.

QUESTION: (Inaudible.)

MODERATOR: Can you use the microphone? I’m sorry. Press that button.

QUESTION: I am Bebe Asama (phonetic), and I work for WestStart (phonetic), which is a research company based in Rockville. And we do a lot of program evaluations. And, in fact, we are contracted by various federal agencies to conduct evaluations of different types of programs.

And I wanted to comment a little bit on the independence versus maybe getting some input from the program managers. I think that, you know, the way we �'�' the way at least I have been able to address that is to make sure that you get the input from the program managers when it comes to interpretation. Because sometimes you may have findings, you have facts, you have results. But you lack the perspective to really interpret, you know, the findings, the results that are in front of you.

And sometimes that might require additional information that you might not have, because you were focusing on specific aspects of the program evaluation. So, I just know that I found that very useful. And to make sure that we focus �'�' we get their input on objective information, how do you interpret this?

We are seeing that, over the years, we are having maybe a decline in particular outcome of particular measure. And yet, these are the inputs that were provided, and this is what some of the maybe previous evaluation reports were saying. But this is what the data are showing. This is what the graph is showing. How do you explain that?

And you will be very surprised that sometimes you don’t have the information, and they will provide that additional information which would help you better understand the results of the program evaluation.

So, I just wanted to share that, that it’s very, very important, because we �'�' you can spend three months in one setting evaluating a program, you still don’t have the picture of why certain things are happening a certain way. And that type of input is very useful for interpretation.

PARTICIPANT: Actually, if I may, what I meant �'�' and we do incorporate many of the things that both of you have suggested �'�' we start out a program evaluation, but by going to our OMB colleagues, by going to staffers on the Hill, and by going to the program and asking them what questions they would like answered.

So, in developing the research questions, which we then do �'�' although we are, ourselves, professional evaluators in our service, we do contract out the actual conduct of the study. So we have that step of independence.

But what I was talking about in terms of balancing �'�' oh, and then we have a study team that is composed of both the professional evaluators on our staff, as well as technical experts from the program area. So they are there throughout the conduct of the study, which is usually a year�'and�'a�'half to two years, as advisors to the contractors to interpret what they’re learning when they’re in the field, or doing their �'�' developing their survey, or whatever.

But what I was talking about is after the final report is delivered to the program manager, the Members of Congress, OMB, and our leadership, how do we achieve �'�' how do we maintain our independence, if we were then to go to the program that has been evaluated to find out are they doing what the recommendations suggested. And it’s at that juncture that I was asking the question about balance.

So, after the study is complete, how involved should we remain with the program officials if, indeed, we want to remain the unbiased, independent group within the Department to conduct studies?

MR. WILLIAMS: And that can be, partially, I think maybe just how you structure the response. I think there is a group that runs energy efficiency programs in New York, has in their �'�' each of their programs that are evaluated, there is a specific rule that managers must respond within, I think, 30 days or 2 months or something, just saying how they intend to respond to those evaluations, to those recommendations.

But �'�' and there is also an element of just maybe having the program managers follow up themselves, creating performance measures that they can implement themselves, and you know, they can themselves �'�' can report those results.

PARTICIPANT: In a way, the evaluator probably drops out of the picture at that point, and –

PARTICIPANT: The contract evaluator –

PARTICIPANT: �'�' (inaudible) the contract evaluator. What’s important is to have a �'�' some type of response protocols for the officials, something that they require to respond to particular recommendations that document how they responded. And that documentation is not reported to the evaluator. It should be reported up the management chain.

PARTICIPANT: Well, indeed, that is how it happens in our department. They report back to the deputy secretary of the department, you know, and he is the one who tasks them with coming up with an implementation plan, or whatever.

But I was talking about the �'�' you know, the professional program evaluators in the Department. Are �'�' we see currently our role as ending when the report is delivered. And “delivered” means that the presentation has been made to the upper management and so on. And I understood, from what you were saying, that we should have some continuing dialogue with the program.

MR. WILLIAMS: I think it depends on the length of the program, too. You know, with the CFL program, this is going to last, you know, maybe a decade or so. So they have regularly scheduled impact evaluations and so forth where they can make these reports every two years or so, and so they can look back independently at what happened.

PARTICIPANT: Another thing you do is you might want (inaudible) you rotate the evaluators out, so you don’t use the same contractors, even though (inaudible) period of time, you rotate the �'�' you put out a new competition for a different �'�' a new evaluator, so you’re not working with the same evaluator for some extended period of time.

We find �'�' I mean there is a little bit of experience that if you work with an evaluator for too many years, their judgment gets tainted a little bit, because they get too close to the implementer. So you rotate the evaluators out.

PARTICIPANT: Each of our contracts is actually separately competed. So we don’t have an ongoing relationship with one.

MODERATOR: We’re actually out of time, unless someone has a burning question.

I would like to thank Scott Williams for his presentation and –


MODERATOR: Thank you all, as well.