Evaluation Policy


The Bill & Melinda Gates Foundation collaborates with partners to promote long-term solutions that help people live healthy, productive lives. Achieving our ambitious goals requires rigorous evaluation so we and our partners can continually improve how we carry out our work.

Evaluation is the systematic, objective assessment of an ongoing or completed intervention, project, policy, program, or partnership. Evaluation is best used to answer questions about what actions work best to achieve outcomes, how and why they are or are not achieved, what the unintended consequences have been, and what needs to be adjusted to improve execution. When done well, evaluation is a powerful tool to inform foundation and partner decision making about how to optimize scarce resources for maximum impact. It is distinct from other forms of measurement that focus only on observing whether change has occurred, not why or how that change occurred.

Our current practice in evaluation is characterized by variation, and in the absence of a policy, decision making is left to individual program teams and program officers. Because the foundation supports a diverse range of partners and projects, it is necessary to have a clear organizational understanding of how evaluation should vary to best inform decision-making across each of these areas.

Purpose of the Policy

Our evaluation policy is intended to help foundation staff and our partners align their expectations in determining why, when, and how to use evaluation. More specifically, the policy encourages foundation teams to be more transparent, strategic, and systematic in deciding what and how to evaluate. Our aim is to integrate evaluation into the fabric of our work, achieve early alignment with our partners about what we are evaluating and why, and generate evidence that is useful to us and our partners as we move forward.

Organizational Context

Our evaluation policy is rooted in our business model, which involves working with partners to achieve the greatest impact. Early in the grant proposal process, we work with prospective partners to define and agree on measurable outcomes and indicators of progress and success. This enables our partners to learn as they carry out their work rather than be distracted by requirements to measure and report at every step along the way.

This approach reinforces the role of evaluation in testing innovation, making improvements, and understanding what works and why to learn quickly from failure and replicate success.

The policy is also rooted in the foundation’s core values: collaboration, rigor, innovation, and optimism. More specifically:

  • We consider evaluation to be a collaborative learning tool that provides us and our partners with feedback so we can learn, adjust, and decide how best to achieve outcomes.
  • We apply rigor to our decision making on when and how to use evaluation and the types of evaluation to implement.
  • We believe that evaluation is crucial to learning from innovative projects and pilot programs before they are implemented on a larger scale.
  • We are optimistic that evaluation can spread benefits across many disciplines as we share useful evidence with partners and others.
Our strategies and evaluation

Our Strategies and Evaluation

The foundation organizes its resources by strategies, each in a specific area or sector. Each strategy has its own goals and priorities, partners and grantees, and allocation of foundation resources. Strategy teams execute their strategies by making investments (grants, contracts, and program-related investments), as well as through advocacy work.

Foundation teams measure the progress of their strategies and investigate what works best to achieve priority outcomes using many different types of evidence. A combination of evaluation findings, partner monitoring data, grantee reports, modeling, population-level statistics and other secondary data offer a more cost-effective and accurate alternative to large summative evaluations. We use all of these sources, including evaluation where relevant, expert opinion, and judgment to decide how to refine foundation strategies on a regular basis.

Evaluation is particularly warranted in the following instances:

  • When evidence is needed to fill a knowledge gap or evaluate a significant policy decision. Evaluation can help to resolve uncertainty and determine the relative cost-effectiveness of different interventions, models, or approaches.
  • When we and our partners need a better understanding of how a cluster of important investments or a specific program or project is performing.
  • When an organization, intermediary, or consortium that we work with is at a critical stage of development and can benefit from an independent performance assessment.
  • When a program team needs to assess the progress of a new operational model or approach. Evaluation provides reliable, independent feedback about what needs to be improved to strengthen our approach and partner relationships.

Evaluation is a high priority when program outcomes are difficult to observe and knowledge is lacking about how best to achieve results—such as when we collaborate with partners who are working to improve service delivery or effect behavioral change, identify, replicate, or scale innovative models, or catalyze change in systems, policies, or institutions.

Evaluation is a low priority when the results of our efforts are easily observable. It is also a low priority when our partners are conducting basic scientific research, developing but not distributing products or tools, or creating new data sets or analyses. In such cases, our partners’ self-reported progress data and existing protocols (such as for clinical trials) provide sufficient feedback for decision making and improvement.

Program teams are not expected to use evaluation to sum up the results of foundation strategies. This would not be the best use of scarce measurement and evaluation resources for two reasons: 1) the impact of our investments cannot easily be differentiated from that of our partners’ investments and efforts; and 2) foundation leaders are more interested in learning how our teams can make the best use of resources and partnerships and how to strengthen program execution.

Evaluation design and methods

Evaluation Design and Methods

Evaluation is a contested discipline. We are aware of the ongoing and healthy debate about what types of evidence are appropriate to inform policy and practice in U.S. education and in international public health and development. However, the diversity of our partners and areas of focus precludes us from promoting only certain types of evaluation evidence as acceptable for decision making.

We avoid a one-size-fits-all approach to evaluation because we want our evaluation efforts to be designed for a specific purpose and for specific intended users. This approach to evaluation design, which we call fit to purpose, has three elements:

  • It allows for a range of methods, including qualitative and quantitative data collection and analysis, retrospective and prospective designs, experimentation, theory-based evaluation, and systems-based approaches.
  • It requires our teams, outside evaluators, and partners to be rigorous about the inferences they make and explicit about the assumptions they use to draw conclusions.
  • It requires our teams and our partners to consider evaluation evidence in the context of action so the evaluation efforts produce findings that can be acted on rather than information that is merely nice to know.

The following three designs represent the vast majority of the evaluations we support.

Evaluations to understand and strengthen program effectiveness

Evaluations that help our partners strengthen the execution of projects are among the most relevant for the foundation because they provide feedback about what is and isn’t working within a specific location or across locations.

We use this type of evaluation in the following scenarios:

  • When one or more partners are delivering a combination of interventions to achieve aggregate outcomes (e.g., increased and consistent use of latrines, better student achievement, exclusive breastfeeding in the first 6 months, use of a particular crop variety, or use of mobile phone–based financial services by women) in a specific location.
  • When one or more partners are delivering the same approach, product, or solution in different locations.
  • When we collaborate with a partner to promote effective resource allocation, planning, and delivery of services in a specific location or sector.

Such evaluations should be designed with the following considerations in mind:

  • They are not expected to assess causal relationships between the interventions and the desired outcomes.
  • They should have a very specific purpose and use. Because evaluations can quickly become comprehensive as well as expensive, the findings must closely match the partner’s decision-making needs.
  • We support the use of technological innovations in data collection and analysis to increase the timeliness and accessibility of data.
  • Both quantitative and qualitative data are relevant in evaluating processes, operations, cost effectiveness, key stakeholders’ perceptions, and enabling contextual factors.

Evaluations may include impact estimates if those are needed to inform important decisions—about scaling up an initiative, for example, or about the level of penetration needed to ensure a certain level of impact. Impact estimates should not be used as proof of macro-level impact, however.

Because the assumptions used to construct impact estimates can lead to large error margins, a robust baseline of key coverage indicators is essential, along with data on how these indicators have changed over time. Population-level impact can then usually be determined through modeling or use of secondary data.

In select cases, it may be necessary to determine a causal relationship between the change in coverage and the desired population-level impact. If so, the design should include a plausible counterfactual, usually obtained through modeling or comparison with national or sub-national trends.

Evaluations to test the causal effects of pilot projects, innovations, or delivery models

Evaluations that produce causal evidence can be used to decide whether to scale up or replicate pilots, innovations, or delivery models. They can also provide essential knowledge to the foundation, our partners, policymakers, and practitioners.

We use this type of evaluation in the following scenarios:

  • When foundation teams and partners need evidence to determine which solutions within large programs are the most effective and cost-effective.
  • When foundation teams and partners invest in pilot projects and innovations and need evidence to persuade others to scale up to larger geographies or replicate in other contexts.
  • When we and our partners need evidence to make trade-offs between different implementation tactics, delivery approaches, and program components.
  • When we and our partners need to assess the effectiveness of advocacy, social marketing, and awareness-raising tactics before deciding on an overall strategy to influence perceptions and behaviors.

Evaluations of causal relationships should be designed with the following considerations in mind:

  • They should be able to clearly demonstrate that the positive or negative effects observed were caused by the intervention or tactic. They should also measure the size of the effect caused by that intervention or tactic.
  • They must be able to rule out the effects of factors other than the specific intervention, by including a plausible counterfactual. We suggest using experimental or certain quasi-experimental designs in this context. If it is impractical or implausible to create a counterfactual (e.g., when a national institution provides technical support to a government partner), we suggest using the evaluation design described next.
  • They are more useful when we test variations rather than a single line of inquiry (e.g., does x work or not?).
  • They should examine the equally important questions of how and why the changes were caused by the intervention, by looking at processes, performance, and costs.

Evaluations of causal relationships should not be used when existing proxies of effectiveness and outcomes are sufficient. They are also not appropriate for evaluating whole packages of interventions with multiple cause-and-effect pathways.

Evaluations to improve the performance of institutions or operating models

Evaluations that provide a neutral assessment of the effectiveness of an organization or operating model can inform foundation and partner decision making about how best to use financial or technical resources, resolve challenges, and support ongoing progress.

We use this type of evaluation selectively, in the following scenarios:

  • When we work with a partner organization that is essential to the success of a foundation strategy.
  • When our relationship with the partner is at a critical juncture where additional, detailed information on a specific area of operation can inform next steps, strengthen collaboration, and depersonalize decision making.
  • When we develop new ways of working (e.g., by establishing a deeper presence in a specific country) and an objective, systematic assessment can inform decisions about implementation and strengthen relationships with key stakeholders.

Evaluations of institutional effectiveness and operating models should be designed with the following considerations:

  • Such evaluations can easily become too comprehensive and burdensome to foundation staff or partners, so rigorous selection of evaluation questions and a clear purpose are essential.
  • Whenever possible, these evaluations should be done in close collaboration with other donors so we can gain efficiencies, achieve a common understanding of the support that key partners need to succeed, and continue learning from our joint experience.

Such evaluations are largely qualitative and should not seek to assess the causal relationship between a partner organization or operating model and program outcomes.

Evaluation roles and responsibilities

Evaluation Roles and Responsibilities

Our evaluation policy is a starting point for strengthening how we use evaluation within the foundation and with our partners. We complement it with resources and designated roles within the foundation that enable clear decision making about when and how to use evaluation and facilitate consistent management of evaluations and use of findings. These resources and roles are detailed in the following sections.

Evaluation plans

Program teams in our U.S. and global divisions that work with partners each have an evaluation plan, which they share openly with partners to promote collaboration, joint evaluation, and learning within and outside the foundation. The plan identifies existing evidence and the critical gaps that we and our partners need to fill to inform decision making and build knowledge.

Program officers consult the team plan before making decisions about specific evaluations, to ensure that evaluation investments fit into an overall strategic framework. They also consult with the foundation’s central Strategy, Measurement & Evaluation team, which works with all program teams at the foundation to find opportunities to invest in and share evaluations that have cross-program relevance and to advance innovation in evaluation methods.

During the grant development process, our program officers and partners discuss and decide whether an evaluation will be needed, to ensure alignment on expectations and sufficient resources to produce useful evaluations. Key factors include the following:

  • Our starting point is ensuring that our partners have the capacity to achieve outcomes that are measurable and can use data to make adjustments and improvements.
  • When a partner sees the need for evaluation, we provide financial and technical resources to ensure that the evaluation is fit to purpose and can produce findings that the partner can act upon.
  • The program officer and partner jointly determine the evaluation’s intended users, the general evaluation design, and resources needed to ensure that the purpose, timing, and use of the evaluation findings are well matched.
  • If the partner lacks evaluation capacity and the evaluation has strategic importance to the foundation, the program team may decide to use an independent evaluator.

All foundation-funded evaluations—whether conducted by independent parties or integrated into our partners’ work—are recorded in a foundation evaluation registry. This helps us track evaluation spending and findings, and ensure continuity and consistency regardless of any foundation or partner staff turnover.

Roles and responsibilities

Responsibility for evaluation takes place at many levels of the foundation:

  • The foundation’s leaders are committed to the principles that underlie this evaluation policy, including the focus on purpose-driven evaluation and the use of evaluation for learning and decision making rather than for proof that foundation resources are responsible for the outcomes of our joint efforts with partners.
  • Our program directors and foundation presidents are dedicated to rigorous decision making about when foundation teams should invest in evaluation, and they use evaluation evidence in their own decision making as well as in dialogue and advocacy with partners and grantees.
  • Our program officers promote the use of evaluation findings for learning within the foundation and externally. They use their team evaluation plans to guide evaluation decisions, and they consult with their measurement, learning, and evaluation colleagues for technical support. They also use evaluation evidence in their decision making and their communication with partners. They are responsible for recording foundation-funded evaluations in the central registry and for sharing findings transparently within and outside the foundation to maintain institutional knowledge and promote ongoing learning and improvement.
  • Our program teams have embedded measurement, learning, and evaluation staff to support the design and use of high-quality evaluations. The role of the embedded staff includes helping to design team evaluation plans, leading the design and management of high-priority evaluations, providing technical support and research to program officers and partners who are designing programs or impact evaluations, and ensuring that evaluations are recorded in the foundation’s registry.
  • The foundation’s central Strategy, Measurement & Evaluation team is responsible for setting and promoting evaluation standards, creating tools and resources for foundation and partner use, and advancing cross-program evaluation and learning. Their responsibilities include assuring evaluation is integrated into foundation business process, maintaining a roster of independent evaluators, funding evaluations that fill critical evidence gaps or answer questions that are relevant to more than one program, providing foundation staff and partners with training and skill-building support, and assisting program teams and partners who need support or advice on evaluation design or management.

Whenever possible, foundation teams look for opportunities to build on grantee monitoring and evaluation rather than create parallel systems, and to invest in national evaluation capacity to support our global programs. We recognize that this may entail concurrent investments in building our partners’ evaluation capabilities. This support is provided directly by program teams and their embedded measurement, learning, and evaluation staff.



The Bill & Melinda Gates Foundation made a clear commitment to actionable measurement as a guiding philosophy in 2008. The philosophy spells out a clear need for purpose-driven evaluation rather than adherence to any one particular method or design. Where relevant and matched to the type of work we do, evaluation can help depersonalize decision making and provide objective data that can inform action.

This policy document outlines the foundation’s position on why, when, and how we use evaluation to create useful evidence for decision makers, improve program execution, inform our evidence-based advocacy, and develop stronger relationships with our grantees and partners. We view it as the starting point of a larger effort to make high-quality evaluation an integral part of how we operate and carry out our work.