Open-Source Field Data Provides a Window into Ghana’s Complex Cocoa Landscape

Ghana is the second largest producer of cocoa in the world, where cocoa expansion often comes at the expense of forest. Monitoring cocoa-driven deforestation is a particular challenge given the complexity of the cocoa landscape mosaic: cocoa is often grown in small fields and under partial shade, and the diversity of tree crops and shade trees often leads agroforestry systems to be misclassified as natural forests regardless of a different reality beneath the canopy.

Despite advances in machine learning using satellite data, the current extent and expansion of cocoa remains challenging to classify. Misclassification of cocoa farms not only has negative implications for climate and biodiversity but could unduly exclude smallholder farmers from the supply chain in the face of future regulations.

Prior cocoa maps have relied on reference data sets that are generally not open or accessible, limiting the ability to independently validate, replicate and build upon these results. The lack of openly available reference data on cocoa limits the ability to monitor deforestation drivers, such as cocoa expansion, and hinders informed decision-making and land use planning. Collecting verified, on-the-ground cocoa farm reference data is necessary to unlock the possibilities of machine learning for accurately detecting cocoa, as cocoa farms are difficult to visually identify in even the highest resolution satellite imagery available.. To respond to this challenge, an extensive field data collection effort was carried out in cocoa-growing regions across southern Ghana.

Improving maps of Ghana’s cocoa landscape through field data collection

This effort, financially supported by the Lacuna Fund and World Resources Institute (WRI), aimed to gather data on land cover and farmer well-being simultaneously. As the European Union Deforestation Regulation (EUDR) is showing, accurate geolocation information and supply chain transparency are key to this. To more accurately map cocoa areas, the project collected over 21,000 polygons of cocoa land cover in the field between September 2024 and March 2025. Data collectors in the field mapped only portions of cocoa farms instead of actual farm boundaries. The purpose of this project is to improve remotely sensed maps of cocoa through publicly available reference data, rather than to collect geolocation data for compliance with any particular regulation. Using high-resolution imagery, more than 14,000 homogeneous cocoa polygons were digitized from the field dataset, providing information on full-sun cocoa. To complement this cocoa field campaign, more than 20,000 points of non-cocoa land cover were manually classified using high resolution data from the Planet NICFI Satellite Data Program and other sources in Collect Earth Online. This included other key land covers that are more easily identifiable in imagery, such as rubber and oil palm plantations.

This extensive biophysical data collection was complemented by a robust socio-economic survey of cocoa farming households. More than 4,400 cocoa-farming households were surveyed to provide insights into key questions around land use and land change, including national incentive structures for promoting forest preservation and cocoa farming in the face of illegal gold mining encroachment. This survey will also provide a much-needed basis for understanding the effects of incoming deforestation regulations like the EUDR, on household incomes and farmer well-being, setting the stage for future follow-up surveys.

The importance of locally led data collection

Data collection efforts were led locally by the Centre for Remote Sensing and Geographic Information Services (CERSGIS), associated with the University of Ghana.

“As a wholly Ghanaian self-sustaining center, CERSGIS possesses a deep understanding of the local context surrounding cocoa production, which allows for culturally relevant approaches to data collection,” said CERSGIS’s Executive Director Foster Mensah. “We actively collaborate with both national and local stakeholders to establish effective data collection pipelines.” Support from international partners included the SERVIR Science Coordination Office (SCO) at the NASA Marshall Space Flight Center, the Laboratory for Applied Science at the University of Alabama in Huntsville (UAH), and experts at WRI, who contributed to the research design process and provided technical assistance.

The Forest Data Partnership (FDaP) brings together leading organizations, governments and private sector partners to collectively strengthen collaboration around global monitoring of commodity-driven deforestation. The partnership was leveraged to convene these partners around cocoa research, as well as through the use of Open Foris Ground to map the polygons for the project. Ground is an open-source application optimized for non-technical users to collect geospatial data in the field, developed by Google in collaboration with the United Nations Food and Agriculture Organization (FAO) through FDaP and launched in March 2024. This project was its first large-scale, operational use, involving over 30 data collectors and many weeks deployed across cocoa growing communities in Ghana.

To empower local organizations and communities to collect data for the project, SERVIR, UAH, WRI, and CERSGIS conducted a data collector training and instrument validation pilot in August 2024. CERSGIS staff, students from the University of Ghana’s Department Of Geography and Resource Development and the Centre for Climate Change and Sustainability Studies, and YouthMappers from the University of Ghana and University of Cape Coast were trained employing a “train-the-trainers" approach, which built local capacity and scaled to provide opportunities for youth from local target communities. Analysts were trained in visual imagery interpretation, field mapping using Open Foris Ground, and the household survey. The team of local Ghanaian analysts were crucial in adjusting household survey questions to be locally relevant and sensitive, as well as translating the survey into local languages. The instruments were then piloted in cocoa communities in two regions.

Maximizing the value of data

This project has already yielded many benefits. It provided an opportunity to employ Open Foris Ground in the field, uncover bugs, and improve the application in real time through an efficient feedback process between field teams in rural Ghanaian cocoa villages, project managers in the U.S., and Ground developer team members across the U.S., India, and Rome — while collecting accurate and reliable cocoa farm reference data. “The real conditions feedback provided by a national institution like CERSGIS in deploying data collection at scale is invaluable for the agile development of the application. Thanks to that effort, we managed to identify and fix operational inefficiencies and gaps, in a very short period.” says Rémi d’Annunzio, Forestry Officer at FAO.

The field data has already been incorporated as an input into the community-pooled reference data powering the latest iteration of the open-source cocoa probability model developed by Google and FDaP, with outputs for 2020 and 2023 available on Google Earth Engine. “For remote sensing classifications to accurately capture complex landscapes, particularly nuanced agroforestry systems like Ghana's, high-quality, in-situ field data is indispensable. This type of dataset is incredibly rare and takes immense effort to collect, which is why we're so grateful to the CERSGIS team for developing and curating it", says Katelyn Tarrio, a researcher on Google’s Sustainable Sourcing team.

Broadly, the project team hopes this data will spur further research and innovation relating to cocoa-driven deforestation in West Africa, bridging a key gap that cannot be overcome by satellite imagery alone. It will offer an opportunity for anyone to assess the accuracy of recent public cocoa mapping products that share the outputs, but not the inputs, that allow for independent validation.

Unique to this project is the integration of socio-economic household data alongside farm reference data. This combination will enable investigations into the incentive structures that influence land use decisions, including cocoa expansions. How will future European Union policies impact household incomes? Does greater opportunity in cocoa dissuade land use transitions, such as is seen in illegal gold mining operations? This open data on cocoa farming extent and behaviors will be critical to answering these questions and others. To protect the privacy of survey respondents, a clustering approach was carried out to anonymize the more than 4,400 household survey responses into 485 anonymized clusters that mask individual identities while maintaining the power of this dataset to power detailed analysis.

What is most important, however, is that this data can be impactful for cocoa communities across Ghana. Cocoa is important for many livelihoods in Ghana and across West Africa, and as governments and companies around the world strive to achieve more sustainable supply chains, it is imperative to ensure that smallholder farmers are not unjustly excluded from markets. By making this data publicly available, the project team hopes to spur the research community to rally around this challenge.

The datasets can be accessed on Zenodo here, under a CC-BY 4.0 license. This project team included Foster Mensah (CERSGIS), Bashara Abubakari (CERSGIS), Jacob Abramowitz (SERVIR/UAH), James Warburton (WRI), Ashleigh Zosel-Harper (WRI), and Emma Hokoda (WRI). The project team thanks all the data collectors involved from CERSGIS, the University of Ghana, and the University of Cape Coast.