Application of Remote Sensing Data in Crop Yield and Quality: Systematic Literature Review

Purpose: Covering current state of the art in the field of application of remotely sensed data in crop quality improvement. Methodology/Approach: Systematic literature review using novel text mining techniques. Findings: Relevance of topic, measured by number of relevant studies, is rising, best performing input data types and modelling techniques are identified. Research Limitation/Implication: Review to a certain point of time in a rapidly evolving field of research. Originality/Value of paper: There was no similar review article on the topic at the time of conducting this research.


INTRODUCTION
Precision Agriculture (PA) is changing aspects of agriculture around the world through several potential benefits, such as profitability, productivity, sustainability, crop quality, environmental protection, and rural development (Liaghat and Balasundram, 2010). According to Cisternas et al. (2020), one of the most used technologies in PA is Remote Sensing (RS). Jensen (1996) defined RS as "a scientific discipline discussing the acquisition and interpretation of information obtained by sensors that are not in physical contact with an observed object". This field of science includes aerial, satellite and cosmic observations of the surfaces and atmospheres of the planets of the solar system, with the most frequently studied object being the planet Earth. RS technologies are usually limited to methods that detect electromagnetic energy, including visible and invisible radiation that interacts with surface materials and the atmosphere (Liaghat and Balasundram, 2010).
Data obtained by RS techniques can be used in variety of sectors besides agriculture: from urban and natural resources planning and natural disaster prevention (Solemane et al., 2019) to creating tools that help optimizing global supply chains, such as Global Copper Smelting Index.
RS technologies have biggest impact on crop quality. According to Munnaf et al. (2020), the key indicator of crop growth and productivity is crop canopy and its geometric characteristics. It has been proven by many researchers, that crop canopy is a potential crop yield indicator (Villalobos et al., 2006). From the perspective of remotely sensed agricultural data, satellite-derived vegetation indices are often used to monitor crop quality and predict crop yields.

METHODOLOGY
The goal of this study is to summarize models, input data and crop types researched in the relevant studies in the field of crop yield quality estimation (CYE). We conducted this study as a systematic literature review by adapting the framework of Kitchenham and Charters (2007).
Firstly, we created a plan for the review, which consists of composing research Questions (Q), defining sources of articles, and search and review protocols: Q1: What are the most researched crop types? Q2: What models are used in CYE? Q3: What input data are used in CYE?

Search Protocol
First step was to extract keywords. Simple text mining tool was developed using Python programming language and Natural Language Processing packages, with full text of 15 previously found and highly relevant case studies as base data collection. In the next step, we performed text processing which consists of removing undesirable information (stop words), stemming and lemmatization. Finally, we evaluated the most frequent one-, two-and three-word terms.
Based on text analysis, we identified "remote sensing", "satellite imagery", "crop yield estimation" and "crop growth model" as the most relevant keywords. Selected terms were then combined into search queries (Table 1). Dataset of collected studies consists of 378 papers (after removing 43 duplicates).
By conducting this research, we found 108 highly relevant publication, that were subjected to further in-depth review. We reviewed full text of each publication to address all research questions.

RESULTS AND DISCUSSION
This section summarizes the results of the research. Firstly, we present general findings. After that, every research question is addressed.
The results shown that importance of RS and PA techniques used in crop yield prediction is increasing, based on the rising number of conducted case studies (relevance R3), as shown in Figure 1.

Figure 1 -Distribution of Publications per Year
Amid the selected publications, 52 were researching CYE models in Asia, with China (40) being the most researched Asian country, followed by India (5) and Pakistan (2). Second most researched continent is America, USA appeared in case studies 23 times and Canada four times. Fewer studies were conducted in Europe (13), Africa (13) and Australia (3).

Crop Types
It has been observed that seven publications did not explicitly indicate a specific type of crop. On the other side, 14 reviewed publications researched more than one crop at once.
We have identified a total of 16 different crop types. The most frequently researched crops were wheat, corn, rice, soybean, and cotton. This could be related to the researched countries, since, according to UN FAO statistics, China, India, and USA are amongst the biggest wheat and corn producers in the world. Results are shown in Figure 2.

Models
We have identified 19 different CYE models. Two main groups of estimation models were defined: existing models and custom models. Former group represents models, such as World Food of Studies (WOFOST) or Crop Environment Resource Synthesis (CERES), that simulate crop growth response to climate data, soil data, crop genotypes and field management across locations throughout the world (Basso, Liu and Ritchie, 2016).
Custom models are developed directly by researchers, using mainly regression analysis (REG) and machine learning techniques (ML). We have discovered that majority of researchers decided to develop their own model, with regression analysis being the most frequent one. However, as technology advances, more researchers are implementing machine learning techniques to estimate crop yield more precisely, as shown in Figure 3.  (SMF). Additionally, correlation between MODIS-WDRVI and grain yield (R 2 = 0.83) was higher than the one based on ground observed green Leaf Area Index (LAI) (R 2 = 0.66). The best correlation was observed 7 to 10 days before silking stage of maize.

Figure 3 -Models Used in Reviewed Publications per Year
Holzman and Rivas (2016)

Crop Growth Models
Our research revealed, that the most frequently used existing model was WOFOST. According to Yuping et al. (2008), WOFOST model provided more accurate estimates of winter wheat yield when remote sensing data are included during the growing season. Similarly, Ma et al. (2013) accomplished more accurate results implementing MODIS-LAI instead of simulated LAI inputs, although one of the disadvantages of MODIS-LAI approach is the residual error resulted from the mixed pixel effect.
CERES is an eco-physiological model that simulates crop phenology, total above ground biomass and yield using carbon, nitrogen, and water balance principles. Base CERES model uses similar inputs as WOFOST: weather, soil, and cultivar data. Case studies have shown that best results are accomplished when using both MODIS-, MERIS-or ASAR-derived LAI and vegetation indices such as EVI or NDVI (Dente et al., 2008;Fang, Liang and Hoogenboom, 2011;Jin et al., 2016a;Ban, Ahn and Lee, 2019).
AquaCrop is a water-driven crop growth model aimed at improving crop water management strategies in irrigation regions. Inputs are meteorological data, soil data, crop parameters and field management data. Jin et al. (2016b) researched winter wheat yield prediction using AquaCrop model and discovered, that the best performing spectral index was Normalized difference matter index (NDMI). Luciani, Laneve and JahJah (2019) used NDVI time series derived phenological data in AquaCrop model with R = 0.699 for corn and R = 0.723 for wheat. However, authors stated, that model performances could be unsatisfactory in severely water-stressed environment.
CROWRAYEM is an abbreviation of Crop water requirement analysis yield estimation model, based on CROPWAT modelling software, which uses climatic data and the crops' yield response to estimate yield (Eze et al., 2020). The research has been conducted in Ethiopia for sorghum and barley. Both crop types performed relatively well, with R 2 coefficient of 0.85 (sorghum) and 0.86 (barley).

Input Data
We have identified 11 types of vegetation indices that were used as an input in various CYE models. The most frequent vegetation index is NDVI, which appeared in 75 of 108 publications, followed by LAI (53 publications) and EVI (18 publications). Results showed that the importance of NDVI and LAI is increasing in recent years, with both indices being used in more studies published in recent years (Figure 4).

Figure 4 -Distribution of Input Data per Year
Gontia and Tiwari (2011)

CONCLUSION
PA is one of the most important trends in the direction of the food sustainability and quality improvement. As the population of the Earth growths, so do the food requirements. PA offers many frameworks and tools that help achieve this goal, with remote sensing being one of the most important. Usability of remote sensing technology is extensive: from urban and natural resources planning and natural disaster prevention to agriculture industry optimization.
This study provides knowledge on the state of the art regarding the crop yield estimation using remote sensing technologies as well as identifies current trends in research. Firstly, we have identified 3 main research questions: • What are the most researched crop types?
• What models are used for CYE?
• What input data are used in CYE?
We have implemented a novel approach using Python's natural language processing packages to extract keywords from previously found case studies. By inserting extracted keywords into search machines, we have found 378 articles in total. As our research revealed, this scientific field has grown in importance in recent years: while 20 years ago were three studies published, in 2020 alone we have found 52 publications on the topic.
As our research showed, there has been exponential rise of research over the years on CYE using RS data. This can be observed through the amount of publication that we found using relevant keywords.
Researchers study many different crop types with wheat and corn being the most frequent; develop different regression models and engage machine learning and artificial intelligence techniques to predict crop yields more and more accurately. On the other side there are many existing crop yield/growth models that report reasonably accurate estimations, such as WOFOST or CERES, that can be modified to fit the specifics of a crop type to further enhance the accuracy of forecast.
We have discovered, that remotely sensed data emerged as variations of spectral vegetation indices, that are more unified and more usable in crop yield forecasting. Different vegetation indices report different accuracy for different crop types, even in different parts of the world. We have identified the most important spectral vegetation indices, that can be used to predict crop yield, as well as potential sources.
The results of the systematic literature review allow to identify multiple future work research in context of CYE using RS data. Namely, focusing on crop types that have not been researched as often, such as barley, sugarcane, potato, or sunflower or creating model that is able to identify type of crop and automatically suggests needed input data. This can be done by implementing convolutional neural networks, that are able to extract spatial and temporal features from multispectral images. Another important aspect that future research should focus on is the impact of predicted yield and estimated health of crop on global supply chains. This can lead to optimization of pesticide use and hence to better food quality.