Update

b209a520 · Feenstra, Ninke · 9a7c5d35 · b209a520
Commit b209a520 authored 2 years ago by Feenstra, Ninke
--- a/survey_paper.Rmd
+++ b/survey_paper.Rmd
@@ -32,9 +32,48 @@ The aim of this paper is to provide insights about the potential uses of spatial
 ## Spatial Microsimulation
-Throughout the years different methods have been developed to create spatial microsimulation models. These methods can first of all be divided into synthetic reconstruction and reweighting methods. Synthetic reconstruction methods aim to estimate and impute characteristics of individual units with a known geographic location, while reweighting methods create new weights for units without a known geographic location so that they are representative for a given region. Nowadays reweighting methods are seen as superior methods over synthetic reconstruction and as a result most spatial microsimulation models are developed using reweighting approaches [@Tanton2013]. As most models are nowadays being developed with reweighting techniques, I will only focus on these methods in this section. 
+Spatial microsimulation combines constraint data that has information at the regional level but not about the variable of interest and the seed data which does not contain spatial information, but does have information about the variables of interest. Often census data is used as the constraint data while survey data is used for the seed. Both data sets are to create new weights for each micro unit in the seed data set. While there are different methods for combining the census and survey data, they all have in common that they require a set of constraint variables. These constraint variables should meet the following four criteria:
+- They should be present in both the census and survey data sets
+- They should be available at the level of the micro unit that is being simulated
+- They should be a reasonable predictor of the variable of interest at the regional level
+- They should be a reasonably predictor of the variable of interest at the micro unit level.
+@Tanton2013 explains that a review of the data can answer whether or not criteria one and two are met. In specific, it should be investigated whether the constraint variable is defined in the same way in both the census and the survey. After checking the definitions of each variable and possible correcting for any differences, the constraint variables should, even if the original variable is continuous, be a categorical or ordinal variable where the different classes are similar across the two data sets. To investigate whether criteria 3 and 4 are met either previous literature or econometric analysis can be used. When using econometric analysis. the r-squared value can be used as an indicator of how food the constraint variables predict the outcome variable. 
+After choosing the constraint variables, a choice must also be made about the method of creating the spatial microsimulation model. The different methods can be divided into synthetic reconstruction and reweighing methods. Synthetic reconstruction methods aim to estimate and impute characteristics of individual units with a known geographic location, while reweighing methods create new weights for units without a known geographic location so that they are representative for a given region. Nowadays reweighing methods are seen as superior methods over synthetic reconstruction and as a result most spatial microsimulation models are developed using reweighing approaches [@Tanton2013]. As most models are nowadays being developed with reweighing techniques, I will only focus on these methods in this section. 
+<!-- voeg hier nog toe dat ik toch een beetje focus op SR en dat IPF voor zowel SR als reweighting gebruikt kan worden --> 
 ### Iterative Proportional Fitting
-Deterministic iterative proportional fitting (IPF) is one of the techniques that can be used to reweight micro units. This technique should not be confused with the probabilistic iterative proportional fitting that is used for synthetic reconstruction. The main equation behind IPF is:
+Deterministic iterative proportional fitting (IPF) is one of the techniques that can be used to reweigh micro units. This technique should not be confused with the probabilistic iterative proportional fitting that is used for synthetic reconstruction. The main equation that is being used to create new weights in IPF is:
-$$ Nw_f=w_{if}*c_{hj}/s_{hj}$$
+$$ Nw_{rf}=w_{if}*c_{rhj}/s_{hj}$$
-Here $Nw_f$ is the new weight for farm $f$ while $w_{if}$ is the original weight of farm f. 
+Here $Nw_{rf}$ is the new weight for farm $f$ in region $r$ while $w_{if}$ is the original weight of farm $f$. $h$ is the categorical outcome of constraint variable $j$ for farm $f$. According to the census data $c_{rh}$ is number of farms in region $r$ that have outcome $h$ for the constraint $j$. $s_{hj}$ is the number of farms in the survey that have outcome $h$ for constraint $j$. After creating new weights for the first constraint the process moves through all the other constraints where the previously created weight is taken as the initial weight. When the equation has been applied to all constraint variables the process returns to the first constraints variable and applied to each of the constraint variables again. Iterating this process ten times is found to be sufficient to achieve a stable new weight [@Tanton2013]. After the process has been iterated several times the simulation moves to a different region until each farm has a new weight for each region. These newly created weights can be used on their own to create a synthetic population by interpreting the weights as the number of farms in that a specific farm represents. On the other hand the weights can also be interpreted as the probability that a specific farm belongs to that region. These probabilities can be combined with Monte Carlo sampling to estimate probabilistic whether or not a farm belongs to that region or not. 
\ No newline at end of file
+### Combinatorial Optimisation
+One of the disadvantages of IPF is that if the created weights are not always whole numbers and when they are used as the number of farms that the represent in a specific regions, fractions of farms are assigned to a specific region. @Tanton2013 explains that one other methodology that is able to assign whole farms to a region is Combinatorial Optimisation (CO). In the first step of CO the total number of farms that are located is each region is determined from census data. Second, for each region a number of farms are randomly drawn from the survey data. This number is equal to the total number of farms in that region. Thirdly, the fit of the randomly sampled farm is determined based on the predetermined constraint variables. Of each category of each constraint the absolute difference between estimated and observed frequency are calculated. These absolute differences of all constraints are summed which gives the total absolute difference. In the fourth step, one of the randomly selected farm is randomly swapped by another farm and the fit of the new synthetic farm population is determined. If the swap leads to an improved fit, the new farm is kept and otherwise the swap is reversed. Last of all, this swapping and estimating the fit goes un until there is no further reduction in absolute difference or until there is an acceptable level of absolute difference. 
+### Simulated Annealing and Quota Sampling
+In real life CO is hardly used anymore as the algorithm behind it is not able to prevent that the algorithm gets trapped with a sub optimal selection of farms. As a result, new methodologies that build onto CO have been developed such as Simulated Annealing and Quota Sampling. 
+<!-- comment --> 
+### GREGWT
+### Simulated Annealing
+### Quota Sampling
+## Farm Level Spatial Microsimulation Models
+```{r}
+headings <- c("Citation","Model Name","Context","Data","Static or Dynamic","Method")
+ballas <- c("Ballas et al., 2006","SMILE","Ireland","Synthetic reconstruction IPF combined with Monte Carlo Sampling")
+```
+knitr::kable()
+## Creating a Spatial Microsimulation model for European Agriculture 
+So far, spatial microsimulation has only been applied to several individual countries or regions in the EU, but not to the EU as a whole. It is possible to develop a spatial microsimulation model for the EU as a whole by using the farm structure survey (FSS) as the constraint data and the farm accountancy data network (FADN) data as the seed.