dynare/doc/userguide/ch-estadv.tex

\chapter{Estimating DSGE models - advanced topics} \label{ch:estadv}

This chapter covers two general areas. The first focuses on the theory of Bayesian estimation. It begins by motivating Bayesian estimation by suggesting some arguments in favor of it as opposed to other forms of model estimation. It then attempts to shed some light on what goes on in Dynare's machinery when it estimates DSGE models. To do so, this section surveys the methodologies adopted for Bayesian estimation, including defining what are prior and posterior distributions, using the Kalman filter to find the likelihood function, estimating the posterior function thanks to the Metropolis-Hastings algorithm, and comparing models based on posterior distributions. Finally, the second part of this chapter focusses on advanced topics and features of Dynare in the area of model estimation. These include an example of model comparison, instructions to compare Bayesian estimation results to those of a BVAR, and a table summarizing where output series are stored.  \\

\section{Behind the scenes: an introduction to Bayesian estimation}

\subsection{Advantages of Bayesian estimation}

Bayesian estimation is becoming increasingly popular in the field of macro-economics. Recent papers have attracted significant attention; some of these include:  \citet{Schorfheide2000} which uses Bayesian methods to compare the fit of two competing DSGE models of consumption, \citet{LubikSchorfheide2003} which investigates whether central banks in small open economies respond to exchange rate movements, \citet{SmetsWouters2003} which applies Bayesian estimation techniques to a model of the Eurozone, \citet{Ireland2004} which emphasizes instead maximum likelihood estimation, \citet{VillaverdeRubioRamirez2004} which reviews the econometric properties of Bayesian estimators and compare estimation results with maximum likelihood and BVAR methodologies, \citet{LubikSchorfheide2005} which applies Bayesian estimation methods to an open macro model focussing on issues of misspecification and identification, and finally \citet{RabanalRubioRamirez2005} which compares the fit, based on posterior distributions, of four competing specifications of New Keynesian monetary models with nominal rigidities.\\

There are a multitude of advantages of using Bayesian methods to estimate a model, but six of these stand out as particularly important and general enough to mention here.\\

First, Bayesian estimation fits the complete, solved DSGE model, as opposed to GMM estimation which is based on particular equilibrium relationships such as the Euler equation in consumption.\\

Second, estimation in the Bayesian case is based on the likelihood generated by the DSGE system, rather than the more indirect discrepancy between the implied DSGE and VAR impulse response functions.\\

Third, Bayesian techniques allow the consideration of priors which work as weights in the estimation process so that the posterior distribution avoids peaking at strange points where the likelihood peaks. Indeed, due to the stylized and often misspecified nature of DSGE models, the likelihood often peaks in regions of the parameter space that are contradictory with common observations, leading to the ``dilemma of absurd parameter estimates''.\\

Fourth, the inclusion of priors also helps identifying parameters. The problem of identification often arises in estimation and can be summarized as different values of structural parameters leading to the same joint distribution for observables. More technically, the problem arises when the posterior distribution is flat over a subspace of parameter values. But the weighting of the likelihood with prior densities often leads to adding just enough curvature in the posterior distribution to facilitate numerical maximization.\\

Fifth, Bayesian estimation explicitly addresses model misspecification by including shocks, which can be interpreted as observational errors, in the structural equations.\\

Sixth, Bayesian estimation naturally leads to the comparison of models based on fit. Indeed, the posterior distribution corresponding to competing models can easily be used to determine which model best fits the data. This, as other topics mentioned above, is discussed more technically in the subsection below. 


\subsection{The basic mechanics of Bayesian estimation}
This and the following subsections are mainly based on a very comprehensive and instructive presentation on Bayesian estimation available on the Dynare website (** add link to presentation) by St<53>phane Adjemian, a member of the Dynare development team. After reading the summary below, you are encouraged to view the presentation for additional details. Also, \citet{AnSchorfheide2006} includes a clear and quite complete introduction to Bayesian estimation, illustrated by the application of a simple DSGE model, while the appendix of \citet{Schorfheide2000} contains details as to the exact methodology and possible difficulties encountered in the process. You may also want to take a glance at \citet{Hamilton1994}, chapter 12, which provides a very clear, although somewhat outdated, introduction to the basic mechanics of Bayesian estimation. Finally, the websites of \href{http://www.econ.upenn.edu/~schorf/}{Frank Schorfheide} and \href{http://www.econ.upenn.edu/~jesusfv/index.html}{Jesus Fernandez-Villaverde} contain a wide variety of very helpful material, from example files to lecture notes to related papers. Finally, remember to also check the \href{http://www.cepremap.cnrs.fr/juillard/mambo/index.php?option=com_forum&Itemid=95&page=viewforum&f=2&sid=164275ffd060698c8150318e8d6b453e}{open online examples} of the Dynare website for examples of .mod files touching on Bayesian estimation. \\

At its most basic level, Bayesian estimation is a bridge between calibration and maximum likelihood. The tradition of calibrating models is inherited through the specification of priors. And the maximum likelihood approach enters through the estimation process based on confronting the model with data. Together, priors can be seen as weights on the likelihood function in order to give more importance to certain areas of the parameter subspace. More technically, these two building blocks - priors and likelihood functions - are tied together by Bayes' rule. Let's see how. \\

First, priors are described by a density function of the form  
\[ p(\boldsymbol\theta_{\mathcal A}|\mathcal A) \] where $\mathcal A$ stands for a specific model, $\boldsymbol\theta_{\mathcal A}$ represents the parameters of model $\mathcal A$, $p(\bullet)$ stands for a probability density function (pdf) such as a normal, gamma, shifted gamma, inverse gamma, beta, generalized beta, or uniform function. \\

Second, the likelihood function describes the density of the observed
        data, given the model and its parameters:
        \[
            {\mathcal L}(\boldsymbol\theta_{\mathcal A}|\mathbf Y_T,\mathcal A) \equiv p(\mathbf Y_T | \boldsymbol\theta_{\mathcal A}, \mathcal A)
        \]
        where $\mathbf Y_T$ are the observations until period
        $T$, and where in our case the likelihood is recursive and can be written as:
        \[
            p(\mathbf Y_T | \boldsymbol\theta_{\mathcal A}, \mathcal A) =
            p(y_0|\boldsymbol\theta_{\mathcal A},\mathcal A)\prod^T_{t=1}p(y_t | \mathbf Y_{t-1},\boldsymbol\theta_{\mathcal A}, \mathcal A)
        \]

We now take a step back. On the one hand, we have a prior density $p(\boldsymbol\theta)$ and on the other, a likelihood $p(\mathbf Y_T | \boldsymbol\theta)$. In the end, we are interested in $p(\boldsymbol\theta | \mathbf Y_T)$, the \textbf{posterior
        density}. Using the Bayes theorem twice we obtain this density of parameters
        knowing the data. Generally, we have
         \[p(\boldsymbol\theta | \mathbf Y_T) = \frac{p(\boldsymbol\theta ; \mathbf Y_T) }{p(\mathbf Y_T)}\]

We also know that
    \[
        p(\mathbf Y_T |\boldsymbol\theta ) = \frac{p(\boldsymbol\theta ; \mathbf Y_T)}{p(\boldsymbol\theta)}
        \Leftrightarrow p(\boldsymbol\theta ; \mathbf Y_T) = p(\mathbf Y_T |\boldsymbol\theta)\times
        p(\boldsymbol\theta)
    \]

By using these identities, we can combine the prior density and the likelihood function discussed above to get the posterior density:
        \[
            p(\boldsymbol\theta_{\mathcal A} | \mathbf Y_T, \mathcal A) = \frac{p(\mathbf Y_T |\boldsymbol\theta_{\mathcal A}, \mathcal A)p(\boldsymbol\theta_{\mathcal A}|\mathcal A)}{p(\mathbf Y_T|\mathcal A)}
        \]
where $p(\mathbf Y_T|\mathcal A)$ is the marginal density of the data
        conditional on the model:
        \[
            p(\mathbf Y_T|\mathcal A) = \int_{\Theta_{\mathcal A}} p(\boldsymbol\theta_{\mathcal A} ; \mathbf Y_T
            |\mathcal A) d\boldsymbol\theta_{\mathcal A}
        \]

Finally, the \textbf{posterior kernel} (or un-normalized posterior density, given that the marginal density above is a constant), corresponds to the numerator of the posterior density:
        \[
            p(\boldsymbol\theta_{\mathcal A} | \mathbf Y_T, \mathcal A) \propto p(\mathbf Y_T |\boldsymbol\theta_{\mathcal A},
            \mathcal A)p(\boldsymbol\theta_{\mathcal A}|\mathcal A) \equiv \mathcal K(\boldsymbol\theta_{\mathcal A} | \mathbf Y_T, \mathcal A)
        \]
This is the fundamental equation that will allow us to rebuild all posterior moments of interest. The trick will be to estimate the likelihood function with the help of the Kalman filter and then simulate the posterior kernel using a sampling-like or Monte Carlo method such as the Metropolis-Hastings. These topics are covered in more details below. This first subsection simply laid out the building blocks of Bayesian estimation. Before moving on, though, the box below gives a simple example based on the above reasoning of what we mean when we say that Bayesian estimation is ``somewhere in between calibration and maximum likelihood estimation''. The example is drawn from the presentation of St<53>phane Adjemian; other similar examples can be found in \citet{Hamilton1994}, chapter 12.\\

\begin{tabular}{|p{11cm}|}
\hline
\textbf{Bayesian estimation: somewhere between calibration and maximum likelihood estimation}\\
\\
Supposed a data generating process $y_t = \mu + \varepsilon_t$ for $t=1,...,T$, where $\varepsilon_t \sim \mathcal{N}(0,1)$ is gaussian white noise. Then, the likelihood is given by
            \[
                p(\mathbf{Y}_T|\mu) =
                (2\pi)^{-\frac{T}{2}}e^{-\frac{1}{2}\sum_{t=1}^T(y_t-\mu)^2}
            \]
We know from the above that $\widehat{\mu}_{ML,T} = \frac{1}{T}\sum_{t=1}^T y_t \equiv
                \overline{y}$ and that $\mathbb{V}[\widehat{\mu}_{ML,T}] = \frac{1}{T}$. \\
                
In addition, let our prior be a gaussian distribution with expectation
            $\mu_0$ and variance $\sigma_{\mu}^2$. Then, the posterior density is defined, up to a constant, by:
            \[
                p\left(\mu|\mathbf{Y}_T\right) \propto
                (2\pi\sigma_{\mu}^2)^{-\frac{1}{2}}e^{-\frac{1}{2}\frac{(\mu-\mu_0)^2}{\sigma_{\mu}^2}}\times(2\pi)^{-\frac{T}{2}}e^{-\frac{1}{2}\sum_{t=1}^T(y_t-\mu)^2}
            \]
Or equivalently, $p\left(\mu|\mathbf{Y}_T\right) \propto
                e^{-\frac{\left(\mu-\mathbb{E}[\mu]\right)^2}{\mathbb{V}[\mu]}}$, with
            \[
                \mathbb{V}[\mu] = \frac{1}{\left(\frac{1}{T}\right)^{-1} +
                \sigma_{\mu}^{-2}}
            \]
            and
            \[
                \mathbb{E}[\mu] =
                \frac{\left(\frac{1}{T}\right)^{-1}\widehat{\mu}_{ML,T} +
                \sigma_{\mu}^{-2}\mu_0}{\left(\frac{1}{T}\right)^{-1} +
                \sigma_{\mu}^{-2}}
            \]

From which we can tell that the posterior mean is a convex combination of the
            prior mean and the ML estimate. In particular, if $\sigma_{\mu}^2 \rightarrow \infty$ (no prior information, only estimation) then $\mathbb{E}[\mu] \rightarrow \widehat{\mu}_{ML,T}$, the maximum likelihood estimator. And if $\sigma_{\mu}^2 \rightarrow 0$ (calibration, no room for estimation) then $\mathbb{E}[\mu] \rightarrow \mu_0$, the prior mean. 
\\
\hline
\end{tabular}\\

\subsection{DSGE models and Bayesian estimation}
Recall from chapter \ref{ch:soladv} that any DSGE model, which is really a collection of first order and equilibrium conditions, can be written in the form $\mathbb{E}_t\left\{f(y_{t+1},y_t,y_{t-1},u_t)\right\}=0$, taking as a solution equations of the type $y_t = g(y_{t-1},u_t)$. In more appropriate terms for what follows, we can rewrite this solution system as
  \begin{eqnarray*}
    y^*_t &=& M\bar y(\theta)+M\hat y_t+N(\theta)x_t+\eta_t\\
\hat y_t &=& g_y(\theta)\hat y_{t-1}+g_u(\theta)u_t\\
E(\eta_t \eta_t') &=& V(\theta)\\
E(u_t u_t') &=& Q(\theta)
  \end{eqnarray*}
where $\hat y_t$ are variables in deviations from steady state, $\bar y$ is the vector of steady state values and $\theta$ the vector of deep (or structural) parameters to be estimated. \\

Note that the second equation is the familiar decision rule or solution of the DSGE model. But the equation expresses a relationship among true endogenous variables that are not directly observed. Only $y^*_t$ is observable, and it is related to the true variables with an error $\eta_t$. Furthermore, it may have a deterministic (**?) trend, which is captured with $N(\theta)x_t$ to allow for the most general case in which it depends on the deep parameters. The first and second equations above therefore very naturally make up a system of measurement and transition or state equations, respectively.\\

The next logical step is to estimate the likelihood corresponding to the above system. The first apparent problem, though, is that the equations are non linear in the deep parameters. Yet, they are linear in the endogenous and exogenous variables so that the likelihood may be evaluated with a linear prediction error algorithm like the Kalman filter. This is exactly what Dynare does. The box below is intended to remind you of what exactly the Kalman filter recursion does.\\

\begin{tabular}{|p{11cm}|}
\hline
\textbf{The Kalman filter recursion}\\
\\
For $t=1,\ldots,T$ and with initial values $y_1$ and $P_1$ given, the recursion follows
\begin{eqnarray*}
  v_t &=& y^*_t - \bar y^* - M\hat y_t - Nx_t\\
  F_t &=& M P_t M'+V\\
  K_t &=& g_yP_tg_y'F_t^{-1}\\
  \hat y_{t+1} &=& g_y \hat y_t+K_tv_t\\
  P_{t+1} &=& g_y P_t (g_y-K_tM)'+g_uQg_u'
\end{eqnarray*}
For more details on the Kalman filter, see \citet{Hamilton1994}, chapter 13. 
\\
\hline
\end{tabular}\\
\\

From the Kalman filter recursion, it is possible to derive the log-likelihood given by
\[
\ln \mathcal{L}\left(\boldsymbol\theta|\mathbf Y^*_T\right) = -\frac{Tk}{2}\ln(2\pi)-\frac{1}{2}\sum_{t=1}^T|F_t|-\frac{1}{2}v_t'F_t^{-1}v_t  
\]
where the vector $\boldsymbol\theta$ contains the parameters we have to estimate: $\theta$, $V(\theta)$ and $Q(\theta)$ and where $Y^*_T$ expresses the set of observable endogenous variables $y^*_t$ found in the measurement equation. \\

The log-likelihood above gets us one step closer to our goal of finding the posterior distribution of our parameters. Indeed, the log posterior kernel can be expressed as 
\[
\ln \mathcal{K}(\boldsymbol\theta|\mathbf Y^*_T) = \ln \mathcal{L}\left(\boldsymbol\theta|\mathbf Y^*_T\right) + \ln p(\boldsymbol\theta)
\]
where the first term on the right hand side is now know after carrying out the Kalman filter recursion. The second, recall, are the priors, and are also know. \\

Next, to find the mode of the posterior distribution - a key parameter and an important output of Dynare - we simply maximize the above log posterior kernel with respect to $\theta$. This is done in Dynare using numerical methods. Recall that the likelihood function is not Gaussian with respect to $\theta$ but to functions of $\theta$ as they appear in the state equation. Thus, this maximization problem is not completely straightforward, but fortunately doable with modern computers. \\

Finally, we are now in a position to find the posterior distribution of our parameters. The distribution will be given by the kernel equation above, but again, it is a nonlinear and complicated function of the deep parameters $\theta$. Thus, we cannot obtain an explicit function of it. (** note sure this is said very well) We resort, instead, to sampling-like methods, of which the Metropolis-Hastings has been retained in the literature as particularly efficient. This is indeed the method adopted by Dynare. The box below gives a very high level and intuitive explanation of what the Metropolis-Hastings algorithm does. \\

\begin{tabular}{|p{11cm}|}
\hline
\textbf{The intuition behind the Metropolis-Hastings algorithm}\\
\\
The Metropolis-Hastings algorithm is often introduced with the allegory of the traveling salesman, of which there is a demo in the Help of Matlab, version 7. The problem of the traveling salesman is that he needs to visit a multitude of cities in a given day, but spend the least time possible on the road. To pick the shortest route, he could simply plan on starting at a random city and then proceed to the city which entails the shortest distance from where he stands, and so on. The problem is that of all possible routes, he will quickly evolve to one representing a local minimum of travel time, but most likely not to the global minimum. To pick the global minimum, he would have to consider traveling to cities that are not necessarily closest together, while remaining open to the possibility that in the end his global travel time would be shorter. The possibility of accepting itineraries that in the immediate may not seem perfectly optimal, in the hope of finding a global optimum, is exactly what the Metropolis-Hastings algorithm is structured to do. \\

In terms more applied to estimating DSGE models with Dynare, we pick a candidate vector of deep parameters $\theta^*$ and compute the value of the posterior kernel with those parameters. We then pick another candidate vector and again compute the value of the posterior kernel. To compare the two, we should not be too quick to simply throw out the one giving a lower value of the posterior kernel, just in case the kernel were to peak at a value nearby. But we should also not ignore the impetus of continuing to search in the direction of the parameters that yield the higher kernel values. How quickly we do so and we forget about the candidate giving the lower kernel value is determined by our sensitivity or acceptance ratio - a key parameter of the Metropolis-Hastings algorithm.\\
\\
\hline
\end{tabular}\\
\\

In more technical terms, the general idea of the Metropolis-Hastings algorithm is to simulate the posterior distribution. Remember that all we have is the posterior mode; we are instead more often interested in the mean and variance of the estimators of $\theta$. To do so, the algorithm builds on the fact that under general conditions the distribution of the deep parameters will be asymptotically normal. The algorithm, in the words of An and Shorfheide, ``constructs a Gaussian approximation around the posterior mode and uses a scaled version of the asymptotic covariance matrix as the covariance matrix for the proposal distribution. This allows for an efficient exploration of the posterior distribution at least in the neighborhood of the mode'' (\citet{AnSchorfheide2006}, p. 18). More precisely, the algorithm implements the following steps: 
\begin{enumerate}
        \item Choose a starting point $\boldsymbol\theta^\circ$, where this is typically the posterior mode, and run a loop over
        2-3-4. 
        \item Draw a \emph{proposal} $\boldsymbol\theta^*$ from a \emph{jumping} distribution
        \[
            J(\boldsymbol\theta^*|\boldsymbol\theta^{t-1}) =
            \mathcal N(\boldsymbol\theta^{t-1},c\Sigma_{m})
        \]
        where $\Sigma_{m}$ is the inverse of the Hessian computed at the posterior mode.
        \item Compute the acceptance ratio
        \[
            r = \frac{p(\boldsymbol\theta^*|\mathbf Y_T)}{p(\boldsymbol\theta^{t-1}|\mathbf
            Y_T)} = \frac{\mathcal K(\boldsymbol\theta^*|\mathbf Y_T)}{\mathcal K(\boldsymbol\theta^{t-1}|\mathbf
            Y_T)}
        \]
        \item Finally accept or discard the proposal $\boldsymbol\theta^*$  according to the following rule, and update, if necessary, the jumping distribution:
        \[
            \boldsymbol\theta^t = \left\{
            \begin{array}{ll}
                \boldsymbol\theta^* & \mbox{ with probability $\min(r,1)$}\\
                \boldsymbol\theta^{t-1} & \mbox{ otherwise.}
            \end{array}\right.
        \]
    \end{enumerate}

While these steps are mathematically clear and give unequivocal instructions to a machine, several practical questions arise when carrying out Bayesian estimation. These include: How should we choose the scale factor $c$ (variance of the jumping
        distribution)? What is a satisfactory acceptance rate? How many draws are ideal? How is convergence of the Metropolis-Hastings iterations assessed? These are all important questions that will come up in your usage of Dynare. They are addressed as clearly as possible in section \ref{sec:estimate} of Chapter \ref{ch:estbase}. \\

\subsection{Comparing models based on posterior distributions}
As mentioned earlier, while touting the advantages of Bayesian estimation, the posterior distribution offers a particularly natural method of comparing models. Let's look at an illustration. \\

Suppose we have a prior distribution over two competing models: $p(\mathcal{A})$ and $p(\mathcal{B})$. Using Bayes' rule, we can compute the posterior
            distribution over models, where $\mathcal{I}=\mathcal{A},\mathcal{B}$ 
        \[
            p(\mathcal{I}|\mathbf Y_T) = \frac{p(\mathcal{I})p(\mathbf Y_T|\mathcal{I})}
            {\sum_{\mathcal{I}=\mathcal{A},\mathcal{B}}p(\mathcal{I})p(\mathbf Y_T|\mathcal{I})}
        \]
where this formula may easily be generalized to a collection of $N$ models.

Then, the comparison of the two models is done very naturally through the ratio of the posterior model distributions. We call this the \textbf{posterior odds ratio}:
        \[
            \frac{p(\mathcal{A}|\mathbf Y_T)}{p(\mathcal{B}|\mathbf
            Y_T)} = \frac{p(\mathcal{A})}{p(\mathcal{B})}
            \frac{p(\mathbf Y_T|\mathcal{A})}{p(\mathbf Y_T|\mathcal{B})}
        \]

The only complication is finding the magrinal density of the data conditional on the model, $p(\mathbf Y_T|\mathcal{I})$, which is also the denominator of the posterior density $p(\boldsymbol\theta | \mathbf Y_T)$ discussed earlier. This requires some detailed explanations of its own. \\

For each model $\mathcal{I}=\mathcal{A},\mathcal{B}$ we can evaluate, at least theoretically, the marginal
        density of the data conditional on the model:
        \[
            p(\mathbf Y_T|\mathcal{I}) = \int_{\Theta_{\mathcal{I}}} p(\boldsymbol\theta_{\mathcal{I}}|\mathcal{I})\times p(\mathbf Y_T
            |\boldsymbol\theta_{\mathcal{I}},\mathcal{I}) d\boldsymbol\theta_{\mathcal{I}}
        \]
        by integrating out the deep parameters $\boldsymbol\theta_{\mathcal{I}}$ from the posterior kernel. Indeed, note that the expression inside the integral sign is exactly the posterior kernel. To remind you of this, you may want to glance back at the first subsection above, specifying the basic mechanics of Bayesian estimation.\\
        
        To obtain the marginal density of the data conditional on the model, there are two options. The first is to assume a functional form of the posterior density that we can integrate. The most straightforward and the best approximation, especially for large samples, is the Gaussian (called a Laplace approximation). In this case, we would have the following estimator:
\[
\widehat{p}(\mathbf Y_T|\mathcal I) = (2\pi)^{\frac{k}{2}}|\Sigma_{\boldsymbol\theta^m_{\mathcal I}}|^{-\frac{1}{2}}p(\boldsymbol\theta_{\mathcal I}^m|\mathbf
Y_T,\mathcal I)p(\boldsymbol\theta_{\mathcal I}^m|\mathcal I)
\]
where $\boldsymbol\theta_{\mathcal I}^m$ is the posterior mode. The advantage of this technique is therefore its computational efficiency: time consuming Metropolis-Hastings iterations are not necessary, only the numerically calculated posterior mode is required. \\

The second option is instead to use information from the Metropolis-Hastings runs and is typically referred to as the Harmonic Mean. The idea is to simulate the marginal density of interest and to simply take an average of these simulated values. To start, note that
\[
p(\mathbf{Y}_T|\mathcal I)=\mathbb{E}\left[\frac{f(\boldsymbol\theta_{\mathcal I})}{p(\boldsymbol\theta_{\mathcal I}|\mathcal I)
            p(\mathbf{Y}_T|\boldsymbol\theta_{\mathcal I},\mathcal I)}\biggl|\boldsymbol\theta_{\mathcal I},\mathcal I\biggr.\right]^{-1}
\]
where $f$ is a probability density function, since 
\[
                \mathbb{E}\left[\frac{f(\boldsymbol\theta_{\mathcal I})}{p(\boldsymbol\theta_{\mathcal I}|\mathcal I)
                p(\mathbf{Y}_T|\boldsymbol\theta_{\mathcal I},\mathcal I)}\biggl|\boldsymbol\theta_{\mathcal I},\mathcal I\biggr.\right]
                =
                \frac{\int_{\Theta_{\mathcal I}}f(\boldsymbol\theta)d\boldsymbol\theta}{\int_{\Theta_{\mathcal I}}p(\boldsymbol\theta_{\mathcal I}|I)
                p(\mathbf{Y}_T|\boldsymbol\theta_{\mathcal I},\mathcal I)d\boldsymbol\theta_{\mathcal I}}
\]
and the numerator integrates out to one (see the presentation of St<53>phane Adjemian (**add link) for details on the intermediate steps). \\

This suggests the following estimator of the marginal
        density
        \[
            \widehat{p}(\mathbf{Y}_T|\mathcal I)= \left[\frac{1}{B}\sum_{b=1}^B
            \frac{f(\boldsymbol\theta_{\mathcal I}^{(b)})}{p(\boldsymbol\theta_{\mathcal I}^{(b)}|\mathcal I)
            p(\mathbf{Y}_T|\boldsymbol\theta_{\mathcal I}^{(b)},\mathcal I)}\right]^{-1}
        \]
where each drawn vector $\boldsymbol\theta_{\mathcal I}^{(b)}$ comes from the
        Metropolis-Hastings iterations and where the probability density function $f$ can be viewed as a weights on the posterior kernel in order to downplay the importance of extreme values of $\boldsymbol\theta$. \citet{Geweke1999} suggests to use a truncated Gaussian function, leading to what is typically referred to as the Modified Harmonic Mean Estimator. \\    
            
            
\section{Advanced topics and functions in Dynare}

\subsection{Comparing output to BVARs}
** TBD based on completed documentation from Michel and St<53>phane. 

\subsection{Comparing models based on their posterior distributions}
** TBD using St<53>phane's examples and model\_compare command which still needs to be documented.

\subsection{Where is your output stored?}
** TBD with a table and info from the new version of Dynare.