### Abstract

In this paper, we consider pilot-aided channel estimation for orthogonal frequency division multiplexing (OFDM) systems with a multiple-input multiple-output setup. The channel is time varying due to Doppler effects and can be approximated by an oversampled complex exponential basis expansion model. We use a best linear unbiased estimator (BLUE) to estimate the channel with the aid of frequency-multiplexed pilots. The applicability of the BLUE, which is referred to as the channel identifiability in this paper, relies upon a proper pilot structure. Depending on whether the channel is estimated within a single OFDM symbol or multiple OFDM symbols, we propose simple pilot structures that guarantee channel identifiability. Further, it is shown that by employing more receive antennas, the BLUE can combat more effectively the Doppler-induced interference and therefore improve the channel estimation performance.

##### Keywords:

MIMO; OFDM; BLUE; time-varying channel; pilot-aided channel estimation; BEM### 1 Introduction

Orthogonal frequency division multiplexing (OFDM) systems have attracted enormous attention recently and have been adopted in numerous existing communication systems. OFDM gains most of its popularity thanks to its ability to transmit signals on separate subcarriers without mutual interference. To further enhance the capacity of the transmission link, OFDM systems can be combined with multiple-input multiple-output (MIMO) features.

The fact that OFDM can transmit signals on separate subcarriers can be mathematically represented in the frequency domain by a diagonal channel matrix. This property holds only in a situation where the channel stays (almost) constant for at least one OFDM symbol interval. In practice, a time-invariant channel assumption can become invalid due to, e.g., Doppler effects resulting from the motion between the transmitter and receiver. In such a case, the frequency-domain channel matrix is not diagonal but generally full with the non-zero off-diagonal elements leading to inter-carrier interference (ICI).

To equalize such channels, the knowledge of all the elements in the channel matrix is required. In order to reduce the number of unknown channel parameters, a widely adopted approach is approximating the variation of the channel in the time domain with a parsimonious model, e.g., a basis expansion model (BEM). Consequently, channel estimation boils down to estimating the corresponding BEM coefficients. Among the various BEMs that have been proposed, this paper will concentrate on the so-called oversampled complex exponential BEM [(O)CE-BEM] [1]. By tuning the oversampling factor, the (O)CE-BEM is reported in [2] to fit time-varying channels much tighter than its variant, the critically sampled complex exponential BEM [(C)CE-BEM] [3,4], and it has a steady modeling performance for a wide range of Doppler spreads [5].

Based on a general BEM assumption, the OFDM channel is estimated in [6] utilizing pilots that are multiplexed with data in the frequency domain. The same paper shows that the channel estimators that view the frequency-domain channel matrix as full, such as the (O)CE-BEM, render a better performance than those that view the channel matrix as diagonal [5], or strictly banded [4], such as the (C)CE-BEM. In this paper, the results of [6] will be extended from a single-input single-output (SISO) scenario to MIMO, with a focus on channel identifiability issues.

Estimating time-varying channels in a MIMO-OFDM system gives rise to a number of additional challenges. In the first place, due to multiple transmit-receive links, more channel unknowns need to be estimated, which requires more pilots and thus imposes a higher pressure on the bandwidth efficiency. To alleviate this problem, we will employ more pilot-carrying OFDM symbols to leverage the channel correlation along the time axis as in [7,8]. Although this comes at a penalty of a larger BEM modeling error, the overall channel estimation performance can still be improved.

Another challenge in a MIMO-OFDM system is how to distribute pilots in the time, frequency and spatial domains. Barhumi et al. [9] and Minn and Al-Dhahir [10] proposes optimal pilot schemes but only for time-invariant channels or systems for which the time variation of the channel within one OFDM symbol can be neglected. Except for [7,11], much less attention has been paid to systems dealing with channels varying faster. In this paper, we will use the channel identifiability criterion as a guideline to design pilot schemes. It is noteworthy that the proposed pilot structures can be independent of the oversampling factor of the (O)CE-BEM, which endows the receiver with the freedom to choose the most suitable oversampling factor.

Pilot structures can have a great impact on both channel identifiability and estimation performance. The latter is, however, difficult to tackle analytically for time-varying channels. In this paper, we will try to establish, by means of simulations, a guideline for designing pilots that render a satisfactory channel estimation performance for different channel situations.

The MIMO feature brings not only design challenges but also performance benefits. Due to the ICI, the contribution of the pilots is always mixed with the contribution of the unknown data in the received samples. By taking this interference explicitly into account in the channel estimator design, [6] shows that the resulting best linear unbiased estimator (BLUE) can cope with the interference reasonably well, producing a performance close to the Crámer-Rao bound (CRB). When multiple receive antennas are deployed, we observe that the channel estimation performance can even be further improved. This is attributed to the fact that each receive antenna gets a different copy of the same transmitted data. The interference is therefore correlated across the receive antennas, which can be exploited by the BLUE to suppress the interference more effectively than in the single receive antenna case. To our best knowledge, this effect has not been reported before.

The remainder of the paper is organized as follows. In Section 2, we present a general MIMO-OFDM system model. In Section 3, we describe how the BLUE can be used to estimate the BEM coefficients. Channel identifiability is discussed in Section 4, based on which we propose a variety of pilot structures. The simulation results are given in Section 5, where we discuss the impact of the various pilot structures on the performance. Conclusions are given in Section 6.

*Notation: *We use upper (lower) bold face letters to denote matrices (column vectors). (·)*,
(·)* ^{T}*and (·)

*represent conjugate, transpose and complex conjugate transpose (Hermitian), respectively. [*

^{H}**x**]

*indicates the*

_{p}*p*th element of the vector

**x**, and [

**X**]

_{p,q}indicates the (

*p*,

*q*)th entry of the matrix

**X**.

**x**on the diagonal, and

**A**

_{0}, ...,

**A**

_{N-1}on the diagonal. ⊗ and † represent the Kronecker product and the pseudo-inverse, respectively.

**I**

*stands for the*

_{N}*N*×

*N*identity matrix;

**1**

_{M×N}for the

*M*×

*N*all-one matrix, and

**W**

*for a*

_{K}*K*-point normalized discrete Fourier transform (DFT) matrix. We use

**X**, whose row and column indices are collected in the sets

**X**, whose indices are collected in

### 2 System model

Let us consider a MIMO-OFDM system with *N*_{T} transmit antennas and *N*_{R} receive antennas, where the channel in the time domain is assumed to be a time-varying
causal finite impulse response (FIR) filter with a maximum order *L*. Using
*l*th lag at the *p*th time instant for the channel between the *m*th transmit antenna and *n*th receive antenna, we can assume that
*l < *0 or *l > L*. Note that this channel model can take the transmit/receiver filter, the propagation
environment and the possible synchronization errors among different transmission links
into account.

For the *j*th OFDM symbol that is transmitted via the *m*th transmit antenna, the data symbols **s**^{(m)}[*j*] are first modulated on *K *subcarriers by means of the inverse DFT (IDFT) matrix
*L*_{cp} ≥ *L *and finally sent over the channel. At the receiver, the received samples corresponding
to the CP are discarded, and the remaining samples are demodulated by means of the
DFT matrix **W*** _{K}*. Mathematically, we can express the received samples during the

*j*th OFDM symbol as

where **z**^{(n)}[*j*] represents the additive noise related to the *n*th receive antenna;
*m*th transmit antenna and *n*th receive antenna in the time domain, and
*L*_{cp} = *L *without loss of generality, we can express the entries of
*a*, *b*) standing for the remainder of *a *divided by *b*.

Obviously, if the channel stays constant within an OFDM symbol,

### 3 Channel estimation

For the ease of analysis, we will differentiate between two cases throughout the whole paper. The first case is based on a single OFDM symbol, which means that the channel will be estimated for each OFDM symbol individually. The other case employs multiple OFDM symbols. Because these two cases are characterized by some unique properties, we treat them separately.

#### 3.1 Single OFDM symbol

#### 3.1.1 Data model and BEM based on a single OFDM symbol

Let us use a BEM to model the time variation of the channel within one OFDM symbol:
for the channel between the *m*th transmit antenna and the *n*th receive antenna, the *l*th lag during the *j*th OFDM symbol can be approximated as

where **u*** _{q}*denotes the

*q*th basis function of a BEM and

where *κ *stands for the oversampling factor with

Assuming that the BEM inflicts a negligible modeling error, the *K*(*L*+1) channel taps within the *j*th OFDM symbol will be uniquely represented by the (*L *+ 1)(*Q *+ 1) BEM coefficients

where

where **V*** _{L}*denotes the matrix that consists of the first

*L*+ 1 columns of

Because we will only concentrate on a single OFDM symbol in this section, we drop
the index *j *for the sake of simplicity.

Let us now use **p**^{(m) }to denote the pilots sent by the *m*th transmit antenna, whose subcarrier positions are contained in the set
**d**^{(m) }to denote the data sent by the *m*th transmit antenna, whose subcarrier positions are contained in the set
*G *clusters, each of length
*g*th pilot cluster

It can be seen from the above that the number of observation samples in
*P *- *D *+ 2ℓ + 1, is controlled by the two parameters *D *and ℓ. To understand the physical meaning of *D*, we know that for a small Doppler spread, the ICI is mostly limited to the neighboring
subcarriers, which is equivalent to the assumption that the frequency-domain channel
matrix has most of its power located on the main diagonal, the *D*/2 sub- and *D*/2 super-diagonals for an appropriate value of *D*. In an ideal case where the channel matrix is strictly banded, we should choose

such that the resulting observation samples will depend exclusively on the pilots

**Figure 1.** **The partitioning of the frequency-domain channel matrix **

The above analysis is based on a single transmit antenna. For a MIMO scenario, every
receiver 'sees' a superposition of OFDM symbols from all the transmit antennas. This
implies that the *g*th observation cluster

As a result, we can use the input-output relationship given in (1) to express

where
**d**^{(m) }as well as the other pilot clusters.

We repeat the relationship in (9) for each cluster *g *= 0, ..., *G *- 1, and for each receive antenna *n *= 0, ..., *N*_{R} - 1, and stack the results in one vector

where **z **is similarly defined as

From (5), it can be shown that each diagonal block of **A **can be expressed as

with

The interference due to data is represented in (10) by **i**, which can be expressed as **i **= **Bd **with

A detailed derivation of (12)-(14) for the SISO case can be found in [6]. The extension to the MIMO case is rather straightforward.

#### 3.1.2 Best linear unbiased estimator based on a single OFDM symbol

From (10), **c **can be estimated by diverse channel estimators. Due to space restrictions, this paper
will not list all the possible channel estimators, but will only focus on the BLUE.

The BLUE is a compromise between the linear minimum mean-square error (LMMSE) and
the least-square (LS) estimator: it treats **c **as a deterministic variable, thus avoiding a possible error in calculating channel
statistics, which are necessary for the LMMSE estimator; at the same time, it leverages
the statistics of the data symbols and noise, which are easier to attain, such that
the interference and the noise can still be better suppressed than with the LS estimator.
Simulation results in [6] show that the BLUE is able to yield a performance close to that of the LMMSE estimator,
even if the latter is equipped with perfect knowledge of the channel statistics.

In a nutshell, the BLUE uses a linear filter **F **to produce an unbiased estimate
**c **is minimized:

Let us assume that the data sent from all the transmit antennas are zero-mean white
with variance
**i **and noise **z **in a single disturbance term, we can follow the steps given in [[12], Appendix 6B] to derive the BLUE as:

where **R**(**c**) denotes the covariance matrix of the disturbance with **c **taken as a deterministic variable. Conform the assumptions on the data and noise statistics
and taking (14) into account, we can show that:

Clearly, (15) cannot be resolved in closed-form since the computation of **R**(**c**) entails the knowledge of **c **itself (contained in **B**). As a remedy, we apply a recursive approach. Suppose at the *k*th iteration, an estimate of **c **has been attained, which is denoted as
**R**(**c**), which in turn is used to produce the BLUE for the subsequent iteration and so on:

Note that a similar idea is adopted in [13] though in a different context. To initialize the iteration, we can set

The above expression is actually the maximum likelihood estimator [12] that is obtained by ignoring the interference **i**.

Using the symbol Γ^{[k] }to denote the normalized difference in energy between the estimates from the present
and previous iterations:

we can halt the iterative BLUE if Γ^{[k] }is smaller than a predefined value or the number of iterations *K *is higher than a predefined value.

In the previous section, we have mentioned that a different choice of ℓ in (9) will have an impact on the channel estimator. For the BLUE in the SISO scenario, it is shown in [6] that the best performance is attained when the whole OFDM symbol is employed for channel estimation.

#### 3.2 Multiple OFDM symbols

In the previous section, the channel is estimated for each block separately. To improve the performance, we will exploit more observation samples in this section. It is nonetheless noteworthy that in the context of time-varying channels, the channel coherence time is rather short, which means that we cannot utilize an infinite number of OFDM symbols to enhance the estimation precision.

Considering *J *consecutive OFDM symbols, out of which there are *V *OFDM symbols carrying pilots, we use the symbol

where *j _{v}*stands for the position of the

*v*th pilot OFDM symbol. Further, the symbol

*v*th pilot OFDM symbol that is used by the

*m*th transmit antenna. Similar extensions hold for

^{a}

**Comb-type **This scheme is adopted in [15-17], in which pilots occupy only a fraction of the subcarriers, but such pilots are carried
by each OFDM symbol. In other words, we have

**Figure 2.** **Overview of the pilot schemes studied**. The left subplot depicts the Comb-type I pilot structure; the middle subplot the
Comb-type II pilot structure, and the right subplot the Block-type pilot structure.
Each rectangle corresponds to one OFDM symbol interval and contains OFDM symbols from
each transmit antenna. Inside the rectangle, the zero pilots are represented by circles;
the non-zero pilots by crosses, and the data symbols by squares.

**Block-type **This scheme is considered in [18-20], in which the pilots occupy the entire OFDM symbol, and such pilot OFDM symbols are
interleaved along the time axis with pure data OFDM symbols. In mathematics,

#### 3.2.1 Data model and BEM based on multiple OFDM symbols

The biggest difference between the multiple and single OFDM symbol case is that we
need here to use a larger BEM to approximate the time-varying channel that spans several
OFDM symbol intervals. More specifically, we need to model *J*(*K *+*L*) consecutive samples of the *l*th channel tap between the *m*th transmit antenna and the *n*th receive antenna, i.e.,

Here, **u*** _{q}*stands for the

*q*th BEM function that spans

*J*(

*K*+

*L*) time instants, and

Hence, for the *j*th OFDM symbol in particular, we obtain

where **u*** _{q}*[

*j*] is a selection of rows

*j*(

*K*+

*L*)+

*L*through (

*j*+1)(

*K*+

*L*) - 1 from

**u**

*. By defining the BEM in this way, the resulting channel matrix of the*

_{q}*j*th OFDM symbol in the frequency domain will admit a slightly different expression than in (5) defined for the single OFDM symbol case:

Where
**u*** _{q}*[

*j*], but with common BEM coefficients

For each pilot OFDM symbol, we will follow the same strategy for choosing the observation
samples as in the single OFDM symbol case. By iterating the I/O relationship in (10)
for each pilot OFDM symbol *j _{v}*=

*j*

_{0}, ...,

*j*

_{V-1}, and stacking the results in one vector, we obtain

which can also be concisely expressed as

where **A**[*j _{v}*] is defined as in (12) with the OFDM symbol index added, and

where **B**[*j _{v}*] and

**d**[

*j*] are defined as in (14) with the OFDM symbol index added.

_{v}#### 3.2.2 Best linear unbiased estimator based on multiple OFDM symbols

We notice that (26) admits an expression analogous to (10). Hence, it is not difficult
to understand that a similar iterative BLUE can be applied for channel estimation
based on multiple pilot OFDM symbols. The BLUE at the (*k *+ 1)st iteration can thus be expressed as

where

where **R**[*j _{v}*] is defined as in (16) with the OFDM symbol index added.

The above derivations can be directly applied for the comb-type pilots. For the Block-type
pilots which occupy the entire OFDM symbol, the corresponding channel estimators are
not subject to data interference, i.e.,

which can be attained in just one shot.

### 4 Channel identifiability

In this paper, we define channel identifiability in terms of the uniqueness of the
BLUE. From (17) and (28), we understand that the BLUE is unique when **A **or
**R **or

Normally speaking, the non-singularity of **R **or
**A **or

The basic pilot structure adopted in this paper can be summarized as follows:

**Pilot Design Criterion 1**. *We group the pilots from one transmit antenna into G (cyclically) equi*-*distant clusters, where each cluster contains only one non-zero pilot. The entire
set of pilots sent by the mth transmit antenna during the vth pilot OFDM symbol can
therefore be expressed in a Kronecker form as*

*where *
*contains all the non-zero pilots sent by the mth transmit antenna during the vth pilot
OFDM symbol, and *Δ^{(m)}[*j _{v}*]

*gives the position of the non-zero pilot within the cluster*.

Further, the following assumption is adopted throughout the remainder of the paper.

**Assumption 1**. *All the subcarriers of the pilot OFDM symbol will be used for channel estimation,
i.e.*,

This assumption is shown in [6] to maximize the performance of the BLUE. In addition, it will greatly simplify the derivation of the channel identifiability conditions.

As in the previous sections, in order to derive the channel identifiability conditions,
we find it instrumental to first explore the rank condition on **A **for the single OFDM symbol case and then extend the results to multiple pilot OFDM
symbols.

#### 4.1 Single OFDM symbol

The full column-rank condition of **A **is related to the full column-rank condition of **A**^{(n) }defined in (10) for an arbitrary receive antenna *n*. Hence, we need to examine whether

Following Pilot Design Criterion 1, [7] shows conditions to ensure that the columns of **A**^{(n) }are orthonormal under a (C)CE-BEM assumption. However, these conditions are not suitable
for an (O)CE-BEM assumption as adopted in this paper, and we need to impose more restrictions,
especially on the pilot design across the transmit antennas. They are summarized in
the following theorem (see Appendix A for a proof).

**Theorem 1**. *With the pilots following Pilot Design Criterion 1, the channel will be identifiable
under an (O)CE-BEM assumption and Assumption 1 if*

*and*

*where μ*^{(m) }*denotes the position of the first non-zero pilot sent by the mth transmit antenna*.

The following remarks are in order at this stage.

**Remark 1**. *For the 'optimal' pilot structure proposed in *[7]*, each OFDM symbol contains G *= *L *+ 1 *pilot clusters, with each pilot cluster satisfying (up to a scale)*

*Such a pilot structure complies with *(34) *and *(35) *with a (C)CE-BEM assumption, i.e.*,

*We observe in *(36) *that the FDKD pilot structure contains a certain number of zeros, which are not specified
in Theorem 1. These zeros are beneficial to combat the ICI, but not necessary for
the rank condition. Later on, we will show that the total number of zeros within the
pilot cluster plays a more significant role at high SNR where the ICI becomes more
pronounced*.

**Remark 2**. *Viewing a time-invariant channel as a special case of a time-varying channel with
a trivial Q *= 0*, we can establish the relationship between the conditions given in *(34) *and *(35)*, and the conditions given for time-invariant channels. For instance, the pilot structure
given in *[9]* requires the number of non-zero pilots per transmit antenna to be no fewer than L
*+ 1*. Further, the non-zero pilots from different transmit antennas must occupy different
subcarriers, i.e., μ*^{(m')} - *μ*^{(m) }> 0 *for m'* ≠ *m*.

#### 4.2 Multiple OFDM symbols

In many practical situations, Theorem 1 can be harsh to satisfy due to practical constraints.
For instance, if the Doppler spread and/or the delay spread of the channel are large,
the lower- and upper-bound in (34) will approach each other, making it harder to find
a suitable *G*. Fortunately, these constraints can be loosened by employing multiple pilot OFDM
symbols.

One important issue of channel estimation based on multiple pilot OFDM symbols is how to distribute the pilots along the time axis. Prior to proceeding, let us introduce two possible schemes.

**Pilot Design Criterion 2**. *The positions of the equi*-*distant pilots sent by the same transmit antenna are disparate for each OFDM symbol,
i.e.*,

Adopting the above design criterion leads to the following theorem.

**Theorem 2**. *With the pilots following Pilot Design Criterion 1 and Pilot Design Criterion 2, then
for the nth receive antenna, the corresponding *
*will have a full column-rank under an (O)CE-BEM assumption and Assumption 1 if*

*and*

The proof is given in Appendix B.

**Remark 3**. *We observe here again that the right inequality in *(38) *is identical to the channel identifiability condition in *[9]* for the time-invariant MIMO channel based on multiple OFDM symbols*.

**Remark 4**. *For realistic system parameters, *
* holds in most cases. From *(39)*, it is hence sufficient if μ*^{(m')} ≠ *μ*^{(m)}*for m'*≠ *m: this implies that the transmitter can be transparent to the oversampling factor
used by the receiver*.

An alternative way of designing the pilots is given by the following construction.

**Pilot Design Criterion 3**. *The values and positions of the equi*-*distant pilots sent by the same transmit antenna are identical for each OFDM symbol,
which implies that*

Adopting the above design criterion leads to the following theorem.

**Theorem 3**. *With the pilots following Pilot Design Criterion 1 and Pilot Design Criterion 3, then
for the nth receive antenna, the corresponding *
*will have a full column-rank under an (O)CE*-*BEM assumption and Assumption 1 if*

*and*

The proof is given in Appendix C.

**Remark 5**. *Theorem 3 enables the transmitter to be completely transparent to the choice of the
oversampling factor at the receiver*.

If there is only one transmit antenna, the conditions given in Theorem 3 can be relaxed as stated in the following corollary.

**Corollary 1**. *With the pilots following Pilot Design Criterion 1 and Pilot Design Criterion 3, if
there is only one transmit antenna, the matrix *
*will have full column-rank under an (O)CEBEM assumption and Assumption 1 if*

The proof is given in the last part of Appendix C. This property has been explored in [21] where a SISO scenario is considered.

### 5 Simulations and discussions

For the simulations, we generate time-varying channels conform Jakes' Doppler profile
[22] using the channel generator given in [23]. The channel taps are assumed to be mutually uncorrelated with a variance of
*υ*_{D} = *f*_{c}*v*/*c*, where *f*_{c} is the carrier frequency; *v *is the speed of the vehicle parallel to the direction between the transmitter and
the receiver, and *c *is the speed of light.

We consider an OFDM system with 64 subcarriers. The pilots and data symbols are multiplexed in the frequency domain by occupying different subcarriers. The data symbols are modulated by quadrature phase-shift keying (QPSK). Further, we set the average power of the pilots to be equal to the average power of the data symbols.

To qualify the channel estimation performance, we use the normalized mean-square error (NMSE), which is defined as

Note that in the above criterion, the true channel

For all the numerical examples below, we adopt the stop criterion that halts the iterative
BLUE if either Γ^{[k]}, which is defined in (19) as the normalized difference in energy between the previous
and current estimates, is smaller than 10^{-6 }or the number of iterations *K *is higher than 30.

#### Study Case 1: Single OFDM Symbol

The pilots used in this study case are grouped in *G *= 4 clusters, each containing seven zero pilots and one non-zero pilot, i.e., *P *+ 1 = 8. The non-zero pilot is located within the pilot cluster at the [3(*m *+ 1) - 1]st position, where *m *corresponds to the transmit antenna index. Because we will use an (O)CE-BEM with *Q *= 2 and *κ *= 4 to fit a slower time-varying channel (*υ*_{D} = 8e^{-4}) and a faster time-varying channel (*υ*_{D} = 4e^{-3}), this pilot structure satisfies the 'optimal'pilot structure in (36) as well as
Theorem 1 for a channel of length *L *= 3, which is assumed for this study case. The performance of the BLUE is given in
Figure 3. We observe that the performance degrades when the number of transmit antennas is
increased from one to two. But more interestingly, this performance degradation can
be alleviated by using more receive antennas, especially for the faster channels (the
right plot). We will discuss this effect in more detail later on.

**Figure 3.** **Channel estimation performance based on a single OFDM symbol for a short channel L = 3**. Left plot

*ν*

_{D}= 8e

^{-4}; right plot

*ν*

_{D}= 4e

^{-3}.

In the subsequent study cases, we will focus on pilots carried by multiple OFDM symbols.
We compare three different pilot structures as summarized in Table 1, where we use *V _{a}*to denote the number of pilot OFDM symbols that satisfy Pilot Design Criterion 2,
and

*V*to denote the number of pilot OFDM symbols that satisfy Pilot Design Criterion 3. The positions of the zero and non-zero pilots and data symbols of the three pilot structures are schematically given in Figure 2. Note also that the, optimal' pilot structure in (36) is carried by all the OFDM symbols in Comb-type I.

_{b}**Table 1.** Pilot structure

#### Study Case 2: Short Channels

In this study case, we again examine channels with *υ*_{D} = 8e^{-4 }and *υ*_{D} = 4e^{-3}. To fit the time variation of the channel for *J *= 6 consecutive OFDM symbols, we use at the receiver an (O)CE-BEM with *Q *= 2 and *κ *= 3 if *υ*_{D} = 8e^{-4 }and with *Q *= 4 and *κ *=1.5 if *υ*_{D} = 4e^{-3}. Further, we focus on a channel with length *L *= 3 and compare the performance of the pilot structures listed in Table 1. The results are given in Figure 4, where we observe that Comb-type I renders a much better performance than the other
two, especially when the channel varies faster (the right plot). This can be attributed
to the zeros in the pilot cluster that protect the non-zero pilots from the interference
much more effectively.

**Figure 4.** **Channel estimation performance based on multiple OFDM symbols for a short channel
L = 3**. Left plot

*ν*

_{D}= 8e

^{-4}; right plot

*ν*

_{D}= 4e

^{-3}.

Again, we observe that the channel estimation performance degrades with more transmit antennas, but improves with more receiver antennas especially at high SNR. In contrast, this does not happen to the Block-type scheme. We understand that the interference induced by the Doppler spread to the channel estimator becomes the dominant nuisance at high SNR. At the same time, this interference is a function of the transmitted data and hence strongly correlated among different receive antennas. The BLUE is able to exploit this correlation to combat the interference better. The following heuristic analysis enables a better insight into this effect.

It can be shown that the variance of the BLUE equals the trace of
**R**[*j _{v}*] as its

*v*th diagonal block, we focus further on

**R**[

*j*]. From its definition in (16), and by applying the matrix inversion lemma in [24], its inverse can be written as

_{v}

where the last is attained at high SNR when
**B**[*j _{v}*] in (45) is associated with the interference. We observe that the

*N*

_{R}

*K*×

*N*

_{R}

*K*matrix

**R**

^{-1}[

*j*] lies in the noise subspace of

_{v}**B**[

*j*], i.e.,

_{v}**R**

^{-1}[

*j*]

_{v}**B**[

*j*] =

_{v}**0**. Suppose the

*N*

_{R}

*K*×

*N*

_{T}(

*K*-

*G*(

*P*+ 1)) matrix

**B**[

*j*] has full column-rank

_{v}*N*

_{T}(

*K*-

*G*(

*P*+ 1)). We then have

The above suggests that the rank of

Note that the rank of
*Q *can enhance the BEM modeling performance at the penalty that more channel unknowns
need to be estimated. An alternative is not to estimate the channel of all the OFDM
symbols, but only the middle part, e.g., the 3rd and 4th symbols. This means that
the channel estimator will work like an overlapping sliding window, an approach that
is adopted in [25].

#### Study Case 3: Long Channels

We examine now a much longer channel with length *L *= 15, for which the results are given in Figure 5. Note that in this figure, we do not list the performance of Comb-type I because
it failed in the simulation. We will explore the reason later on. Figure 5 shows that Comb-type II performs in general better than the Block-type, especially
when the channel varies faster. Note that the channels where the data are located
are not estimated directly in the Block-type scheme, but actually result from an implicit
interpolation of the channels estimated at the pilot OFDM symbols. The resulting interpolation
error gives rise to a performance penalty.

**Figure 5.** **Channel estimation performance based on multiple OFDM symbols for a long channel L = 15**. Left plot

*ν*

_{D}= 8e

^{-4}; right plot

*ν*

_{D}= 4e

^{-3}.

The channel equalization performance based on the estimated channels is given in Figure
6, where the bit error rate (BER) is used as the performance measure. The results in
Figure 6 follow similar trends as shown in Figure 5 except for the MISO case with *N*_{T} = 2 and *N*_{R} = 1. In this case, the equalizer fails because there are more unknowns than observation
samples.

**Figure 6.** **Channel equalization performance based on multiple OFDM symbols for a long channel
L = 15**. Left plot

*ν*

_{D}= 8e

^{-4}; right plot

*ν*

_{D}= 4e

^{-3}.

#### Study Case 4: Why Comb-type I Fails for Long Channels

For channels with a long delay spread, it is not possible for Comb-type I to satisfy
Theorem 1. Although by using multiple symbols, Theorem 2 can still be met, the condition
number of
*L *+ 1 supersedes the number of pilot clusters *G*. Here, we define the condition number of a non-square matrix

where
*n*th singular value of
*N*_{T} = 1 and *N*_{R} = 1, where one can observe that the condition number of

**Figure 7.** **Condition number versus channel length for different pilot structures**.

The condition number of

**Figure 8.** **Channel estimation performance versus channel length for different pilot structures
at SNR = 40 dB**.

#### Study Case 5: Convergence performance

As mentioned at the beginning of this section, we have adopted a stopping criterion
that halts the BLUE if either Γ^{[k] }< 10^{-6 }or *K *≥ 31. The actual number of iterations is dependent on several factors such as the
channel, the SNR, the number of transmit/receive antennas. As an example, we show
in this case the convergence performance for the Comb-type II pilots over the channel
with *L *= 15 and *υ*_{D} = 4e^{-3}. Figure 9 shows the average number of iterations versus SNR required for different MIMO setups,
where the MISO case *N*_{T} = 2 and *N*_{R} = 1 requires the most iterations especially at high SNR. In this case,
**Study Case 4 **that when the SNR increases, the condition number of
^{[k] }during each iteration. With the adopted stopping criterion, we can conclude from this
figure that the BLUE halts after around six iterations in most cases.

**Figure 9.** **Average number of iterations versus SNR**.

**Figure 10.** **Average normalized difference in energy over subsequent estimates**. Left plot *N*_{T} = 1 and *N*_{R} = 1; middle plot *N*_{T} = 2 and *N*_{R} = 1; right plot *N*_{T} = 2 and *N*_{R} = 4.

### 6 Conclusions

In this paper, we have discussed how to design pilots to estimate time-varying channels in a MIMO-OFDM system. We underline that the proposed pilot design criteria can be made (almost) independent of the oversampling factor of the (O)CE-BEM such that each receiver can independently choose the best (O)CE-BEM.

We have compared the performance of three different pilot structures, all conform the proposed design criteria. By means of simulations, we have shown that

• Each pilot OFDM symbol should contain as few pilot clusters as possible provided there are more than the channel order.

• Comb-type pilots can estimate the time-varying channel better than the Block-type pilots because they suffer from a smaller interpolation error.

• For comb-type pilots, it is possible to improve the channel estimation performance by employing more receive antennas, which combats the interference more effectively.

### Appendices

#### A Proof of Theorem 1

Because each pilot cluster now contains only one non-zero pilot, we can express the
positions of the equi-spaced non-zero pilots sent by transmit antenna *m *as

with *X *= *K*/*G*. Since the zero pilots have no contribution, we can rewrite **A**^{(n)}, defined in (12), in the following form

Compared to (13), we keep here only the rows/columns that correspond to the positions
of the non-zero pilots, which are represented by

The following two lemmas determine the rank of

**Lemma 1**. *If K*/[*N _{T}*(

*Q*+1)] ≥

*G, and*

*, the matrix*

*has full column-rank N*(

_{T}G*Q*+ 1).

*Proof*. Let us first examine the *m*th submatrix of

Given the property that

with

The above matrix is obviously a stack of *X *× (*Q*+1) submatrices, each being diagonal of size *G*. To be more specific, the (*x*, *q*)th submatrix

In the above, we have down-sampled the BEM sequence **u*** _{q}*into length-

*G*subsequences with the

*x*th subsequence being

*x*= 0, ...,

*X*- 1.

In order to obtain a better perception of its rank, we apply an row-permutation and
column-permutation on

where **Π*** _{G}*and

*G*interleave matrices with appropriate dimensions;

^{b }and

With **u*** _{q}*defined as the

*q*th basis of the (O)CE-BEM given in (3), we can rewrite

With
*m *= 0, ..., *N*_{T} - 1. It is not difficult to realize the rank of
*G*. It is tall if *X *= *K*/*G *≥ *N*_{T}(*Q *+ 1). Besides, it contains distinctive columns of a larger *κX*(*K *+ *L*)-point DFT matrix if *μ*^{(m+1)}*κ*(*K *+ *L*) > *μ*^{(m)}*κ*(*K *+ *L*) + *KQ*, which is hence of full column-rank. □

**Lemma 2**. *If G *≥ (*L *+ 1)*, the matrix *
* has full column-rank N _{T}*(

*L*+ 1)(

*Q*+ 1).

*Proof*. Expressing
*m*th submatrix

is determined by the rank of
**W*** _{K}*, and is thus of full column-rank

*L*+1 if

*G*≥

*L*+1.

In this case, the matrix
*N*_{T}(*L *+ 1). □

For the matrix product

Combining Lemma 1 and Lemma 2 concludes the proof.

#### B Proof of Theorem 2

Similar to (48), we can express

where
*j _{v}*added.

We first prove the full column-rank condition of

with *K'* := *κV*(*K *+ *L*). Like in Lemma 1, the rank of
*G*. It is tall if *X *= *K */ *G *≥ *N*_{T}(*Q *+ 1). Besides, if *μ*^{(m+1)}*K'* > *μ*^{(m)}*K'* + *KG*, this matrix contains distinctive columns of a larger *XK'*-point DFT matrix, and is in that case of full column-rank.

To check the rank of

where

Because