Journal of Statistical Planning andInference 136 (2006) 1281–1301www.elsevier.com/locate/jspi
The contribution of the maximum to the sum of excesses for testing maxdomains of attraction
Cláudia Neves
a
,
1
, Jan Picek
b
,
2
, M.I. FragaAlves
c
,
∗
,
1
a
UIMA, Department of Mathematics, University of Aveiro, Portugal
b
Department of Applied Mathematics, Technical University of Liberec, Czech Republic
c
CEAUL, DEIO, Faculty of Sciences, University of Lisbon, Portugal
Received 1 July 2003; accepted 16 September 2004Available online 26 November 2004
Abstract
Weconsiderani.i.d.sample,fromanunderlyingdistributionfunctionwithunknownshape,locationand scale parameters, belonging to some maxdomain of attraction. We study the performance of ateststatisticwhichismerelyaratiobetweenthemaximumandthemeanofthesampleoftheexcessesabove some random threshold. This scale/location invariant ratio turns out to be very useful in theconstruction of an asymptotically size
test for the null hypothesis that the distribution comes fromthe Gumbel domain of attraction. The test is based on the
k
n
largest observations, where
k
n
is anyintermediate sequence of positive integers. Both power of the test and type I error probability arestudied for ﬁnite sample sizes by simulation.© 2004 Elsevier B.V.All rights reserved.
MSC:
62G10; 62G20; 62G32
Keywords:
Generalized extreme value and generalized Pareto distributions; Consistency of a test;Semiparametric approach; Regular variation; Simulation
∗
Corresponding author. Tel.: +351217500414; fax: +351217500081.
Email addresses:
claudia@mat.ua.pt (C. Neves), jan.picek@vslib.cz (J. Picek), isabel.alves@fc.ul.pt (M.I.
FragaAlves).
1
Research partially supported by FCT/POCTI/FEDER.
2
Czech Republic Grant KJB3042303.03783758/$see front matter © 2004 Elsevier B.V.All rights reserved.doi:10.1016/j.jspi.2004.09.008
1282
C. Neves et al. / Journal of Statistical Planning and Inference 136 (2006) 1281–1301
1. Introduction
Let
X
1
,X
2
, ..., X
n
be independent and identically distributed (i.i.d.) random variables(r.v.’s),withthesameunknowndistributionfunction(d.f.)
F
,andlet
X
1
,n
X
2
,n
···
X
n,n
be the associated order statistics (o.s.) after arranging the random sample in nondecreasingorder.Duetotheirnature,semiparametricmodels,areneverspeciﬁedindetailbyhand.Instead,the only assumption made is that
F
is in the domain of attraction of an extreme valuedistribution (notation:
F
∈
D
(G
)
), i.e., there exist normalizing constants
a
n
>
0 and
b
n
∈
R
such thatlim
n
→∞
P
{
a
−
1
n
(X
n,n
−
b
n
)
x
} =
G
(x)
:=
exp
(
−
(
1
+
x)
−
1
/
)
for all
x
such that 1
+
x >
0 and with some extreme value index
∈
R
. Read
G
0
(x)
asexp
(
−
exp
(
−
x))
for all
x
∈
R
.The fundamental paper of Gnedenko (1943) establishes thatGeneralized Extreme Value (GEV) distribution in the von Mises parametrization
(G
)
isan uniﬁed version of all possible nondegenerate weak limits of the maximum
X
n,n
, up tolocation/scale parameters. For
<
0,
=
0 and
>
0,
G
d.f. reduces to Weibull, Gumbeland Fréchet distributions, respectively.The following necessary and sufﬁcient condition for
F
∈
D
(G
)
was established inde Haan (1984) (
ﬁrst order extended regular variation property
):lim
t
→∞
U(tx)
−
U(t)a(t)
=
D
(x)
:=
x
−
1
,
=
0
,
log
x,
=
0 (1)for every
x >
0 and some positive measurable function
a
, with
U
standing for a quantiletype function (q.f.) pertaining to
F
deﬁned by the generalized inverse
U(t)
:=
11
−
F
←
(t)
=
inf
x
:
F(x)
1
−
1
t
.
Observe that the limit function
(x
−
1
)/
is the tail q.f. of the generalized Pareto (GP)distribution
F
(x)
:=
1
+
log
G
(x)
=
1
−
(
1
+
x)
−
1
/
for
x
0 if
0
,
0
x
−
1
if
<
0
.
ThisfactreﬂectsitsexceptionalroleinExtremeValueTheory(cf.Pickands,1975;Balkemaand de Haan, 1974) and appeals to the appropriateness of classifying the tails of all possibledistributions in
D
(G
)
into three classes, discriminated by the tail index sign. For positive
, the powerlaw behavior in the tail of the underlying distribution
F
has important implications since it may suggest, for instance, the presence of inﬁnite moments. Because theﬁrstorder condition (1) can be reformulated as lim
t
→∞
U(tx)/U(t)
=
x
, for all
x >
0,i.e.
U
is
regular varying at inﬁnity (notation:
U
∈
RV
), Karamata’s Theorem for integration of regularly varying functions asserts that
E(X
+
1
)
p
is inﬁnite for
p>
1
/
, where
X
+
1
=
max
(
0
,X
1
)
. So, these heavy tailed distributions have inﬁnite right endpoint and theexistence of moments is related to the value of
.The Fréchet domain of attraction containsdistributionswithpolynomiallydecaytailssuchasthePareto,Cauchy,Student’sandFréchet
C. Neves et al. / Journal of Statistical Planning and Inference 136 (2006) 1281–1301
1283
itself. All d.f.’s belonging to
D
(G
)
with
<
0—Weibull domain of attraction—are lighttailed distributions with ﬁnite right endpoint. Such domain of attraction encloses Uniformand Beta distributions. The intermediate case
=
0 is of particular interest in many appliedsciences where extremes are relevant, not only because of the simplicity of inference withinthe Gumbel domain
G
0
but also for the great variety of distributions possessing an exponential tail whether having ﬁnite right endpoint or not. Normal, Gamma and Lognormaldistributions can be found in Gumbel domain. Taking all into consideration, it has becomeclear the advantage of looking for the most propitious type of tail when ﬁtting empiricaldistributions at high quantiles. Effectively, separating statistical inference procedures according to the most suitable domain of attraction for the underlying d.f.
F
has become anusual practice.A test for Gumbel domain against Fréchet or Weibull maxdomain has received thegeneral designation of statistical choice of extreme domains of attraction (see e.g. Castilloet al., 1989; Hasofer and Wang, 1992; Fraga Alves and Gomes, 1996; Wang et al., 1996;
Marohn, 1998a, b). Among these, Hasofer and Wang’s may be pointed out as one of themost commonly used testing procedure. In particular, Reiss and Thomas (2001, p. 154)have incorporated it in the “XTREMES” software. This test is based on a
location/scale
invariantstatistic,functionoftheexcessesoverarandomthreshold
X
n
−
k,n
.Theasymptoticstatements of the referred authors settle on a ﬁxed
k
, whereas
n
goes to inﬁnity, bearing onresults presented in Weissman (1978). Nevertheless, in the last part of the referred paperthere is an attempt to extend the setup of the test, allowing
k
to increase with the samplesize
n
, albeit under heuristic arguments. Pursuing the same objective, Segers and Teugels(2000) have recently suggested a large sample test for the Gumbel domain hypothesis; afterderivingtheasymptoticdistributionofGalton’sratio(enjoyingthe
location/scale
invarianceproperty too) provided condition (1), the authors used Rao’s test statistic (see e.g. Serﬂing,1980) for simple null hypothesis in order to establish a decision rule. In the process, theywere confronted with the need of blocking the srcinal sample of size
n
into
m
subsamples,each of size
n
i
,i
=
1
,...,m
also under pledge of largeness.The present paper deals with the twosided problem of testing Gumbel domain againstFréchet or Weibull domains, i.e.
F
∈
D
(G
0
)
versus
F
∈
D
(G
)
=
0
.
(2)Considering
k
upper order statistics in a way that these might present a satisfactory pictureof the tail of
F
, we introduce a new test statistic which is simply the ratio between themaximum and the mean of the excesses above a random threshold
X
n
−
k,n
T
n
(k)
:=
X
n,n
−
X
n
−
k,n
1
k
ki
=
1
(X
n
−
i
+
1
,n
−
X
n
−
k,n
),
(3)where
k
=
k
n
is a sequence of positive integers such that
k
→ ∞
and
k/n
→
0 as thesample size
n
tends to inﬁnity, i.e. taking into account the increasing information about theright tail provided by the top data by enlarging the sample size, in a quite natural way. Theexactdistribution
T
n
(k)
doesnotdependonlocationorscaleparametersanditsdiscriminantbehavior towards heavy or light tailed distributions proves to be basically governed by the
1284
C. Neves et al. / Journal of Statistical Planning and Inference 136 (2006) 1281–1301
sample maximum. In addition, onesided testing problems
F
∈
D
(G
0
)
versus
F
∈
D
(G
)
<
0
(
or
F
∈
D
(G
)
>
0
)
(4)can also be treated by our results.The outline of this paper is as follows. In Section 2, we present a new test criteriumin companion results about the kind of ratios under the basis of our study. In Section 3,proofs about the asymptotic properties of the test statistic are given, liable to
F
∈
D
(G
0
)
,
F
∈ {
D
(G
)
:
<
0
}
or to
F
∈ {
D
(G
)
:
>
0
}
, and subsequent rejection regions at anasymptotic level
for testing (2) or (4) are established. The test reveals to be consistent.In Section 4, the exact performance of this test is evaluated, via simulation for a variety of models, in accordance with two main factors: the type I error probability and power of thetest; comparisons with the Segers and Teugels’, the Hasofer and Wang’s and the likelihoodratio testing procedures will be carried out. Finally, Section 5 summarizes some concludingremarks and Section 6 is fully dedicated to a practical example.
2. A new test for Gumbel domain
As a starting point, let
X
1
,X
2
,...,X
n
be i.i.d. nonnegative r.v.’s and deﬁne
S
n
:=
X
1
+
X
2
+···+
X
n
and
R
n
:=
X
n,n
/S
n
. The preliminary intention here is to characterizethe ratio
R
n
, indicating only roughly its familiar asymptotic properties. This will be doneby means of results which exploit the intimate connection of the asymptotic behavior of
R
n
with regular variation concepts. Such approach will, inevitably, lead us to the order of ﬁnitemoments of
F
. Speciﬁcally, Theorem 2 refers to the case of no ﬁnite ﬁrst moment but ﬁnitemoments of some fractional order,
E(X
p
1
)<
∞
, 0
<p<
1, while for the case of no ﬁnitemoments of any order
p>
0,
E(X
p
1
)
= ∞
, for all
p>
0, Theorem 3 states an equivalencerelation involving slowly varying right tails (notation:
F
∈
RV
0
).Throughout this paper,
as
→
,
P
→
and
d
→
denote
almost sure convergence
,
convergence in probability
and
convergence in distribution
, respectively.
Theorem 1
(
O’Brien, 1980
).
Let
X
1
,X
2
,...
be independent nonnegative random variables with common d.f. F
,
then
R
n
as
→
0
⇔
E(X
1
)<
∞;
R
n
P
→
0
⇔
E(X
1
I
{
X
1
x
}
)
∈
RV
0
.
Theorem 2
(
Bingham and Teugels, 1981
).
Assume
X
1
,X
2
,...
are independent nonnegative random variables with common d.f. F
.
The following are equivalent
:(i)
R
n
d
→
R,
where
R
is
a
non

degenerate
r.v.
;(ii)
F
∈
RV
−
,
for some
∈
(
0
,
1
)
;(iii)
E(
1
/R
n
)
→
n
→∞
1
/(
1
−
),
∈
(
0
,
1
)
.
C. Neves et al. / Journal of Statistical Planning and Inference 136 (2006) 1281–1301
1285
Theorem 3
(
Arov and Bobrov, 1960; Maller and Resnick, 1984
).
Let
X
1
,X
2
,...
be inde pendent nonnegative random variables with common d.f. F
,
then
R
n
P
→
1
⇔
F
∈
RV
0
.
Remark 4.
A Borel–Cantelli argument yields
E(X
1
)<
∞ ⇔
n
−
1
X
n,n
as
→
0 (seeEmbrechts et al., 1997, p. 432). So we have that
n
−
1
X
n,n
as
→
0
⇔
R
n
as
→
0. Moreover,from Theorem 6 of Downey (1990),
E(X
1
)<
∞
implies
n
−
1
E(X
n,n
)
→
n
→∞
0.Givingheedtothestatisticalchoiceofdomainofattractionproblem,werestrictthefocusof our study to the tail of
F
. In this framework, the most important features under studycomprise the excesses
{
X
n
−
i
+
1
,n
−
X
n
−
k,n
}
ki
=
1
over the random threshold
X
n
−
k,n
. Namely,the relative contribution of the maximum of excesses to their sum can be written as
R
n
(k)
:=
X
n,n
−
X
n
−
k,n
ki
=
1
(X
n
−
i
+
1
,n
−
X
n
−
k,n
).
(5)Under condition (1), the ratio (5) is approximately equal in distribution (notation:
d
∼
) tothe ratio of the maximum to the sum of
k
independent r.v.’s
W
(
)
1
,...,W
(
)k
identicallydistributed as a r.v.
W
(
)
with GP(
) distribution (
∈
R
), i.e.
R
n
(k)
d
∼
(Y
k,k
−
1
)/
ki
=
1
(Y
i
−
1
)/
d
=
W
(
)k,k
ki
=
1
W
(
)i
or equivalently
T
n
(k)
d
∼
kW
(
)k,k
ki
=
1
W
(
)i
,
(6)where
{
Y
i,n
}
ni
=
1
are the o.s. of
Y
1
,...,Y
n
i.i.d r.v.’s with common Pareto d.f.
F(y)
=
1
−
y
−
1
,y >
1. For
=
0, take
(Y
−
1
)/
as log
(Y)
. The statement in (6) may be checkedby taking into consideration the ﬁrstorder condition (1) with the equality
X
i,n
d
=
U(Y
i,n
)
(notation
d
=
for‘isdistributedas’)andusingthefactsthatforanintermediatesequence
k
=
k
n
,
Y
n
−
k,n
P
→∞
(see, e.g. Smirnov, 1952) to expand the scaled excesses for
i
=
1
,...,k
.Then
X
n
−
i
+
1
,n
−
X
n
−
k,n
a(Y
n
−
k,n
)
d
=
U(Y
n
−
i
+
1
,n
)
−
U(Y
n
−
k,n
)a(Y
n
−
k,n
)
=
(Y
n
−
i
+
1
,n
/Y
n
−
k,n
)
−
1
+
o
p
(
1
)
=
Y
k
−
i
+
1
,k
−
1
+
o
p
(
1
).
Remark 5.
The truncated moments of a
GP(
)
distributed r.v.
W
(
)
take the form
E(W
(
)
I
{
W
(
)
x
}
)
=
(E(Y
I
{
Y
x
}
)
+
x
−
1
/
−
1
)/
, where
Y
is a standard Pareto randomvariable. Thus, considering
R
n
(k)
liable to an intermediate sequence of positive integers,