Reviews & Opinions
Independent and trusted. Read before buy Sharp HT-X1!

Sharp HT-X1


Bookmark
Sharp HT-X1

Bookmark and Share

 

About Sharp HT-X1
Here you can find all about Sharp HT-X1 like manual and other informations. For example: review.

Sharp HT-X1 manual (user guide) is ready to download for free.

On the bottom of page users can write a review. If you own a Sharp HT-X1 please write about it to help other people.
[ Report abuse or wrong photo | Share your Sharp HT-X1 photo ]

 

 

Manual

Preview of first few manual pages (at low quality). Check before download. Click to enlarge.
Manual - 1 page  Manual - 2 page 

Download (English)
Sharp HT-X1, size: 582 KB
Related manuals
Sharp HT-X1 Operation Manual
Sharp HT-X15h Annexe 1
Sharp HT-X1H
Sharp HT-X1 Quick Start Guide
Sharp HT-X1 Quick Start

 

Sharp HT-X1

 

 

User reviews and opinions

<== Click here to post a new opinion, comment, review, etc.

No opinions have been provided. Be the first and add a new opinion/review.

 

Documents

doc0

HEAT KERNEL AND GREEN FUNCTION ESTIMATES ON NONCOMPACT SYMMETRIC SPACES II
JeanPhilippe Anker & Lizhen Ji
published in Topics in probability and Lie groups, J. C. Taylor (ed.), CRM Proc. Lect. Notes 28, Amer. Math. Soc. (2001), 19
1. Introduction For a complete Riemannian manifold, the heat kernel and Green function are two natural and important spectral functions. They play an important role in studying the spectrum of the manifold and understanding relations between the spectrum and geometry (see [C], [D], [G], [BGV] and the references there). Numerous results have been obtained for the heat kernel and Green function, in particular various Gauss type bounds on the heat kernel of noncompact manifolds (see [G]). For general Riemannian manifolds, the bounds on the heat kernel are better when the manifolds are nonnegatively curved, though the lower and upper bounds are not of the same order (see [LY] for example). Riemannian symmetric spaces of noncompact type form an important class of nonpositively curved, simply connected Riemannian manifolds. For such spaces, the heat kernel has been studied intensively using techniques from harmonic analysis on Lie groups; better bounds on the heat kernel are known, and sharp bounds have been obtained for some special spaces, for example, hyperbolic spaces (see [A2], [A3], [AJ1], [D], and the references there). In [AJ1], for all noncompact symmetric spaces, we obtain optimal upper and lower bounds on and exact asymptotics of the heat kernel in the restricted region where the space variable is bounded by an arbitrary large multiple of the time variable, which
1991 Mathematics Subject Classication. 22E30, 22E46, 31C12, 43A80, 43A85, 43A90, 58G11. Key words and phrases. Green function, Harnack inequality, heat kernel, semisimple Lie groups, spherical functions, symmetric spaces (Riemannian, noncompact). First author partially supported by the European Commission (HCM Network 19941997 Fourier Analysis and TMR Network 19982001 Harmonic Analysis). Second author partially supported by the U.S.A. National Science Foundation (postdoctoral fellowship DMS 9407427 and grant DMS 9704434) Typeset by AMS-TEX
JEANPHILIPPE ANKER & LIZHEN JI
is critical for many applications. For example, using these bounds on the heat kernel, we obtain sharp bounds on the Green function, which are used in an essential way in [GJT, Chap. VIII] to determine the Martin compactication of symmetric spaces. Other applications include sharp bounds on the Poisson kernel, the weak type (1, 1) inequality for the heat maximal operator on Iwasawa AN groups, and the Lp heat propagation (see [AJ1, 4] for details). We will also comment in 2 below on implications of our results for large time behaviors of the heat kernel as opposed to the wellknown small time asymptotics of the heat kernel. When the isometry group of the symmetric spaces is a complex Lie group, the heat kernel can be computed in terms of elementary functions, and the sharp bounds and the exact asymptotics can be derived easily. Otherwise, the Fourier transformation of the symmetric spaces expresses the heat kernel as an integral involving nonelementary spherical functions and the HarishChandra cfunction, and the problem is to estimate this integral. For higher rank symmetric spaces, one of the diculties in estimating the heat kernel lies along the walls of the Weyl chamber. For this purpose, we use the complicated asymptotic expansion of spherical functions along the walls developed by Trombi and Varadarajan [TV]. The main purpose of this note is to give an alternative approach to obtain the sharp bounds without using this asymptotic expansion. Specically, in [AJ1], we obtained the sharp bounds on the heat kernel away from the walls using only the HarishChandra convergent expansion of spherical functions inside the Weyl chamber. The result in this note is that the sharp bounds away from the walls combined with the parabolic Harnack inequality of Li and Yau [LY] imply the sharp bounds on the heat kernel in the above restricted region. We also outline another approach to the asymptotics of and hence the sharp bounds on the the heat kernel away from the walls, thus making this note basically selfcontained. The basic point of this note is that the analysis of spherical functions near the walls is very subtle, but the heat kernel behaves nicely because it satises the parabolic Harnack inequality. On the other hand, Trombi and Varadarajans asymptotic expansion of spherical functions along the walls is essential for the asymptotics of the heat kernel and the Green function along the walls (see [AJ1, 5]). We would like to point out that the result of this note was not announced at this conference in 1992 and hence does not correspond to a talk by either of us at the meeting. In fact, this simplication of going over the walls via the Harnack inequality was observed in 1997 when [AJ1] was almost written up. We would like to thank J. C. Taylor for suggesting to include this note into this volume. 2. Sharp bounds on the heat kernel and the Green function We recall several notations from [AJ1, 2]. Let G be a connected semisimple Lie group of noncompact type with nite center, K G a maximal compact subgroup, and X = G/K the associated symmetric space. Let a g be a Cartan subspace, and = (g, a) the set of restricted roots. Fix a positive Weyl chamber a+ , and denote by + (resp. ++ ) the set of positive (resp. positive indivisible) roots.

HEAT KERNEL AND GREEN FUNCTION
Let ht (y 1 x) = ht (x, y) be the heat kernel of X. Then one of the main results in [AJ1, Theorem 3.7.ii] is the following sharp bounds. Theorem 2.1. For any > 0, there exist positive constants C1 , C2 such that, for any H a+ with |H| (1 + t), C1 ht (exp H) t

(1 + t)

|++ |

(1 + , H )

|H|2 4t
where n = dim X, m is equal to the sum of dimensions of the positive root spaces g and is half sum of the positive roots with multiplicity. Let be the LaplaceBeltrami operator of X, which is taken to be nonpositive, and k the kernel of ( ||2 + 2 )1 , i.e. the Green function of X. By the relation

( ||2 + 2 )1 =

dte(||

2 )t t

the above sharp bounds on the heat kernel imply the following sharp bounds on the Green function k (see [AJ1, 4] for details). Corollary 2.2. For any > 0, there exist positive constants C1 , C2 such that, for any H a+ with |H| 1, C1 where k (exp H) |H|

|++ |

0 (exp H)e

= dim a, the rank of X.

Using the sharp bounds on 0 (exp H) in [A1], the sharp bounds on the Green function are equivalent to the following: C1 k (exp H) |H|

(1 + , H ) e

,H |H|
The exact asymptotics of the heat kernel ht (exp H) and hence the Green function k (exp H) are also obtained in [AJ1, 5]. Sharp bounds and asymptotics for k when = 0 are obtained in [AJ1] as well. Since only the sharp bounds on the Green function k for > 0 are used in [GJT] to determine the Martin compactication of X, we will not state them here. We would also like to point out that the Martin compactication is dened in terms of the asymptotics at innity of the normalized Green function (see [GJT, Chap. VI] or the survey article by Taylor [T] in this volume), and the asymptotic results in [AJ1, 5] could simplify the arguments in [GJT, Chap. VIII] in determining the Martin compactication. On the other hand, this note shows that it is much easier, at least technically, to obtain the sharp bounds on the Green function than the exact asymptotics, and hence the arguments in [GJT, Chap. VIII] are fully justied. Another consequence of Theorem 2.1 is the following sharp bounds on the large time asymptotics of the heat kernel ht (exp H).

Corollary 2.3. For any xed H a and t 1, there exists positive constants C 1 , C2 such that ht (exp H) C1 C2. |H|2 ++ 2 t 2 | | e|| t 4t As mentioned above, the exact asymptotics as t + are also available. This result sheds some lights on the large time asymptotics of the heat kernel of general Riemannian manifolds. It is wellknown that for, any Riemannian manifold, the heat kernel has a small time asymptotic expansion, and the small time behavior determines the Weyl law on distribution of eigenvalues when the manifold is compact, and the relation between the small and large time behaviors of the heat kernel plays an important role in the heat kernel proof of the AtiyahSinger index theorem (see [BGV] for example). Inspired by the importance of the small time asymptotics, a natural question is whether the heat kernel admits large time asymptotics. In fact, large time behaviors of the heat kernel of the universal covering space of a compact Riemannian manifold is closely related to the socalled NovikovShubin invariants (see [L]). For certain Lie groups, large time asymptotics of the heat kernel of more general operators have been studied intensitively by Varopoulos and his school using probability methods (see [Va] and the references there). The small time asymptotics resemble the heat kernel of the Euclidean space Rn , which is given by the Gauss kernel ht (x, y) = (4t) 2 e

n |xy|2 4t

On the other hand, the large time behaviors could be quite dierent. In fact, the results in Corollary 2.3 suggest the following spectral explanation of some possible terms appearing in the large time asymptotics of the heat kernel of a Riemannian manifold: (1) When the bottom of the spectrum 0 of the negative of the Laplace operator is strictly positive, there is an additional exponential decaying term e0 t. (2) The exponent of the additional polynomial decay is determined by the dimension of the continuous spectrum of near the bottom 0 and the vanishing order of the spectral measure at the bottom. For some solvable Lie groups, there could be additional subexponential and suppolynomial decaying terms (see [Va]), whose spectral meaning is not clear yet. For noncompact symmetric spaces, 0 = ||2 ; and the spectrum of is purely continuous and has dimension equal to = dim a; and the spectral measure (or the Plancherel measure) is given by the negative square of HarishChandras c-function and its vanishing order at the origin is equal to 2|++ |. Now we summarize very briey the approach in [AJ1]. Using the spherical Fourier transform on G, the heat kernel ht (exp H), H a, can be written as ht (exp H) = c2

d et(|| +|| ) (exp H) , |c()|2
where (exp H) is the spherical function, c() is HarishChandras cfunction, and c2 is the positive constant entering the inversion formula for the spherical Fourier transform in [AJ1, Theorem 2.2.2.ii]. The basic idea in the proof of [AJ1, Theorem 3.7] is to expand (exp H) in terms of exponential functions and estimate ht (exp H) dierently depending on the position of H relative to the walls of the Weyl chamber. When H is close to the walls, an important technique is induction on the semisimple rank (the descent argument). It is here that Trombi and Varadarajans asymptotic expansion plays a crucial role. Because no good bounds on the terms in the asymptotic expansion are available and this asymptotic expansion may not converge, the condition |H| (1 + t) is imposed in the above theorem. As mentioned earlier, the main purpose of this note is to give an alternative proof of this theorem without using Trombi and Varadarajans asymptotic expansion of spherical functions. 3. Sharp bounds and asymptotics away from the walls In [AJ1, Theorem 3.7.i], we obtained the following sharp bounds on the heat kernel away from the walls of the Weyl chamber using only HarishChandras convergent expansion of spherical functions inside the Weyl chamber. Proposition 3.1. (Bounds away from the walls) There exist positive constants , C 1 , C2 such that, for every H a+ with inf + , H , and all t , C1 ht (exp H) t 2 {

n |H|2 4t

, H (t + , H )

m +m2 2

The upper bound is proved in Step 2 of the proof of Theorem 3.7 in [AJ1], and the lower bound in Step 6. Though the proof is complicated, the basic idea is simple and can be outlined as follows. When H a+ and is regular, we have HarishChandras converging expansion of spherical function (exp H) in terms of exponential functions (cf. [AJ1, Theorem 2.2.8]): (exp H) = e ,H e q,H c(w) q (w) ei w,H.

q2Q wW

Then ht (exp H) is the sum of the terms c2 e||

t ,H q,H a

d c()1 q () et||
These terms can be estimated in two steps: (1) shift the contour of integration from a to a + i H to produce the expected Gaussian factor e 4t , (2) set the integration variable 2t = 0 in nonelementary factors in the integrand to get the leading term.
The crucial point is to justify Step 2 and show that, for q 2Q, q = 0, the term contributing to ht (exp H) is of smaller order. For details, see Steps 2 and 6 of the proof of [AJ1, Theorem 3.7]. In the following, we outline an alternative proof of Proposition 3.1 and hence of Theorem 2.1. We rst obtain the asymptotics of the heat kernel ht (exp H) away from the walls [AJ1, Theorem 5.1.1.i] using a dierent method and then derive the sharp bounds in Proposition 3.1. The proof of [AJ1, Theorem 5.1.1] depends on the proof of the sharp bounds in [AJ1, Theorem 3.7]. Proposition 3.2. (Asymptotics away from the walls) Let Hj a+ be a unbounded sequence. If , Hj + for all + and tj +, then 1 htj (exp Hj ) ctj

iHj 1 ||2 tj ) e 2tj

|Hj |2 4tj
where = dim a is the rank of X, c() is HarishChandras function, and c 2 is the same positive constant as above. Proof. As mentioned earlier, we have htj (exp Hj ) = c2 e||

tj ,Hj q2Q

q,Hj a

d c()1 q () e||

tj +i ,Hj
Because of the decaying factor e shown to be equal to c2 e||
for q = 0, the leading term of htj (exp Hj ) can be d c()1 e||

tj ,Hj

tj i ,Hj
If c()1 is a polynomial in , then we can compute this term explicitly in terms of elementary functions. But, in general, c()1 only has a polynomial growth.2 The basic idea is to replace c()1 by a polynomial and estimate the error thus arising. More precisely, dene a polynomial P () =

i , (1 i , )k ,

where k > m +m2 for all. Let f () = c()1 /P (). Then c()1 = P ()f (). 2 Dene the Fourier transform f (H) =

1 For 2 If

df () ei
two functions f (j) and g(j) of j, the expression f (j) g(j) means that limj+ f (j)/g(j) = 1. G is a complex Lie group, then c()1 is a polynomial and |c()|2 (exp H) is an elementary function of , and hence ht (exp H) can be computed explicitly.
Then f (H) is a bounded continuous function. Since both c()1 and 1/P () are holomorphic on the tubular cone a+ia+ , f () is holomorphic on the tubular cone a+ia+. Therefore f () belongs to the class H0 (0; a+ ), according to the denition of Hp (a; C) in [Vl, p. 238]. By Theorem 2 and Corollary 1 in [Vl, p. 239], the support of f (H) is + a, where + a = { a | , H > 0, for all H a+ }, the socalled big contained in chamber dual to the positive Weyl chamber a+. 2 The Fourier transform of P () e|| t , denoted by F (H, t), is given by F (H, t) = (4t) 2

(1 )k e

Then d c()1 e||

dv F (Hj v, tj ) f (v).

Using the fact that ++ , H is a skewsymmetric polynomial on a under the Weyl group W of the least degree, we can show that as j +, F (Hj v, tj ) (4tj ) and hence F (Hj v, tj ) (4tj )

2 ++ 2 ++

, Hj 2tj

, Hj 1+ 2tj

|v|2 4tj

Hj ,v 2tj

By changing the order of this limit and the integration, which can be justied, we get the desired asymptotics of htj (exp Hj ). Proof of Proposition 3.1. From this asymptotic result in Proposition 3.2, we get the following alternative proof of Proposition 3.1. By contradiction, we can show that there exist positive constants C1 , C2 and a large positive constant such that, for every H a+ with inf + , H and all t , C1 ht (exp H) t 2 c( iH )1 exp(||2 t , H 2t 0 (exp H)

++ |H|2 4t )

Then the proposition follows from the following two results: for H a+ , (1 + , H ) e
(see [A1]), and , H , H m +m(1 + ) 2 t t c( iH 1 ) , 2t

where m = dim g.3

two functions f and g, the expression f C2 such that C1 f g C2 f.
g means that there exist two positive constants C 1 ,
4. The Harnack inequality and sharp bounds along the walls To extend the sharp bounds on the heat kernel in Proposition 3. to the walls of the Weyl chamber, we need the following Harnack inequality of Li and Yau [LY, Theorem 2.2], [SY, Theorem 2.2 on p. 163]. Proposition 4.1. Let M be a ndimensional complete, noncompact Riemannian manifold with Ricci curvature Ric(M ) k, for some k 0. If u(x, t) is a positive solution of the heat equation on M , then, for any a > 1, x1 , x2 M , 0 < t1 < t2 < +, the following inequality holds: u(x1 , t1 ) u(x2 , t2 ) t2 t1
nak a2 d2 (x1 , x2 ) + (t2 t1 ). 4(t2 t1 ) 2(a 1)
Proposition 4.2. (Global bounds) The sharp bounds on the heat kernel away from the walls of the Weyl chamber in Proposition 3.1 imply the same sharp bounds on the heat kernel in the region |H| (1 + t); i.e. Theorem 2.1 follows from Proposition 3.1. Proof. For every positive constant , when t and |H| (1 + t) (1 + ), the sharp bounds on ht (exp H) are wellknown and follow from the general theory for the heat kernel of Riemannian manifolds (see [K]). On the other hand, there exists a constant C such that, for any H a+ , we can nd H a+ satisfying inf + , H and |H H | C. Let t = t + 1. Notice that when |H| (1 + t), |H |2 |H|2 | | = O(1). t t By the rst paragraph, we can assume that t + 1. Then (1 + t) 2 |

(t + , H )

and the sharp upper bound on ht (exp H) follows from the bound on ht (exp H ) given in Proposition 3.1 by applying Proposition 4.1 to u(x, t) = ht (x), x1 = exp H, x2 = exp H , t1 = t, t2 = t. The lower sharp bound on ht (exp H) follows similarly by taking t = t 1. Remark 4.3. As mentioned in Corollary 2.2, sharp bounds on the Green function [AJ1, Theorem 4.2.2] are derived from the sharp bounds on the heat kernel. Therefore, Proposition 4.2 gives an approach of obtaining sharp bounds on the Green function without using Trombi and Varadarajans asymptotic expansion of spherical functions. Using the Harnack inequality for positive solutions of u + u = 0, ||2 , we can also obtain sharp bounds on the Green function along the walls from the bounds away from the walls as above.

References

J.Ph. Anker, La forme exacte de lestimation fondamentale de HarishChandra, C. R. Acad. Sci. Paris Srie I, 305 (1987), 371374. e [A2] , Le noyau de la chaleur sur les espaces symtriques U(p, q) / U(p) U(q) , in Harmonic e analysis, Luxembourg 1987 , P. Eymard & J.P. Pier (eds.), Lect. Notes Math. 1359, Springer Verlag (1988), 6082. [A3] , Sharp estimates for some functions of the Laplacian on noncompact symmetric spaces, Duke Math. J. 65 (1992), 257297. [AJ1] J.Ph. Anker & L. Ji, Heat kernel and Green function estimates on noncompact symmetric spaces, Geom. Funct. Anal. (GAFA) 9 (1999), 10351091. [AJ2] , Comportement exact du noyau de la chaleur et de la fonction de Green sur les espaces symtriques non compacts, C. R. Acad. Sci. Paris Srie I, 326 (1998), 153156. e e [BGV] N. Berline, E. Getzler & M. Vergne, Heat kernel and Dirac operators, SpringerVerlag (1992). [C] I. Chavel, Eigenvalues in Riemannian geometry, Academic Press (1984). [D] E. B. Davies, Heat kernels and spectral theory, Cambridge Univ. Press (1989). [G] A. Grigoryan, Heat kernel of a noncompact Riemannian manifold, in Stochastic analysis, (Summer Research Institute, Cornell University, July 1993), M. C. Cranston & M. A. Pinsky (eds.), Proc. Symp. Pure Math. 57, Amer. Math. Soc. (1995), 239263. [GJT] Y. Guivarch, L. Ji & J. C. Taylor, Compactications of symmetric spaces, Progr. Math. 156, Birkhuser (1998). a [K] Y. Kannai, O diagonal short time asymptotics for fundamental solutions of diusion equations, Comm. P. D. E. 2 (1977), 781830. [LY] P. Li & S. T. Yau, On the parabolic kernel of the Schrdinger operator, Acta Math. 156 (1986), o 153201. [L] J. Lott, Heat kernels on covering spaces and topological invariants, J. Di. Geom 35 (1992), 471510. [SY] R. Schoen & S. T. Yau, Lectures on dierential geometry, International Press (1994). [T] J. C. Taylor, Martin compactications, to appear in Topics in probability and Lie groups, J. C. Taylor (ed.), CRM Proc. Lect. Notes, Amer. Math. Soc. [TV] P. C. Trombi & V. S. Varadarajan, Spherical transforms on semisimple Lie groups, Ann. of Math. 94 (1971), 246303. [Va] N. Th. Varopoulos, The local theorem for symmetric diusion on Lie groups An overview, in Harmonic analysis and number theory, S. W. Drury and M. Ram Murty (eds.), Canad. Math. Soc. Conf. Proc. 21, Amer. Math. Soc. (1997), 143152. [Vl] V. S. Vladimirov, Methods of the theory of functions of many variables, M. I. T. Press (1966). JeanPhilippe Anker, Universit Henri Poincare (Nancy I), Institut de Mathmatiques e e Elie Cartan (Laboratoire Commun UHPCNRSINRIA), B.P. 239, F54506 Vandoeuvre l`sNancy Cedex, France e E-mail address: anker@iecn.u--nancy.fr Lizhen Ji, University of Michigan, Department of Mathematics, East Hall, 525 East University Avenue, Ann Arbor, MI 481091109, USA E-mail address: lji@math.lsa.umich.edu [A1]

doc1

Aggregation by exponential weighting and sharp oracle inequalities
A. Dalalyan and A.B. Tsybakov
University of Paris 6, 4, Place Jussieu, 75252 Paris cedex 05, France
Abstract. In the present paper, we study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp oracle inequalities for convex aggregates dened via exponential weights, under general assumptions on the distribution of errors and on the functions to aggregate. We show how these results can be applied to derive a sparsity oracle inequality.

Introduction

Consider the regression model Yi = f (xi ) + i , i = 1,. , n, (1)
where x1 ,. , xn are given elements of a set X , f : X R is an unknown function, and i are i.i.d. zero-mean random variables on a probability space (, F, P ) where R. The problem is to estimate the function f from the data Dn = ((x1 , Y1 ),. , (xn , Yn )). Let (, A) be a probability space and denote by P the set of all probability measures dened on (, A). Assume that we are given a family {f , } of functions f : X R such that the mapping f is measurable, R being equipped with the Borel -eld. Functions f can be viewed either as weak learners or as some preliminary estimators of f based on a training sample independent of Y (Y1 ,. , Yn ) and considered as frozen. We study the problem of aggregation of functions in {f , } under the squared loss. Specically, we construct an estimator fn based on the data Dn and called aggregate such that the expected value of its squared error fn f

fn (xi ) f (xi )

is approximately as small as the oracle value inf f f 2. n In this paper we consider aggregates that are mixtures of functions f with exponential weights. For a measure from P and for > 0 we set fn (x)

(, , Y)f (x) (d),

Dalalyan and Tsybakov

with (, , Y) =

exp n Y f 2 / n exp n Y fw 2 / (dw) n
1 where Y f 2 and we assume that is such that n i=1 Yi f (xi ) n the integral in (2) is nite. Note that fn depends on two tuning parameters: the prior measure and the temperature parameter. They have to be selected in a suitable way. Using the Bayesian terminology, () is a prior distribution and fn is the posterior mean of f in a phantom model Yi = f (xi ) + i , where i are iid normally distributed with mean 0 and variance /2. The idea of mixing with exponential weights has been discussed by many authors apparently since 1970-ies (see [27] for a nice overview of the subject). Most of the work focused on the important particular case where the set of estimators is nite, i.e., w.l.o.g. = {1,. , M }, and the distribution is uniform on. Procedures of the type (2)(3) with general sets and priors came into consideration quite recently [9, 8, 3, 29, 30, 1, 2, 25], partly in connection to the PAC-Bayesian approach. For nite , procedures (2)(3) were independently introduced for prediction of deterministic individual sequences with expert advice. Representative work and references can be found in [24, 17, 11]; in this framework the results are proved for cumulative loss and no assumption is made on the statistical nature of the data, whereas the observations Yi are supposed to be uniformly bounded by a known constant. This is not the case for the regression model that we consider here. We mention also related work on cumulative exponential weighting methods: n there the aggregate is dened as the average n1 k=1 fk. For regression models with random design, such procedures are introduced and analyzed in [8], [9] and [26]. In particular, [8] and [9] establish a sharp oracle inequality, i.e., an inequality with leading constant 1. This result is further rened in [3] and [13]. In addition, [13] derives sharp oracle inequalities not only for the squared loss but also for general loss functions. However, these techniques are not helpful in the framework that we consider here, because the averaging device cannot be meaningfully adapted to models with non-identically distributed observations. Aggregate fn can be computed on-line. This, in particular, motivated its use for on-line prediction with nite. Papers [13], [14] point out that fn and its averaged version can be obtained as a special case of mirror descent algorithms that were considered earlier in deterministic minimization. Finally, [10] establishes an interesting link between the results for cumulative risks proved in the theory of prediction of deterministic sequences and generalization error bounds for the aggregates in the stochastic i.i.d. case. In this paper we establish sharp oracle inequalities for the aggregate fn under the squared loss, i.e., oracle inequalities with leading constant 1 and optimal rate of the remainder term. For a particular case, such an inequality has been pioneered in [16]. The result of [16] is proved for a nite set and Gaussian errors. It makes use of Steins unbiased risk formula, and gives a very precise constant in the remainder term of the inequality. The inequalities that we prove below are

Aggregation by exponential weighting
valid for general and arbitrary functions f satisfying some mild conditions. Furthermore, we treat non-Gaussian errors. We introduce new techniques of the proof based on dummy randomization which allows us to obtain the result for n-divisible distributions of errors i. We then apply the Skorokhod embedding to cover the class of all symmetric error distributions with nite exponential moments. Finally, we consider the case where f is a linear combination of M known functions with the vector of weights RM. For this case, as a consequence of our main result we obtain a sparsity oracle inequality (SOI). We refer to [22] where the notion of SOI is introduced in a general context. Examples of SOI are proved in [15, 5, 4, 6, 23]. In particular, [5] deals with the regression model with xed design that we consider here and proves approximate SOI for BIC type and Lasso type aggregates. We show that the aggregate with exponential weights satises a sharp SOI, i.e., a SOI with leading constant 1.
Risk bounds for n-divisible distributions of errors
The assumptions that we need to derive our main result concern essentially the probability distribution of the i.i.d. errors i. (A) There exist i.i.d. random variables 1 ,. , n dened on an enlargement of the probability space (, F, P ) such that: (A1) the random variable 1 + 1 has the same distribution as (1 + 1/n)1 , (A2) the vectors = (1 ,. , n ) and = (1 ,. , n ) are independent. Note that (A) is an assumption on the distribution of 1. If 1 satises (A1), then we will say that its distribution is n-divisible. We defer to Section 4 the discussion about how rich is the class of n-divisible distributions. Hereafter, we will write for brevity instead of (, , Y). Denote by P the set of all the measures P such that f (x) is integrable w.r.t. for x {x1 ,. , xn }. Clearly P is a convex subset of P. For any measure P we dene f (xi ) =

f (xi ) (d),

i = 1,. , n.
We denote by the probability measure A A (d) dened on A. With the above notation, we have fn = f. We will need one more assumption. Let L : R R {} be the moment generating function of the random variable 1 , i.e., L (t) = E(et1 ), t R. (B) There exist a functional : P P R and a real number 0 > 0 such that e( f f 2 f f 2 )/ n L 2(f (xi )f (xi )) (, ), n n i=1 (, ) is concave and continuous in the total variation norm for any P , (, ) = 1, (4)

for any 0. Simple sucient conditions for this assumption to hold in particular cases are given in Section 4. The next theorem presents a PAC-Bayesian type bound. Theorem 1. Let be an element of P such that P for all Y Rn and > 0. If assumptions (A) and (B) are fullled, then the aggregate fn dened by (2) with 0 satises the oracle inequality E fn f

p(d) +

K(p, ) , n+1

p P ,

where K(p, ) stands for the Kullback-Leibler divergence between p and. Proof. Dene the mapping H : P Rn by H = (f (x1 ) f (x1 ),. , f (xn ) f (xn )) , For brevity, we will write h = H = (f (x1 ) f (x1 ),. , f (xn ) f (xn )) , , where is the Dirac measure at (that is (A) = 1 A) for any A A l( where 1 denotes the indicator function). l() Since E(i ) = 0, assumption (A1) implies that E(i ) = 0 for i = 1,. , n. On the other hand, (A2) implies that is independent of . Therefore, we have E where S = E log

= E log exp

= S + S1

exp exp

2 n 2 n

(d), + 2 (h H ) (d).

S1 = E log
The denition of yields S = E log

exp exp

n Y f n Y f

+ f f (d).

2 n, 2 n

(d) (6)

+ E log Since Y f
2n1 h + f f (n + 1) f f n f f

we get (d)

S = E log

2( + ) h (d)

+ E log = E log

en() (d) E log

e(n+1)() (d),
where we used the notation () = ( f f 2 2n1 h )/ and the fact that n + can be replaced by (1+1/n) inside the expectation. The Hlder inequality o n implies that en() (d) ( e(n+1)() (d)) n+1. Therefore, S E log n+1 e(n+1)() (d).
Assume now that p P is absolutely continuous with respect to. Denote by the corresponding Radon-Nikodym derivative and by + the support of p. Using the concavity of the logarithm and Jensens inequality we get E log

e(n+1)() (d) E log

e(n+1)() (d) e(n+1)() 1 () p(d)

= E log (n + 1)E

() p(d) +

log () p(d).

Noticing that the last integral here equals to K(p, ) and combining the resulting inequality with (8) we obtain S E

K(p, ). n+1

Since E(i ) = 0 for every i = 1,. , n, we have E(()) = f f the Fubini theorem we nd S

and using (9)

Note that this inequality also holds in the case where p is not absolutely continuous with respect to , since in this case K(p, ) =. To complete the proof, it remains to show that S1 0. Let E () denote the conditional expectation E(|). By the concavity of the logarithm, S1 E log

E exp

+ 2 (h H )
Since f = f and is independent of , the last expectation on the right hand side of this inequality is bounded from above by ( , ). Now, the fact that Sfollows from the concavity and continuity of the functional (, ), Jensens inequality and the equality ( , ) = 1. Remark. Another way to read the result of Theorem 1 is that, if the probabilistic phantom Gaussian error model is used to construct fn , with variance taken larger than a certain threshold value, then the Bayesian posterior mean under the true model is close in expectation to the best prediction, even when the true data generating distribution does not have Gaussian errors, but errors of more general type.

Model selection with nite or countable
Consider now the particular case where is countable. W.l.o.g. we suppose that = {1, 2,. }, {f , } = {fj } and we set j ( = j). As a corollary j=1 of Theorem 1 we get the following sharp oracle inequalities for model selection type aggregation. Theorem 2. Assume that is an element of P such that P for all Y Rn and > 0. Let assumptions (A) and (B) be fullled and let be countable. Then for any 0 the aggregate fn satises the inequality E fn f

1 log j. n+1

In particular, if j = 1/M , j = 1,. , M , we have E fn f

j=1,.,M

log M. n+1
Proof. For a xed integer jwe apply Theorem 1 with p being the Dirac measure: p( = j) = 1 = j0 ), j 1. This gives l(j E fn f

fj 0 f

1 log j0. n+1
Since this inequality holds for every j0 , we obtain the rst inequality of the proposition. The second inequality is an obvious consequence of the rst one. Remark. The rate of convergence (log M )/n obtained in (10) is optimal rate of model selection type aggregation when the errors i are Gaussian [21, 5].
Checking assumptions (A) and (B)
In this section we give some sucient conditions for assumptions (A) and (B). Denote by Dn the set of all probability distributions of 1 satisfying assumption (A1). First, it is easy to see that all zero-mean Gaussian or double-exponential distributions belong to Dn. Furthermore, Dn contains all stable distributions. However, since non-Gaussian stable distributions do not have second order moments, they do not satisfy (4). One can also check that the convolution of two distributions from Dn belongs to Dn. Finally, note that the intersection D = n1 Dn is included in the set of all innitely divisible distributions and is called the L-class (see [19], Theorem 3.6, p. 102). However, some basic distributions such as the uniform or the Bernoulli distribution do not belong to Dn. To show this, let us recall that the characteristic function of the uniform on [a, a] distribution is given by (t) = sin(at)/(at). For this function, ((n + 1)t)/(nt) is equal to innity at the points where
sin(nat) vanishes (unless n = 1). Therefore, it cannot be a characteristic function. Similar argument shows that the centered Bernoulli and centered binomial distributions do not belong to Dn. We now discuss two important cases of Theorem 1 where the errors i are either Gaussian or double exponential. Proposition 1. Assume that sup f f n L <. If the random variables i are i.i.d. Gaussian N (0, 2 ), 2 > 0, then for every (4+2/n) 2 +2L2 the aggregate fn satises inequality (5). Proof. If i N (0, 2 ), assumption (A) is fullled with random variables i having the Gaussian distribution N (0, (2n + 1) 2 /n2 ). Using the Laplace transform of the Gaussian distribution we get L (u) = exp( 2 u2 (2n+1)/(2n2 )). Therefore, take (, ) = exp f f

(2n + 1) f f 2 n

This functional satises (, ) = 1, and it is not hard to see that the mapping (, ) is continuous in the total variation norm. Finally, this mapping is concave for every (4 + 2/n) 2 + 2 sup f f 2 by virtue of Lemma 3 n in the Appendix. Therefore, assumption (B) is fullled and the desired result follows from Theorem 1. Assume now that i are distributed with the double exponential density f (x) = 2 e

2|x|/

Aggregation under this assumption is discussed in [28] where it is recommended to modify the shape of the aggregate in order to match the shape of the distribution of the errors. The next proposition shows that sharp risk bounds can be obtained without modifying the algorithm. Proposition 2. Assume that sup f f n L < and supi, |f (xi )| L <. Let the random variables i be i.i.d. double exponential with variance 2 > 0. Then for any larger than max 8+ 2 + 2L2 , + L n n
the aggregate fn satises inequality (5). Proof. We apply Theorem 1. The characteristic function of the double exponential density is (t) = 2/(2 + 2 t2 ). Solving (t) (t) = ((n + 1)t/n) we get the characteristic function of 1. The corresponding Laplace transform L in this case is L (t) = (it), which yields L (t) = 1 + 2n2 (2n + 1) 2 t2. (n + 1)t2

Therefore

n. (n + 1) We now use this inequality to check assumption (B). For all , P we have log L (t) (2n + 1)(t/n)2 , |t| 2 f (xi ) f (xi ) / 4L/, Therefore, for > + 1/n L we get log L 2 f (xi ) f (xi ) / (2n + 1)(f (xi ) f (xi ))2. 2 n i = 1,. , n.
Thus, we get the functional of the same form as in the proof of Proposition 1, with the only dierence that 2 is now replaced by 2 2. Therefore, it suces to repeat the reasoning of the proof of Proposition 1 to complete the proof.
Risk bounds for general distributions of errors
As discussed above, assumption (A) restricts the application of Theorem 1 to models with n-divisible errors. We now show that this limitation can be dropped. Recall that the main idea of the proof of Theorem 1 consists in an articial introduction of the dummy random vector independent of. However, the independence property is too strong as compared to what we really need in the proof of Theorem 1. Below we come to a weaker condition invoking a version of Skorokhod embedding (a detailed survey on this subject can be found in [18]). For simplicity we assume that the errors i are symmetric, i.e., P (i > a) = P (i < a) for all a R. The argument can be adapted to the asymmetric case as well, but we do not discuss it here. We now describe a version of Skorokhods construction that will be used below, cf. [20, Proposition II.3.8]. Lemma 1. Let 1 ,. , n be i.i.d. symmetric random variables on (, F, P ). Then there exist i.i.d. random variables 1 ,. , n dened on an enlargement of the probability space (, F, P ) such that (a) + has the same distribution as (1 + 1/n). (b) E(i |i ) = 0, i = 1,. , n, (c) for any > 0 and for any i = 1,. , n, we have E(ei |i ) e(i )

(n+1)/n2

Proof. Dene i as a random variable such that, given i , it takes values i /n or 2i i /n with conditional probabilities P (i = i /n|i ) = (2n+1)/(2n+2) and P (i = 2i i /n|i ) = 1/(2n + 2). Then properties (a) and (b) are straightforward. Property (c) follows from the relation E(ei |i ) = e

1 e2i (1+1/n) 1 2n + 2

and Lemma 2 in the Appendix with x = i /n and = 2n + 2.
We now state the main result of this section. Theorem 3. Fix some > 0 and assume that sup f f n L for a nite constant L. If the errors i are symmetric and have a nite second moment 2 E(i ), then for any 4(1 + 1/n) + 2L2 we have E fn f

K(p, ) + Rn , n+1

where the residual term Rn is given by

Rn = E sup

2 4(n + 1)(i )(f (xi ) f (xi ))n
and E denotes expectation with respect to the outer probability P . Proof. In view of Lemma 1(b) the conditional expectation of random variable i given vanishes. Therefore, with the notation of the proof of Theorem 1, 2 we get E( fn f n ) = S + S1. Using Lemma 1(a) and acting exactly as in the proof of Theorem 1 we get that S is bounded as in (9). Finally, as shown in the proof of Theorem 1 the term S1 satises S1 E log
According to Lemma 1(c), E e2

n (h H )/

2 4(n + 1)(f (xi ) f (xi ))2 i. n2 2
Therefore, S1 S2 + Rn , where S2 = E log

4(n + 1) f f 2 n

Finally, we apply Lemma 3 with s2 = 4(n + 1) and Jensens inequality to get that S2 0. Corollary 1. Let the assumptions of Theorem 3 be satised and let |i | B almost surely where B is a nite constant. Then the aggregate fn satises inequality (5) for any 4B (1 + 1/n) + 2L. Proof. It suces to note that for = B 2 we get Rn 0. Corollary 2. Let the assumptions of Theorem 3 be satised and suppose that E(et|i | ) B for some nite constants t > 0, > 0, B > 0. Then for any n e2/ and any 4(1 + 1/n)(2(log n)/t)1/ + 2L2 we have E fn f

K(p, ) n+1 p P.

16BL2 (n + 1)(2 log n)2/ , n2 t2/
In particular, if = {1,. , M } and is the uniform measure on we get E fn f
log M j=1,.,M n+1 16BL2 (n + 1)(2 log n)2/ +. n2 t2/ min fj f
Proof. Set = (2(log n)/t)1/ and note that 4(n + 1) sup f f Rn n2 ,P

n 2 n i=E(i )+

16L2 (n + 1) 2 E(1 )+ n
where a+ = max(0, a). For any x (2/(t))1/ the function x2 etx is decreas ing. Therefore, for any n e2/ we have x2 etx 2 et = 2 /n2 , as soon 2 as x . Hence, E(1 )+ B2 /n2 and the desired inequality follows. Remark. Corollary 2 shows that if the tails of the errors have exponential decay and is of the order (log n)1/ which minimizes the remainder term, then the 1 rate of convergence in the oracle inequality (13) is of the order (log n) (log M )/n. In the case = 1, comparing our result with the risk bound obtained in [13] for averaged algorithm in random design regression, we see that an extra log n multiplier appears. We conjecture that this deterioration is due to the technique of the proof and probably can be removed.

Sparsity oracle inequality
Let 1 ,. , M be some functions from X to R. Consider the case where RM and f = j j j , = (1 ,. , M ). For RM denote by J() the set of indices j such that j = 0, and set M () Card(J()). For any > 0, 0 < L0 , dene the probability densities q0 (t) = 3 , t R, 2(1 + |t|)4

1 q() = C0

1 q0 j / 1 L0 ), RM , l(
where C0 = C0 (, M, L0 ) is the normalizing constant and stands for the Euclidean norm of RM. Sparsity oracle inequalities (SOI) are oracle inequalities bounding the risk in terms of the sparsity index M () or similar characteristics. The next theorem provides a general tool to derive SOI from the PAC-Bayesian bound (5). Note that in this theorem fn is not necessarily dened by (2). It can be any procedure satisfying (5).
Theorem 4. Let fn satisfy (5) with (d) = q() d and L0 / M where 0 < L0 , 0 < < 1. Assume that contains the ball { RM : L0 }. Then for all such that (1 )L0 we have E fn f
log(1 + 1 | |) + R(M, , L0 , ), j
where the residual term is R(M, , L0 , ) = 2 e2

M 5/2 (L0 )3

M 5/2 (n + 1) 3 L3 0
for L0 < and R(M, , , ) = 2
1 Proof. We apply Theorem 1 with p(d) = C q( )1 L0 ) d, l( where C is the normalizing constant. Using the symmetry of q and the fact that f f = f = f we get

f f, f f

1 p(d) = C

f f, fw

q(w) dw = 0.
Therefore f f 2 p(d) = f f 2 + f f 2 p(d). On the other n n n hand, bounding the indicator 1 L0 ) by one and using the identities l( q (t) dt = R t2 q0 (t) dt = 1, we obtain R 0 f f

1 C0 C

j=1 j wj wj q0 dwj = C0 C
Since 1 x e2x for all x [0, 1/2], we get C C0 = = 1 M

L0 / L0 M M

L |j | 0 M
3dt 1 = 1 (1 + t)4 (1 + LM 1/2 )2M exp exp(M 5/2 (L0 )3 ). (1 + LM 1/2 )3
On the other hand, in view of the inequality 1 + |j / | (1 + | / |)(1 + |j j |/ ) the Kullback-Leibler divergence between p and is bounded as follows: j K(p, ) =

1 C q( ) q()

p(d) 4

log(1 + | 1 |) log C. j

Easy computation yields C0 1. Therefore C C0 C exp( 2 M )3 ) and (L0 the desired result follows.
We now discuss a consequence of the obtained inequality in the case where the errors are Gaussian. Let us denote by the Gram matrix associated to the family n (j )j=1,.,M , i.e., M M matrix with entries j,j = n1 i=1 j (xi )j (xi ) for every j, j {1,. , M }. We denote by max () the maximal eigenvalue of. In what follows, for every x > 0, we write log+ x = (log x)+.
L0 Corollary 3. Let fn be dened by (2) with (d) = q() d and let = M n with 0 < L0 < , 0 < < 1. Let i be i.i.d. Gaussian N (0, 2 ) with 2 > 0, max () K 2 , f n L and let (4 + 2n1 ) 2 + 2L2 with L = L + L0 K. M Then for all R such that (1 )L0 we have 4 M n E fn f 2 f f 2 + M ( ) 1+ log+ + log+ | | n n j n+1 L0 J( )

C + , 1/2 min(M 1/2 , n3/2 ) nM where C is a positive constant independent of n, M and . Proof. We apply Theorem 4 with = { RM : L0 }. We need to check that fn satises (5). This is indeed the case in view of Proposition 1 and the inequalities f f n f n + L + K L. Thus we have E fn f
with R(M, , L0 , ) as in Theorem 4. One easily checks that log(1 + 1 | |) j 1 + log+ ( 1 | |) 1 + log+ ( 1 ) + log+ (| |). Hence, the desired inequality j j follows from R(M, , L0 , ) =
(L0 )2 2M 3 n3/2 M 5/2 M 2n e M j=1

2M 5/2 (n+1)M 3 n3/2

(L0 )2 M K 2 e2 M 2n

2 (n+1)M 1/2 n3/2

C nM 1/2 min(M 1/2 ,n3/2 )
Remark. The result of Corollary 3 can be compared with the SOI obtained for other procedures [57]. These papers impose heavy restrictions on the Gram matrix either in terms of the coherence introduced in [12] or analogous local characteristics. Our result is not of that kind: we need only that the maximal eigenvalue of were bounded. On the other hand, we assume that the oracle vector belongs to a ball of radius < L0 in 2 with known L0. This assumption is not very restrictive in the sense that the 2 constraint is weaker than the 1 constraint that is frequently imposed. Moreover, the structure of our oracle inequality is such that we can consider slowly growing L0 , without seriously damaging the result.

Appendix

Lemma 2. For any x R and any > 0, x + log 1 +
1 Proof. On the interval (, 0], the function x x + log 1 + (ex 1) is increasing, therefore it is bounded by its value at 0, that is by 0. For positive values of x, we combine the inequalities ey 1 y + y 2 /2 (with y = x) and 1 log(1 + y) y (with y = 1 + (ex 1)).
Lemma 3. For any s2 /n + 2 sup f f 2 and for every P , the n function s2 f f 2 f f 2 n n exp 2 n is concave. Proof. Consider rst the case where Card() = m <. Then every element of P can be viewed as a vector from Rm. Set Q() = (1 ) f f = (1 )

+ 2 f f , f f + 2

T T Hn Hn

T Hn Hn

where = s2 /(n) and Hn is the nm matrix with entries (f (xi )f (xi ))/ n. The statement of the lemma is equivalent to the concavity of eQ()/ as a function of P , which holds if and only if the matrix 2 Q() Q() Q()T T is positive-semidenite. Simple algebra shows that 2 Q() = 2(1)Hn Hn and T T T Q() = 2Hn [(1 )Hn + Hn ]. Therefore, Q() Q() = Hn MHn , T T where M = 4Hn Hn with = (1 ) + . Under our assumptions, is larger than s2 /n, ensuring thus that P. Clearly, M is a symmetric and positive-semidenite matrix. Moreover, max (M) Tr(M) = 4 Hn 4 n

(f f )(xi )

(f (xi ) f (xi ))2 = 4

4 max f

where max (M) is the largest eigenvalue of M and Tr(M) is its trace. This estimate yields the matrix inequality Q() Q()T 4 max f f

2 n T Hn Hn.

Hence, the function eQ()/ is concave as soon as 4 max f f 2 2(1 n ). The last inequality holds for every n1 s2 + 2 max f f 2. n The general case can be reduced to the case of nite as follows. The concavity of the functional G() = exp validity of the inequality G

s2 f f n 2

is equivalent to the

G() + G() + , 2 2

, P.
Fix now arbitrary , P. Take = {1, 2, 3} and consider the set of functions {f , } = {f , f , f }. Since is nite, P = P. According to the rst part of the proof, the functional G() = exp s2 f f n 2
is concave on P as soon as s2 /n + 2 max f f 2 , and therefore for n every s2 /n + 2 sup f f 2 as well. (Indeed, by Jensens inequality for n f f 2 (d) sup f f 2.) any measure P we have f f 2 n n n This leads to + G() + G() , G , P.
Taking here the Dirac measures and dened by ( = j) = 1 = 1) and l(j ( = j) = 1 = 2), j = 1, 2, 3, we arrive at (14). This completes the proof of l(j the lemma.

References

e e 1. Audibert, J.-Y.: Une approche PAC-baysienne de la thorie statistique de lapprentissage. PhD Thesis. University of Paris 6. 2004. 2. Audibert, J.-Y.: A randomized online learning algorithm for better variance control. Proceedings of the 19th Annual Conference on Learning Theory, COLT 2006. Lecture Notes in Articial Intelligence 4005 (2006) 392407. Springer-Verlag, Heidelberg. 3. Bunea, F. and Nobel, A.B.: Sequential Procedures for Aggregating Arbitrary Estimators of a Conditional Mean. Preprint Florida State University, 2005. www.stat.fsu.edu/flori 4. Bunea, F., Tsybakov, A.B. and Wegkamp, M.H.: Aggregation and sparsity via 1 -penalized least squares. Proceedings of 19th Annual Conference on Learning Theory, COLT 2006. Lecture Notes in Articial Intelligence 4005 (2006) 379391. Springer-Verlag, Heidelberg. 5. Bunea, F., Tsybakov, A.B. and Wegkamp, M.H.: Aggregation for gaussian regression. Annals of Statistics, to appear (2007). http://www.stat.fsu.edu/wegkamp 6. Bunea, F., Tsybakov, A.B. and Wegkamp, M.H.: Sparsity oracle inequalities for the Lasso, 2006. Submitted. 7. Candes, E., Tao,T.: The Dantzig selector: statistical estimation when p is much larger than n. Annals of Statistics, to appear (2007). 8. Catoni, O. : Universal aggregation rules with exact bias bounds. Preprint n.510, Laboratoire de Probabilits et Mod`les Alatoires, Universits Paris 6 and Paris 7, e e e e 1999. http://www.proba.jussieu.fr/mathdoc/preprints/index.html#1999. ee 9. Catoni, O.: Statistical Learning Theory and Stochastic Optimization. Ecole dt de Probabilits de Saint-Flour 2001, Lecture Notes in Mathematics. Springer, N.Y., e 2004. 10. Cesa-Bianchi, N., Conconi, A. and Gentile, G.: On the generalization ability of online learning algorithms. IEEE Trans. on Information Theory 50 (2004) 20502057.

11. Cesa-Bianchi, N., and Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York, 2006. 12. Donoho, D.L., Elad, M. and Temlyakov, V.: Stable Recovery of Sparse Overcomplete Representations in the Presence of Noise. IEEE Trans. on Information Theory 52 (2006) 618. 13. Juditsky, A., Rigollet, P., and Tsybakov, A.: Learning by mirror averaging. Preprint n.1034, Laboratoire de Probabilits et Mod`le alatoires, Universits Paris 6 and e e e e Paris 7, 2005. https://hal.ccsd.cnrs.fr/ccsd-00014097 14. Juditsky, A.B., Nazin, A.V., Tsybakov, A.B. and Vayatis, N.: Recursive aggregation of estimators via the Mirror Descent Algorithm with averaging. Problems of Information Transmission 41 (2005) 368 384. 15. Koltchinskii, V.: Sparsity in penalized empirical risk minimization, 2006. Submitted. 16. Leung, G., and Barron, A.: Information theory and mixing least-square regressions. IEEE Transactions on Information Theory 52 (2006) 33963410. 17. Littlestone, N. and Warmuth, M. K.: The weighted majority algorithm. Information and Computation 108 (1994) 212261. 18. Obloj, J.: The Skorokhod embedding problem and its ospring. Probability Surveys 1 (2004) 321 392. 19. Petrov, V.V.: Limit Theorems of Probability Theory. Clarendon Press, Oxford, 1995. 20. Revuz, D. and Yor, M.: Continuous Martingales and Brownian Motion. SpringerVerlag, 1999. 21. Tsybakov, A.B.: Optimal rates of aggregation. Computational Learning Theory and Kernel Machines. B.Schlkopf and M.Warmuth, eds. Lecture Notes in Articial o Intelligence 2777 (2003) 303313. Springer, Heidelberg. 22. Tsybakov, A.B.: Regularization, boosting and mirror averaging. Comments on Regularization in Statistics, by P.Bickel and B.Li. Test 15 (2006) 303310. 23. van de Geer, S.A.: High dimensional generalized linear models and the Lasso. Research report No.133. Seminar fr Statistik, ETH, Zrich, 2006. u u 24. Vovk, V.: Aggregating Strategies. In: Proceedings of the 3rd Annual Workshop on Computational Learning Theory, COLT1990, CA: Morgan Kaufmann (1990), 371386. 25. Vovk, V.: Competitive on-line statistics. International Statistical Review 69 (2001), 213-248. 26. Yang, Y.: Combining dierent procedures for adaptive regression. Journal of Multivariate Analysis 74 (2000) 135161. 27. Yang, Y.: Adaptive regression by mixing. Journal of the American Statistical Association 96 (2001) 574588. 28. Yang, Y.: Regression with multiple candidate models: selecting or mixing? Statist. Sinica 13 (2003) 783809. 29. Zhang, T.: From epsilon-entropy to KL-complexity: analysis of minimum information complexity density estimation. Annals of Statistics (2007), to appear. 30. Zhang, T.: Information theoretical upper and lower bounds for statistical estimation. IEEE Transactions on Information Theory (2007), to appear.

 

Tags

Txpf42G20S Office GP-1650D MP9500 Review Live HUB PT-AX100 Civilian Panda VA-10 VGP-BMS10 Optio E70 PVM-1342Q 15 2 Scenarist Snaptune ONE Cl2510TG Wrtp54G-NA CG400 5000 VA JBL L150 BQ-321 D7200 KTM MXC WD-1049C Lexmark X544 1 8 SA-HT680 GXT650VP1 B120AH APX 6 5500N ZEN-soft01-V4 Suunto M2 UA40C5000QM Faxjet 325 PSR-300-PSR-200 HL 10 HB38056H Nitro CIC RZ-23LZ55 Nexstar 5I EX-P700 Patrol 97280 Ericsson K320 Wl-357 MN-100-L- AQV09vban Pspb3 CDX-530 AEG-electrolux N24 Z5 FD Series SP2014N-KIT Vito CD Fidelio 4 KX-TS85 TX-930 DMR-EH595 Nokia 8910 KCA-BT100 SC-HT730 Newforce R1 Videostudio 9 CJ-N76CL 1002FX M-J310 MC115 KDL-20S2500 ES-8016 DSL-2640B SH-FX50 2005 HR7745 30PF9975 GH1000 58840 FDF105 Wl-322 HPI-6S Soundstream SVX4 HQ8240 EN108TP Optio L20 29PT8607 12 DS14DFL TX-28PK2E Mouse PFT3540B 7416CG Tower 1 SD-YD250 LE32B541p7W PSR-540 CQ-RDP200 For MAC Mobile FC8716 02 UX-45 67 HS 10

 

manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding

 

Sitemap

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101