BIC Model T-3
|
|
Bookmark BIC Model T-3 |
About BIC Model T-3Here you can find all about BIC Model T-3 like manual and other informations. For example: review.
BIC Model T-3 manual (user guide) is ready to download for free.
On the bottom of page users can write a review. If you own a BIC Model T-3 please write about it to help other people. [ Report abuse or wrong photo | Share your BIC Model T-3 photo ]
Manual
Preview of first few manual pages (at low quality). Check before download. Click to enlarge.
Download
(English)BIC Model T-3 Mobile Phone, size: 1.2 MB |
BIC Model T-3
Video review
Bic Runga Sway
User reviews and opinions
| moo113 |
5:12pm on Tuesday, October 19th, 2010 ![]() |
| Awesome game player, and has replaced my laptop but I do not have to need for business and so I do not know about how those work. Great for traveling,... | |
| rilian |
6:34pm on Sunday, August 15th, 2010 ![]() |
| I replaced my first-gen iPod Touch, which I had since they first came out a few years ago, with this new beast of a device. First of all. | |
| yho |
4:49am on Tuesday, May 18th, 2010 ![]() |
| My Company uses Citrix, so I am able to run Windows Applications, SAP, even flash and all my GO TO corporate applications on the device. you will love the 9 inches screen. You will enjoy the touchscreen experience with iPad Fast, Lightweight, Compact | |
Comments posted on www.ps2netdrivers.net are solely the views and opinions of the people posting them and do not necessarily reflect the views or opinions of us.
Documents

The ABC of Model Selection: AIC, BIC and the New CIC
Carlos C. Rodrguez
Department of Mathematics and Statistics The University at Albany, SUNY Albany, NY Abstract. The geometric theory of ignorance [1] suggests new criteria for model selection. One example is to choose model M minimizing, CIC = log p(xi ) +
R d N log + logV + N log(d + 1)
where (x1 ,. , xN ) is a sample of N iid observations, p M is the mle, d = dim(M) is the dimension of the model M, V = Vol(M) is its information volume and R = Ricci(M) is the Ricci scalar evaluated at the mle. I study the performance of CIC for the problem of segmentation of bit streams dened as follows: Find n from N iid samples of a complete dag of n bits. The CIC criterion outperforms AIC and BIC by orders of magnitude when n > 3 and it is just better for the cases n = 2, 3. Keywords: Geometric Theory of Ignorance, Information Geometry, Model Selection, Segmentation of Bitstreams, Statistical Ignorance. PACS: 02.50.Tt,02.40.ky
INTRODUCTION
Consider the following decision problem: Given a nite sequence of bits, x = b1 b2. bk choose one M among competing statistical models (i.e. explanations for x) M1 , M2 ,. For example M j may explain x as N = k/ j independent chunks of j bits generated by a graphical model of j binary variables with a given structure but unspecied parameters. We allow models M j to be of different dimensions for both the data and the parameter spaces for different values of j. This is a standard decision problem requiring a loss function and a prior for its solution. The CIC formula dened in the abstract is an approximation to the bayes rule for 01-loss and uniform priors. By uniform priors we mean that the bits are generated by rst choosing M uniformly at random among the available M j s, followed by a random choice of a probability distribution p M and nally producing x = x1 x2. xN as a random sample of size N from p. The rst three terms of CIC are easy to obtain. Under 01-loss the bayes action is the mode of the posterior distribution and we only need to search for the model M with highest posterior probability P(M|x). By bayes theorem, P(M|x) = P(M) P(x) p(x)
August 4, 2005 1
where dV is the information volume element in M and V =Vol(M) is the total volume of M. Taking logs, noticing that (P(M)/P(x)) 1 and using a parameterization M Rd we obtain, log P(M|x1 ,., xN ) = log
eNLN ( ) dV ( ) logV +C
where in the parameterization the volume element dV = det I( ) d and the average 1 log likelihood LN = N N log p(xi | ). I( ) is the Fisher information matrix at . i=1 Expanding LN about the mle , noticing that LN ( ) = 0 and that by the LLN (Law of Large Numbers) LN ( ) I( ) we can write, N N LN ( ) = N LN ( ) ( )T I( )( ) + o(N| |2 ) 2 Thus, the bayes action (as N ) is the model that maximizes 2 NLN ( ) + log N
where the second term is obtained by noticing that for large N, by the mean value theorem for integrals and the formulas for dV and the normalizing constant of a d-dim gaussian, e 2 ( )
T I( )( )
dV = |I( )|1/2 I ( ) N
The rst three terms of CIC are then just a simple consequence of the large sample properties of mles. The last term involving the Ricci scalar R at the mle was obtained semi-empirically by simulation guided by the more rigorous analysis in [1].
TESTING CIC
To test the performance of CIC as a criterion for model selection we compared it with AIC (the An Information Criterion of Akaike [2]) and with BIC (the Bayesian Information Criterion of Schwarz [3]). All three criteria search, among a list of possible models for the data, for the one minimizing the AIC, BIC or CIC expressions dened by, AIC = N LN ( ) + d d BIC = N LN ( ) + log N 2 N R d CIC = N LN ( ) + log + logV + N log(d + 1)
The ABC of Model Selection: AIC, BIC and the New CIC August 4, 2005
(1) (2) (3)
A Note About the ABC ICs AIC
Initially, Akaike justied AIC by maximum entropy. He asked: If there are competing models with a, possibly different, number of free parameters; How should the sample xN be used to choose one of them?. He reasoned: If we knew the true distribution t then we could take the model that maximizes the entropy relative to this t. Translation: solve arg minM I(t : M) where I(t : M) is the Kullback distance from t to M. Let pM M be the I-projection of t onto M, i.e. I(t : M) = I(t : pM ). Thus, AIC attempts to nd, Ma = arg min I(t : pM ) = arg min
t(x) log pM (x) dx
= arg min Et log pM (X )
The problem is that t and pM are unknown and need to be estimated from the data xN. By the LLN and the consistency of the mle (i.e. p pM ) we have, Et log pM (X ) Et log p(X ) 1 N p(Xi) N i=1
but this nave mle estimate is biased for nite N. It could then be argued that after collecting the data xN , it should be the term in the middle, Et log p, the one quantifying the loss and not the original, Et log pM. The asymptotic bias (with respect to Et log p) can be obtained by the following (tricky) considerations. By the (generalized) Pythagoras theorem (see [4]) we have: I(t : pM ) + I(pM : p) = I(t : p) and rearranging terms we get, Et log Thus, 2NEt log p(X ) || N(0 )||2 d pM (X ) p(X ) = I(pM : p) pM (X )
where 0 is the parameter associated to pM and we have used the consistency and asymptotic normality of the mle (i.e., 0 and N( 0 ) N(0, I 1(0 )), to arrive at the asymptotic Chi-square with d degrees of freedom. With this we can write: E log p(Xi ) + N Et log p(X )
August 4, 2005 3
lim E log
p(X ) p(Xi ) + N Et log pM (Xi ) pM (X )
d d = = d. 2 2
Where we have used the fact that twice the sum involving the likelihood ratio (line above) converges (in law) to a Chi-square with d degrees of freedom. Hence, AIC is just the nave mle corrected to be asymptotically unbiased as an estimator of Et log p. These arguments try to justify the denition of AIC in (1). Nevertheless, I dont nd AIC defensible as a general criterion for model selection.
BIC and CIC
The BIC of Schwarz can be obtained by following the derivation for the rst three terms of CIC used in the introduction of this paper. However, instead of using the uniform prior on M use a x arbitrary positive prior on M and neglect the terms of order N 0 = 1 to arrive at (2). The problem with BIC is that the neglected terms, involving the volume and the curvature of M can become the leading terms. In fact that is the case for the important case of multinomial models studied in this paper.
The Simulations
We played repeatedly (100 repetitions per sample size) the standard game of generating a sample of size N from a chosen true distribution. Then, acting as if we didnt know this true distribution, we let AIC, BIC, and CIC, guess a model for the simulated data and counted the proportion of correct guesses for each criterion. The underlying true distributions were chosen from the set {M2 , M3 ,., M9 } where Mn is the complete dag of n binary variables. The observed sequence of bits was created by concatenating a random sample of size N from Mn with random values for the parameters. The dimension d, volume V , and scalar curvature R for the complete dag Mn were computed in [5] as, d = 2n 1 k , where k = 2n1 V = (k 1)! 1 R = d(d 1). 4 (4) (5) (6)
The simulations, are summarized in gure (1). The graphs show that CIC is orders of magnitude better than its two competitors (AIC and BIC) as a criterion for determining the size of complete bitnets. When n = 9 the graph seems to show only the CIC curve. The two other curves (for AIC and BIC) are in fact there but are never different from 0!. For example with 100 observations from complete bitnets of 9 bits, CIC chose the
The ABC of Model Selection: AIC, BIC and the New CIC August 4, 2005 4
FIGURE 1. Proportion of Successes vs. Sample size. AIC(dot), BIC(dash), CIC(solid)
correct size n = 9, 100 times out of 100 but AIC and BIC failed all 100 times. When the sample size was increased to 200, CIC chose the correct n = 9 all 100 times, but AIC and BIC still failed all 100 trials. We should emphasize that each sample was chosen with random values for the parameters. Thus, the superiority of CIC over AIC and BIC appears to be independent of the actual values of the parameters of the complete bitnet. This is compatible with the homogeneous (constant curvature) geometry of complete bitnets. The results of these simulations also agree, reinforce, and validate the ndings by Eitel Laura in [6, 7]. Lauras Monte Carlo experiments show conclusively that by adding the approximation for log V (see [5]) to the BIC formula (2) (i.e. essentially using the rst three terms of CIC) outperforms plain BIC in the much more difcult task of identifying the full structure of a bitnet of n (xed) bits and the performance increases with the maximum number of parents in the true bitnet.
August 4, 2005
The Iliad: BOOK I
Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans. Many a brave soul did it send hurrying down to Hades, and many a hero did it yield a prey to dogs and vultures, for so were the counsels of Jove fulfilled from the day on which the son of Atreus, king of men, and great Achilles, first fell out with one another
011000
FIGURE 2.
Segmentation of concatenated les by CIC
SEGMENTING REAL DATA
To demonstrate the sensitivity of CIC for recognizing model changes along a bitstream, we created a long sequence of bits by concatenating pieces of les of different types. The data x was, x = 10k(.ai f f ) + 20k(. jpg) + 5k(.txt) + 5k(.gz) i.e., ten thousand bytes from a sound le (.aiff type), followed by 20k bytes from an image le (.jpg type), followed by 5k from an ascii le (.txt type), followed by 5k from the same ascii le by compressed with gnuzip (.gz type). Figure (2) shows the value of CIC for a sliding window of 640 bits traversing the boundaries between les of different types. The horizontal lines indicate the average values of CIC for the different segments: end of sound, sky, ground, beginning of text, end of text, compressed data.
FOR MODEL SELECTION IGNORANCE IS BLISS
Take the blue pill, the story ends. You wake up in your bed and believe whatever you want to believe. You take the red pill, you stay in Wonderland, and I show you how deep the rabbit hole goes. Remember, all Im offering is the truth, nothing more. Morpheus (holding out two pills): The Matrix.
The ABC of Model Selection: AIC, BIC and the New CIC August 4, 2005 6
Let M be a manifold of homogeneous theories, i.e. M is a standard regular, parametric statistical model. M is riemannian with the induced metric from the Hellinger distance (i.e. with Fisher information as the metric) and therefore carries a notion of volume element dV = det I( ) d . Consider the following two ways for generating (Data,Theory): 1) The Informative Prior Way Pick Theory p M with prior probability scalar density (p). Then, observe Data x = x1 x2. x , i.e. with probability p(x ) = p(x1 )p(x2 ). p(x ). Here (Data,Theory) are dependent: Prob(x , p) = p(x ) (p) 1. 2) The Ignorant Prior Way Pick Theory p M uniformly at random, i.e. with constant scalar density (assume M of nite volume). Then, observe Data x = x1 x2. x from the true distribution, i.e. with probability t(x ) = t(x1)t(x2).t(x ). Here (Data,Theory) are independent: Prob(x , p) = t(x ) (p).
Ignorance is self-similar
Notice that in the ignorant way above, data is assumed to come from the true (t) distribution. If you happen to know this t then you have complete knowledge; Thats all there is to know about the distribution of the data and you have arrived at the true theory of everything. Enjoy! If, on the other hand, all you know about t is the manifold M of guesses, then the ignorant generative model (2, above) preserves that prior state of knowledge a posteriori. The prior state of indifference (uniform (p)) about the elements of M does not change after observing the data x , for all > 0. The posterior is the same as the prior since Data and Theory are independent.
LIPREM: The Red Pill
The notion of ignorance sketched above, produces as LIPREM: Least Informative Prior RElative to M. Where,
= arg min Dist(1 : 2)
where Dist is any statistically meaningful measure of separation between the joint distribution of (Data,Theory) specied by 1 and by the ignorant way 2 above. The class of all the statistically meaningful notions of separation between unnormalized probability distributions can be shown to be generated by the -information deviations I (see [1] and the references there), where [0, 1]. If we let, A (M) = I (1 : 2) to be the total information in M then, M = arg max I (1 : 2)
August 4, 2005 7
is the minimax ignorant (or maximin informative) model. The CIC is just an estimate of A for = 0, = N,t = p (the mle) that keeps the rst terms of an asymptotic expansion in = N (see [1]). LIPREM and its extensions produce all the statistically meaningful actions for model selection. The standard maximum posterior probability model is just one special case.
CONCLUSION: MORE GEOMETRY
It is natural to decompose AIC, BIC and CIC as the sum of two terms. The term providing the t of the data to the model (common to all the three criteria) plus the rest. That rest is obviously a penalty on the complexity of the model. In retrospect, it is to be expected that the complexity of a model M should involve some (or all?) of its geometric and topological invariants like: dimension, volume and curvature, as CIC does. But we need to keep in mind that CIC, like AIC and BIC, is only an approximation. It would be much better to be able to show that useful models spring from the optimization of a global topological quantity, like the total (or mean?) scalar curvature of M. In fact, we already know that that is precisely the case in classical physics. I would like to show that that is also the case for the whole of inference.
REFERENCES
1. C. Rodrguez, A geometric theory of ignorance, Tech. rep., SUNY Albany, Dept. of Mathematics, http://omega.albany.edu:8008/ignorance (2003). 2. H. Akaike, IEEE Transactions on Automatic Control, pp. 716723 (1974). 3. G. Schwarz, Annals of Statistcs, 6, 461464 (1978). 4. S.-i. Amari, Differential-Geometrical Methods in Statistics, vol. 28 of Lecture Notes in Statistics, Springer-Verlag, 1985. 5. C. Rodrguez, The Volume of Bitnets, in Maximum Entropy and Bayesian Methods, edited by R. Fischer, R. Preuss, and U. von Toussaint, 2004, vol. 735 of AIP Conf. Proc., pp. 555564, http://omega.albany.edu:8008/bitnets. 6. E. Laura, Learning structure and parameters of Bayesian Belief Networks, Ph.D. thesis, The University at Albany, SUNY. School of Information Science (2003), http://omega.albany.edu:8008/bitnets/references/. 7. E. Laura, Learning the structure of a Bayesian network, in Maximum Entropy and Bayesian Methods, AIP Conf. Proc., 2005, these Proceedings.

Statistics 203: Introduction to Regression and Analysis of Variance
Model Selection: General Techniques
Jonathan Taylor
- p. 1/16
q Today q Crude outlier detection test q Bonferroni correction q Simultaneous inference for q Model selection: goals q Model selection: general q Model selection: strategies q Possible criteria q Mallows
s s s s
Outlier detection / simultaneous inference. Goals of model selection. Criteria to compare models. (Some) model selection.
q AIC & BIC q Maximum likelihood estimation q AIC for a linear model q Search strategies q Implementations in R q Caveats
- p. 2/16
Crude outlier detection test
If the studentized residuals are large: observation may be an outlier. Problem: if n is large, if we threshold at t1/2,np1 we will get many outliers by chance even if model is correct. Solution: Bonferroni correction, threshold at t1/2n,np1.
- p. 3/16
Bonferroni correction
If we are doing many t (or other) tests, say m > 1 we can control overall false positive rate at by testing each one at level /m. Proof: P (at least one false positive) = P m |Ti | t1/2m,np1 i=1
P |Ti | t1/2m,np1 =. m
Known as simultaneous inference: controlling overall false positive rate at while performing many tests.
- p. 4/16
Simultaneous inference for
Other common situations in which simultaneous inference occurs is simultaneous inference for. Using the facts that N , 2 (X t X)np np along with 2 leads to 2 /p ( )t (X t X)( )/p p Fp,np np /(n p)
(1 ) 100% simultaneous condence region: : ( )t (X t X)( ) p 2 Fp,np,1
- p. 5/16
Model selection: goals
When we have many predictors (with many possible interactions), it can be difcult to nd a good model. Which main effects do we include? Which interactions do we include? Model selection tries to simplify this task.
- p. 6/16
Model selection: general
This is an unsolved problem in statistics: there are no magic procedures to get you the best model. In some sense, model selection is data mining. Data miners / machine learners often work with very many predictors.
- p. 7/16
Model selection: strategies
To implement this, we need: x a criterion or benchmark to compare two models. x a search strategy. With a limited number of predictors, it is possible to search all possible models.
- p. 8/16
Possible criteria
R2 : not a good criterion. Always increase with model size > optimum is to take the biggest model. Adjusted R2 : better. It penalized bigger models. Mallows Cp. Akaikes Information Criterion (AIC), Schwarzs BIC.
- p. 9/16
Mallows Cp
SSE(M) Cp (M) = n + 2 p(M). = SSE(F )/dfF is the best estimate of 2 we have (use the fullest model) SSE(M) = Y YM 2 is the SSE of the model M p(M) is the number of predictors in M, or the degrees of freedom used up by the model. Based on an estimate of 1 2
E (Yi E(Yi ))2
1 = 2
E (Yi Yi )2 + Var(Yi )
- p. 10/16
AIC & BIC
Mallows Cp is (almost) a special case of Akaike Information Criterion (AIC) AIC(M) = 2 log L(M) + 2 p(M).
L(M) is the likelihood function of the parameters in model M evaluated at the MLE (Maximum Likelihood Estimators). Schwarzs Bayesian Information Criterion (BIC) BIC(M) = 2 log L(M) + p(M) log n
- p. 11/16
Maximum likelihood estimation
If the model is correct then the log-likelihood of (, ) is n log L(, |X, Y ) = log(2) + log 2 Y X 2 2
where Y is the vector of observed responses. MLE for in this case is the same as least squares estimate because rst term does not depend on MLE for 2 : log L(, ) 2 n 1 = 2 + 4 Y X b ,b2
Solving for 2 :
2 M LE =
1 Y X n
1 SSE(M) n
Note that the MLE is biased.
- p. 12/16
AIC for a linear model
Using M LE =
2 M LE
1 = SSE(M) n
we see that the AIC of a multiple linear regression model is AIC(M) = n (log(2) + log(SSE(M)) log(n))+2(n+p(M)+1)
If 2 is known, then SSE(M) + 2p(M) AIC(M) = n log(2) + log( ) + 2
which is almost Cp (M) + Kn.
- p. 13/16
Search strategies
Best subset: search all possible models and take the one 2 with highest Ra or lowest Cp. Stepwise (forward, backward or both): useful when the number of predictors is large. Choose an initial model and be greedy. Greedy means always take the biggest jump (up or down) in your selected criterion.
- p. 14/16
Implementations in R
Best subset: use the function leaps. Works only for multiple linear regression models. Stepwise: use the function step. Works for any model with Akaike Information Criterion (AIC). In multiple linear regression, AIC is (almost) a linear function of Cp. Here is an example.
- p. 15/16
Caveats
Many other criteria have been proposed. Some work well for some types of data, others for different data. These criteria are not direct measures of predictive power. Later we will see cross-validation which is an estimate of predictive power.
- p. 16/16
Tags
FE1002 Digital CL-329 WA85U3 Review DCR-DVD610E RW3320 Versatis MAX AF400FTZ HBT 144F CE 430 TXP42G15E LC-52D64U GFX-8 40-18 S Everclassic VC9700 X-360 TC-26LX70 112 GV-650 D-NE510 ADA8000 SGH-D600S LN40C530f1M RMX50 Urei 813C CY-PA4003U DV363-K 2333 HD MCC3880EM EL-1750piii VS3251 836 MT HTR-5630 XA-S 02XX System Explorer Plus NM328DVD 700 TV B7300 DVD-P185 Equalizer Triobrake A45-S121 VRX935VD ICF-SW11 HR6835 Sibelius 3 CDP-C661 GS-1524 AX-397 XM-SD14X Aspire-RC500 STR-DA50ES 4 7 AV-29QH4SU LAR-351BUC GR-AX777 MG-5886AL PA205 ALL-IN-ONE CD140 CDE-9841R MW500 Df Igps Mark-S 712D Meter LT17N13w1 DES-3224 I845 Seiko 7S26 WP3861 GC8220 02 AEG-electrolux ZA2 Yzea 650 PCV-RS402 T-4211 LE40A615 ML1610-TED PRA-1500 LN40A630m1F IC-M125 Suite 10 HK695 Cc-MT400 HTC Tytn WFL 288Y Proceed AMP2 Theater 21FX4AGE LF651D Nikkormat FT2 Pspa6 ROC 4238 Syncmaster 460P RS7000 PRO 4000 ZR-7S MB-4344B D-NF421
manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding
Sitemap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101




