File Name: data analysis and regression mosteller and tukey .zip
Tukey and Data Analysis David C. Hoaglin Abstract. From the time that John W. Tukey started to do serious work in statistics, he was interested in problems and techniques of data analysis. Some people know him best for exploratory data analysis, which he pioneered, but he also made key contributions in analysis of variance, in regression and through a wide range of applications.
This paper reviews illustrative contributions in these areas. Key words and phrases: Analysis of variance, exploratory data analysis, regression. Indeed, I don t think it would be an exaggeration to say that most of John s contributions to statistics involved or grew out of problems in data analysis. Even if one focuses on data analysis itself, the number is large. It would be reasonable to include the volume on Graphics: , and parts of other volumes surely count as well e.
And I haven t mentioned the applications, where the object was to analyze the data themselves; nearly all sets of data have some distinctive features, and it s hard to imagine that an analysis with John as a participant would be routine. This brief account cannot hope to cover more than a small fraction of such a corpus. Thus I offer a selection of topics, chosen to highlight several major areas: exploratory data analysis, of course, and also analysis of variance, regression and applications.
In some instances I illustrate the way that John developed techniques and refined them over a period of years. A more comprehensive account would surely include time series and spectrum analysis; fortunately, Brillinger covers that area in depth.
David C. A key influence was the biometrician and data analyst Charles P. Winsor, from whom as John said in the dedication of EDA he learned much that could not have been learned elsewhere. For some of the roots of John s attitude toward data analysis, however, it helps to look a little farther back. When he came to Princeton as a graduate student in , he was in the Chemistry Department he had completed his undergraduate education and taken a master s degree in chemistry at Brown University.
In his foreword to the philosophy volumes of CWJWT, John explained, A respectable physical-science education officially in chemistry, but with large doses of physics and substantial doses of geology probably helped me a lot in understanding the character of the problems to which data brought to me were intended to be relevant.
A purely mathematical background would, I believe, have left me at a severe disadvantage. Given a reasonable sensitivity to the underlying issues, seeing many sets of data seems to have made it natural to try to think about techniques in terms of the needs they might fill and the gaps The philosophy that appears in these two volumes is far more based on a bottom up approach than on a top down one. This background helps me to understand his approach.
So does a discussion in the last part of The Future of Data Analysis Tukey, a : If we are to make progress in data analysis, as it is important that we should, we need to pay attention to our tools and our attitudes. If these are adequate, our goals will take care of themselves.
We dare not neglect any of the tools that have proved useful in the past. But equally we dare not find ourselves confined to their use. If algebra and analysis cannot help us, we must press on just the same, making as good use of intuition and originality as we know how. In particular we must give very much more attention to what specific techniques and procedures do when the hypotheses on which they are customarily developed do not hold. And in doing this we must take a positive attitude, not a negative one.
It is not sufficient to start with what it is supposed to be desired to estimate, and to study how well an estimator succeeds in doing this. We must give even more attention to starting with an estimator and discovering what is a reasonable estimand, to discovering what it is reasonable to think of the estimator as estimating. To those who hold the ossified view that statistics is optimization such a study is hindside before, but to those who believe that the purpose of data analysis is to analyze data better it is clearly wise to learn what a procedure really seems to be telling us about.
It would be hard to overemphasize the importance of this approach as a tool in clarifying situations. A page or two later in that paper John turns to attitudes: Almost all the most vital attitudes can be described in a type form: willingness to face up to X. He discusses a number of X s, including quotation marks omitted more realistic problems, the necessarily approximate nature of useful results in data analysis, the need for collecting the results of actual experience with specific data-analytic techniques, the need for iterative procedures, free use of ad hoc and informal procedures, and the fact that data analysis is intrinsically an empirical science.
In the area of attitudes, and with plenty of intuition and originality, John practiced what he preached. Some of these attitudes contribute to flexibility, an important theme in John s approach to data analysis.
The separation between exploratory data analysis and confirmatory data analysis allowed exploratory data analysis to proceed freely, without adherence to a unified framework, and assigned to confirmatory data analysis the systematic task of assessing the strength of the evidence.
In what he wrote, John gave much more attention to exploratory. As he explained in the foreword to the philosophy volumes, there is little doubt that exploratory gets more attention here than would be its fair share, if these two volumes were to be one s only reference and guide.
Emphasis, however, was rightly placed where the need for more attention was greatest. Exploratory data analysis has, to my joy, been receiving more and more attention, but the pendulum of relative attention has not yet reached the balance point, through which it will, no doubt, overswing. In commenting on Bayesian analysis he mentioned a natural, but dangerous desire for a unified approach and remarked that the greatest danger I see from Bayesian analysis stems from the belief that everything that is important can be stuffed into a single quantitative framework.
For me these comments illustrate his avoidance of frameworks and unification for data analysis more generally. The limited preliminary edition of the book came out, in three xeroxed volumes, in and Tukey, c, d, a , and, after further development, the first edition followed in Tukey, a. A few years later the two volumes of The Statistician s Guide to Exploratory Data Analysis Hoaglin, Mosteller and Tukey, b, b provided conceptual and logical support for selected techniques and explained connections with classical statistical theory.
From the publication of the limited preliminary edition, EDA received an enthusiastic welcome, especially in fields that analyze data and apply statistics. In it. By now a number of those techniques have become part of statistical instruction at all levels. So, at the level of tools, the impact of EDA has been broad and lasting. I am not so sure about the attitudes, which require more effort to teach and more reflection, but I am hopeful that they will continue to spread and have a positive impact.
In presenting the techniques that illustrate and apply these themes, John gave us a torrent of terminology, much of it newly coined. For example, stem-andleaf display, hinges and other letter values, box-andwhisker plot, the bulging rule, running medians, wandering schematic plot, median polish, two-way plot, diagnostic plot, froot and flog, reroughing, double root, product-ratio analysis and pseudospreads. Some of the basic ideas appear much earlier in John s work though not always in publications.
For example, he was interested in resistance and robustness in the s, as well as re-expression in One Degree of Freedom for Non-Additivity Tukey, h , which I discuss below. From the EDA contributions I have selected a few that illustrate how John developed techniques over a period of time. In these instances his approach was to devise a technique building on insight and experience , use it on diverse data and modify or fine-tune it or, perhaps, scrap it.
This approach is a natural application of some of the attitudes that I quoted earlier from The Future of Data Analysis. These are based on the hinges, H L and H U, which are approximate quartiles of the batch. The aim was not to have a formal rule for declaring an observation an outlier, but to call attention to such data for further investigation.
The values of k have remained at 1. The most frequent application is in boxplots, as illustrated in Figure 1. The inner fences determine which data values should be plotted individually at the ends of a boxplot, and thus how far out the whiskers extend. John did not arrive at 1. Example of a boxplot. The box extends from the lower hinge to the upper hinge and has a line across it at the median. The whiskers show the extent of the data inside the inner fences, and four observations are outside at the upper end.
John put together a favored smoother from a number of building blocks: Running medians 3. Repeated smoothing or resmoothing R use the output of a smoothing operation most often running medians of 3 as input to that same smoothing operation and continue until no changes occur.
An end-value rule to handle y 1 and y T. A component to handle steadily increasing sequences: hanning H after von Hann , a weighted average with weights 1 4, 1 2, 1 4. Residuals the rough. Extracting an additional smooth from the rough reroughing.
If both smoothers are the same, reroughing is called twicing. Theory related to such smoothers and their component operations was subsequently developed by Mallows and Velleman The process of experimentation continued, and by about John had settled on two simpler choices: 3R and 3R then 3pR. The operation 3p handles sequences in which two consecutive values are equal Thus 3p replaced the earlier operation of splitting 2-point peaks and valleys, useful because a running median of 3 leaves them unchanged.
John sometimes found that he wanted something a bit more complicated than the simplest re-expressions, so he expanded his toolkit by adding hybrids of two reexpressions. These use one re-expression to the left of x 0 and the other to the right of x 0, matching their value and slope at x 0. Here the two re-expressions are matched at the median, M, and the hybrid re-expression is matched to the data at M.
More experience from use by others would be instructive. Thus alerted, it is easy for me to see the data-analytic motivation for the pigeonhole model Cornfield and Tukey, f , but the thrust of that paper and related ones is more methodological. Nonconstancy of variability and nonnormality had received considerable attention in the literature. John noted that he had more often needed to be concerned with nonadditivity, and he showed how, in a row-by-column table, to isolate a one-degree-of-freedom piece from the residual sum of squares, with the expectation that this piece would capture: discrepant observations; systematic behavior associated with analyzing the data in a scale where the effects for rows and columns are not additive.
John indicated how to use this information in choosing a transformation to reduce or remove the nonadditivity, but he did not take this aspect very far, because of limited experience.
The ability to distinguish systematic nonadditivity from discrepant observations was much enhanced by obtaining the fit and residuals from median polish Tukey, a, Section 11A. Looking back on this line of development in the foreword to Volume VII of CWJWT, he listed four branches of extensions from ODOFFNA pages l li : higher-order single degrees of freedom; the recognition that a purely multiplicative fit differs from an additive fit by an odoffna single degree of freedom [the PLUS-one fit]; breakdowns into low-rank but not single-degree-offreedom constituents [as in the vacuum cleaner, higher-rank fits in McNeil and Tukey c , and work by John Mandel, Ruben Gabriel and others, conveniently summarized by Emerson and Wong ]; graphical replacement of odoffna by diagnostic plots.
I am grateful to Bert Green for some of the background details. The initial stimulus was a set of data analyzed by Johnson and Tsao and published by Johnson The work began in , when John recruited Bert, then a second-year graduate student in psychometrics, to help with reanalyzing the data. As that analysis unfolded and each iteration suggested the next, Bert did the calculations on an electromechanical calculator!
The complexity of the example was genuine: the data layout had observations, and the initial ANOVA table had 39 lines. The dataset itself is of interest, especially because John used it as a source of examples over the years. The experiment, from psychophysics, measured difference limens for weights by a method of continuous change. An aluminum pail was attached by a lever system to a ring on the subject s finger.
View Larger Image. Ask Seller a Question. Binding has slight discoloration. Rest of the text is flawless with crisp pages. Text is free of writing, marks, and highlighting. Hand inspected for quality and flaws.
View larger cover. Pearson Higher Education offers special pricing when you choose to package your text with other student resources. If you're interested in creating a cost-saving package for your students contact your Pearson Higher Education representative. Nobody is smarter than you when it comes to reaching your students. You know how to convey knowledge in a way that is relevant and relatable to your class.
Locate a company worth investing in Germany. The selection of "good to great" by Jim Collins basis. The final selection of firms draw less than Please note of the recent trend ROE and dividend payout ratio of over five years to research information.
He is also credited with coining the term ' bit ' and the first published use of the word software. Tukey was born in New Bedford, Massachusetts , in to a Latin teacher father and a private tutor mother. He was mainly taught by his mother and attended regular classes only for special subjects like French. He is claimed to have helped design the U-2 spy plane. In , Tukey was elected to the American Philosophical Society.
Mosteller, John W. Tukey as one of your book collection! But, it is not in your cabinet compilations. Tukey that is supplied in soft file. Tukey currently as well as in the link supplied.
Request a copy. Additional order info. Buy this product.
Несмотря на солидный заработок, Танкадо ездил на службу на стареньком мопеде и обедал в одиночестве за своим рабочим столом, вместо того чтобы вместе с сослуживцами поглощать котлеты из телятины и луковый суп с картофелем - фирменные блюда местной столовой. Энсей пользовался всеобщим уважением, работал творчески, с блеском, что дано немногим. Он был добрым и честным, выдержанным и безукоризненным в общении. Самым главным для него была моральная чистота. Именно по этой причине увольнение из АН Б и последующая депортация стали для него таким шоком. Танкадо, как и остальные сотрудники шифровалки, работал над проектом ТРАНСТЕКСТА, будучи уверенным, что в случае успеха эта машина будет использоваться для расшифровки электронной почты только с санкции министерства юстиции. Использование ТРАНСТЕКСТА Агентством национальной безопасности должно было регулироваться примерно так же, как в случае ФБР, которому для установки подслушивающих устройств необходимо судебное постановление.
Ebook] Free Ebook Data Analysis And Regression A Second Course In Statistics By Frederick Mosteller John W icel3.org 7 63 0. 1 / 7. 2 / 7. 3 / 7. Loading.Avice L. 19.03.2021 at 13:04
Tukey and Data Analysis David C.