Name:
Contemporary Statistical Models for the Plant and Soil Sciences PDF
Published Date:
11/13/2001
Status:
[ Active ]
Publisher:
CRC Press Books
Preface
To the Reader
Statistics is essentially a discipline of the twentieth century, and for several decades it was keenly involved with problems of interpreting and analyzing empirical data that originate in agronomic investigations. The vernacular of experimental design in use today bears evidence of the agricultural connection and origin of this body of theory. Omnipresent terms, such as block split-plot or , emanated from descriptions of blocks of land and experimental plots in agronomic field designs. The theory of randomization in experimental work was developed by Fisher to neutralize in particular the spatial effects among experimental units he realized existed among field plots. Despite its many origins in agronomic problems, statistics today is often unrecognizable in this context. Numerous recent methodological approaches and advances originated in other subject-matter areas and agronomists frequently find it difficult to see their immediate relation to questions that their disciplines raise. On the other hand, statisticians often fail to recognize the riches of challenging data analytical problems contemporary plant and soil science provides. One could gain the impressions that
• statistical methods of concern to plant and soil scientists are completely developed and understood;
• the analytical tools of classical statistical analysis learned in a one- or two-semester course for non-statistics majors are sufficient to cope with data analytical problems;
• recent methodological work in statistics applies to other disciplines such as human health, sociology, or economics, and has no bearing on the work of the agronomist;
• there is no need to consider contemporary statistical methods and no gain in doing so.
These impressions are incorrect. Data collected in many investigations and the circumstances under which they are accrued often bear little resemblance to classically designed experiments. Much of the data analysis in the plant and soil sciences is nevertheless viewed in the experimental design framework. Ground and remote sensing technology, yield monitoring, and geographic information systems are but a few examples where analysis cannot necessarily be cast, nor should it be coerced, into a standard analysis of variance framework. As our understanding of the biogical/physical/environmental/ecological mechanisms increases, we are more and more interested in what some have termed the space/time dynamics of the processes we observe or set into motion by experimentation. It is one thing to collect data in space and/or over time, it is another matter to apply the appropriate statistical tools to infer what the data are trying to tell us. While many of the advances in statistical methodologies in past decades have not explicitly focused on agronomic applications, it would be incorrect to assume that these methods are not fruitfully applied there. Geostatistical methods, mixed models for repeated measures and longitudinal data, generalized linear models for non-normal (= non-Gaussian) data, and nonlinear models are cases in point.
The dedication of time, funds, labor, and technology to study design and data accrual often outstrip the efforts devoted to the analysis of the data. Does it not behoove us to make the most of the data, extract the most information, and apply the most appropriate techniques? Data sets are becoming richer and richer and there is no end in sight to the opportunities for data collection. Continuous time monitoring of experimental conditions is already a reality in biomedical studies where wristwatch-like devices report patient responses in a continuous stream. Through sensing technologies, variables that would have been observed only occasionally and on a whole-field level can now be observed routinely and spatially explicit. As one colleague put it: "What do you do the day you receive your first five million observations?" We do not have (all) the answers for data analysis needs in the information technology age. We subscribe wholeheartedly, however, to its emerging philosophy : Do not to be afraid to get started, do not to be afraid to stop, and apply the best available methods along the way.
In the course of many consulting sessions with students and researchers from the life sciences, we realized that the statistical tools covered in a one- or two-semester statistical methods course are insufficient to cope successfully with the complexity of empirical research data. Correlated, clustered, and spatial data, non-Gaussian (non-Normal) data and nonlinear responses are common in practice. The complexity of these data structures tends to outpace the basic curriculum. Most studies do not collect just one data structure, however. Remotely sensed leaf area index, repeated measures of plant yield, ordinal responses of plant injury, the presence/absence of disease and random sampling of soil properties, for example, may all be part of one study and comprise the threads from which scientific conclusions must be woven. Diverse data structures call for diverse tools. This text is an attempt to squeeze between two covers many statistical methods pertinent to research in the life sciences. Any one of the main chapters (§4 to 9) could have easily been expanded to the size of the entire text, and there are several excellent textbooks and monographs that do so. Invariably, we are guilty of omission.
To the User
Contemporary statistical models cannot be appreciated to their full potential without a good understanding of theory. Hence, we place emphasis on that. They also cannot be applied to their full potential without the aid of statistical software. Hence, we place emphasis on that. The main chapters are roughly equally divided between coverage of essential theory and applications. Additional theoretical derivations and mathematical details needed to develop a deeper understanding of the models can be found on the companion CD-ROM. The choice to focus on The SAS® System for calculations was simple. It is, in our opinion, the most powerful statistical computing platform and the most widely available and accepted computing environment for statistical problems in academia, industry, and government. In rare cases when procedures in SAS® were not available and macros too cumbersome we employed the S-PLUS® package, in particular the S+SpatialStats® module. The important portions of the executed computer code are shown in the text along with the output. All data sets and SAS® or S-PLUS® codes are contained on the CD-ROM.
To the Instructor
This text is both a reference and textbook and was developed with a reader in mind who has had a first course in statistics, covering simple and multiple linear regression, analysis of variance, who is familiar with the principles of experimental design and is willing to absorb a few concepts from linear algebra necessary to discuss the theory. A graduate-level course in statistics may focus on the theory in the main text and the mathematical details appendix. A graduate-level service course in statistical methods may focus on the theory and applications in the main chapters. A graduate-level course in the life sciences can focus on the applications and through them develop an appreciation of the theory. Chapters 1 and 2 introduce statistical models and the key data structures covered in the text. The notion of clustering in data is a recurring theme of the text. Chapter 3 discusses requisite linear algebra tools, which are indispensable to the discussion of statistical models beyond simple analysis of variance and regression. Depending on the audiences previous exposure to basic linear algebra, this chapter can be skipped. Several possible course concentrations are possible. For example,
1. A course on linear models beyond the basic stats-methods course: §1, 2, (3), 4
2. A course on modeling nonlinear response: §1, 2, (3), 5, 6, parts of 8
3. A course on correlated data: §1, 2, (3), 7, 8, parts of 9
4. A course on mixed models: §1, 2, (3), parts of 4, 7, 8
5. A course on spatial data analysis: §1, 2, (3), 9
In a statistics curriculum the coverage of §4 to 9 should include the mathematical details and special topics sections §A4 to A9.
We did not include exercises in this text; the book can be used in various types of courses at different levels of technical difficulty. We did not want to suggest a particular type or level through exercises. Although the applications (case studies) in §5 to 9 are lengthy, they do not consitute the final word on any particular data. Some data sets, such as the Hessian fly experiment or the Poppy count data are visited repeatedly in different chapters and can be tackled with different tools. We encourage comparative analyses for other data sets. If the applications leave the reader wanting to try out a different approach, to tackle the data from a new angle, and to improve upon our analysis, we wronged enough to get that right.
This text would not have been possible without the help and support of others. Data were kindly made available by A.M. Blackmer, R.E. Byers, R. Calhoun, J.R. Craig, D. Gilstrap, C.A. Gotway Crawford, J.R. Harris, L.P. Hart, D. Holshouser, D.E. Karcher, J.J. Kells, J. Kelly, D. Loftis, R. Mead, G. A. Milliken, P. Mou, T.G. Mueller, N.L. Powell, R. Reed, J. D. Rimstidt, R. Witmer, J. Walters, and L.W. Zelazny. Dr. J.R. Davenport (Washington State University-IRAEC) kindly provided the aerial photo of the potato circle for the cover. Several graduate students at Virginia Tech reviewed the manuscript in various stages and provided valuable insights and corrections. We are grateful in particular to S.K. Clark, S. Dorai-Raj, and M.J. Waterman. Our thanks to C.E. Watson (Mississippi State University) for a detailed review and to Simon L. Smith for EXP®. Without drawing on the statistical expertise of J.B. Birch (Virginia Tech) and T.G. Gregoire (Yale University), this text would have been more difficult to finalize. Without the loving support of our families it would have been impossible. Finally, the fine editorial staff at CRC Press LLC, and in particular our editor, Mr. John Sulzycki, brought their skills to bear to make this project a reality. We thank all of these individuals for contributing to the parts of the book that are right. Its flaws are our responsibility.
| Edition : | 01 |
| Number of Pages : | 762 |
| Published : | 11/13/2001 |
| isbn : | 978-1-58488-1 |