# Basic Statistics I – Definitions of commonly used Statistical Terms

When talking of statistics, we come across many simple terms. These Basics are called as basic statistics. we come across them day in and day out, when ever we are working on any statistics. Without these basic statistics terms, we cannot understand anything with respect to our statistical problem. Most of them are also used in our daily life, but may be with a different name in statistics

• Average – Also called the mean, it is the arithmetic average of all of the sample values. It is calculated by adding all of the sample values together and dividing by the number of elements (n) in the sample.
• Central Tendency – A measure of the point about which a group of values is clustered; two measures of central tendency are the mean, and the median.
• Characteristic – A process input or output which can be measured and monitored.
• Cycle Time – The total amount of elapsed time expended from the time a task, product or service is started until it is completed.
• Long-term Variation – The observed variation of an input or output characteristic which has had the opportunity to experience the majority of the variation effects that influence it.
• Median – The middle value of a data set when the values are arranged in either ascending or descending order.
• Mode : The data point which occurs at maximum frequency
• Lower Control Limit (LCL) –  for control charts: the limit above which the subgroup statistics must remain for the process to be in control. Typically, 3 standard deviations below the central tendency.
• Lower Specification Limit (LSL) – The lowest value of a characteristic which is acceptable.Range – A measure of the variability in a data set. It is the difference between the largest and smallest values in a data set.
• Specification Limits – The bounds of acceptable performance for a characteristic.
• Standard Deviation – One of the most common measures of variability in a data set or in a population. It is the square root of the variance.
• Trend – A gradual, systematic change over time or some other variable.
• Upper Control Limit (UCL) for Control Charts – The upper limit below which a process statistic must remain to be in control. Typically this value is 3 standard deviations above the central tendency.
• Upper Specification Limit (USL) – The highest value of a characteristic which is acceptable.
• Variability – A generic term that refers to the property of a characteristic, process or system to take on different values when it is repeated.
• Variables – Quantities which are subject to change or variability.
• Variable Data – Data which is continuous, which can be meaningfully subdivided, i.e. can have decimal subdivisions.
• Variance – A specifically defined mathematical measure of variability in a data set or population. It is the square of the standard deviation.

Originally posted 2011-10-20 12:53:00.

# Types of Data

The first step of any statistical enquiry is the collection of relevant numerical data. The types of data used for statistical purposes is mainly classified as primary data and secondary data

#### Variable

a characteristic of population that can take different values (e.g., defects, processing time).

#### Data

Data are measurements collected on a variable

#### Primary Data

Data collected for the purpose of the given inquiry is called as Primary Data. These are collected by the enquirer, either by his own or through some agency set up for this purpose, directly from the field of enquiry. This type of data can be used with greater confidence because the enquirer himself decides upon the coverage of the data, definitions to be used and as such will have a greater control over the reliability of the data.

#### Secondary Data

The data already collected by some other agency or for some other purpose and available in published or un published form is known as secondary data. The user has to be perticularly careful about using using such data. The user must clearly understand the nature of the data, their coverage, the definitions used for the data and their reliability.
The usage of secondary data is generally preferred if the conditions mentioned above are clear and usable. This will reduce the time taken for the analysis, also reduces cost of the analysis.

#### Discrete Data

Count or frequency of occurrence

#### Attribute Data

Data which on one of a set of discrete values such as pass or fail, yes or no.

#### Continuous Data

Measurements that can be meaningfully divided into finer and finer increments of precision

#### Usage of Sampling

The big question is weather the collection of data should be done by complete population or by sampling. If sample is used, care should be taken that this is a representative of complete population. A sample designed with care can produce results that may be sufficiently accurate for the purpose of enquiry. A Carefully designed sample can save a lot of time and money.

#### Methods of Data Collection

The methods used to collect data are Questionnaire Method, Interview Method and Direct Observation Method. Any one or a combination of these are used to collect data.

#### Usage of Data

The data collected should be subjected to a thorough scrutiny to see if they may be considered correct. The success of the analysis depends on the reliability of the data. However excellent the statistical method of data analysis may be, they cannot bring out useful and reliable information from faulty, unreliable of mistaken data. Especially, this is more applicable in case of usage of secondery data.

Like this?? – Go on the visit the next column – Statistics – 3: presntations and Organization

Originally posted 2011-10-01 11:14:00.

# Statistics – 1 – What is Statistics ?

Statistics can be described as a quantitative method of scientific investigations.
If used as  plural noun ‘Statistics’ means the numerical data arising out of any sphere of human experience.
Used as singular ‘Statistics’ is the name for the body of scientific methods used for collection, analysis, Organizing, and interpretation of Numerical data.
According to American Statistical Association “Statistics” is the scientific application of mathematical principles to the collection, analysis, and presentation of numerical data’
Also, There is a different meaning for the word ‘Statistic’ in the field of Statistics(subject). In this sense A ‘Statistic’ is a numerical item which are produced by the some calculations using the data. Standard Deviation, Mean etc are called as ‘Statistic’  in this sense.
This is one arm of Mathematics, which is extensively used in all most every field. It has become an important tool in the work of many academic disciplines such as medicine, psychology, education, sociology, engineering and physics, just to name a few. It is also important in many aspects of society such as business, industry and government. Because of the increasing use of statistics in so many areas of our lives, it has become very desirable to understand and practice statistical thinking. This is important even if you do not use statistical methods directly.
Even with so many uses, there is some mistrust in public about the subject. This is because of the misuse of the figures by the people for their convenience. During the introduction to the course i joined on, this statement is used. There are 3 types of lies. 1 – Lies, 2- damned Lies 3-Statistics. We will teach you the 3rd part here.
Used properly statistics is a panacea for all the problems faced by the world. it can be a tremendous tool for the growth of any organization.
Visit the next post Data Collection – Types of Data

Originally posted 2011-09-30 20:00:00.

# What are the 7 Basic Quality Tools

These are the Most used basic quality tools in solving Quality related Problems. These are suitable for those people with little or minimal formal training on Statistics. These are the seven basic Graphical techniques which help in solving vast majority of problems.
The History of these tools is interesting.
In the 1950, Just after the 2nd world war, Japan was concentrating on rebuilding. One of the initiatives was invitation to the legendary American Quality Guru, W. Edwards Deming to Japan by the JUSE(Japanese Union of Scientists and Engineers) to train hundreds of Japanese Engineers, Managers and Scholars on the subject of Statistical Process control. During the Hundreds of lectures delivered by Deming, the emphasis was on basic tools which were available on the process control.
Taking Cue from these, Kaoru Ishikawa, at the time an associate professor at the University of Tokyo and a member of JUSE, Developed these tools. His chief desire was to democratize Quality i.e  he wanted to make quality control comprehensible to all workers, and inspired by Deming’s lectures, he formalized the Seven Basic Tools of Quality Control. He believed that  90% of a company’s problems could be improved using these seven tools, and that  they could easily be taught to any member of the organization. This ease of use combined with their graphical nature makes statistical analysis makes interesting to all.
These are listed below.

1. Check Sheets – A generic Tool which can be used for collection and analysis of data. A structured and prepared form that can be adapted for wide variety of issues
2. Control Charts – This is a graphical technique,which can be used to study the changes to a process over time
3. Pareto Chart – This is another graphical technique, which can be used to identify the significance of individual factors
4. Scatter Chart – This is used to identify the relation between variables, by plotting pairs of numerical data, with one variable on each axis. The points will be falling on a line or a curve, if the variables are related.
5. Cause and Effect Diagram (Also called as Ishikawa Diagram or Fishbone diagram) – This can be used to structure the brain Storming Sessions. It is used to sort ideas into useful categories. Many Possible Causes are identified for a stated problem and the effect on the problem are identified
6. Flow Chart (Stratification Charts)- This tool is used to identify the patterns within the data collected from multiple sources and clubbed together. It is used to identify the meaning of the vast data by identifying patterns.
7. Histogram – It looks very much like a bar chart. it is used to identify the frequency of occurrence of a variable in a set of data.