Business Accounting and Bookkeeping


SCREEN SHOOT:

IN this video discus how to manage the books of account of  business.

  1.  The  term bookkeeping means not same different people give different ideas:

  1. Some people thinking that bookkeeping is the same as accounting. They assume that keeping a company's books and preparing its financial statements and tax reports are all part of bookkeeping. Accountants do not share their view.
  2. Some people see t bookkeeping as limited to recording transactions in journals or daybooks and then posting the amounts into accounts in ledgers. After the amounts are posted, the bookkeeping has ended and an accountant with a college degree takes over. The accountant will make adjusting entries and then prepare the financial statements and other reports.
  3. The past distinctions between bookkeeping and accounting have become blurred with the use of computers and accounting software. For example, a person with little bookkeeping training can use the accounting software to record vendor invoices, prepare sales invoices, etc. and the software will update the accounts in the general ledger automatically. Once the format of the financial statements has been established, the software will be able to generate the financial statements with the click of a button.


Marketing and distribution




Marketing are activities of a company associated with buying and selling a product or service. It includes advertising, selling and delivering products to people. People who work in marketing departments of companies try to get the attention of target audiences by using slogans, packaging design, celebrity endorsements and general media exposure.



I.T Services for business


SCREEN SHOOT:

Information technology (IT) is the study of computer and other things are related with IT like mobile games tablet  application of computers to store, study, retrieve, transmit, and manipulate data,or information, often in the context of a business or other enterprise. IT is considered a subset of information and communications technology (ICT).

Product Designed for Marketing





BUSINESS PLAN



Business. An organization or economic system where goods and services are exchanged for one another or for money.The businesses is divided in to three type of business one sole proprietorship, Partnership, Company , than company divided to more types butdiscus next time.  

HOW TO START NEW BUSINESS





Business. An organization or economic system where goods and services are exchanged for one another or for money.The businesses is divided in to three type of business one sole proprietorship, Partnership, Company , than company divided to more types but discus next time.  

NONRANDOM SAMPLING

‘Nonrandom sampling’ implies that kind of sampling in which the population units are drawn into the sample by using one’s personal judgment. This type of sampling is also known as purposive sampling. Within this category, one very important type of sampling is known as Quota Sampling.

QUOTA SAMPLING

In this type of sampling, the selection of the sampling unit from the population is no longer dictated by chance. A sampling frame is not used at all, and the choice of the actual sample units to be interviewed is left to the discretion of the interviewer. However, the interviewer is restricted by quota controls. For example, one particular interviewer may be told to interview ten married women between thirty and forty years of age living in town X, whose husbands are professional workers, and five unmarried professional women of the same age living in the same town. Quota sampling is often used in commercial surveys such as consumer market-research. Also, it is often used in public opinion polls.

ADVANTAGES OF QUOTA SAMPLING

· There is no need to construct a frame.
· It is a very quick form of investigation.
· Cost reduction.

SIMPLE RANDOM SAMPLING

In this type of sampling, the chance of any one element of the parent pop being included in the sample is the same as for any other element. By extension, it follows that, in simple random sampling, the chance of any one sample appearing is the same as for any other. There exists quite a lot of misconception regarding the concept of random sampling. Many a time, haphazard selection is considered to be equivalent to simple random sampling. For example, a market research interviewer may select women shoppers to find their attitude to brand X of a product by stopping one and then another as they pass along a busy shopping area --- and he may think that he has accomplished simple random sampling!
Actually, there is a strong possibility of bias as the interviewer may tend to ask his questions of young
attractive women rather than older housewives, or he may stop women who have packets of brand X prominently on show in their shopping bags!.
In this example, there is no suggestion of INTENTIONAL bias! From experience, it is known that the human being is a poor random selector --- one who is very subject to bias.
Fundamental psychological traits prevent complete objectivity, and no amount of training or conscious effort can eradicate them. As stated earlier, random sampling is that in which population units are selected by the lottery method. As you know, the traditional method of writing people’s names on small pieces of paper, folding these pieces of paper and shuffling them is very cumbersome!
A much more convenient alternative is the use of RANDOM NUMBERS TABLES.
A random number table is a page full of digits from zero to 9. These digits are printed on the page in a TOTALLY

random manner i.e. there is no systematic pattern of printing these digits on the page.

OTHER TYPES OF RANDOM SAMPLING

· ·Stratified sampling (if the population is heterogeneous)
· Systematic sampling (practically, more convenient than simple random sampling)
· Cluster sampling (sometimes the sampling units exist in natural clusters)
· Multi-stage sampling
All these designs rest upon random or quasi-random sampling. They are various forms of PROBABILITY sampling ---
that in which each sampling unit has a known (but not necessarily equal) probability of being selected.
Because of this knowledge, there exist methods by which the precision and the reliability of the estimates can be calculated OBJECTIVELY.
It should be realized that in practice, several sampling techniques are incorporated into each survey design, and only rarely will simple random sample be used, or a multi-stage design be employed, without stratification. The point to remember is that whatever method be adopted, care should be exercised at every step so as to make the results as reliable as possible.

SAMPLING & NON-SAMPLING ERRORS

1. SAMPLING ERROR

The difference between the estimate derived from the sample (i.e. the statistic) and the true population value (i.e. the parameter) is technically called the sampling error. For example,
Sampling error = X -U
Sampling error arises due to the fact that a sample cannot exactly represent the pop, even if it is drawn in a correct manner

2. NON-SAMPLING ERROR

Besides sampling errors, there are certain errors which are not attributable to sampling but arise in the process of data
collection, even if a complete count is carried out.
Main sources of non sampling errors are:
· The defect in the sampling frame.
· Faulty reporting of facts due to personal preferences.
· Negligence or indifference of the investigators
· Non-response to mail questionnaires.
These (non-sampling) errors can be avoided through
· Following up the non-response,
· Proper training of the investigators.
· Correct manipulation of the collected information,
Let us now consider exactly what is meant by ‘sampling error’: We can say that there are two types of non-response ---
partial non-response and total non-response. ‘Partial non-response’ implies that the respondent refuses to answer some of the questions. On the other hand, ‘total non-response’ implies that the respondent refuses to answer any of the questions. Of course, the problem of late returns and non-response of the kind that I have just mentioned occurs in the case of HUMAN populations. Although refusal of sample units to cooperate is encountered in interview surveys, it is far more of a problem in mail surveys. It is not uncommon to find the response rate to mail questionnaires as low as 15 or 20%.The provision of INFORMATION ABOUT THE PURPOSE OF THE SURVEY helps in stimulating interest, thus increasing the chances of greater response. Particularly if it can be shown that the work will be to the ADVANTAGE of the respondent IN THE LONG RUN.
Similarly, the respondent will be encouraged to reply if a pre-paid and addressed ENVELOPE is sent out with the questionnaire. But in spite of these ways of reducing non-response, we are bound to have some amount of non-response.
Hence, a decision has to be taken about how many RECALLS should be made.
The term ‘recall’ implies that we approach the respondent more than once in order to persuade him to respond to our queries.
Another point worth considering is:
How long the process of data collection should be continued? Obviously, no such process can be carried out for an indefinite period of time! In fact, the longer the time period over which the survey is conducted, the greater will be the potential VARIATIONS in attitudes and opinions of the respondents. Hence, a well-defined cut-off date generally needs to be established. Let us now look at the various ways in which we can select a sample from our population. We begin by looking at the difference between non-random and RANDOM sampling. First of all, what do we mean by nonrandom sampling?

SAMPLING FRAME

A sampling frame is a complete list of all the elements in the population. For example:

  • The complete list of the BCS students of Virtual University of Pakistan on February 15, 2003
  • Speaking of the sampling frame, it must be kept in mind that, as far as possible, our frame should be free fromvarious types of defects:
  • does not contain inaccurate elements
  • is not incomplete
  • is free from duplication, and
  • Is not out of date.

Next, let’s talk about the sample that we are going to draw from this population.
As you all know, a sample is only a part of a statistical population, and hence it can represent the population to only to some extent. Of course, it is intuitively logical that the larger the sample, the more likely it is to represent the population. Obviously, the limiting case is that: when the sample size tends to the population size, the sample will tend to be identical to the population. But, of course, in general, the sample is much smaller than the population.
The point is that, in general, statistical sampling seeks to determine how accurate a description of the population the sample and its properties will provide. We may have to compromise on accuracy, but there are certain such advantages of sampling because of which it has an extremely important place in data-based research studies.

ADVANTAGES OF SAMPLING

1. Savings in time and money.
· Although cost per unit in a sample is greater than in a complete investigation, the total cost will
be less (because the sample will be so much smaller than the statistical population from which
it has been drawn).
· A sample survey can be completed faster than a full investigation so that variations from
sample unit to sample unit over time will largely be eliminated.
· Also, the results can be processed and analyzed with increased speed and precision because
there are fewer of them.
2. More detailed information may be obtained from each sample unit.
3. Possibility of follow-up:
(After detailed checking, queries and omissions can be followed up --- a procedure which might prove impossible in a
complete survey).
4. Sampling is the only feasible possibility where tests to destruction are undertaken or where the population is
effectively infinite.
The next two important concepts that need to be considered are those of sampling and non-sampling errors.

‘POPULATION’

A statistical population is the collection of every member of a group possessing the same basic and defined
characteristic, but varying in amount or quality from one member to another.

EXAMPLES

· Finite population:
IQ’s of all children in a school.
· Infinite population:
Barometric pressure:
(There are an indefinitely large number of points on the surface of the earth).
A flight of migrating ducks in Canada
(Many finite pops are so large that they can be treated as effectively infinite). The examples that we have just
considered are those of existent populations.
A hypothetical population can be defined as the aggregate of all the conceivable ways in which a specified event can
happen.

For Example:


  • All the possible outcomes from the throw of a die – however long we throw the die and record the results,we could always continue to do so far a still longer period in a theoretical concept – one which has no existence in reality.


  •  The No. of ways in which a football team of 11 players can be selected from the 16 possible members named by the Club Manager.

We also need to differentiate between the sampled population and the target population. Sampled population is that from which a sample is chosen whereas the population about which information is sought is called the target population thus our population will consist of the total no. of students in all the colleges in the Punjab.
Suppose on account of shortage of resources or of time, we are able to conduct such a survey on only 5
colleges scattered throughout the province. In this case, the students of all the colleges will constitute the target pop whereas the students of those 5 colleges from which the sample of students will be selected will constitute the sampled population. The above discussion regarding the population, you must have realized how important it is to have a very well-defined population.
The next question is: How will we draw a sample from our population?
The answer is that: In order to draw a random sample from a finite population, the first thing that we need is the complete list of all the elements in our population.
This list is technically called the FRAME.

COLLECTION OF SECONDARY DATA

The secondary data may be obtained from the following sources:
· Official, e.g. the publications of the Statistical Division, Ministry of Finance, the Federal and Provincial
Bureaus of Statistics, Ministries of Food, Agriculture, Industry, Labour, etc.
· Semi-Official, e.g., State Bank of Pakistan, Railway Board, Central Cotton Committee, Boards of Economic
Inquiry, District Councils, Municipalities, etc.
· Publications of Trade Associations, Chambers of Commerce, etc
· Technical and Trade Journals and Newspapers
· Research Organizations such as universities, and other institutions
Let us now consider the POPULATION from which we will be collecting our data. In this context, the first important
question is: Why do we have to resort to Sampling?
The answer is that: If we have available to us every value of the variable under study, then that would be an ideal and a
perfect situation. But, the problem is that this ideal situation is very rarely available --- very rarely do we have access to
the entire population.
The census is an exercise in which an attempt is made to cover the entire population. But, as you might know, even the
most developed countries of the world cannot afford to conduct such a huge exercise on an annual basis!
More often than not, we have to conduct our research study on a sample basis. In fact, the goal of the science of
Statistics is to draw conclusions about large populations on the basis of information contained in small samples.

COLLECTION OF PRIMARY DATA

One or more of the following methods are employed to collect primary data:

  • Direct Personal Investigation
  • Indirect Investigation
  • Collection through Questionnaires
  • Collection through Enumerators
  • Collection through Local Sources

DIRECT PERSONAL INVESTIGATION

In this method, an investigator collects the information personally from the individuals concerned. Since he interviews the informants himself, the information collected is generally considered quite accurate and complete. This method may prove very costly and time-consuming when the area to be covered is vast. However, it is useful for laboratory experiments or localized inquiries. Errors are likely to enter the results due to personal bias of the investigator.

INDIRECT INVESTIGATION

Sometimes the direct sources do not exist or the informants hesitate to respond for some reason or other. In such a case, third parties or witnesses having information are interviewed. Moreover, due allowance is to be made for the personal bias. This method is useful when the information desired is complex or there is reluctance or indifference on the part of the informants. It can be adopted for extensive inquiries.

COLLECTION THROUGH QUESTIONNAIRES

A questionnaire is an inquiry form comprising of a number of pertinent questions with space for entering information asked. The questionnaires are usually sent by mail, and the informants are requested to return the questionnaires to the investigator after doing the needful within a certain period. This method is cheap, fairly expeditious and good for extensive inquiries. But the difficulty is that the majority of the respondents (i.e. persons who are required to answer the questions) do not care to fill the questionnaires in, and to return them to the investigators. Sometimes, the questionnaires are returned incomplete and full of errors. Students, in spite of these drawbacks, this method is considered as the STANDARD method for routine business and administrative inquiries. It is important to note that the questions should be few, brief, very simple, and easy for all respondents answer, clearly worded and not offensive to certain respondents.

COLLECTION THROUGH ENUMERATORS

Under this method, the information is gathered by employing trained enumerators who assist the informants in making the entries in the schedules or questionnaires correctly. This method gives the most reliable information if the enumerator is well-trained, experienced and tactful. Students, it is considered the BEST method when a large-scale governmental inquiry is to be conducted. This method can generally not be adopted by a private individual or institution as its cost would be prohibitive to them.

COLLECTION THROUGH LOCAL SOURCES

In this method, there is no formal collection of data but the agents or local correspondents are directed to collect and send the required information, using their own judgment as to the best way of obtaining it. This method is cheap and expeditious, but gives only the estimates.

PRIMARY AND SECONDARY DATA

Data that have been originally collected (raw data) and have not undergone any sort of statistical treatment, are called PRIMARY data. Data that have undergone any sort of treatment by statistical methods at least ONCE, i.e. the data that have been collected, classified, tabulated or presented in some form for a certain purpose, are called SECONDARY data.


DEFINITION OF COLLECTION OF DATA

The most important part of statistical work is perhaps the collection of data. Statistical data are collected either by a COMPLETE enumeration of the whole field, called CENSUS, which in many cases would be too costly and too time consuming as it requires large number of enumerators and supervisory staff, or by a PARTIAL enumeration associated with a SAMPLE which saves much time and money.

STEPS INVOLVED IN ANY STATISTICAL RESEARCH


  • Topic and significance of the study
  • Objective of your study
  • Methodology for data-collection Source of your data Sampling methodology Instrument for collecting data 
  • As far as the objectives of your research are concerned, they should be stated in such a way that you are absolutely clear about the goal of your study --- EXACTLY WHAT IT IS THAT YOU ARE TRYING TO FIND OUT? As far as the methodology for DATA-COLLECTION is concerned, you need to consider:
  • Source of your data (the statistical population)
  • Sampling Methodology
  • Instrument for collecting data

MEASUREMENT SCALES

By measurement, we usually mean the assigning of number to observations or objects and scaling is a process of measuring. The four scales of measurements are briefly mentioned below:

NOMINAL SCALE:

The classification or grouping of the observations into mutually exclusive qualitative categories or classes is said to constitute a nominal scale. For example, students are classified as male and female. Number 1 and 2 may also be used to identify these two categories. Similarly, rainfall may be classified as heavy moderate and light. We may use number 1, 2 and 3 to denote the three classes of rainfall. The numbers when they are used only to identify the categories of the given scale carry no numerical significance and there is no particular order for the grouping.

ORDINAL OR RANKING SCALE

It includes the characteristic of a nominal scale and in addition has the property of ordering or ranking of
measurements. For example, the performance of students (or players) is rated as excellent, good fair or poor, etc.
Number 1, 2, 3, 4 etc. are also used to indicate ranks. The only relation that holds between any pair of categories is that of “greater than” (or more preferred).

INTERVAL SCALE

A measurement scale possessing a constant interval size (distance) but not a true zero point, is called an interval scale. Temperature measured on either the Celsius or the Fahrenheit scale is an outstanding example of interval scale because the same difference exists between 20o C (68o F) and 30o C (86o F) as between 5o C (41o F) and 15o C (59o F). It cannot be said that a temperature of 40 degrees is twice as hot as a temperature of 20 degree, i.e. the ratio 40/20 has no meaning. The arithmetic operation of addition, subtraction, etc. is meaningful.

RATIO SCALE

It is a special kind of an interval scale where the sale of measurement has a true zero point as its origin. The ratio scale is used to measure weight, volume, distance, money, etc. The, key to differentiating interval and ratio scale is that the zero point is meaningful for ratio scale.

ERRORS OF MEASUREMENT

Experience has shown that a continuous variable can never be measured with perfect fineness because of certain habits and practices, methods of measurements, instruments used, etc. the measurements are thus always recorded correct to the nearest units and hence are of limited accuracy. The actual or true values are, however, assumed to exist. For  example, if a student’s weight is recorded as 60 kg (correct to the nearest kilogram), his true weight in fact lies between  59.5 kg and 60.5 kg, whereas a weight recorded as 60.00 kg means the true weight is known to lie between 59.995 and 60.005 kg. Thus there is a difference, however small it may be between the measured value and the true value. This sort of departure from the true value is technically known as the error of measurement. In other words, if the observed value and the true value of a variable are denoted by x and x + e respectively, then the difference (x + e) – x, i.e. e is the error. This error involves the unit of measurement of x and is therefore called an absolute error. An absolute error divided by the true value is called the relative error. Thus the relative error =e/e+x , which when multiplied by 100,
is percentage error. These errors are independent of the units of measurement of x. It ought to be noted that an error has both magnitude and direction and that the word error in statistics does not mean mistake which is a chance inaccuracy.

BIASED AND RANDOM ERRORS

An error is said to be biased when the observed value is consistently and constantly higher or lower than the true value. Biased errors arise from the personal limitations of the observer, the imperfection in the instruments used or some other conditions which control the measurements. These errors are not revealed by repeating the measurements. They are cumulative in nature, that is, the greater the number of measurements, the greater would be the magnitude of error. They are thus more troublesome. These errors are also called cumulative or systematic errors. An error, on the other hand, is said to be unbiased when the deviations, i.e. the excesses and defects, from the true value tend to occur equally often. Unbiased errors and revealed when measurements are repeated and they tend to cancel out in the long run. These errors are therefore compensating and are also known as random errors or accidental errors.

OBSERVATIONS AND VARIABLES


In statistics, an observation often means any sort of numerical recording of information, whether it is a physical measurement such as height or weight; a classification such as heads or tails, or an answer to a question such as yes or no.

VARIABLES:

A characteristic that varies with an individual or an object is called a variable. For example, age is a variable as it varies from person to person. A variable can assume a number of values. The given set of all possible values from which the variable takes on a value is called its Domain. If for a given problem, the domain of a variable contains only one value, then the variable is referred to as a constant.

QUANTITATIVE AND QUALITATIVE VARIABLES:

Variables may be classified into quantitative and qualitative according to the form of the characteristic of interest. A variable is called a quantitative variable when a characteristic can be expressed numerically such as age, weight, income or number of children. On the other hand, if the characteristic is non-numerical such as education, sex, eyecolour, quality, intelligence, poverty, satisfaction, etc. the variable is referred to as a qualitative variable. A qualitative characteristic is also called an attribute. An individual or an object with such a characteristic can be counted or enumerated after having been assigned to one of the several mutually exclusive classes or categories.

DISCRETE AND CONTINUOUS VARIABLES:

A quantitative variable may be classified as discrete or continuous. A discrete variable is one that can take only a discrete set of integers or whole numbers, which is the values, are taken by jumps or breaks. A discrete variable represents count data such as the number of persons in a family, the number of rooms in a house, the number of deaths in an accident, the income of an individual, etc.
A variable is called a continuous variable if it can take on any value-fractional or integral––within a given
interval, i.e. its domain is an interval with all possible values without gaps. A continuous variable represents
measurement data such as the age of a person, the height of a plant, the weight of a commodity, the temperature at a place, etc.
A variable whether countable or measurable, is generally denoted by some symbol such as X or Y and Xi or Xj represents the ith or jth value of the variable. The subscript i or j is replaced by a number such as 1,2,3, … when referred to a particular value.

THE MEANING OF DATA


The word “data” appears in many contexts and frequently is used in ordinary conversation. Although the word carries
something of an aura of scientific mystique, its meaning is quite simple and mundane. It is Latin for “those that are
given” (the singular form is “datum”). Data may therefore be thought of as the results of observation.

EXAMPLES OF DATA


  • Data are collected in many aspects of everyday life.
  • Statements given to a police officer or physician or psychologist during an interview are data.
  • So are the correct and incorrect answers given by a student on a final examination.
  • Almost any athletic event produces data.
  • The time required by a runner to complete a marathon,
  • The number of errors committed by a baseball team in nine innings of play.
  • And, of course, data are obtained in the course of scientific inquiry:
  • the positions of artifacts and fossils in an archaeological site,
  • The number of interactions between two members of an animal colony during a period of observation,
  • The spectral composition of light emitted by a star.


WAY OF STATISTICS WORKS

As it is such an important area of knowledge, it is definitely useful to have a fairly good idea about the way in which it
works, and this is exactly the purpose of this introductory course.
The following points indicate some of the main functions of this science:

  •  Statistics assists in summarizing the larger set of data in a form that is easily understandable.
  • Statistics assists in the efficient design of laboratory and field experiments as well as surveys.
  • Statistics assists in a sound and effective planning in any field of inquiry.
  • Statistics assists in drawing general conclusions and in making predictions of how much of a thing will happen under given conditions.

IMPORTANCE OF STATISTICS IN VARIOUS FIELDS:

As stated earlier, Statistics is a discipline that has finds application in the most diverse fields of activity. It is perhaps a
subject that should be used by everybody. Statistical techniques being powerful tools for analyzing numerical data are
used in almost every branch of learning. In all areas, statistical techniques are being increasingly used, and are
developing very rapidly.

  •  A modern administrator whether in public or private sector leans on statistical data to provide a factual basis for decision.
  • A politician uses statistics advantageously to lend support and credence to his arguments while elucidating the problems he handles.
  • A businessman, an industrial and a research worker all employ statistical methods in their work. Banks, Insurance companies and Government all have their statistics departments.
  • A social scientist uses statistical methods in various areas of socio-economic life a nation. It is sometimes said that “a social scientist without an adequate understanding of statistics, is often like the blind man groping in a dark room for a black cat that is not there”.

WHAT IS STATISTICS?

  • That science which enables us to draw conclusions about various phenomena on the basis of real datacollected on sample-basis
  • A tool for data-based research
  • Also known as Quantitative Analysis
  • A lot of application in a wide variety of disciplines Agriculture, Anthropology, Astronomy, Biology,Economic, Engineering, Environment, Geology, Genetics, Medicine, Physics, Psychology, Sociology,Zoology …. Virtually every single subject from Anthropology to Zoology …. A to Z!
  • Any scientific enquiry in which you would like to base your conclusions and decisions on real-life data, youneed to employ statistical techniques!
  • Now a day, in the developed countries of the world, there is an active movement for of Statistical Literacy.

THE NATURE OF THIS DISCIPLINE DESCRIPTIVE STATISTICS
PROBABILITY INFERENTIAL STATISTICS 

MEANINGS OF ‘STATISTICS’:
The word “Statistics” which comes from the Latin words status, meaning a political state, originally meant information useful to the state, for example, information about the sizes of population sand armed forces. But this word has now acquired different meanings.

  • In the first place, the word statistics refers to “numerical facts systematically arranged”. In this sense, the word statistics is always used in plural. We have, for instance, statistics of prices, statistics of road accidents,statistics of crimes, statistics of births, statistics of educational institutions, etc. In all these examples, the word statistics denotes a set of numerical data in the respective fields. This is the meaning the man in the street gives to the word Statistics and most people usually use the word data instead.
  • In the second place, the word statistics is defined as a discipline that includes procedures and techniques used to collect process and analyze numerical data to make inferences and to research decisions in the face of uncertainty. It should of course be borne in mind that uncertainty does not imply ignorance but it refers to the incompleteness and the instability of data available. In this sense, the word statistics is used in the singular. As it embodies more of less all stages of the general process of learning, sometimes called scientific method, statistics is characterized as a science. Thus the word statistics used in the plural refers to a set of numerical information and in the singular, denotes the science of basing decision on numerical data. It should be noted that statistics as a subject is mathematical in character.
  •  Thirdly, the word statistics are numerical quantities calculated from sample observations; a single quantity that has been so collected is called a statistic. The mean of a sample for instance is a statistic. The word statistics is plural when used in this sense.

CHARACTERISTICS OF THE SCIENCE OF STATISTICS

Statistics is a discipline in its own right. It would therefore be desirable to know the characteristic features of statistics
in order to appreciate and understand its general nature. Some of its important characteristics are given below:

  •  Statistics deals with the behaviour of aggregates or large groups of data. It has nothing to do with what is happening to a particular individual or object of the aggregate.
  • Statistics deals with aggregates of observations of the same kind rather than isolated figures.
  • Statistics deals with variability that obscures underlying patterns. No two objects in this universe are exactly alike. If they were, there would have been no statistical problem.
  • Statistics deals with uncertainties as every process of getting observations whether controlled or uncontrolled, involves deficiencies or chance variation. That is why we have to talk in terms of probability.
  • Statistics deals with those characteristics or aspects of things which can be described numerically either by counts or by measurements.
  • Statistics deals with those aggregates which are subject to a number of random causes, e.g. the heights of persons are subject to a number of causes such as race, ancestry, age, diet, habits, climate and so forth.
  • Statistical laws are valid on the average or in the long run. There is n guarantee that a certain law will hold in all cases. Statistical inference is therefore made in the face of uncertainty.
  • Statistical results might be misleading the incorrect if sufficient care in collecting, processing and interpreting the data is not exercised or if the statistical data are handled by a person who is not well versed in the subject mater of statistics.

What Is the Standard Bell Curve?

show up throughout statistics. Diverse measurements such as diameters of seeds, lengths of fish fins, scores on the SAT and weights of individual sheets of a ream of paper all form bell curves when they are graphed. The general shape of all of these curves is the same. But all of these curves are different, because it is highly unlikely that any of them share the same mean orstandard deviation. Bell curves with large standard deviations are wide, and bell curves with small standard deviations are skinny. Bell curves with larger means are shifted more to the right than those with smaller means.

An Example

To make this a little more concrete, let’s pretend that we measure the diameters of 500 kernels of corn. Then we record, analyze and graph the data. It is found that the data set is shaped like a bell curve and has a mean of 1.2 cm with a standard deviation of .4 cm. Now suppose that we do the same thing with 500 beans, and we find that they have a mean diameter of .8 cm with a standard deviation of .04 cm.
The bell curves from both of these data sets are plotted above. The red curve corresponds to the corn data and the green curve corresponds to the bean data. As we can see, the centers and spreads of these two curves are different.
These are clearly two different bell curves. They are different because their means and standard deviations don’t match. Since any interesting data sets we come across can have any positive number as a standard deviation, and any number for a mean, we’re really just scratching the surface of an infinite number of bell curves. That’s a lot of curves and far too many to deal with. What’s the solution?

A Very Special Bell Curve

One goal of mathematics is to generalize things whenever possible. Sometimes several individual problems are special cases of a single problem. This situation involving bell curves is a great illustration of that. Rather than deal with an infinite number of bell curves, we can relate all of them to a single curve. This special bell curve is called the standard bell curve or standard normal distribution.
The standard bell curve has mean of zero and standard deviation of one. Any other bell curve can be compared to this standard by means of a straightforward calculation.

Features of the Standard Bell Curve

All of the properties of any bell curve hold for the standard bell curve.
  • The standard bell curve not only has mean of zero, but also median and mode of zero. This is the center of the curve.
  • The standard bell curve shows mirror symmetry at zero. Half of the curve is to the left of zero and half of the curve is to the right. If the curve were folded along a vertical line at zero, both halves would match up perfectly.
  • The standard bell curve follows the 68-95-99.7 rule, which gives us an easy way to estimate the following:
    • Approximately 68% of all of the data is between -1 and 1.
    • Approximately 95% of all the data is between -2 and 2.
    • Approximately 99.7% of the data is between -3 and 3.

Why Do We Care?

At this point we may be asking, “Why bother with a standard bell curve?“ It may seem like a needless complication, but the standard bell curve will be beneficial as we continue on in statistics.
We will find that one type of problem in statistics requires us to find areas underneath portions of any bell curve that we encounter. The bell curve is not a nice shape for areas. It’s not like a rectangle or right triangle that have easy area formulas. Finding areas of parts of a bell curve can be tricky, so hard in fact that we would need to use some calculus. If we don’t standardize our bell curves, we would need do some calculus every time we want to find an area. If we standardize our curves all the work of calculating areas has been done for us.

An Introduction to the Bell Curve

After I give a test in any of my classes, one thing that I like to do is to make a graph of all the scores. I typically write down 10 point ranges such as 60-69, 70-79, and 80-89, then put a tally mark for each test score in that range. Almost every time I do this, a familiar shape emerges. A few students do very well and a few do very poorly. A bunch of scores end up clumped around the mean score. Different tests may result in different means and standard deviations, but the shape of the graph is nearly always the same. This shape is commonly called the bell curve.
Why call it a bell curve? The bell curve gets its name quite simply because its shape resembles that of a bell. These curves appear throughout the study of statistics, and their importance cannot be overemphasized.

What Is a Bell Curve?

To be technical, the kinds of bell curves that we care about the most in statistics are actually called normal probability distributions. For what follows we’ll just assume the bell curves we’re talking about are normal probability distributions. Despite the name “bell curve,” these curves are not defined by their shape. Instead an intimidating looking formulais used as the formal definition for bell curves.
But we really don’t need to worry too much about the formula. The only two numbers that we care about in it are the mean and standard deviation. The bell curve for a given set of data has center located at the mean. This is where the highest point of the curve or “top of the bell“ is located. A data set‘s standard deviation determines how spread out our bell curve is. The larger the standard deviation, the more spread out the curve.

Important Features of a Bell Curve

There are several features of bell curves that are important and distinguishes them from other curves in statistics:
  • A bell curve has one mode, which coincides with the mean and median. This is the center of the curve where it is at its highest.
  • A bell curve is symmetric. If it were folded along a vertical line at the mean, both halves would match perfectly because they are mirror images of each other.
  • A bell curve follows the 68-95-99.7 rule, which provides a convenient way to carry out estimated calculations:
    • Approximately 68% of all of the data lies within one standard deviation of the mean.
    • Approximately 95% of all the data is within two standard deviations of the mean.
    • Approximately 99.7% of the data is within three standard deviations of the mean.

An Example

If we know that a bell curve models our data, we can use the above features of the bell curve to say quite a bit. Going back to the test example, suppose we have 100 students who took a statistics test with mean score of 70 and standard deviation of 10.
The standard deviation is 10. Subtract and add 10 to the mean. This gives us 60 and 80. By the 68-95-99.7 rule we would expect about 68% of 100, or 68 students to score between 60 and 80 on the test.
Two times the standard deviation is 20. If we subtract and add 20 to the mean we have 50 and 90. We would expect about 95% of 100, or 95 students to score between 50 and 90 on the test.
A similar calculation tells us that effectively everyone scored between 40 and 100 on the test.

Uses of the Bell Curve

There are many applications for bell curves. They are important in statistics because they model a wide variety of real world data. As mentioned above, test results are one place where they pop up. Here are some others:
  • Repeated measurements of a piece of equipment
  • Measurements of characteristics in biology
  • Approximating chance events such as flipping a coin several times
  • Heights of students at a particular grade level in a school district

When Not to Use the Bell Curve

Even though there are countless applications of bell curves, it is not appropriate to use in all situations. Some statistical data sets, such as equipment failure or income distributions, have different shapes and are not symmetric. Other times there can be two or more modes, such as when several students do very well and several do very poorly on a test. These applications require the use of other curves that are defined differently than the bell curve. Knowledge about how the set of data in question was obtained can help to determine if a bell curve should be used to represent the data or not.

Z-Score Formula

To convert any bell curve into a standard bell curve, we use the above formula. Let x be any number on our bell curve with mean, denoted by mu, and standard deviation denoted by sigma. The formula produces a z-score on the standard bell curve.

The Normal Distribution or Bell Curve

The normal distribution, commonly known as the bell curve occurs throughout statistics. Here is the equation for all bell curves.

Standard Normal Distribution Table


he table found below is a compilation of areas from the standard normal distribution, more commonly known as the bell curve. The table provides the area of the region located under the bell curve and to the left of a given z score. These areas represent probabilities and have numerous applications throughout statistics.
Anytime that a normal distribution is being used, a table such as this one can be consulted to perform important calculations. If you need help reading the table, begin with the value of your z score. In order to use this particular table, the value should be rounded to the nearest hundredth. Find the appropriate entry in the table by reading down the first column for the ones and tenths places of your number, and along the top row for the hundredths place.
For example, if z=1.67, then you would split this number into 1.67 = 1.6 + .07. The number located in the 1.6 row and .07 column is .953. Thus 95.3% of the area under the bell curve is to the left of z=1.67.
The table may also be used to find the areas to the left of a negative z score. To do this, drop the negative sign and look for the appropriate entry in the table. After locating the area, subtract .5 to adjust for the fact that z is a negative value.

Standard Normal Distribution Table

z0.00.010.020.030.040.050.060.070.080.09
0.0.500.504.508.512.516.520.524.528.532.536
0.1.540.544.548.552.556.560.564.568.571.575
0.2.580.583.587.591.595.599.603.606.610.614
0.3.618.622.626.630.633.637.641.644.648.652
0.4.655.659.663.666.670.674.677.681.684.688
0.5.692.695.699.702.705.709.712.716.719.722
0.6.726.729.732.736.740.742.745.749.752.755
0.7.758.761.764.767.770.773.776.779.782.785
0.8.788.791.794.797.800.802.805.808.811.813
0.9.816.819.821.824.826.829.832.834.837.839
1.0.841.844.846.849.851.853.855.858.850.862
1.1.864.867.869.871.873.875.877.879.881.883
1.2.885.887.889.891.893.894.896.898.900.902
1.3.903.905.907.908.910.912.913.915.916.918
1.4.919.921.922.924.925.927.928.929.931.932
1.5.933.935.936.937.938.939.941.942.943.944
1.6.945.946.947.948.950.951.952.953.954.955
1.7.955.956.957.958.959.960.961.962.963.963
1.8.964.965.966.966.967.968.969.969.970.971
1.9.971.972.973.973.974.974.975.976.976.977
2.0.977.978.978.979.979.980.980.981.981.982
2.1.982.983.983.983.984.984.985.985.985.986
2.2.986.986.987.987.988.988.988.988.989.989
2.3.989.990.990.990.990.991.991.991.991.992
2.4.992.992.992.993.993.993.993.993.993.994
2.5.994.994.994.994.995.995.995.995.995.995
2.6.995.996.996.996.996.996.996.996.996.996
2.7.997.997.997.997.997.997.997.997.997.997

How to Calculate the Margin of Error

Many times political polls and other applications of statistics state their results with a margin of error. It is not uncommon to see that an opinion poll states that there is support for an issue or candidate at a certain percentage of respondents, plus and minus a certain percentage. It is this plus and minus term that is the margin of error. But how is the margin of error calculated? For a simple random sample of a sufficiently large population, the margin or error is really just a restatement of the size of the sample and the level of confidence being used.

The Formula for the Margin of Error

In what follows we will utilize the formula for the margin of error. We will plan for the worst case possible, in which we have no idea what the true level of support is the issues in our poll. If we did have some idea about this number , possibly through previous polling data, we would end up with a smaller margin of error.
The formula we will use is: E = zα/2/(2√ n)

The Level of Confidence

The first piece of information we need to calculate the margin of error is to determine what level of confidence we desire. This number can be any percentage less than 100%, but the most common levels of confidence are 90%, 95%, and 99%. Of these three the 95% level is used most frequently.
If we subtract the level of confidence from one, then we will obtain the value of alpha, written as α, needed for the formula.

The Critical Value

The next step in calculating the margin or error is to find the appropriate critical value. This is indicated by the term zα/2 in the above formula. Since we have assumed a simple random sample with a large population, we can use the standard normal distribution of z-scores.
Suppose that we are working with a 95% level of confidence. We want to look up the z-scorez*for which the area between -z* and z* is 0.95. From the table we see that this critical value is 1.96.
We could have also found the critical value in the following way. If we think in terms of α/2, since α = 1 - 0.95 = 0.05, we see that α/2 = 0.025. We now search the table to find the z-score with an area of 0.025 to its right. We would end up with the same critical value of 1.96.
Other levels of confidence will give us different critical values. The greater the level of confidence, the higher the critical value will be. The critical value for a 90% level of confidence, with corresponding α value of 0.10, is 1.64. The critical value for a 99% level of confidence, with corresponding α value of 0.01, is 2.54.

Sample Size

The only other number that we need to use in the formula to calculate the margin of error is the sample size, denoted by n in the formula. We then take the square root of this number.
Due to the location of this number in the above formula, the larger the sample size that we use, the smaller the margin of error will be. Large samples are therefore preferable to smaller ones. However, since statistical sampling requires resources of time and money, there are constraints to how much we can increase the sample size. The presence of the square root in the formula means that quadrupling the sample size will only half the margin of error.

A Few Examples

To make sense of the formula, let’s look at a couple of examples.
  1. What is the margin of error for a simple random sample of 900 people at a 95% level of confidence?
  2. By use of the table we have a critical value of 1.96, and so the margin of error is 1.96/(2 √ 900 = 0.03267, or about 3.3%.
  3. What is the margin of error for a simple random sample of 1600 people at a 95% level of confidence?
  4. At the same level of confidence as the first example, increasing the sample size to 1600 gives us a margin of error of 0.0245, or about 2.5%.

Statistics and Political Polls

At any given time throughout a political campaign the media may want to know what the public at large thinks about policies or candidates. One solution would be to ask everyone who they would vote for. This would be costly, time consuming and infeasible. Another way to determine voter preference is to use a statistical sample. Rather than ask every voter to state his or her preference in candidates, polling research companies poll a relatively small number of people who their favorite candidate is. The members of the statistical sample help to determine the preferences of the entire population. There are good polls and not so good polls, so it is important to ask the following questions when reading any results.

Who Was Polled?

A candidate makes his or her appeal to the voters because the voters are the ones who cast ballots. Consider the following groups of people:
  • Adults
  • Registered voters
  • Likely voters
To discern the mood of the public any of these groups may be sampled. However, if the intent of the poll is to predict the winner of an election, the sample should be comprised of registered voters or likely voters.
The political composition of the sample sometimes plays a roll in interpreting poll results. A sample consisting entirely of registered Republicans would not be good if someone wanted to ask a question about the electorate at large. Since the electorate rarely breaks into 50% registered Republicans and 50% registered Democrats, even this type of sample may not be the best to use.

When Was the Poll Conducted?

Politics can be fast paced. Within a matter of days an issue arises, alters the political landscape, then is forgotten by most when some new issue surfaces. What people were talking about on Monday sometimes seems to be a distant memory when Friday comes. News runs faster than ever, however good polling takes time to conduct. Major events can take several days to show up in poll results. The dates when a poll was conducted should be noted to determine if current events have had time to affect the numbers of the poll.

What Methods Were Used?

Suppose that Congress is considering a bill that deals with gun control. Read the following two scenarios and ask which is more likely to accurately determine the public sentiment.
  • A blog asks its readers to click on a box to show their support of the bill. A total of 5000 participate and there is overwhelming rejection of the bill.
  • A polling firm randomly calls 1000 registered voters and asks them about their support of the bill. The firm finds that their respondents are more or less evenly split for and against the bill.
Although the first poll has more respondents, they are self-selected. It is likely that the people who would participate are those who have strong opinions. It could even be that the readers of the blog are very like-minded in their opinions (perhaps it is a blog about hunting). The second sample is random, and an independent party has selected the sample. Even though the first poll has a larger sample size, the second sample would be better.

How Large Is the Sample?

As the discussion above shows, a poll with a larger sample size is not necessarily the better poll. On the other hand, a sample size may be too small to state anything meaningful about public opinion. A random sample of 20 likely voters is too small to determine the direction that the entire U.S. population is leaning on an issue. But how large should the sample be?
Associated with the size of the sample is the margin of error. The larger the sample size, thesmaller the margin of error. Surprisingly, sample sizes as small as 1000 to 2000 are typically used for polls such as Presidential approval, whose margin of error is within a couple of percentage points. The margin of error could be made as small as desired by using a larger sample, however this would require a higher cost to conduct the poll.

Bringing It All Together

The answers to the above questions should help in assessing the accuracy of results in political polls. Not all polls are created equally. Often details are buried in footnotes or omitted entirely in news articles that quote the poll. Be informed on how a poll was designed.