Organization of data in statistics


Organization of data in statistics

of data refers to the systematic arrangement of collected figures (raw data), so that the data becomes easy to understand and more convenient for further statistical treatment .​

Classification is the process of arranging data into sequences and groups according to their common characteristics of separating them in to different but related parts.​

Characteristics of classification.

1. Homogeneity​



4. Flexibility​

5. Diversification

Characteristics of classification

A variable is a characteristic which is capable of being measured and capable of change in its value from time to time.​

Basis of classification: ​

1. Chronological classification: In such a classification data are classified either in ascending or in descending order with reference to time such as years, quarters, months weeks etc.​

​2. Geographical/Spatial classification: The data are classified with reference to geographical location/place such as countries, states , cities, districts, block etc​

.​3. Qualitative classification: Data are classified with reference to descriptive characteristics like sex, caste, religion literacy etc.​

​4. Quantitative classification: Data are classified on the basis of some measurable characteristics such as height, age, weight, income, marks of students.​

​5. conditional classification: When data are classified with respect to condition, the type of classification is called conditional classification.​

A mass of data in its original form is called raw data. It is an unorganized mass of various items.​A characteristic which is capable of being measured and changes its value overtime is called a variable. It is of two type. ​

(a) Discrete ​

(b) Continuous ​

​Discrete: Discrete variable are those variables that increase in jumps or in complete numbers and are not fractional. Ex.-number of student in a class could be 2, 4, 10, 15,, 20, 25, etc. It does not take any fractional value between them.​

Continuous variable: Continuous variables are those variables that can takes any value i.e. integral value or fractional value in a specified interval. Ex- Wages of workers in a factory.​

Different values of a variable is distributed in different classes along with their corresponding class frequencies.​
The class mid-point or class mark is the middle value of a class. It lies halfway between the lower class limit and the upper class limit of a class and can be ascertained in the following manner.​
Class mid-point = upper class limit + lower class limit / 2.​
Class frequency: It means the number of values in a particular class.​
Class width:- It is the difference between the upper class limit and lower class limit​
Class width = upper class Limit – Lower class Limit ​

In comparison to the exclusive method, the inclusive method does not excludes the upper class limit in a class interval. It includes the upper class in a class. Thus both class limits are parts of the class intervals e.g., 0-9, 

Types of series​
1. Individual series​
2. Frequency series​
a. Discrete series Or frequency array​
b. F requency distribution or continuous series​

Individual series
Individual series are those series in which the items are listed singly. For example:​​​

   6    15000
Total income900000
Average income15000

Discrete series or frequency array
A discrete series or frequency array is that series in which data are prescribed in a way that exact measurements of ​​items are clearly shown. The example in following table illustrates a frequency array.​​

TownDistance from town(f)

Continuous series
It is that series in which items cannot be exactly measured. The items assume a range of values​ and are ​placed within the range of limits. In other words, data are classified into different classes with a range, the range is called ​class-intervals.​​

Class interval C INo of student (f)

Organisation of data ppt