Data Analysis and machine learning notebook: Data analysis of crops production and area in India

Indian is an agriculture based country, where more than 50% of population is depend on agriculture. This structures the main source of income. The commitment of agribusiness in the national income in India is all the more, subsequently, it is said that agriculture in India is a backbone for Indian Economy. The contribution of agriculture in the initial two decades towards the total national output is between 48% and 60%. In the year 2001-2002, this contribution declined to just around 26%.
This project is done to make data analysis of different crop production in India. State wise contribution of different crop to Indian market. Area and production ratio for different crop in India. Season wise crop production etc.
Indian economy is based on mainly agriculture. Different crops produce every year. The quantity of production is in lakhs of tons. And, production area is also very huge. To summarize this huge field of area is the main problem statement of our project. India has 29 states and 720 districts. To evaluate crops production of different state and district is one of the problem statements.
The objectives can be summarized as:
 Finding how much different crops produce per year in India.
 Evaluate state wise different crops production.
 Evaluate district wise different crops production.
 Evaluate season wise different crops production.
 Evaluate production growth of different crops in India per year.
 Analysis of the ratio of crop production and area for cultivation.
 Creating user input field to show analysis of crop production of state, district, season in India.

Methodology used is described below:
 Creating environment for python coding using Anaconda Navigator.
 Opening Jupyter Notebook in localhost.
 Importing different helpful packages for the analysis in Jupyter Notebook.
Ex. Pandas, Numpy,matplotlib.pyplot, seaborn etc.
 Reading the data and binding it into a data frame.
 Cleaning the data and introducing factors and omitting all the NA values.
 Extracting important fields from the data and removing the redundant data.
 Grouping by different columns of data as required evaluating problem statement.
 Getting the groups for different columns as mentioned in problem statement.
 Sorting the values production and areas.
 Analysing the fields according to the problem statement.
 Analysing the ratio of production and area for different crop.
 Plotting the analysed data.
 Plotting consists of parameter like kind, figsize, logy, color, grid, stacked etc.
 Different kind of plotting is done. Ex: Bar plot, Barh plot, line plot etc.
 Denoting x- axis and y-axis for different plot is done.
 Creating user input field for their desired data of different state, district etc.

The dataset to be used is described briefly.
We are using dataset of Indian Crop production which is taken from Government of India. It is in .csv file format.
We are using JupyterNotebook and Python 3.0 language for the coding. Anaconda Navigator is used to create the environment for the code to compile and run.
The dataset is collected from data.gov.in.
This file consists of different columns. Columns are described below:
1. State_Name: It represents the state name of India.
2. District_Name: It gives the different districts name of Indian states.
3. Crop_Year: It represents the year of production of different crop.
4. Season: It represents the season of production of crop.
5. Crop: It represents the name of the crop.
6. Area: It represents the area of production of the crop. Unit of this column is in Hector.
7. Production: It represents the production of different crop. Unit of this column is in ton.

Given below is a code for data visualization by query processing using python.

#IMPORTING IMPORTANT LIBRARIES
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
%matplotlib inline

#DATA PREPROCESSING
crops_prod_data = pd.read_csv("/Users/Acer/agricultural/apy.csv")
crops_prod_data['Season'] = crops_prod_data['Season'].str.rstrip()
crops_prod_data['Crop_Year']=crops_prod_data['Crop_Year'].astype(str)

#FUNCTIONS DEFINATION
def overall():
cultivation_data = crops_prod_data[['Crop_Year', 'Crop', 'Area', 'Production']]
y=input("Enter the name of the crop : ")
cultivation_data=cultivation_data.groupby('Crop').get_group(y)
cultivation_data=cultivation_data.groupby('Crop_Year')[['Production', 'Area']].sum()

print(cultivation_data)
print("Bar plot of the above data")
cultivation_data.dropna().plot(kind='bar', figsize=(20,10), logy=True, color=['dodgerblue', 'aqua'])
print("Line plot of the above data")
cultivation_data.dropna().plot(figsize=(20,10), logy=True, color=['dodgerblue', 'aqua'], linestyle='solid', marker='o', alpha=0.8, markersize=8)

def statewise():
x=input("Enter the name of the state : ")
y=input("Enter the name of the crop : ")
cultivation_data = crops_prod_data[['State_Name', 'Crop_Year', 'Crop', 'Area', 'Production']]

cultivation_data=cultivation_data.groupby('State_Name').get_group(x)
cultivation_data=cultivation_data.groupby('Crop').get_group(y)
cultivation_data=cultivation_data.groupby('Crop_Year')[['Production', 'Area']].sum()

print(cultivation_data)
print("Bar plot of the above data")
cultivation_data.dropna().plot(kind='bar', figsize=(20,10), logy=True, color=['dodgerblue', 'aqua'])
print("Line plot of the above data")
cultivation_data.dropna().plot(figsize=(20,10), logy=True, color=['dodgerblue', 'aqua'], linestyle='solid', marker='o', alpha=0.8, markersize=8)

def districtwise():
x=input("Enter the name of the state : ")
z=input("Enter the name of the district : ")
y=input("Enter the name of the crop : ")
cultivation_data = crops_prod_data[['State_Name','District_Name', 'Crop_Year', 'Crop', 'Area', 'Production']]

cultivation_data=cultivation_data.groupby('State_Name').get_group(x)
cultivation_data=cultivation_data.groupby('Crop').get_group(y)
cultivation_data=cultivation_data.groupby('District_Name').get_group(z)
cultivation_data=cultivation_data.groupby('Crop_Year')[['Production', 'Area']].sum()

print(cultivation_data)
print("Bar plot of the above data")
cultivation_data.dropna().plot(kind='bar', figsize=(20,10), logy=True, color=['dodgerblue', 'aqua'])
print("Line plot of the above data")
cultivation_data.dropna().plot(figsize=(20,10), logy=True, color=['dodgerblue', 'aqua'], linestyle='solid', marker='o', alpha=0.8, markersize=8)

def seasonwise():
x=input("Enter the name of the season : ")
z=input("Enter the name of the crop : ")
cultivation_data=crops_prod_data[['Season', 'Crop_Year', 'Crop', 'Area', 'Production']]
cultivation_data=cultivation_data.groupby('Season').get_group(x)
cultivation_data=cultivation_data.groupby('Crop').get_group(z)
cultivation_data=cultivation_data.groupby('Crop_Year')[['Production', 'Area']].sum()

print(cultivation_data)
print("Bar plot of the above data")
cultivation_data.dropna().plot(kind='bar', figsize=(20,10), logy=True, color=['dodgerblue', 'aqua'])
print("Line plot of the above data")
cultivation_data.dropna().plot(figsize=(20,10), logy=True, color=['dodgerblue', 'aqua'], linestyle='solid', marker='o', alpha=0.8, markersize=8)

def comparingstate():
x=input("Enter the name of the state : ")
y=input("Enter the name of the crop : ")
cultivation_data = crops_prod_data[['State_Name', 'Crop_Year', 'Crop', 'Area', 'Production']]

cultivation_data=cultivation_data.groupby('State_Name').get_group(x)
cultivation_data=cultivation_data.groupby('Crop').get_group(y)
cultivation_data=cultivation_data.groupby('Crop_Year')[['Production', 'Area']].sum()

a=input("Enter the name of the state : ")
b=input("Enter the name of the crop : ")
cultivation_data1 = crops_prod_data[['State_Name', 'Crop_Year', 'Crop', 'Area', 'Production']]

cultivation_data1=cultivation_data1.groupby('State_Name').get_group(a)
cultivation_data1=cultivation_data1.groupby('Crop').get_group(b)
cultivation_data1=cultivation_data1.groupby('Crop_Year')[['Production', 'Area']].sum()

cultivation_data.rename(columns={'Production':'Production_of_state1'}, inplace=True)
cultivation_data.rename(columns={'Area':'Area_of_state1'}, inplace=True)
cultivation_data1.rename(columns={'Production':'Production_of_state2'}, inplace=True)
cultivation_data1.rename(columns={'Area':'Area_of_state2'}, inplace=True)
df12=pd.concat([cultivation_data, cultivation_data1], axis=1)

print(df12)
ax=df12.plot(kind='bar', figsize=(20,20), color=['darkorange', 'forestgreen', 'darkgoldenrod', 'limegreen'], grid=True)
print("COMPARISON OF PRODUCTION")
ax=df12.plot(y=['Production_of_state1', 'Production_of_state2'], figsize=(20,10), color=['darkorange', 'darkgoldenrod'], grid=True, linestyle='solid', marker='o', alpha=0.8, markersize=8)
print("COMPARISON OF AREA")
ax=df12.plot(y=['Area_of_state1', 'Area_of_state2'], figsize=(20,10), color=['forestgreen', 'limegreen'], grid=True, linestyle='solid', marker='o', alpha=0.8, markersize=8)

def comparingdistrict():
x=input("Enter the name of the state : ")
z=input("Enter the name of the district : ")
y=input("Enter the name of the crop : ")
cultivation_data = crops_prod_data[['State_Name', 'District_Name', 'Crop_Year', 'Crop', 'Area', 'Production']]

cultivation_data=cultivation_data.groupby('State_Name').get_group(x)
cultivation_data=cultivation_data.groupby('District_Name').get_group(z)
cultivation_data=cultivation_data.groupby('Crop').get_group(y)
cultivation_data=cultivation_data.groupby('Crop_Year')[['Production', 'Area']].sum()

a=input("Enter the name of the state : ")
c=input("Enter the name of the district : ")
b=input("Enter the name of the crop : ")
cultivation_data1 = crops_prod_data[['State_Name', 'District_Name', 'Crop_Year', 'Crop', 'Area', 'Production']]

cultivation_data1=cultivation_data1.groupby('State_Name').get_group(a)
cultivation_data1=cultivation_data1.groupby('District_Name').get_group(c)
cultivation_data1=cultivation_data1.groupby('Crop').get_group(b)
cultivation_data1=cultivation_data1.groupby('Crop_Year')[['Production', 'Area']].sum()

cultivation_data.rename(columns={'Production':'Production_of_district1'}, inplace=True)
cultivation_data.rename(columns={'Area':'Area_of_district1'}, inplace=True)
cultivation_data1.rename(columns={'Production':'Production_of_district2'}, inplace=True)
cultivation_data1.rename(columns={'Area':'Area_of_district2'}, inplace=True)
df12=pd.concat([cultivation_data, cultivation_data1], axis=1)

print(df12)
ax=df12.plot(kind='bar', figsize=(20,20), color=['darkorange', 'forestgreen', 'darkgoldenrod', 'limegreen'], grid=True)
print("COMPARISON OF PRODUCTION")
ax=df12.plot(y=['Production_of_district1', 'Production_of_district2'], figsize=(20,10), color=['darkorange', 'darkgoldenrod'], grid=True, linestyle='solid', marker='o', alpha=0.8, markersize=8)
print("COMPARISON OF AREA")
ax=df12.plot(y=['Area_of_district1', 'Area_of_district2'], figsize=(20,10), color=['forestgreen', 'limegreen'], grid=True, linestyle='solid', marker='o', alpha=0.8, markersize=8)

def production_of_state():

x=input("enter the name of the year : ")
y=input("enter the name of the crop : ")

dd1=crops_prod_data[['State_Name', 'Crop_Year', 'Crop', 'Area', 'Production']]

dd1=dd1.groupby('Crop_Year').get_group('2010')
dd1=dd1.groupby('Crop').get_group('Rice')
dd1=dd1.groupby('State_Name')[['Production', 'Area']].sum()
dd1=dd1.sort_values(by='Production', ascending=False)
print(dd1)
print("Pie chart plot of the data for top ten states")
dd1[:10].plot(kind='pie', y='Production', figsize=(10,10), autopct='%1.1f%%')
dd1[:10].plot(kind='pie', y='Area', figsize=(10,10), autopct='%1.1f%%')
print('Joint plot showing the about the ratio')

def production_of_district():

x=input("enter the name of the year : ")
y=input("enter the name of the crop : ")
z=input("enter the name of the state : ")

dd1=crops_prod_data[['State_Name', 'District_Name', 'Crop_Year', 'Crop', 'Area', 'Production']]

dd1=dd1.groupby('Crop_Year').get_group(x)
dd1=dd1.groupby('Crop').get_group(y)
dd1=dd1.groupby('State_Name').get_group(z)
dd1=dd1.groupby('District_Name')[['Production', 'Area']].sum()
dd1=dd1.sort_values(by='Production', ascending=False)
print(dd1)
print("Pie chart plot of the data for top ten states")
dd1[:10].plot(kind='pie', y='Production', figsize=(10,10), autopct='%1.1f%%')
dd1[:10].plot(kind='pie', y='Area', figsize=(10,10), autopct='%1.1f%%')

print("CROPS PRODUCTION STATISTICS OF INDIA")
print("")
print("")
print("")
print("1. India's overall crops production")
print("2. Year wise different crops production and area for a particular state")
print("3. Year wise different crops production and area for a particular district")
print("4. Season wise different crops production and area")
print("5. Comparing a specific crop production for two different state")
print("6. Comparing a specific crop production for two different district")
print("7. Top ten states and their production and area for a crop in pie chart")
print("8. Top ten district of a state and their production and area for a crop")
option=input("Choose any one of the options : ")

if option=='1':
overall()
elif option=='2':
statewise()
elif option=='3':
districtwise()
elif option=='4':
seasonwise()
elif option=='5':
comparingstate()
elif option=='6':
comparingdistrict()
elif option=='7':
production_of_state()
elif option=='8':
production_of_district()

else:
print("Invalid input")

Data Analysis and machine learning notebook

Blog Archive

Tuesday, 11 December 2018

Data analysis of crops production and area in India

No comments:

Post a Comment