Lesson 10, Topic 3
In Progress

Topic 12.3: Data Frame

Lesson Progress
0% Complete

 DataFrame is the widely used data structure of pandas. Note that, Series are used to work with one dimensional array, whereas DataFrame can be used with two dimensional arrays. DataFrame has two different indexes i.e. column-index and row-index. The most common way to create a DataFrame is by using the dictionary of equal-length list as shown below. Further, all the spreadsheets and text files are read as DataFrame, therefore it is very important data structure of pandas.

data={‘name’:[‘Google’,’Facebook’,’IBM’,’Microsoft’]

          ‘Date’:[’19-March-2022’, ’18-March-2022’, ’17-March-2022’, ’16-March-2022’]

       ‘Share’:[‘70%’,’15%’,’2%’,’13%’]

      ‘rank’:[1,2,3,4]

df=pd.DataFrame(data)

df

Indexing and Slicing

df[“name”]

df[[“name”,”rank’]]

     df.iloc[0:2]

     df.iloc[0:2,0:2]

   df.loc[0:2,’shares’]

 df.loc[0,’rank’]

Reading data from various sources

In this section, two data files are used i.e. ‘titles.csv’ and ‘cast.csv’. The ‘titles.csv’ file contains the list of movies with the releasing year; whereas ‘cast.csv’ file has five columns which store the title of movie, releasing year, star-casts, type(actor/actress), characters and ratings for actors, as shown below

cast=pd.read_csv(‘data//cast.csv)

cast

read_csv : load the data from the csv file.

 index_col = None : there is no index i.e. first column is data

head() : show only first five elements of the DataFrame

 tail() : show only last five elements of the DataFrame If there is some error while reading the file due to encoding, then try for following option as well,

Note: head() and tail() commands can be used to remind ourselves about the header and contents of the file.

 These two commands will show the first and last 5 lines respectively of the file. Further, we can change the total number of lines to be displayed by these commands,

cast.head(10)

cast.tail(10)

Data Frame properties

dtype: Return the dtypes in the DataFrame.

 ndinm: Return an int representing the number of axes / array dimensions.

 shape: Return a tuple representing the dimensionality of the DataFrame.

 size: Return an int representing the number of elements in this object.

Dataframe functions

Info:gives information about data

cast.info()

Describe:

      describe data in detail with

count

mean

sdv

min

max

25%

75%

50%

cars.describe()

ISNull():

Isnull() check null values and sum() will perform addition of it in respective column

cars.isnull().sum()

Unique():-Give unique value from column

cars[“engine-type”]. unique( )

Value_counts():-Will count how many time that value is repeat in data

cars[“body-style”]. value_counts( )

Mathematical Operations:-

print(cars[“price”].sum( ))

print(cars[“price”].mean( ))

print(cars[“price”].std( ))

print(cars[“price”].median( ))

print(cars[“price”].var( ))

print(cars[“price”].max( )) print(cars[“price”].min( ))

This website uses cookies to ensure you get the best experience on our website.