In this article, we will look at a top 500 songs of all time dataset, and using python we will display data from it
Code
First of all, we import the modules we need, and we then read in the Top500Songs.csv which you can download yourself at the bottom of this article
The next step is to have a look at the data we are going to check for null values and there are a couple of columns that I would like to modify
The columns are called title, description, appears on, artist, writers, producer, released, streak, position
# checking null values if exist df.isnull().sum() # modify a few columns df['position']=df['position'].str[3:] df['Year']=df['released'].str[-4:] df['streaklg']=df['streak'].str[:-6] print(df.head())
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df=pd.read_csv("Top500Songs.csv",sep=",", encoding='Latin-1') print(df.head()) print(df.tail()) # checking null values if exist df.isnull().sum() # modify a few columns df['position']=df['position'].str[3:] df['Year']=df['released'].str[-4:] df['streaklg']=df['streak'].str[:-6] print(df.head())
Now for some examples
We will show data for Top songs, top artists, top writers, and top producers
We will print out the data and then display a graph for each
The 1 Artist is : The Beatles
The 2 Artist is : Bob Dylan
The 3 Artist is : Elvis Presley
The 4 Artist is : The Rolling Stones
The 5 Artist is : U2
The 6 Artist is : The Beach Boys
The 7 Artist is : Led Zeppelin
The 8 Artist is : James Brown
The 9 Artist is : The Jimi Hendrix Experience
The 10 Artist is : Chuck Berry
The 1 Writer is : John Lennon, Paul McCartney
The 2 Writer is : Dylan
The 3 Writer is : Mick Jagger, Keith Richards
The 4 Writer is : Bono, the Edge, Adam Clayton, Larry Mullen Jr.
The 5 Writer is : Springsteen
The 6 Writer is : Prince
The 7 Writer is : John Fogerty
The 8 Writer is : Berry
The 9 Writer is : Mick Jones, Joe Strummer
The 10 Writer is : Wonder
The 1 Producer is : George Martin
The 2 Producer is : Wilson
The 3 Producer is : Steve Sholes
The 4 Producer is : Bob Johnston
The 5 Producer is : Jimmy Miller
The 6 Producer is : Jerry Wexler
The 7 Producer is : Leonard and Phil Chess
The 8 Producer is : Sam Phillips
The 9 Producer is : Brown
The 10 Producer is : Fogerty
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df=pd.read_csv("Top500Songs.csv",sep=",", encoding='Latin-1') print(df.head()) print(df.tail()) # checking null values if exist df.isnull().sum() # modify a few columns df['position']=df['position'].str[3:] df['Year']=df['released'].str[-4:] df['streaklg']=df['streak'].str[:-6] print(df.head()) #top songs df_top=df.groupby('Year').count().reset_index() df_top plt.figure(figsize=(20,10)) plt.bar(df_top['Year'],df_top['title']) plt.xticks(rotation = 'vertical') plt.xlabel('Years') plt.ylabel('Recurrences') plt.title('Numbers of Top Song By Year') plt.show() #top artists df_artists =df.groupby('artist').count().sort_values(by='description',ascending = False).reset_index() df_artists = df_artists[:10] plt.figure(figsize=(15,8)) plt.pie(df_artists['title'],labels = df_artists['artist'],autopct='%.2f',shadow=True) plt.title("Top Artists") j=0 for i in df_artists['artist']: print("The",j+1,"Artist is :",df_artists['artist'][j]) j=j+1 plt.show() #top writers df_writers=df.groupby('writers').count().sort_values(by='description',ascending = False).reset_index() df_writers = df_writers[:10] plt.figure(figsize=(15,8)) plt.pie(df_writers['title'],labels = df_writers['writers'], autopct = "%.2f",shadow = True) plt.title("Top Writers") j=0 for i in df_writers['writers']: print("The",j+1,"Writer is :",df_writers['writers'][j]) j=j+1 plt.show() #top producers df_producers = df.groupby('producer').count().sort_values(by='description',ascending = False).reset_index() df_producers = df_producers[:10] plt.figure(figsize=(15,8)) plt.pie(df_producers['title'],labels = df_producers['producer'],autopct = '%.2f',shadow = True) plt.title("Top Producers") j=0 for i in df_producers['producer']: print("The",j+1,"Producer is :",df_producers['producer'][j]) j=j+1 plt.show()
Links
You can download the dataset and python example from
https://github.com/programmershelp/maxpython/tree/main/Data%20Analysis/top500songs