Analyzing a top 500 songs of all time dataset in python

In this article, we will look at a top 500 songs of all time dataset, and using python we will display data from it

Table of Contents

Code

First of all, we import the modules we need, and we then read in the Top500Songs.csv which you can download yourself at the bottom of this article

The next step is to have a look at the data we are going to check for null values and there are a couple of columns that I would like to modify

The columns are called title, description, appears on, artist, writers, producer, released, streak, position

# checking null values if exist

df.isnull().sum()

# modify a few columns

df['position']=df['position'].str[3:]

df['Year']=df['released'].str[-4:]

df['streaklg']=df['streak'].str[:-6]

print(df.head())

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df=pd.read_csv("Top500Songs.csv",sep=",", encoding='Latin-1')

print(df.head())
print(df.tail())
# checking null values if exist
df.isnull().sum()
# modify a few columns
df['position']=df['position'].str[3:]
df['Year']=df['released'].str[-4:]
df['streaklg']=df['streak'].str[:-6]
print(df.head())

Now for some examples

We will show data for Top songs, top artists, top writers, and top producers

We will print out the data and then display a graph for each

The 1 Artist is : The Beatles
The 2 Artist is : Bob Dylan
The 3 Artist is : Elvis Presley
The 4 Artist is : The Rolling Stones
The 5 Artist is : U2
The 6 Artist is : The Beach Boys
The 7 Artist is : Led Zeppelin
The 8 Artist is : James Brown
The 9 Artist is : The Jimi Hendrix Experience
The 10 Artist is : Chuck Berry
The 1 Writer is : John Lennon, Paul McCartney
The 2 Writer is : Dylan
The 3 Writer is : Mick Jagger, Keith Richards
The 4 Writer is : Bono, the Edge, Adam Clayton, Larry Mullen Jr.
The 5 Writer is : Springsteen
The 6 Writer is : Prince
The 7 Writer is : John Fogerty
The 8 Writer is : Berry
The 9 Writer is : Mick Jones, Joe Strummer
The 10 Writer is : Wonder
The 1 Producer is : George Martin
The 2 Producer is : Wilson
The 3 Producer is : Steve Sholes
The 4 Producer is : Bob Johnston
The 5 Producer is : Jimmy Miller
The 6 Producer is : Jerry Wexler
The 7 Producer is : Leonard and Phil Chess
The 8 Producer is : Sam Phillips
The 9 Producer is : Brown
The 10 Producer is : Fogerty

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df=pd.read_csv("Top500Songs.csv",sep=",", encoding='Latin-1')

print(df.head())
print(df.tail())
# checking null values if exist
df.isnull().sum()
# modify a few columns
df['position']=df['position'].str[3:]
df['Year']=df['released'].str[-4:]
df['streaklg']=df['streak'].str[:-6]
print(df.head())

#top songs
df_top=df.groupby('Year').count().reset_index()
df_top
plt.figure(figsize=(20,10))
plt.bar(df_top['Year'],df_top['title'])
plt.xticks(rotation = 'vertical')
plt.xlabel('Years')
plt.ylabel('Recurrences')
plt.title('Numbers of Top Song By Year')
plt.show()

#top artists
df_artists =df.groupby('artist').count().sort_values(by='description',ascending = False).reset_index()
df_artists = df_artists[:10]

plt.figure(figsize=(15,8))
plt.pie(df_artists['title'],labels = df_artists['artist'],autopct='%.2f',shadow=True)
plt.title("Top Artists")

j=0
for i in df_artists['artist']:
    print("The",j+1,"Artist is :",df_artists['artist'][j])
    j=j+1
plt.show()

#top writers
df_writers=df.groupby('writers').count().sort_values(by='description',ascending = False).reset_index()
df_writers = df_writers[:10]
plt.figure(figsize=(15,8))
plt.pie(df_writers['title'],labels = df_writers['writers'], autopct = "%.2f",shadow = True)
plt.title("Top Writers")

j=0
for i in df_writers['writers']:
    print("The",j+1,"Writer is :",df_writers['writers'][j])
    j=j+1
plt.show()

#top producers
df_producers = df.groupby('producer').count().sort_values(by='description',ascending = False).reset_index()
df_producers = df_producers[:10]
plt.figure(figsize=(15,8))
plt.pie(df_producers['title'],labels = df_producers['producer'],autopct = '%.2f',shadow = True)
plt.title("Top Producers")
j=0
for i in df_producers['producer']:
    print("The",j+1,"Producer is :",df_producers['producer'][j])
    j=j+1
plt.show()

Links

You can download the dataset and python example from

https://github.com/programmershelp/maxpython/tree/main/Data%20Analysis/top500songs

Analyzing a top 500 songs of all time dataset in python

Code

Links

Analyzing a world population dataset in python

Analyzing a tech sector layoff dataset in python

You may also like