One of the games that made it into my top 10 was The Elder Scroll Skyrim, making it nostalgic to be able to do a project related to the game. These days I was thinking about doing a project and I ended up having an idea. But I needed to have a set of data, I searched the internet and found some, but I lacked armor.
I searched The Elder Scrolls Wiki for this table of armors . I need to carry out the information in the table. In this simple tutorial, you will extract the table, perform pre-processing and finally add it to Kaggle Datasets.
Extract Dataset Tabular
To perform this step you will need the Pandas library. Pandas has a huge set of features to perform processing, extraction, processing and much more. On the Wiki website there are several tables with armor data, for this I need to use pandas.read_html.
With pd.read_html I can:
- Read an HTML, where I pass the link as a string to the function.
- Use Regular expression
- Extract Links
That and some options that I'm going through, I recommend looking in the documentation to find out others.
Finally, it returns to me all the tables it found, already being Dataframes.
# Import Library Pandas
import pandas as pd
# Read HTML
tables_on_page = pd.read_html("https://elderscrolls.fandom.com/wiki/Armor_(Skyrim)")
# Amount Tables
len(tables_on_page) # output: 16
Each table is a type of armor, so I separated them by the names I found on the website. I won't include all types so as not to accumulate something repetitive in the post. But at the end I will leave the project link so you can check it out and test it too.
headgear_light_armor = tables_on_page[4]
headgear_light_armor
Preprocessing
I did this step individually for each table to add a new column. Where I needed to put the type of armor as a class. At the end I merge the tables, to get just one. As there was no difference between the columns, it was only necessary to use pd.concat. Then I renamed some columns, due to some functionality of the link in these columns it was not possible to return the text. Then I dropped the ID column because I didn't think it was necessary for what I'm creating. Finally I save the file in .csv .
# New Column
headgear_light_armor['type_armor'] = 'Headgear Light Armor'
headgear_light_armor
# Unfiy Datasets
df = pd.concat([gauntlets_heavy_armor, gauntlets_light_armor, boots_heavy_armor, boots_light_armor, cuirasses_heavy_armor, cuirasses_light_armor, headgear_heavy_armor, headgear_light_armor], axis=0)
# Rename Columns
df.rename(columns={'Unnamed: 1':'Armor','Unnamed: 2':'Encumbrance','Unnamed: 3':'Gold'}, inplace=True)
# Drop ID
df.drop(['Item ID'], axis=1, inplace=True)
# Save File
df.to_csv('dataset_armor_skyrim_1.csv', index=False)
Kaggle Datasets
When I was developing this notebook I didn't think about adding it to Kaggle. I wanted to use the result of the set for my project. From this point I thought why not share, I can create a post explaining the process and finally post the dataset for other users to use on their notebooks. So make a contribution. And here we are, let's go to post this dataset.
If you don't know much about Kaggle, there's a post of mine where I talk a little about this incredible tool.
After you have created your account on Kaggle, you will be on the dashboard. Search for Datasets.
Then click on New Dataset.
Now you will get the created dataset, which was created in the notebook. You can drag or open the window to add the file.
Now you need to add a title to your dataset, you can add more files if you need. Choose between the public or private options, finally create the dataset. At this stage it will load your file and then direct you to the dataset page.
Finally, your dataset will be available on Kaggle for you to use in your projects or, if the dataset is public, other users can use it.
The Elder Scrolls Skyrim - Armor
Repository
You can find the project via the link below.
About the author:
A little more about me...
Graduated in Bachelor of Information Systems, in college I had contact with different technologies. Along the way, I took the Artificial Intelligence course, where I had my first contact with machine learning and Python. From this it became my passion to learn about this area. Today I work with machine learning and deep learning developing communication software. Along the way, I created a blog where I create some posts about subjects that I am studying and share them to help other users.
I'm currently learning TensorFlow and Computer Vision
Curiosity: I love coffee