whichanimenext?

Building whichanimenext? provided me with some practical experience creating and processing datasets before using models on them to be served as APIs.

The tool returns 5 similar Anime based on a title inputted by the user. It provides similarity match ratings which have been generated from a sample of MyAnimeList user ratings.

The backend is an API written in Python, which serves a frontend written in React.

How the recommendations work

The recommendations work through a Cosine Similarity model which is generated from MyAnimeList user ratings based on Anime titles. It's a somewhat primitive way of generating similarity, because it does not take factors such as genre or other users' taste (alike to user-user recommendation) into account.

The quality of recommendations is not particularly high (for example, try searching for 'One Piece' - a popular series that yields low similarity results), especially for smaller titles as there is less data available for them. Another problem would be that an inputted Anime might return later seasons of the same Anime, which is not particularly helpful to a user.

The datasets

This project uses two datasets: the match ratings as generated by the Cosine Similarity model, and another dataset which holds other information on Anime such as the title, years aired and thumbnail URL (for use on the frontend).

The original data has been taken from these datasets, which holds information scraped from MyAnimeList.

There was a lot of data which wasn't relevant to the project, so after some processing I was able to reduce the size of the datasets considerably. I was also able to remove some incomplete entries which would be problematic in future if not addressed.

After this was done, I created a pivot table with Anime titles and the scores that users had rated them. From here I created a sparse matrix of the pivot table for easier computation, and then used the Cosine Similarity model from the scikit-learn library. The similarity percentages can now be shown as the values in this dataset multiplied by 100.

For the extra Anime information dataset, I wrote a small Python script which fetched data on all available anime from the Jikan API. This was then put into a dataframe and exported to a .csv file. If you want to do this yourself, please read through all the rules of using the API so that you do not get banned due to improper use.

If you want to see in more detail how I processed and generated the ratings for the recommendation dataset, take a look at this notebook.

FastAPI

To serve the data for each set of recommendations, all I needed was a simple backend that served JSON objects at various endpoints. For this use case, FastAPI was a perfect and lightweight choice. It took a matter of minutes to setup and serve everything to the endpoints I needed.

Having said this, there are probably security issues that I have overlooked - but for a simple project like this I did not deem it necessary to spend too much time thinking about these things (especially with no database or need for authorisation).

Frontend

Consuming the API did not really call for a framework to be used, but I went with React because I'm always looking for opportunities to practice development with it. For styling, I went with Tailwind CSS because of it's fast styling and as this is not a project I plan to maintain for a long period of time.

For the search bar, I used the react-select library, which fetches Anime titles from the API. This way, I don't need to worry too much about input validation or handling them. As there are so many titles to search from, the performance can be slow at times. I implemented a custom filter option for slightly better performance (unfortunately not fuzzy search):

const customFilterOption = (option, rawInput) => {
  const words = rawInput.split(" ");
  return words.reduce(
    (acc, cur) => acc && option.label.toLowerCase().includes(cur.toLowerCase()),
    true
  );
};

Credit for the filter above goes to this GitHub issue.

Otherwise, I used Zustand for some lightweight state management. If you're tired of Redux or similar, I would recommend trying it.

Deployment

I used Heroku to deploy the FastAPI backend. I'm on the free hobby tier, which means that projects go to sleep after 30 minutes to save resources. As the API is needed to be functioning for the frontend to work, I counteracted this sleep behaviour by creating a CRON job to ping the backend at a specific endpoint every 30 minutes.

For the frontend, I deployed on Vercel like normal, with no issues encountered.


In future, I'd like to reconsider the approach for creating recommendations, and also remove different seasons of an Anime (to avoid the problem explained earlier).

Below are the GitHub repositories for this project: