Improve existing manga reader website by scraping and modifying the DOM with nodejs
I stumble across a manga reader website called readsnk.com, while reading manga on it, I experience a minor buggy user experience. for example, when I open the website, it loads all the images simultaneously and not displayed sequentially.
Since the images are not loaded sequentially.. the 30th page can be loaded first then the first page loaded at the end. based on this problem I decided to make a clone of this website.
My approach is to scrape the HTML using HTTP request, modify the HTML by adding native browser lazy loading image, and serve the modified HTML to the client (browser). stack I use are nodejs, express & Axios
TL;DR compare improved manga reader vs original
Scraping the page using Axios
Modifying HTML DOM using regex
you can use a dom parser like cheerio in node js, but it’s overkill for a small project like this, as you can see in the image above. I made 4 changes to the DOM.
.replace(/<img/g, "<img loading=\"lazy\" width=\"1600\" height=\"1124\"")
for appending loading=lazy
to all <img/>
tag, you must notice that loading=lazy
will not work unless you set the width
and height
, by adding this attribute the browser will load images inside your browser viewport.
.replace(/href\=\"\/css/g, 'href="https://ww7.readsnk.com/css').replace(/src\=\"\/js/g, 'href="https://ww7.readsnk.com/js')
replacing all tag attributes containing href
and src
fromrelative path
to absolute path
.
I also add Redis caching to the app, I’ll write about it in the next post.