scrape using scrapy and splash and execute inner javacscript , script tags

import scrapy

import pickle

class MySpider(scrapy.Spider):

    start_urls = ["http://localhost:8050/render.html?url=https://www.youtube.com/channel/UCv1Ybb65DkQmokXqfJn0Eig/channels"]
    name = "youtubesc"    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(url, self.parse)

    def parse(self, response):
        self.log("this program just visited"+ response.url)
        print("response")
        print(response.text)
        # print( response.css('a.ux-thumb-wrap.yt-uix-sessionlink .spf-link').extract())        filename = "pp.html"        with open(filename, 'wb') as f:
            pickle.dump((response.body), f)

        # yield {        #     'author_name': response.css('small.author::text').extract_first()        # }

We are leveraging localhost of splash bcz normal methods explained on website was not working

api ai tutorial

Search This Blog

scrape using scrapy and splash and execute inner javacscript , script tags

Labels

Comments

Post a Comment

Popular posts from this blog

Gui logging in node js and python

opening multiple ports tunnels ngrok in ubuntu

fork and sync a github project to bitbucket