Skip to main content

scrape using scrapy and splash and execute inner javacscript , script tags

import scrapy

import pickle

class MySpider(scrapy.Spider):

    start_urls = ["http://localhost:8050/render.html?url=https://www.youtube.com/channel/UCv1Ybb65DkQmokXqfJn0Eig/channels"]
    name = "youtubesc"    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(url, self.parse)

    def parse(self, response):
        self.log("this program just visited"+ response.url)
        print("response")
        print(response.text)
        # print( response.css('a.ux-thumb-wrap.yt-uix-sessionlink .spf-link').extract())        filename = "pp.html"        with open(filename, 'wb') as f:
            pickle.dump((response.body), f)

        # yield {        #     'author_name': response.css('small.author::text').extract_first()        # }


We are leveraging localhost of splash bcz normal methods explained on website was not working

Comments

Popular posts from this blog

Gui logging in node js and python

For node.js Use  frontail for logging https://www.npmjs.com/package/frontail For Python -- use Cutelog https://pypi.org/project/cutelog/ In NodeJs for using frontail we need to use log the logs in a file for logging logs to file , we will use winston Using winston https://www.npmjs.com/package/winston Eg. of using winstonconst { createLogger, format, transports } = require('winston'); const { combine, timestamp, label, prettyPrint } = format; const logger = createLogger({   level: 'info',   format: format.json(),   transports: [     //     // - Write to all logs with level `info` and below to `combined.log`      // - Write all logs error (and below) to `error.log`.     //     new transports.File({ filename: 'error.log', level: 'error' }),     new transports.File({ filename: 'combined.log' })   ] }); logger.log({   level: 'info',   message: 'What time is...

opening multiple ports tunnels ngrok in ubuntu

Location for the config yml file /home/example/.ngrok2/ngrok.yml content of config file authtoken: 4nq9771bPxe8ctg7LKr_2ClH7Y15Zqe4bWLWF9p tunnels: app-foo: addr: 80 proto: http host_header: app-foo.dev app-bar: addr: 80 proto: http host_header: app-bar.dev how to start ngrok with considering the config file: ngrok start --all

rename field in elastic Search

https://qiita.com/tkprof/items/e50368eb1473497a16d0 How to Rename an Elasticsearch field from columns: - {name: xxx, type: double} to columns: - {name: yyy, type: double} Pipeline API and reindex create a new Pipeline API : Rename Processor PUT _ingest/pipeline/pipeline_rename_xxx { "description" : "rename xxx", "processors" : [ { "rename": { "field": "xxx", "target_field": "yyy" } } ] } { "acknowledged": true } then reindex POST _reindex { "source": { "index": "source" }, "dest": { "index": "dest", "pipeline": "pipeline_rename_xxx" } }