Skip to main content

setup a new scrapy project from cli

scrapy startproject first_scrapy
https://www.tutorialspoint.com/scrapy/scrapy_create_project.htm


sample program
import scrapy  

class firstSpider(scrapy.Spider): 
   name = "first" 
   allowed_domains = ["dmoz.org"] 
   
   start_urls = [ 
      "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", 
      "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" 
   ]  
   def parse(self, response): 
      filename = response.url.split("/")[-2] + '.html' 
      with open(filename, 'wb') as f: 
         f.write(response.body)

to run the program goto project directory and type 
"scrapy crawl 'name'"

name will be whatever is there in the program eg. here it is 'first'
as name = 'first'

Comments

Popular posts from this blog

Gui logging in node js and python

For node.js Use  frontail for logging https://www.npmjs.com/package/frontail For Python -- use Cutelog https://pypi.org/project/cutelog/ In NodeJs for using frontail we need to use log the logs in a file for logging logs to file , we will use winston Using winston https://www.npmjs.com/package/winston Eg. of using winstonconst { createLogger, format, transports } = require('winston'); const { combine, timestamp, label, prettyPrint } = format; const logger = createLogger({   level: 'info',   format: format.json(),   transports: [     //     // - Write to all logs with level `info` and below to `combined.log`      // - Write all logs error (and below) to `error.log`.     //     new transports.File({ filename: 'error.log', level: 'error' }),     new transports.File({ filename: 'combined.log' })   ] }); logger.log({   level: 'info',   message: 'What time is...

opening multiple ports tunnels ngrok in ubuntu

Location for the config yml file /home/example/.ngrok2/ngrok.yml content of config file authtoken: 4nq9771bPxe8ctg7LKr_2ClH7Y15Zqe4bWLWF9p tunnels: app-foo: addr: 80 proto: http host_header: app-foo.dev app-bar: addr: 80 proto: http host_header: app-bar.dev how to start ngrok with considering the config file: ngrok start --all

rename field in elastic Search

https://qiita.com/tkprof/items/e50368eb1473497a16d0 How to Rename an Elasticsearch field from columns: - {name: xxx, type: double} to columns: - {name: yyy, type: double} Pipeline API and reindex create a new Pipeline API : Rename Processor PUT _ingest/pipeline/pipeline_rename_xxx { "description" : "rename xxx", "processors" : [ { "rename": { "field": "xxx", "target_field": "yyy" } } ] } { "acknowledged": true } then reindex POST _reindex { "source": { "index": "source" }, "dest": { "index": "dest", "pipeline": "pipeline_rename_xxx" } }