Skip to main content

Posts

Showing posts from June, 2018

Understanding and using and Testing regex

Regex Understanding - https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285 http://www.vogella.com/tutorials/JavaRegularExpressions/article.html Testing: https://regex101.com/ Using: https://stackoverflow.com/questions/40458087/python-extract-text-from-string?rq=1 https://stackoverflow.com/questions/4666973/how-to-extract-a-substring-from-inside-a-string-in-python

scrape using scrapy and splash and execute inner javacscript , script tags

import scrapy import pickle class MySpider(scrapy.Spider): start_urls = [ "http://localhost:8050/render.html?url=https://www.youtube.com/channel/UCv1Ybb65DkQmokXqfJn0Eig/channels" ] name = "youtubesc" def start_requests( self ): for url in self .start_urls: yield scrapy.Request(url, self .parse) def parse( self , response): self .log( "this program just visited" + response.url) print ( "response" ) print (response.text) # print( response.css('a.ux-thumb-wrap.yt-uix-sessionlink .spf-link').extract()) filename = "pp.html" with open (filename, 'wb' ) as f: pickle.dump((response.body), f) # yield { # 'author_name': response.css('small.author::text').extract_first() # } We are leveraging localhost of splash bcz normal methods explained on website was not working

setup a new scrapy project from cli

scrapy startproject first_scrapy https://www.tutorialspoint.com/scrapy/scrapy_create_project.htm sample program import scrapy class firstSpider ( scrapy . Spider ): name = "first" allowed_domains = [ "dmoz.org" ] start_urls = [ "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/" , "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" ] def parse ( self , response ): filename = response . url . split ( "/" )[- 2 ] + '.html' with open ( filename , 'wb' ) as f : f . write ( response . body ) to run the program goto project directory and type "scrapy crawl 'name'" name will be whatever is there in the program eg. here it is 'first' as name = 'first'

Elastic Search multi match query equivalent with normal query

https://stackoverflow.com/questions/25537322/find-out-which-fields-matched-in-a-multi-match-query There is another exact way to find out which field is matched in the query Because the highlight is post highlight process, it is not accurate because of the way it did Just use named query to do it instead of multi-match such as { "multi_match" : { "query" : "query phrase here", "fields" : [ "name", "tag", "categorys" ], "operator" : "AND" } translate it into bool query with name "should": [ { "match": { "name": { "query": "query phrase here", "_name":"name_field" } } },{ "match": { "tag":{ "query": "

Elastic Search highlight matched Text

GET testindividualvideos/_search {   "query": {     "match": {       "dialog": {         "query": "wise",          "analyzer": "synonyms"       }     }   },    "highlight" : {         "fields" : {             "dialog" : {} // will highlight all the matches in the dialog         }     } }

Adding wordnet synonym for elasticSearch

The setting for wordnet synonym PUT /individualvideos {   "settings": {     "analysis":{     "analyzer":{         "synonyms":{             "filter":[                 "lowercase",                 "synonym_filter"             ],         "tokenizer": "standard"     } }, "filter": {  "synonym_filter": {  "type": "synonym",  "format" : "wordnet",  "synonyms_path" : "analysis/wn_s.pl"  }  }   } } } Query: GET allvideos/_search {   "query": {     "match": {       "description": {         "query": "tell me about relationship",          "analyzer": "synonyms"       }     }   } }

elastic search all queries

PUT twitter/_doc/1 {     "user" : "kimchy",     "message" : "trying out Elasticsearch having running all the way goes down sizing are is the" } PUT /my_index1 {   "settings": {     "analysis": {       "analyzer": {         "my": {           "type": "standard",           "stopwords": [ "is", "having" ]         }       }     }   } } GET /_search?q=having    GET /_analyze {   "analyzer": "standard",   "text": "trying out Elasticsearch having running all the way goes down sizing are is the" } GET /_analyze?tokenizer=whitespace {"You're the 1st runner home!"} POST _analyze {   "analyzer": "my_analyzer",   "text":     "The quick brown fox." } PUT /my_index {   "mappings": {     "blog": {       "properti

elastic search query for synonym getting lower score in search

the first sample working query .. will be modified as per use case PUT my_index4 {    "settings": {     "analysis": {       "filter": {         "arabic_stop": {           "type":       "stop",           "stopwords":  "_arabic_"         },         "arabic_keywords": {           "type":       "keyword_marker",           "keywords":   ["مثال"]         },         "arabic_stemmer": {           "type":       "stemmer",           "language":   "arabic"         }       },       "analyzer": {         "stemming_analyzer": {           "tokenizer":  "standard",           "filter": [             "lowercase",             "arabic_stop",             "arabic_normalization",             "arabic_keywords",    

transliteration google api node js (eg of hindi here)

var googleTransliterate = require('google-transliterate'); var transliteration = ''; kk = googleTransliterate.transliterate('what is this buddha', 'en', 'hi', function(err, transliteration){   console.log(transliteration);   console.log(transliteration[0]['hws']);   var fs = require('fs');   fs.writeFile("./pk.txt", transliteration[0]['hws'], function(err) {       if(err) {           return console.log(err);       }       console.log("The file was saved!");   }); });

run python from inside of node js

function callName(req, res) {             // Use child_process.spawn method from      // child_process module and assign it      // to variable spawn      var spawn = require( "child_process" ).spawn;             // Parameters passed in spawn -      // 1. type_of_script      // 2. list containing Path of the script      //    and arguments for the script             // E.g : http://localhost:3000/name?firstname=Mike&lastname=Will      // so, first name = Mike and last name = Will      var process = spawn( 'python' ,[ "./hello.py" ,                              req.query.firstname,                              req.query.lastname] );      // Takes stdout data from script which executed      // with arguments and send this data to res object      process.stdout.on( 'data' , function(data) {          res.send(data.toString());      } ) } https://www.geeksforgeeks.org/run-python-script-node-js-us