Skip to main content

Posts

Showing posts from December, 2018

scrapy notes

use in shell 1. scrapy shell url css selector: response.css('small.author') <small class = "author"> william<> --> html will look something like this response.css('small.author').extract() to get the actual html data Now to remove the html selector: response.css('small.author::text').extract() To get the result as string and not as list and to select the first element response.css('small.author::text')[0].extract() or response.css('small.author::text').extract_first() for doing scrapy shell website to load a new website while we are in shell we can use fetch(url) To create a new request scrapy.Request(url)

python commands pandas

Series You can convert a list,numpy array, or dictionary to a Series: labels = ['a','b','c'] my_list = [10,20,30] arr = np.array([10,20,30]) d = {'a':10,'b':20,'c':30} pd.Series(data=my_list) pd.Series(data=my_list,index=labels) pd.Series(d) #series can even hold function, though its very unlikely that we will use it Using an index ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])          USA 1 Germany 2 USSR 3 Japan 4 dtype: int64 ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan']) ser1 + ser2 Germany 4.0 Italy NaN Japan 8.0 USA 2.0 USSR NaN Data Frames We can think of a DataFrame as a bunch of Series objects   df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split()) W X Y Z A 2.706850 0.6...

python useful commands numpy

Create an array of the integers from 10 to 50 np.arange(10,51) Create an array of all the even integers from 10 to 50 np.arange(10,51,2) Create a 3x3 matrix with values ranging from 0 to 8 np.arange(9).reshape(3,3) Create a 3x3 identity matrix ¶ np.eye(3) Use NumPy to generate a random number between 0 and 1 np.random.rand(1)  Use NumPy to generate an array of 25 random numbers sampled from a standard normal distribution np.random.randn(25) Create an array of 20 linearly spaced points between 0 and 1: np.linspace(0,1,20) mat is a 2d array Get the sum of all the values in mat mat.sum() Get the standard deviation of the values in mat mat.std() Get the sum of all the columns in mat mat.sum(axis=0)

mapping and analyzers code in elasticsearch

GET physicsscript/_search {   "query": {     "match": {       "title": "energy"     }   } } #_______________________ #test mapping in small PUT twitter/_doc/2 {     "user" : "kimchy",     "post_date" : "2009-11-15T14:12:12",     "message" : "running very fast loving everything rocks pea" } GET /twitter/_search?q='try' GET /twitter/_mapping/ ______________________________ PUT /twitter6 {   "settings": {     "analysis": {       "filter": {         "english_stop": {           "type":       "stop",           "stopwords":  "_english_"         },         "light_english_stemmer": {           "type":       "stemmer",           "language":   "l...