整理了下文件,看到一些以前弄的小东西,再拿来看下,也算是温习吧,不管什么,久了不用都很容易忘掉。 这个是用来跟踪商品价格的一个小爬虫,爬的是jd的数据,因为自己网购基本在那上面…
下面堆上代码。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 import requestsimport jsonimport reimport timeimport osdef fun () : userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36" itemId = '7234518' postUrl="http://item.jd.com/" +itemId+".html" priceUrl="http://p.3.cn/prices/mgets?skuIds=J_" +itemId header = { "User-Agent" : "userAgent" } resName = requests.get( postUrl , headers = header) resPrice = requests.get( priceUrl , headers = header) htName = resName.text reName = re.search('"sku-name">(.*?)</' ,htName,re.S) try : name = reName.group(1 ).strip() except AttributeError: print("no have this item" ) return htPrice = json.loads(resPrice.text) price = str(htPrice[0 ]['p' ]) itime = time.strftime("%Y-%m-%d %H:%M:%S" ,time.localtime()) print('[time]\n' ) print(itime) print('-------------------------------' ) print('[name]\n' ) print(name) print('-------------------------------' ) print('[price]\n' ) if float(price) >= 0 : print(price) else : print("no stock" ) print('-------------------------------' ) fileN = 'itemId_' +itemId+'.txt' lastP = os.popen("[ -f %s ] && tail -n 1 %s|cut -d, -f1" %(fileN,fileN)).read() if lastP == price+'\n' : print('is same' ) elif lastP != '' : print('is change' ) with open(fileN,'a+' ) as f: f.write(price+',' +itime+'\n' ) if __name__ == "__main__" : print("===============================" ) fun() print("===============================" )
看下运行效果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [root@localhost xjd]#./xjd.py =============================== [time] 2018-08-19 23:30:07 ------------------------------- [name] 三星(SAMSUNG) 970 EVO 500G M.2 NVMe 固态硬盘(MZ-V7E500BW) ------------------------------- [price] 1299.00 ------------------------------- is same =============================== [root@localhost xjd]# cat itemId_7234518.txt 1299.00,2018-08-19 22:07:39 1299.00,2018-08-19 23:30:07 [root@localhost xjd]#
对于jd那个商品id,应该是比较固定的,这是长期跟踪价格的基础。 最后可以用crontab定期运行脚本。
文件链接:xjd.py