博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
python基本语法1.4--初识爬虫
阅读量:5075 次
发布时间:2019-06-12

本文共 1152 字,大约阅读时间需要 3 分钟。

import requestsimport timeimport xml.etree.ElementTree as ETfrom multiprocessing.dummy import Pool as ThreadPoo;from xml.parsers.expat import ParserCreateclass DefaultSaxHandler(object):    def __init__(self, provinces):        self.provinces = provinces    def start_element(self, name, attrs):        if name != 'map':            name = attrs['title']            number = attrs['href']            self.provinces.append((name, number))                def end_element(self, name):        pass    def char_data(self, text):        pass    def get_provinces(url):    content = requests.get(url).content.decode('gb2312')    start = content.find('')    end = content.find('')    content = content[start:end + len('')].strip()    print(content)    provinces = []    handler = DefaultSaxHandler(provinces)    parser = ParserCreate()    parser.StartElementHandler = handler.start_element    parser.EndElementHandler = handler.end_element    parser.CharacterDataHandler = handler.char_data    parser.Parse(content)    return provincesprovinces = get_provinces('http://www.ip138.com/post')print(provinces)

 

转载于:https://www.cnblogs.com/xiaoyingying/p/7689841.html

你可能感兴趣的文章