'민주당 내 XX들 명단' 시리즈 글을 작성할 때 만들어서 활용하던 python 소스 입니다.
페이지 정보
57 조회
6 댓글
2 추천
본문
'민주당 내 XX들 명단' 시리즈 글을 작성할 때 만들어서 활용하던 python 소스 입니다.
chatGPT를 통해 기본틀을 어느 정도 만들었고, 그 후에 아주 초끔만 다듬었습니다.
keywords , yesterday, today 만 살펴보시면 활용하실 수 있을 겁니다.
import requests
from bs4 import BeautifulSoup
import datetime
import re
import urllib.parse
import time
def korean_to_url(input_text):
# Encode the input text in UTF-8
utf8_bytes = input_text.encode("utf-8")
# Convert the UTF-8 bytes to URL-encoded format
url_encoded = urllib.parse.quote(utf8_bytes)
return url_encoded
if __name__ == "__main__":
# search keyword
keywords = "이원욱,조응천,김종민,이낙연,오영환,이상민,설훈,홍영표,박용진,송갑석"
keywords_list = keywords.split(",")
today = datetime.datetime.now()
yesterday = today - datetime.timedelta(days=2)
today = today - datetime.timedelta(days=0)
start_date = yesterday.strftime("%Y.%m.%d")
end_date = today.strftime("%Y.%m.%d")
space4 = " "
start_date_d = yesterday.strftime("%Y.%m.%d").replace(".","")
end_date_d = today.strftime("%Y.%m.%d").replace(".","")
with open("view_" + str(start_date.replace(".","_")) + ".html", "w", encoding="utf-8") as file:
print("\n" + "START naver search")
for keyword in keywords_list:
file.write("> " + keyword + " (naver) <br>")
file.write(space4 + "---------------------------------------<br>")
print("---->" + keyword)
url_encoded_text = korean_to_url(keyword)
url = f"https://search.naver.com/search.naver?where=news&sm=tab_pge&query={url_encoded_text}&field=0&pd=1&ds={start_date_d}&de={end_date_d}&mynews=0&office_type=0&office_section_code=0&news_office_checked=&office_category=0&service_area=0&nso=so:dd,p:1w,a:all&start=11"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for result in soup.select(".news_tit"):
link = str(result)
file.write(space4 + link + "<br>")
# file.write(space4 + "---------------------------------------")
# file.write("<br><br>")
time.sleep(0.1)
file.write("<br><br>")
file.write(space4 + "=========================================")
file.write("<br><br>")
print("END naver search" + "\n")
print("\n" + "START daum search")
for keyword in keywords_list:
file.write("> " + keyword + " (daum) <br>")
file.write(space4 + "---------------------------------------<br>")
print("---->" + keyword)
url_encoded_text = korean_to_url(keyword)
url = f"https://search.daum.net/search?DA=STC&cluster=y&cluster_page=1&ed=&enc=utf8&nil_search=btn&period=w&q={keyword}&sd=&w=news&sort=recency"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for result in soup.select(".item-title"):
link = str(result)
file.write(space4 + link)
url = f"https://search.daum.net/search?w=news&nil_search=btn&DA=STC&enc=utf8&cluster=y&cluster_page=2&q={keyword}&sd={start_date}010100&ed={end_date}125959&period=w"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for result in soup.select(".tit_main fn_tit_u"):
link = str(result)
file.write(space4 + link + "<br>")
url = f"https://search.daum.net/search?w=news&nil_search=btn&DA=STC&enc=utf8&cluster=y&cluster_page=3&q={keyword}&sd={start_date}010100&ed={end_date}125959&period=w"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for result in soup.select(".tit_main fn_tit_u"):
link = str(result)
file.write(space4 + link + "<br>")
url = f"https://search.daum.net/search?w=news&nil_search=btn&DA=STC&enc=utf8&cluster=y&cluster_page=4&q={keyword}&sd={start_date}010100&ed={end_date}125959&period=w"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for result in soup.select(".tit_main fn_tit_u"):
link = str(result)
file.write(space4 + link + "<br>")
file.write(space4 + "---------------------------------------")
file.write("<br><br>")
time.sleep(0.1)
file.write("<br><br>")
file.write(space4 + "=========================================")
file.write("<br><br>")
print("END daum search" + "\n")
끝.
- 3년은 너무 길어요.
- 다시 회초리를 듭니다.
- [지역구 국회의원 문자 전송]
'22대 국회의장 선거에서 누구를 선택하셨나요?'
[국회의장 선거] 답변 정리해봅니다.
댓글 6
벗바리님의 댓글