Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug of Gerapy Auto Extractor 爬取论坛帖子时候出错 #4

Open
bowu678 opened this issue Jul 12, 2020 · 1 comment
Open

Bug of Gerapy Auto Extractor 爬取论坛帖子时候出错 #4

bowu678 opened this issue Jul 12, 2020 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@bowu678
Copy link

bowu678 commented Jul 12, 2020

爬取的链接是:https://www.19lou.com/forum-269-1.html
使用extract_list获取到的数据是:
[
{
"title": "19楼帮帮团维权月来啦!7月维权主题汽车类",
"url": "http://www.19lou.com/forum-79-thread-42261592790646553-1-1.html"
},
{
"title": "19楼帮帮团来咯,求助维权攻略请收下!",
"url": "http://www.19lou.com/forum-79-thread-82281589267909116-1-1.html"
},
{
"title": "【19楼帮帮团】每日诈骗连载!少点套路,多点幸福",
"url": "http://www.19lou.com/forum-79-thread-82681592968362354-1-1.html"
},
{
"title": "杭州人杭州事,你要知道的都在19楼",
"url": "http://www.19lou.com/forum-269-thread-63421567731405299-1-1.html"
},
{
"title": "楼外楼:杭州事【总版规】(本版不支持一切形式广告)",
"url": "http://www.19lou.com/forum-269-thread-31532348-1-1.html"
}
]

使用extract_detail获取的是:
{
"title": "",
"datetime": "2020-07-12 00:55:56",
"content": "浙公网安备 33010002000029号"
}

没有一个是想要的数据,想要的是帖子标题加链接加帖子正文。

@bowu678 bowu678 added the bug Something isn't working label Jul 12, 2020
@Germey
Copy link
Member

Germey commented Jul 12, 2020

感谢,目前的优化都是针对新闻页面,对论坛支持可能不好,我会继续优化。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants