C#抓取网站下的链接下的网页数据怎么做？？

想输入网址后，就抓取整个网站所有的页面，怎么实现？需要C#代码写，不用现成的工具，我分析是不是抓取首页的所有链接，然后一级一级往下抓就可以？并且，需要抓特定的内容，也就是... 想输入网址后，就抓取整个网站所有的页面，怎么实现？需要C#代码写，不用现成的工具，我分析是不是抓取首页的所有链接，然后一级一级往下抓就可以？并且，需要抓特定的内容，也就是需要过滤掉一些信息。怎么实现？是一个什么思路，大家帮帮，多谢。展开

 我来答

3个回答

#热议# 普通体检能查出癌症吗？

shshshdy
2011-11-07 · TA获得超过102个赞

知道小有建树答主

回答量：264

采纳率：0%

帮助的人：185万

我也去答题访问个人页

关注

展开全部

1读取此网站的页面源代码
2利用正则取得所有超连接的内容
3把取得的超连接内容循环，再次操作1，2的步骤，这次2中写逻辑你想要的数据

---读取网页源代码---
protected void Page_Load(object sender, EventArgs e)
{
string strtemp;
strtemp = GetURLContent("http://go.microsoft.com/fwlink/?LinkId=25817", "utf-8");
//Response.ContentType = "application/x-www-form-urlencoded";
Response.Write(strtemp);
}
string GetURLContent(string url,string EncodingType)
{
string PetiResp = "";
Stream mystream;
//"http://go.microsoft.com/fwlink/?LinkId=25817"
//"utf-8"
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(url);
req.AllowAutoRedirect = true;
System.Net.HttpWebResponse resp = (System.Net.HttpWebResponse)req.GetResponse();
if (resp.StatusCode == System.Net.HttpStatusCode.OK)
{
mystream = resp.GetResponseStream();
System.Text.Encoding encode = System.Text.Encoding.GetEncoding(EncodingType);
StreamReader readStream = new StreamReader(mystream, encode);
char[] cCont = new char[500];
int count = readStream.Read(cCont, 0, 256);
while (count > 0)
{
// Dumps the 256 characters on a string and displays the string to the console.
String str = new String(cCont, 0, count);
PetiResp += str;
count = readStream.Read(cCont, 0, 256);
}
resp.Close();
return PetiResp;
}
resp.Close();
return null;
}

本回答由提问者推荐

已赞过 已踩过<

评论收起

baisedebing
2011-11-08 · TA获得超过202个赞

知道小有建树答主

回答量：225

采纳率：100%

帮助的人：116万

我也去答题访问个人页

关注

展开全部

正则加递归
以前无聊做了一个

追问

也想但不好实现，，，你有相关的源码吗，

追答

有类似的  但没有递归

getHTML（string path）
{
正则  m
getHTML（m）
}
应该是这样的

已赞过 已踩过<

评论收起

tianshasoft
2011-11-07

知道答主

回答量：3

采纳率：0%

帮助的人：5003

我也去答题访问个人页

关注

展开全部

找c# html解析工具。用那个类库可以把html中的标签解析出来

已赞过 已踩过<

评论收起

更多回答（1）

推荐律师服务：若未解决您的问题，请您详细描述您的问题，通过百度律临进行免费专业咨询

C#抓取网站下的链接下的网页数据怎么做？？

其他类似问题

为你推荐：