ASP提取网页中链接和标题的代码
ASP程序:提取网页中的所有超级连接URL和标题,比如把特定网页中的<aherf="../a.htm">连接一</a><aherf="../b.htm">连接二</a><...
ASP程序:
提取网页中的所有超级连接URL和标题,比如把特定网页中的
<a herf="../a.htm">连接一</a>
<a herf="../b.htm">连接二</a>
<a herf="../c.htm">连接三</a>
<a herf="../d.htm">连接四</a>
这些连接提取出来!
给提供一些代码 展开
提取网页中的所有超级连接URL和标题,比如把特定网页中的
<a herf="../a.htm">连接一</a>
<a herf="../b.htm">连接二</a>
<a herf="../c.htm">连接三</a>
<a herf="../d.htm">连接四</a>
这些连接提取出来!
给提供一些代码 展开
1个回答
展开全部
可以用正则式来做。
自己钻研吧
Call newsSina
Sub newsSina
dim Content,url,ShowContent
url="http://news.sina.com.cn/news1000/index.shtml";
Content=GetNewsContent(url)
set re=new RegExp
're.pattern="\[.*?\)"
re.pattern="<li>.*"
re.Global=true
re.IgnoreCase=true
set matches=re.execute(Content)
For Each Match in matches
'ShowContent=ShowContent&right(match.Value,len(match.value)-4)&"<br>"
ShowContent=ShowContent&ChangeURL(right(match.Value,len(match.value)-4))&"<br>"
next
ShowContent=ShowContent&"<br><a href=# onclick=vbscript:history.back>返回首页</a>"
response.write "<font size=2>"&ShowContent&"</font>"
'str=http://edu.sina.com.cn/m/2005-04-11/110673.html
'str="[财经] <a href=http://finance.sina.com.cn/g/20050411/11251505530.shtml target=_blank>CCTV经济半小时:黄河还能活几年</a><FONT class=rq> (2005/04/11 11:25)"
'response.write ChangeURL(str)
End Sub
Function ChangeURL(str)
'response.write str&"<br>"
set Rep=new regExp
Rep.pattern="http.*\b"
Rep.Global=true
Rep.IgnoreCase=true
set RepMatches=Rep.execute(str)
'Rtn=Rep.test(str)
'response.write Rtn
For Each RepMatch in Repmatches
ChangeURL=replace(str,RepMatch.value,"ShowSinaNews.asp?url="&RepMatch.value)
'response.write RepMatch.value
next
End Function
Function GetNewsContent(URL)
set objHttp=server.createobject("Microsoft.XMLHttp")
'objHttp.open "get","http://news.sina.com.cn/news1000/index.shtml";,false
objHttp.open "get",URL,false
objHttp.send()
GetNewsContent=B2B(objHttp.responsebody)
End Function
Function B2B(body)
dim objStream
set objStream=server.createobject("adodb.stream")
objStream.type=1
objStream.Mode=3
objStream.Open
objStream.Write body
objStream.Position=0
objStream.Type=2
objStream.Charset="gb2312"
B2B=objStream.ReadText
set objStream=nothing
End Function
%>
主要的是server.createobject("Microsoft.XMLHttp")这个组件
如果要提取更复杂的内容,还要用到正则表达式等。
自己钻研吧
Call newsSina
Sub newsSina
dim Content,url,ShowContent
url="http://news.sina.com.cn/news1000/index.shtml";
Content=GetNewsContent(url)
set re=new RegExp
're.pattern="\[.*?\)"
re.pattern="<li>.*"
re.Global=true
re.IgnoreCase=true
set matches=re.execute(Content)
For Each Match in matches
'ShowContent=ShowContent&right(match.Value,len(match.value)-4)&"<br>"
ShowContent=ShowContent&ChangeURL(right(match.Value,len(match.value)-4))&"<br>"
next
ShowContent=ShowContent&"<br><a href=# onclick=vbscript:history.back>返回首页</a>"
response.write "<font size=2>"&ShowContent&"</font>"
'str=http://edu.sina.com.cn/m/2005-04-11/110673.html
'str="[财经] <a href=http://finance.sina.com.cn/g/20050411/11251505530.shtml target=_blank>CCTV经济半小时:黄河还能活几年</a><FONT class=rq> (2005/04/11 11:25)"
'response.write ChangeURL(str)
End Sub
Function ChangeURL(str)
'response.write str&"<br>"
set Rep=new regExp
Rep.pattern="http.*\b"
Rep.Global=true
Rep.IgnoreCase=true
set RepMatches=Rep.execute(str)
'Rtn=Rep.test(str)
'response.write Rtn
For Each RepMatch in Repmatches
ChangeURL=replace(str,RepMatch.value,"ShowSinaNews.asp?url="&RepMatch.value)
'response.write RepMatch.value
next
End Function
Function GetNewsContent(URL)
set objHttp=server.createobject("Microsoft.XMLHttp")
'objHttp.open "get","http://news.sina.com.cn/news1000/index.shtml";,false
objHttp.open "get",URL,false
objHttp.send()
GetNewsContent=B2B(objHttp.responsebody)
End Function
Function B2B(body)
dim objStream
set objStream=server.createobject("adodb.stream")
objStream.type=1
objStream.Mode=3
objStream.Open
objStream.Write body
objStream.Position=0
objStream.Type=2
objStream.Charset="gb2312"
B2B=objStream.ReadText
set objStream=nothing
End Function
%>
主要的是server.createobject("Microsoft.XMLHttp")这个组件
如果要提取更复杂的内容,还要用到正则表达式等。
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询