Automated data download

Discussion:

(too old to reply)

Anton Klepec

2008-10-27 13:01:41 UTC

I tried to use that code below and it works fine but I get html page with
html code. Can I get just text without html code? Below is the code that I
use.

**********************
locarr = new array(2)
locarr[1] = "AKC"
*locarr[2] = "11526400"
locarr[2]=""
URLA="http://www.mf.gov.si/slov/dav_car/obr_mera_povez_osebe.htm"
urlb=""
*urlb= "&cb_00060=on &begin_date=2008-01-01 &format=rdb"
DownloadToFile(urla+locarr[2]+urlb, Locarr[1]+"4.txt")

function DownloadToFile
// Get data from URL via automated download
//"Rich Autotracker posted this in the dBase newsgroups"
parameters URL, cFile
if file("&cfile.")
erase "&cfile."
endif
if type('URLDownloadToFile') # "FP"
extern culong URLDownloadToFile;
( cptr, cstring, cstring, culong,;
culong) URLMON.DLL from "URLDownloadToFileA"
endif
hResult = itoh(URLDownloadToFile( null, URL, cFile, null, null))
return iif(file("&cfile.") and hResult=0,true,false)
****** END DownLoadToFile ******
*************************

Thanks foy any help.

Anton Klepec

I have developed two types of procedures for automated downloads of text
data from USGS sites. I am interested in advice about which method might be
preferable, and any other suggestions.
E.L.
The first method uses function DownloadToFile, developed by Rich
"Autotracker". DownloadtoFile uses the API function URLDownloadToFile
from URLMON.dll. It works OK for me. But I did see some cautions about it
in some web forums.
The second method uses the XMLHttp object through oleautomation. I worked
this up from some javascript code in a document "XMLHttp Requests for
Ajax" by Nicholas C. Zakas, found at
http://www.wrox.com/WileyCDA/Section/id-291289.html. I'm not trying to do
Ajax, just to acquire a text file from an automated web site. This works
OK for me too.
Here excerpts from both of my procs.
****** With DownLoadToFile ******
locarr = new array(2)
locarr[1] = "NorthFork"
locarr[2] = "11526400"
urla= "http://Waterdata.usgs.gov/nwis/dv ?site_no="
urlb= "&cb_00060=on &begin_date=2008-01-01 &format=rdb"
DownloadToFile(urla+locarr[2]+urlb, Locarr[1]+"CFS2008.txt")
function DownloadToFile
// Get data from URL via automated download
//"Rich Autotracker posted this in the dBase newsgroups"
parameters URL, cFile
if file("&cfile.")
erase "&cfile."
endif
if type('URLDownloadToFile') # "FP"
extern culong URLDownloadToFile;
( cptr, cstring, cstring, culong,;
culong) URLMON.DLL from "URLDownloadToFileA"
endif
hResult = itoh(URLDownloadToFile( null, URL, cFile, null, null))
return iif(file("&cfile.") and hResult=0,true,false)
****** END DownLoadToFile ******
****** With XMLHTTP ******
oXmlHttp = createXMLHttp() // proc below
oXmlHttp.open("get", MkURL(), true)
if oXmlHttp.readyState == 4
msgbox("Got response.")
endif
oXmlHttp.send(null)
if oXmlHttp.status == 200
msgbox("Data returned") // is: " + oXmlHttp.responseText)
else
msgbox("An error occurred: " + oXmlHttp.statusText)
return
endif
f = new file()
f.create("NorthForkCFS2008.txt","RW")
f.puts(oXmlHttp.responseText)
f.close()
oXmlHttp.abort()
oXmlHttp = null
proc createXMLHttp
aVersions =
{"MSXML2.XMLHttp.5.0","MSXML2.XMLHttp.4.0","MSXML2.XMLHttp.3.0","MSXML2.XMLHttp","Microsoft.XMLHttp"}
for i = 1 to aVersions.size
try
?i
oXmlHttp = new oleautoclient(aVersions[i])
return oXmlHttp // return if no error
catch (Exception e)
// do nothing
endtry
next
msgbox("MSXML is not installed.")
Proc MkURL
stcode = "11526400 "
urla= "http://Waterdata.usgs.gov/nwis/dv ?site_no="
urlb= "&cb_00060=on &begin_date=2008-01-01 &format=rdb"
return urla+stcode+urlb
****** End XMLHTTP ******

Geoff Wass [dBVIPS]

2008-10-28 04:58:55 UTC

Permalink

Post by Anton Klepec
I tried to use that code below and it works fine but I get html page with
html code. Can I get just text without html code? Below is the code that I
use.

Anton,

Look at the "Example No. 3" here:

http://www.jpmartel.com/bu20_b.htm

--
Geoff Wass [dBVIPS]
Montréal, Québec, Canada

.|.|.| dBASE info at http://geocities.com/geoff_wass |.|.|.
.|.|.| ---------------------------------------------------------- |.|.|.
.|.|.| IT Consultant http://Geoff_Wass.com |.|.|.

Anton Klepec

2008-10-30 12:00:02 UTC

Permalink

Thanks, Geoff. It is pretty complicated. Is there any simple funkcion, like
that on net explorer, which saves html page as text page?
--
Lep pozdrav!

Anton Klepec

AKC d.o.o.
Kavškova 13
1000 Ljubljana

tel. 01 515 3212

www.akc.si

Post by Geoff Wass [dBVIPS]

Post by Anton Klepec
I tried to use that code below and it works fine but I get html page with
html code. Can I get just text without html code? Below is the code that I
use.

Anton,
http://www.jpmartel.com/bu20_b.htm
--
Geoff Wass [dBVIPS]
Montréal, Québec, Canada
.|.|.| dBASE info at http://geocities.com/geoff_wass |.|.|.
.|.|.| ---------------------------------------------------------- |.|.|.
.|.|.| IT Consultant http://Geoff_Wass.com |.|.|.

Geoff Wass [dBVIPS]

2008-10-31 04:55:51 UTC

Permalink

Post by Geoff Wass [dBVIPS]
http://www.jpmartel.com/bu20_b.htm

Anton,

There is more than one "Example No.3" in that article, sorry. Here is
the code you would need:

oRegExp = new OleAutoClient("VBScript.RegExp")
oRegExp.Global = true
oRegExp.IgnoreCase = true
cString = "If you comply with regular expression, "
cString += "go to www.regexp.com or www.re.com."
oRegExp.Pattern = "www.\w+\.\w+"

aMatches = oRegExp.Execute(cString)
for i = aMatches.count to 1 step -1
cTemp = cString.left(aMatches.item(i-1).FirstIndex)
cTemp += "http://" + aMatches.item(i-1).Value
cTemp += cString.substring( aMatches.item(i-1).FirstIndex + ;
aMatches.item(i-1).length, ;
cString.length)

/*
// Or using the dbase string functions
cTemp = left(cString, aMatches.item(i-1).FirstIndex)
cTemp += "http://" + aMatches.item(i-1).Value
cTemp += substr(cString,aMatches.item(i-1).FirstIndex + ;
aMatches.item(i-1).length+1)
*/

cString = cTemp
next
msgbox( cString )

You should be able to make a function out of this (untested).

function htmlToText( cInputString )

local cString, oRegExp, aMatches, i, cTemp

cString = cInputString
oRegExp = new OleAutoClient("VBScript.RegExp")
oRegExp.Global = true
oRegExp.IgnoreCase = true
oRegExp.Pattern = "www.\w+\.\w+"

aMatches = oRegExp.Execute(cString)

for i = aMatches.count to 1 step -1
cTemp = cString.left(aMatches.item(i-1).FirstIndex)
cTemp += "http://" + aMatches.item(i-1).Value
cTemp += cString.substring( aMatches.item(i-1).FirstIndex + ;
aMatches.item(i-1).length, ;
cString.length)

cString = cTemp
next

return( cString )