FoxPro Programming
Possible to receive and convert > 16 mb file from web server
Gravatar is a globally recognized avatar based on your email address. Possible to receive and convert > 16 mb file from web server
  Albert Gostick
  All
  Jan 4, 2019 @ 12:47pm

Hi all,

Just finished reading Rick's blog on > 16 mb files and then trying the .lUseLargePostBuffer property in wwhttp to post a > 16mb file to web server - works (how long has this been in there? did not see it until recently?). So I could change my code that uploads files to their document server to allow > 16mb.

But, not sure if I can download these large files. Before I spend a lot of time trying some test code, any idea if this is possible:

  • file is an xml stream which has the xml header and then tags for some meta data and then the actual binary is contained within tags eg.
<docid>12345...</docid>
<doctitle>Something warm and fuzzy.doc</doctitle>
<blob>...base64 encoded data...</blob>

So theoretically, if I brought down a string > 16mb, if I followed the rules outlined in the article and maybe dumped the string into a file, do you think I could pull out the blob and decode it? Or would pulling out the blob part using STREXTRACT and STRCONV() to decode not work.

Just thought I would ask instead of hitting my head against a wall...(the server api also has code to block downloads > 16 mb so harder to test).

Albert

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Rick Strahl
  Albert Gostick
  Jan 5, 2019 @ 01:13pm

You can directly download to file to get the content down. Once it's in a file you can use the MSXML parser to get the content out I believe, but I'm not sure if that would work with >16meg strings. I think it should as long as it's a single value that's directly assigned to a variable. You should actually be able to pull this into a blob (type Q).

Haven't tried this but should be possible.

Worst case scenario I would build an intermediate .NET component that can pull the content in full size, process it and then either pass it back to you or store it into whatever data storage you need to put it. Using wwDotnetBridge that would be an easy workaround.

+++ Rick ---

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Albert Gostick
  Rick Strahl
  Apr 3, 2019 @ 02:45pm

Hi Rick, Had some downtime today to mess around with the large file thing. I incorporated the new property .lUseLargePostBuffer into my code so that I could upload files larger than 16mb. It worked so I am happy with that.

I am now waiting for the web developer to remove the restriction regarding downloading strings larger than 16 mb - I had him put this in so that I did not have to handle an exception if the web sent down a larger string - he just returned an error code to me instead.

Once he makes that change, can you tell me if your code in httpget() is able to accept a string larger than 16mb (without reading through all the code)? Or is there a switch to have the output go directly to a file and then from there I can pick it up and parse it?

BTW, re-read your blog post and one correction (if I read you right) - if I create a large string with FILETOSTR() over 16 mb, I can create a new var from this e.g.

lcTest = FILETOSTR("c:\temp\BiggaFile.pdf")  && some file > 16 mb

lcTest2 = lcTest

Another note is that I also found is that you can append something to a large string e.g.

lcTest = lctest + "blather"

but you cannot prepend something to the var

lcTest = "pre-blather" + lctest

as it must have to create an intermediate internal var when prepending. Something to add to your notes.

Thanks, Albert

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Rick Strahl
  Albert Gostick
  Apr 3, 2019 @ 06:55pm

Sooo....

I actually took a look at the code in wwHttp to see why it wasn't working with large strings and it turns out I was able to fix it so it does now.

The issue was that that although the content is built up with tcBuffer = tcBuffer + lcReadContent which in theory should work for strings larger than 16mb, it did not work in wwHttp because the variable was passed in by reference. I added a second dereferenced variable lcBuffer to accumulate the data then assign it back to tcBuffer when it's loaded and... voila it works. > 16meg strings from wwHttp calls!

I've updated the wwHttp.prg in https://west-wind.com/files/WebConnectionExperimental.zip that has this change.

That said - if you know you're receiving large files, it'll be more efficient to just dump the data directly to file. Especially if you're dealing with binary content anyway. It's quicker and saves the additional memory overhead.

While in there I also bumped up the the default buffer size to 65k from 4k which should make larger downloads drastically faster. A 20mb download was taking nearly a minute before and now takes 5 seconds.

+++ Rick ---

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Albert Gostick
  Rick Strahl
  Apr 5, 2019 @ 10:31am

Hi Rick, The new classes work. Before I put them into production, I am on ver 6.21 - are there any dependencies that would require me to get the rest of the updated classes (i.e. 7.x). Not that I don't mind paying and getting them as I don't think my client will object to paying for an upgrade as I think it is probably a couple of years since they have paid for an upgrade (you should be able to tell me).

Also, the download from the server for a 33mb file is really slow (about 2 minutes). I tracked this down in your code to the fact that it seems the whole string comes down to some Windows temp memory and then your code takes it in chunks and adds it to the resulting lcBuffer string. I think creating this string on large files just takes too much processing because of the number of loops required for say an 18mb file.

I then wondered if I could increase the buffer size to decrease the number of loops; I looked at httpGet() and noticed a tnBufferSize var but it only gets passed in if lcUserName is passed in as numeric - which I already use for username so cannot go that route; btw, I noticed tnBuffersize never gets declared as local - I think it might have been local var szHead as that does not appear to be used in the function.

So then I outputted to file and that is a lot faster (a few secs) so I will rework my code to dump out the string to a temp file and then I will pull that back into my class as a string as I have to parse it for tags and decode it etc. before my final dump out to file.

If you have any other suggestions for speeding up the download instead of outputting to file (maybe there is a switch or property some place that I have not yet seen?), I'd like to hear them as I would rather not dump it out to file just to pick it up and put it back into a string.

Thanks, Albert

Albert

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Rick Strahl
  Albert Gostick
  Apr 5, 2019 @ 01:03pm

Check the nHttpWorkBufferSize property. This was really small (4096) bytes but I've bumped this to 65535 and that makes a huge difference. You can set the buffer before the download and if you know it's large make it even bigger.

I updated the default this in wwHttp but it may have been after I posted the file in the zip.

With the 64k buffer a 30 meg download took about 5 seconds. With the 4k buffer it took about a minute. So the buffer definitely makes a difference (not sure why it's this much though). I think it's because with the largish buffers VFP is re-allocating memory each and every time and as the memory size goes up that gets slow. For 4k that's a lot of allocations, much less so with the larger buffers.

The way my latest code works it uses the value in nHttpWorkBufferSize as the max buffer size, and it sets the during the download buffersize based on the content size. If it's smaller it only allocates the size of the actual buffer. If it's larger it maxes out at the value you set.

And again - if you're downloading a file, you should just directly save it to a file and remove the need to store into a string at all. The buffer size will still matter for this as it affects the number of reads and writes to the file, but probably less because there's no string allocation overhead.

As to version update - it's probably fine, but probably a good idea to upgrade to get these enhancements as they come along...

+++ Rick ---

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Albert Gostick
  Rick Strahl
  Apr 8, 2019 @ 08:39am

Hi Rick,

The version you gave me has .nhttpWorkBufferSize set to 65556 (btw, does not really matter but 64k is 65536).

Traced it down and this is your code that changes it:

*** Dynamically set the buffersize up to the max buffer size
IF THIS.nContentSize > 0
    LOCAL lnHttpBufferSize
	lnHttpBufferSize = this.nContentSize
	IF THIS.nhttpworkbuffersize > 0 AND lnHttpBufferSize > THIS.nhttpworkbuffersize
	   lnHttpBufferSize = THIS.nhttpworkbuffersize
	ENDIF
	THIS.nhttpworkbuffersize = lnHttpBufferSize
ENDIF

What I figured out is that before my actual download, I have a call to fetch some other info from the server and this was setting it to 338 bytes even though the class instantiated to 64k. All subsequent calls use the 338 buffer size. Not sure if this is what you intended - I would have expected it to default back to the 64k limit that the object originally had for this property.

I guess your intent is that I set this before every call to an appropriate size (for documents, I don't know the size of the document downloads ahead of time as my only other api call to the server just returns a list of documents and time stamps).

I can easily work around this as I have a ResetObject() method that gets called for all the function calls in my class so that I don't have to remember to reset every property that I might potentially change - I can put a line in there to always set it back to 64k. But I wonder if your code could capture the original value and reset it before exiting httpGetEx().

Thanks, Albert

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Albert Gostick
  Albert Gostick
  Apr 8, 2019 @ 12:12pm

Just an update on some testing regarding outputting straight to file vs. outputting to the string; I wanted to check times if always going to string as it seemed for some strange reason once I got above 40mb downloads that outputting direct to file "chokes" - don't know why but it does. As well, I wanted to try out some different buffer sizes when setting nhttpworkbuffersize. Here are my results. Note that I initially was only looking at larger files (18mb and up) but then I went back to check the smaller sized files (but I did not do all buffersizes for them).

Speed test results line is as follows: size of blob - time when outputting direct to file then times when outputting to string at various buffer sizes

4.5MB - to file: 0.52 secs; 1024kb: 0.47 secs

9MB - to file: 0.96 secs; 1024kb: 0.88 secs

18MB - to file: 2.70 secs; 64kb: 4.09 secs, 256kb: 2.29 secs; 512kb: 2.18 secs; 1024kb: 2.20 secs

32MB - to file: 6.10 secs; 64kb: 11.55 secs, 256kb: 4.97 secs; 512kb: 4.25 secs; 1024kb: 3.66 secs

40MB - to file: 6.47 secs; 64kb: 16.65 secs, 256kb: 6.95 secs; 512kb: 5.33 secs; 1024kb: 4.36 secs

52MB - to file*: 39.5 secs; 64kb: 30.15 secs, 256kb: 10.65 secs; 512kb: 7.69 secs; 1024kb: 6.38 secs

*note: but often fails to download properly even with timeout set to 60 secs

Conclusions: "to file" is only faster when buffersize is 64kb or smaller; **once buffer size gets above 64kb, "to string" is faster (but only significantly so if file size is > 18mb); for smaller "normal" sized files (the bulk of which this program deals with), the difference is not significant. ** But because "to string" really chokes on large files (somewhat above 40mb), will instead have to use "to string" in this procedure.

Reasons for difference: when outputting to file, although the initial response get dumps into a file, I then have to move this content back into a string to parse it as the actual document data is within tags. So this adds some overhead to push into file and pull out before I can continue parsing. If I did not need to further parse and decode the string, the times might have been closer (but I would always need to at least decode the string even if it were not buried between tags so would still have to pull out to decode - or decode into a 2nd file).

Not sure why "to file" chokes above 40mb. At first I thought it was the server having trouble encoding larger files but then once I switched to outputting to a string, the problem went away.

Hope this helps you or someone else.

Albert

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Rick Strahl
  Albert Gostick
  Apr 8, 2019 @ 12:51pm

Hmmm... looks like I should be resetting the property along with other properties after each request.

However - it's recommended you use a new wwHttp instance for every request anyway. wwHttp is pretty lightweight and there's no reason not to create a new instance for each stateless Http request.

+++ Rick ---

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Rick Strahl
  Albert Gostick
  Apr 8, 2019 @ 01:12pm

Also tried playing around with file downloads to see if there's a problem. I can't see why bigger file downloads would fail - files are just buffers that are written out to disk one chunk at a time - VFPs file functions are just a slight step above raw API calls so they should be quick and efficient.

So I did this a few times:

CLEAR
DO WCONNECT

LOCAL loHttp as wwhttp
loHttp = CREATEOBJECT("wwhttp")
loHttp.nHttpWORKBUFFERSIZE = 500000

* 150mb download .NET SDK
lcOutput = loHttp.Get("https://download.visualstudio.microsoft.com/download/pr/4145b8a6-dfd9-4677-9a88-416e546fc30b/95a010d11c01c1013dc3943ced53de74/dotnet-sdk-2.2.202-win-gs-x64.exe",
                      "","","c:\temp\dotnet.exe")

? loHttp.nError
? loHttp.cErrorMsg

This works for me without issue. So there might be something else going on on your end with the bigger download failing...

Note also that the buffer size also applies to file downloads as each chunk written to disk is based on this buffer size.

I've also fixed the file buffer size to be localized in the HttpGetEx call meaning the value of nHttpWorkBufferSize is not actually updated in the request. So that should fix the value being set to an unaccepted value when you reuse the object.

+++ Rick ---

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Albert Gostick
  Rick Strahl
  Apr 8, 2019 @ 01:46pm

Yeah, I will have to ask the web-side guy who customized the web api to see if there is something he is doing that chokes above 40mb. Time taken goes up 3 fold 40 to 52 when it should only be about 30% longer.

As well, this testing is run against a test server which is older than the production. By rights they are supposed to have the same configuration. The test server is an older physical server with fixed hard drives whereas the production server is virtualized with a really fast SSD only drives. So things might improve once in production.

Albert

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Albert Gostick
  Rick Strahl
  Apr 9, 2019 @ 10:43am

Hi Rick, Tried your test code against dotnet.exe and download time of 7 seconds. So yeah, download times should be fast. Will wait and see what it is like in production on a fast server and see if the probs persist. Thanks, Albert

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Albert Gostick
  Rick Strahl
  Apr 9, 2019 @ 11:19am

Just downloaded 7.0.4 client tools. I know it is pwd protected but I have not had a version of pkzip for some time since Windows can now extract zip files. But it seems that under Win 10, there is no way to put in the password - at least I cannot find the option.

Do you know if there is a freeware version that will unzip this file and prompt for a password?

Thanks, Albert

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Tore Bleken
  Albert Gostick
  Apr 9, 2019 @ 12:32pm

I recommend 7-zip.

Gravatar is a globally recognized avatar based on your email address. re: Possible to receive and convert > 16 mb file from web server
  Albert Gostick
  Tore Bleken
  Apr 10, 2019 @ 09:08am

Perfect. Did the trick. Thanks.

© 1996-2024