Page 1 of 1

Reading single line at a time from a very large file

Posted: Jul 30th, '17, 21:06
by SimpleSi
I'd like to process a .osm file (the whole of the UK) which is VERY large (19GB)
Is there a readline block somewhere?

Re: Reading single line at a time from a very large file

Posted: Aug 3rd, '17, 22:33
by JohnM
Wow, that's a big file! You'd definitely want to process it one line at a time.

There is not currently a way to read one line (or one character) at a time from a file, but I agree a "general purpose" language should have a way to do that. I'll put this on the wish list.

Re: Reading single line at a time from a very large file

Posted: Aug 9th, '17, 13:46
by JohnM
I've added some primitives to read "filestreams" a byte or a line at a time. There is currently no random access (a.k.a. "seek") functionality; you have to read the file sequentially. I'm not sure if that's good enough for what you have in mind. A possible application would be to extract a subset of the UK street map (e.g. just your city or region) small enough for GP to keep the entire subset in memory, allowing fast searching and access.

Keep in mind that 19GB is a lot of data. Even a simple line filtering operation on the file could take a few hours. Of course, you might not need to process the entire file often if all you're trying to do is to extract a small part of the street map for further processing.

These primitives will be in v74, which I'm about to release. I'll be curious to hear what you do with them.

Re: Reading single line at a time from a very large file

Posted: Aug 14th, '17, 22:06
by SimpleSi
Readline will be fine for what I wanted to do (I found an old dataset I had of the coastline of the UK which was readable in 1 chunk so manged to use that)


But generally, reading 1 byte/1 line at a time is always needed for read any large file in any os/language - seek would be needed for large database programs (I wrote one on BBC Micro with 5.25 floppies in the old days :) )

Re: Reading single line at a time from a very large file

Posted: Aug 14th, '17, 22:32
by SimpleSi
Could you give a hint to where these primitives are please :)

Re: Reading single line at a time from a very large file

Posted: Aug 15th, '17, 17:54
by SimpleSi
Found the primitives but not having much luck in using them

I assumed that I needed to use a variable to hold a file handle but that doesn't seem to to work
mp.PNG
mp.PNG (3.28 KiB) Viewed 12252 times

Re: Reading single line at a time from a very large file

Posted: Sep 1st, '17, 20:59
by SimpleSi
Got my head around these now :)
I've put them in this class so I can just import them into projects
filestreamblocks.gp
(426 Bytes) Downloaded 366 times
Could do with a eof boolean as well please :)