Reading single line at a time from a very large file

Questions about GP commands and how to do things

Moderator: MSandro

Post Reply
SimpleSi
Posts: 330
Joined: Jul 2nd, '17, 13:47

Reading single line at a time from a very large file

Post by SimpleSi » Jul 30th, '17, 21:06

I'd like to process a .osm file (the whole of the UK) which is VERY large (19GB)
Is there a readline block somewhere?

JohnM
Posts: 379
Joined: Sep 11th, '15, 14:42

Re: Reading single line at a time from a very large file

Post by JohnM » Aug 3rd, '17, 22:33

Wow, that's a big file! You'd definitely want to process it one line at a time.

There is not currently a way to read one line (or one character) at a time from a file, but I agree a "general purpose" language should have a way to do that. I'll put this on the wish list.

JohnM
Posts: 379
Joined: Sep 11th, '15, 14:42

Re: Reading single line at a time from a very large file

Post by JohnM » Aug 9th, '17, 13:46

I've added some primitives to read "filestreams" a byte or a line at a time. There is currently no random access (a.k.a. "seek") functionality; you have to read the file sequentially. I'm not sure if that's good enough for what you have in mind. A possible application would be to extract a subset of the UK street map (e.g. just your city or region) small enough for GP to keep the entire subset in memory, allowing fast searching and access.

Keep in mind that 19GB is a lot of data. Even a simple line filtering operation on the file could take a few hours. Of course, you might not need to process the entire file often if all you're trying to do is to extract a small part of the street map for further processing.

These primitives will be in v74, which I'm about to release. I'll be curious to hear what you do with them.

SimpleSi
Posts: 330
Joined: Jul 2nd, '17, 13:47

Re: Reading single line at a time from a very large file

Post by SimpleSi » Aug 14th, '17, 22:06

Readline will be fine for what I wanted to do (I found an old dataset I had of the coastline of the UK which was readable in 1 chunk so manged to use that)


But generally, reading 1 byte/1 line at a time is always needed for read any large file in any os/language - seek would be needed for large database programs (I wrote one on BBC Micro with 5.25 floppies in the old days :) )

SimpleSi
Posts: 330
Joined: Jul 2nd, '17, 13:47

Re: Reading single line at a time from a very large file

Post by SimpleSi » Aug 14th, '17, 22:32

Could you give a hint to where these primitives are please :)

SimpleSi
Posts: 330
Joined: Jul 2nd, '17, 13:47

Re: Reading single line at a time from a very large file

Post by SimpleSi » Aug 15th, '17, 17:54

Found the primitives but not having much luck in using them

I assumed that I needed to use a variable to hold a file handle but that doesn't seem to to work
mp.PNG
mp.PNG (3.28 KiB) Viewed 12153 times

SimpleSi
Posts: 330
Joined: Jul 2nd, '17, 13:47

Re: Reading single line at a time from a very large file

Post by SimpleSi » Sep 1st, '17, 20:59

Got my head around these now :)
I've put them in this class so I can just import them into projects
filestreamblocks.gp
(426 Bytes) Downloaded 362 times
Could do with a eof boolean as well please :)

Post Reply