python script · Google Code-in · Zulip Chat Archive

Stream: Google Code-in

Topic: python script

Jeff Sieu (Dec 10 2017 at 05:41):

For the task that requires a python script that downloads GCI data, does the data have to include task submissions like given in the example link

Jeff Sieu (Dec 11 2017 at 03:43):

@Sean Hey Sean, I have finished the knight, but I'm still on the python script task and I've not a clue as to how to download task submissions with the API. I've read the API and I only saw it returns JSONs with info about tasks and task instances, but not the files submitted to each task instance.

Jeff Sieu (Dec 11 2017 at 03:43):

Any ideas?

Sean (Dec 11 2017 at 06:21):

For the task that requires a python script that downloads GCI data, does the data have to include task submissions like given in the example link

unless there's strictly no way to get that data, but yes -- the entire point of that task is to download all of the files that have been uploaded.

Jeff Sieu (Dec 11 2017 at 06:23):

cant seem to find a way to download submission data through the api though, so i've submitted my knight task first

Sean (Dec 11 2017 at 06:39):

Any ideas?

There was support/contact information for the API, so feel free to contact them on how to get at the files submitted.

Jeff Sieu (Dec 11 2017 at 07:13):

So, I've just contacted the GCI API support and it turns out the API does not support this

Sean (Dec 11 2017 at 07:13):

did robert give any hints?

Sean (Dec 11 2017 at 07:14):

is there perhaps a link to the GCI site in the API, perhaps a suburl?

Sean (Dec 11 2017 at 07:14):

where the links could be scraped?

Sean (Dec 11 2017 at 07:14):

or the whole page for that matter, wget style

Jeff Sieu (Dec 11 2017 at 07:27):

From the past submissions it does seem like the html page was scraped

Jeff Sieu (Dec 11 2017 at 07:27):

Not much, I only got "The API does not support this.

(And it is highly unlikely to support it this year.)

-R

Sean (Dec 11 2017 at 07:28):

the past ones were scraped, but that was a completely different system

Sean (Dec 11 2017 at 07:29):

can you link me to the task description?

Sean (Dec 11 2017 at 07:30):

note that the task was broken into two parts -- first part is the basic instance info

Sean (Dec 11 2017 at 07:30):

second task was to download the data files (somehow)

Sean (Dec 11 2017 at 07:30):

https://codein.withgoogle.com/dashboard/tasks/4537825565343744/

Sean (Dec 11 2017 at 07:32):

I just updated the descripiton

Sean (Dec 11 2017 at 07:33):

@Jeff Sieu so yeah, I see there is task_instance_url so that could be scraped using urllib or wget: https://stackoverflow.com/questions/24346872/python-equivalent-of-a-given-wget-command

Jeff Sieu (Dec 17 2017 at 12:41):

@Sean, seems like the error is probably related to proxies though I've no idea about the specifics

Jeff Sieu (Dec 17 2017 at 12:47):

Since the error is due to max retries exceeded (default is 0), maybe we can try setting the max_retries to a high number (maybe 200) using this:
from requests.adapters import HTTPAdapter
s = requests.Session()
s.mount(url, HTTPAdapter(max_retries=200))

Jeff Sieu (Dec 17 2017 at 12:48):

Referencing https://stackoverflow.com/questions/15431044/can-i-set-max-retries-for-requests-request

Last updated: Nov 04 2025 at 00:54 UTC