Commit c5ae2254 authored by Brendan Hoar's avatar Brendan Hoar Committed by GitHub

Better handling of embeddings with two rare, but not unusual, files in them

I have encountered pickled embeddings with a short byteorder file at the top-level, as well as a .data/serialization_id file.

Both load fine after allowing these files in the dataset.

I do not think it is likely adding them to the safe unpickle regular expression would be a security risk, but that's for the maintainers to decide.
parent c5b75598
...@@ -65,7 +65,7 @@ class RestrictedUnpickler(pickle.Unpickler): ...@@ -65,7 +65,7 @@ class RestrictedUnpickler(pickle.Unpickler):
# Regular expression that accepts 'dirname/version', 'dirname/data.pkl', and 'dirname/data/<number>' # Regular expression that accepts 'dirname/version', 'dirname/data.pkl', and 'dirname/data/<number>'
allowed_zip_names_re = re.compile(r"^([^/]+)/((data/\d+)|version|(data\.pkl))$") allowed_zip_names_re = re.compile(r"^([^/]+)/((data/\d+)|byteorder|(\.data\/serialization_id)|version|(data\.pkl))$")
data_pkl_re = re.compile(r"^([^/]+)/data\.pkl$") data_pkl_re = re.compile(r"^([^/]+)/data\.pkl$")
def check_zip_filenames(filename, names): def check_zip_filenames(filename, names):
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment