-
-
Notifications
You must be signed in to change notification settings - Fork 34.4k
Support alternative alphabets in BaseXX encodings #145980
Copy link
Copy link
Closed
Labels
3.15new features, bugs and security fixesnew features, bugs and security fixesextension-modulesC modules in the Modules dirC modules in the Modules dirstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancementA feature request or enhancement
Metadata
Metadata
Assignees
Labels
3.15new features, bugs and security fixesnew features, bugs and security fixesextension-modulesC modules in the Modules dirC modules in the Modules dirstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancementA feature request or enhancement
Feature or enhancement
RFC 4648 describes two alphabets for Base64 (standard and urlsafe) and two alphabets for base32 (standard and hexadecimal). Python also implements three variants of Base85 (Ascii85 is more complex than this, but it can be based on Base85). A number of other formats are based on BaseXX encoding with alternative alphabets.
So, I suggest to adde the alphabet parameter in several
binasciifunctions. They can be used in the implementation of thebase64module or directly by users implementing alternative formats.We can remove just added functions
b2a_z85()anda2b_z85()-- they are equivalent ofb2a_base85()anda2b_base85()with an alternative alphabet. Also, Base64 with alternative alphabets will be more efficient for large data. Accidentally, this also fixes #145968.For encoding functions we can simply pass a bytes object containing all alphabet characters. Decoding functions need a reverse table of length which maps a byte to its index or special invalid values. We can provide a function which creates such table from the alphabet.
Alternatively, we can create it automatically from the passed alphabet argument and cache the result. This is less flexible but more user friendly interface. It adds some overhead for small input data, because you need to calculate a hash of the 64- or 85-bytes object, but for large data this is insignificant.
Linked PRs