This project, Rattlesnake, will be a compiler, and its associated run-time libraries, which translate Python source code into Rust source code.
The initial target will be to translate an arbitrary subset of Python 3.10 into Nightly Rust, but an eventual goal is complete compatibility with up-to-date Python versions and stable Rust with as low an MSRV (Minimum Supported Rust Version, a term used by Rust projects to indicate support) as possible.
The purpose of Rattlesnake is to allow Python code to be compiled into a native executable, which can be run faster, more memory-efficiently, and without requiring a Python interpreter. It would also allow rapid development of Rust libraries, by writing a rough skeleton in Python, translating it to Rust, and then optimising the generated Rust code. Lastly, it could also allow Python projects to use Rust libraries, by leaving the Rust calls in place when translating the Python source.
An eventual goal of Rattlesnake is to have full coverage of Python 3, and thus to be able to translate any Python 3 code into Rust. However, this is a very ambitious goal, and so for the time being, the aim is to support a subset which removes some particularly tricky-to-translate language features (some reflection, decorators, extension modules, async, threading, etc.). The early versions of Rattlesnake will be written in Python, using the ast
built-in module to generate an AST (Abstract Syntax Tree, an easy-to-manipulate representation of code), and then compiled into Rust source code; later versions of Rattlesnake will be built in Rust, using the compiled Python script as a base, and will likely replace ast
with pest
1.
The aim of Rattlesnake is to translate Python 3 code into Rust code, as efficiently as possible and with as little loss of functionality as possible. The generated code should also look as readable as possible.
ast
module to parse source code.cargo install rattlesnake
will install the compiler binary.pest
) or recompiling the ast
module (if using Rattlesnake-compiled ast
) on each major version.
The product will take as input Python 3 source code. It will translate that source code into Rust source code, or raise an error if the input contains unsupported features. The output Rust code will be able to be compiled into a native executable, which can be run without a Python interpreter. The output code will depend on the library rattlesnake
, which will be a Rust library which provides various standard Python types and functions.
There will also be a Python library rattlesnake
, which will provide various standard Rust types, functions, and macros, and can be used by Python code in order to produce more optimised or efficient Rust code.
Rattlesnake is intended to be a tool for Python developers who want the speed and memory-efficiency benefits of Rust. It also serves as a rapid prototyping tool for Rust projects, and a way to simplify the distribution of Python projects as a single executable.
This section focuses on other Python compilers, and how they differ from Rattlesnake.
Pyrex2 is a Python-like language which compiles to C as a Python extension module. It was developed by Greg Ewing, and was the predecessor to Cython. It is no longer maintained, and has been superseded by Cython.
It only supported extension module targets, and as a result always depends on the CPython interpreter, rather than producing stand-alone executables.
Cython3 and Cythonize, first released in 2007, are tools which translate Python or Cython code down to C, which can then be compiled either into a native executable or into a Python extension module. Cython is a modified dialect of Python, which changes some syntax to allow for easier translation for the tool.
While Cython and Cythonize share the same high-level goal as Rattlesnake - to make Python code faster - Cython produces code which is 100% equivalent to how CPython would run that code. This has a couple of downsides:
Rattlesnake, on the other hand, will produce Rust code, which is not guaranteed to be 100% equivalent to how CPython would run the code (although its behaviour will be matched on a best-effort basis). This allows for more aggressive optimisations, and means that the generated code will never require a Python interpreter to be present at runtime.
For more information on Cython, see:
RPython4 is an AOT-compiled dialect of Python. It was developed for use in PyPy5, and is used to write the PyPy interpreter itself. It has no formal specification (its official definition4 is ‘RPython is everything that our translation toolchain can accept’!).
RPython has a number of compile targets, split into the two broad categories of ‘C-like memory model’ and ‘object-oriented memory model’. This allows the compiler to make code tailored for its target, leaning more heavily on allocation and caching for C-like targets, and translating somewhat more directly for OO targets.
PyO36 is a project which provides Rust bindings for CPython. This is typically used for creating extension modules, but can also be used to run Python code from Rust at run-time.
PyO3’s goals are fundamentally different to Rattlesnake’s, but as it is in the same area, it is worth mentioning.
RustPython7 is a Python interpreter written in Rust. It is not a compiler, but rather an alternative to CPython; its major selling point is the ability to embed a Python script into a web page, and run it in the browser with WebAssembly. It can also be used to embed Python scripts into Rust programs, similar to PyO3.
Not to be confused with PyPI, the Python Package Index.
PyPy5 is a JIT-compiling Python interpreter, which is written in RPython, a restricted subset of Python which was developed alongside PyPy. PyPy’s primary goal is to be a drop-in alternative to CPython, and in most cases is significantly faster at running pure-Python code.
This paper describes the advantages of Python for scientific computing, and presents its problems with performance when performing low-level loops as one of the justifications for Cython. It then describes Cython, first noting that it is a fork of Pyrex, and then describing its features.
Next, it goes on to describe a few common optimisations that Cython will perform by default, and then describes ways to further optimise Cython code using its special syntax.
This paper describes the attempt of these researchers to create a Python-to-Rust semi-automatic transpiler, based on the (now-archived) open-source project Pyrs, which was, according to its README, ‘not aimed at producing ready-to-compile code’.
The paper notes the similarities between Python idioms and Rust idioms that Rattlesnake aims to use, and notes an across-the-board speed-up and memory reduction of up to 12x and up to 4x, respectively, when comparing their compiled Rust code to the original Python code.
The researchers’ method required lots of programmer input during the translation step; according to the paper, ‘[after] syntax conversion, the program is unlikely to immediately compile using the Rust compiler and must be manually edited’.
This paper describes, generally, the Rust language itself, its history, and its advantages compared to other languages. Among other things, the paper notes Rust’s memory safety, commitment to zero-overhead abstractions, and its ecosystem.
It goes on to compare Rust to C, C++, Go, Java, and Python in 3 benchmarks, and notes that Rust outperforms all of them except C in all three benchmarks for memory usage, and for speed outperforms all other languages in two of the three benchmarks, and is beaten only by C and C++ (and only by a margin of 0.04s) in the third.
The paper continues by discussing common memory safety issues, why they are common in C and C++ projects, and how Rust protects against them.
Overall, the paper gives a good description of Rust, and explains in details the reasons why Rust is a sensible choice for a low-level language - and hence, why it is a good choice as the target language for Rattlesnake.
The software development model I have chosen for Rattlesnake is the RAD (Rapid Application Development) model of development.
RAD is a methodology which focuses on rapid prototyping and iteration. It is a form of agile development, and is particularly suited to projects where the requirements are not fully known at the start of the project, or may change as the project progresses.
I have chosen RAD primarily because it reduces the overhead of other methodologies, such as the mountain of documentation associated with scrum, or the strict requirements of waterfall. It also allows for a more flexible timescale, as each iteration can be as long or as short as necessary.
As I am the primary stakeholder for the project at this point, the extra stakeholder involvement of RAD is not an issue.
The requirement gathering method I have chosen for Rattlesnake is the prototyping method. This is a method which focuses primarily on creating candidate improvements to the solution, and then evaluating them with the stakeholders (in this case, me) to see what works well, and what needs improving or replacing.
This method of requirement gathering can be thought of similarly to the gradient descent model used to train machine learning algorithms; trying out potential changes, and deciding the direction to move based on the feedback generated from those changes.
For the testing process, I intend to run the compiler over CPython’s benchmarking suite; the test results will be based on how much of the suite compiles successfully (both with Rattlesnake, and then with Cargo).
Code which does not compile with Rattlesnake will be penalised less harshly than compiled code which cannot be compiled with Cargo; the former implies unsupported features whereas the latter implies a compiler bug.
For the evaluation process, the compiled artefacts will then be benchmarked and compared to the CPython and PyPy benchmark results.
The run-time libraries for Rattlesnake will be implemented in Python and Rust, by necessity. The first version of the compiler will be implemented in Python, and eventually it will be compiled by itself into Rust, then maintained as a Rust project. While the compiler is written in Python, it will use the ast
module to parse source code; once it has been compiled into Rust, it is likely that I will swap out the ast
module for pest
, rather than simply maintaining the compiled version of ast
.
Rattlesnake, and all associated documentation, will be versioned using Git. The repository will be hosted on GitHub, and will be publicly available once my dissertation is finished.
Contributes to activity #1, #5.
The run-time library for Rust needs to contain basic definitions for Python objects and functions. It will also need to contain a special type, the boxed PyObject
, which will be used for Python objects of unknown type.
Contributes to activity #2, #5.
@no_except
to indicate that a function cannot raise an exception, or @derive()
to indicate that a class wants to derive standard Rust traits such as Debug
, Clone
, or Copy
)
println!()
panic!()
unreachable!()
MACRO_*
, and any call to a function with a name starting with MACRO_
will be transformed into a macro call in the output Rust code. They will also be poly-filled at run-time when using a standard Python implementation.Vec<T>
HashMap<K, V>
and BTreeMap<K, V>
HashSet<T>
and BTreeSet<T>
Instant
and Duration
Contributes to activity #3, #4, #5.
The Rattlesnake repository will be structured as follows:
compiler
: The Rattlesnake compiler code, and the Rust run-time library.
pylib
: The Python run-time library.prose
: All prose documents, including this one, associated with the dissertation project, as Markdown documents.meeting-notes
: Notes from meetings with my project supervisor, as Markdown documents.The deliverables associated with the Rattlesnake project are as follows:
A representation of source code which is easy to manipulate. It is often used as an intermediate representation of code in compilers as it represents the ideas written in the code without the complicating details such as whitespace.
For example, the following Python code:
python
def foo(bar):
return bar + 1
Could be represented with the following AST (in Lisp syntax):
(function-definition
(name "foo")
(parameters
(name "bar"))
(body
(return-statement
(+ (name "bar")
(constant 1)))))
cargo
(n.): The standard Rust package manager and build tool
cargo
tool can then be used to install these crates, either globally (installing their associated binaries) or locally (as a dependency of the current project).pip
tool, and all related tools, to install Python packages.dragostis, jstnlef, CAD97, tomtau, flying-sheep, et al. (2023) pest. The Elegant Parser. Available at: https://pest.rs/ (Accessed: 18/10/2023) ↩
Ewing, G. (2010) Pyrex. Available at: https://www.csse.canterbury.ac.nz/greg.ewing/python/Pyrex/ (Accessed: 11/10/2023). ↩
Behnel S., Bradshaw R., Woods D., Valo M., Dalcín L., et al. (2023) Cython: C-Extensions for Python. Available at: https://cython.org/ (Accessed: 11/10/2023). ↩
The PyPy Project (2022) RPython Language. Available at: https://rpython.readthedocs.io/en/latest/rpython.html (Accessed: 11/10/2023) ↩ ↩2
The PyPy Team (2023) PyPy. Available at: https://www.pypy.org/ (Accessed: 11/10/2023) ↩ ↩2
PyO3 Developers (2023) PyO3: Rust bindings for the Python interpreter. Available at: https://github.com/PyO3/pyo3 (Accessed: 18/10/2023) ↩
windelbouwman, coolreader18, palaviv, cthulahoops, youknowone, OddCoincidence, OddBlock, skinny121, rmliddle, jgirardet, et al. (2023) RustPython. Available at: https://rustpython.github.io/ (Accessed: 18/10/2023) ↩
Behnel S., Bradshaw R., Citro C., Dalcin L, Seljebotn D.S., Smith K. (2010) Cython: The Best of Both Worlds. Computing in Science & Engineering, 13(2), pp. 31-39. doi: 10.1109/MCSE.2010.118 ↩
Lunnikivi H., Jylkkä K., Hämäläinen T. (2020) Transpiling Python to Rust for Optimized Performance. Embedded Computer Systems: Architectures, Modeling, and Simulation, pp. 127-138. doi: 10.1007/978-3-030-60939-9_9 ↩
Bugden W., Alahmar A. (2022) Rust: The Programming Language for Safety and Performance. arXiv:2206.05503. doi: 10.48550/arXiv.2206.05503 ↩