Cleaned up code, added multiseed training wrapper, PyTorch profiler training option, updated gradio demo, made changes to research paper to match new changes and new training results from adding new training techniques, architecture.md now explains all designs and decisions