Is inference on a GPU feasible? If so, how much GPU memory would be needed? (16GB is apparently not enough) Or is there a distilled or lighter-weight version?