API Reference
Complete reference for all functions and classes in the project.
Data Preprocessing Module
utils.py
String Functions
| Function |
Signature |
Description |
human_format |
(num: float) β str |
Format number with K/M/B suffix |
hamming |
(s1: str, s2: str) β int |
Hamming distance between strings |
revcomp |
(str: str) β str |
Reverse complement of DNA |
get_qualities |
(str: str) β List[int] |
ASCII quality to Phred scores |
contains_Esp3I_site |
(str: str) β bool |
Check for restriction site |
File I/O
| Function |
Signature |
Description |
tqdm_readline |
(file, pbar) β str |
Read line with progress update |
process_paired_fastq_file |
(f1, f2, callback) β int |
Process paired FASTQ files |
Sequence Features
| Function |
Signature |
Description |
add_flanking |
(nts: str, len: int) β str |
Add flanking sequences |
add_barcode_flanking |
(nts: str, len: int) β str |
Add barcode flanking |
nts_to_vector |
(nts: str, rna=False) β ndarray |
One-hot encode sequence |
folding_to_vector |
(nts: str) β ndarray |
One-hot encode structure |
str_to_vector |
(str: str, template: str) β ndarray |
Generic one-hot encoding |
ei_vec |
(i: int, len: int) β List[int] |
Create one-hot vector |
RNA Structure
| Function |
Signature |
Description |
rna_fold_structs |
(seqs, maxBPspan=0) β (structs, mfes) |
Predict structures |
compute_structure |
(seqs) β (struct_oh, structs, mfes) |
One-hot encoded structures |
compute_seq_oh |
(seqs) β ndarray |
One-hot encode sequences |
compute_wobbles |
(seqs, structs) β ndarray |
Identify wobble pairs |
create_input_data |
(seqs) β (seq_oh, struct_oh, wobbles) |
Complete feature extraction |
Structure Analysis
| Function |
Signature |
Description |
find_parentheses |
(s: str) β Dict[int, int] |
Map base pair positions |
compute_bijection |
(s: str) β ndarray |
Pairing array |
compute_wobble_indicator |
(seq, struct) β List[int] |
Wobble pair flags |
RNAutils.py
| Function |
Signature |
Description |
RNAfold |
(seqs, bin, temp, span, cmd) β List[[str, float]] |
MFE structure prediction |
RNAsubopt |
(seq, bin, delta) β List[(str, float)] |
Suboptimal structures |
RNAsample |
(seqs, bin, temp, n, span) β List[List[str]] |
Boltzmann sampling |
RNA_partition_function |
(seqs, constraints, ...) β List[float] |
Partition function |
compute_coupling.py
| Function |
Signature |
Description |
collect_barcodes |
(r1, r2, r1_q, r2_q) β None |
Extract barcode-exon pairs |
Global variables: couplings, good_reads, reads_with_N, unidentified_reads
compute_splicing_outcomes.py
| Function |
Signature |
Description |
identify_splicing_pattern |
(r1, r2, r1_q, r2_q) β None |
Classify splicing |
Splicing categories: num_exon_inclusion, num_exon_skipping, num_intron_retention, num_splicing_in_exon, num_unknown_splicing
generate_training_data.py
| Function |
Signature |
Description |
read_dataset |
(path, filter=True) β DataFrame |
Load filtered CSV |
to_input_data |
(df, flanking=10) β tuple |
Create model inputs |
to_target_data |
(df) β ndarray |
Compute PSI values |
Model Training Module
model.py
Custom Layers
| Class |
Purpose |
Key Parameters |
Selector |
Select between inputs |
trainable=False |
ResidualTuner |
Residual MLP |
hidden_units=100 |
SumDiff |
Energy difference |
freeze=False |
RegularizedBiasLayer |
Position bias |
Regularization params |
Regularizers
| Class/Function |
Purpose |
MultiRegularizer |
Combined regularizer |
pos_reg |
L2 position penalty |
adj_reg_fo |
First-order smoothness |
adj_reg_so |
Second-order smoothness |
Functions
| Function |
Signature |
Description |
binary_KL |
(y_true, y_pred) β scalar |
Binary KL divergence loss |
regularized_act |
(x, reg, act) β tensor |
Activation with regularization |
train_model |
(model, X, y, file, ...) β history |
Train with checkpointing |
get_model |
(**kwargs) β Model |
Create model instance |
get_model() Parameters
| Parameter |
Type |
Default |
Description |
input_length |
int |
90 |
Sequence length |
randomized_region |
tuple |
(10, 80) |
Exon position |
num_filters |
int |
20 |
Sequence filters |
num_structure_filters |
int |
8 |
Structure filters |
filter_width |
int |
6 |
Sequence filter size |
structure_filter_width |
int |
30 |
Structure filter size |
dropout_rate |
float |
0.01 |
Dropout probability |
activity_regularization |
float |
0.0 |
Activation L1 |
position_regularization |
float |
2.5e-5 |
Position L2 |
adjacency_regularization |
float |
0.0 |
First-order smoothness |
adjacency_regularization_so |
float |
0.0 |
Second-order smoothness |
energy_activation |
str |
"softplus" |
Energy activation |
tune_energy |
bool |
True |
Train energy params |
Figures Module
force_plot.py
| Function |
Signature |
Description |
get_link_midpoint |
(fn, mid, eps, ...) β float |
Find sigmoid midpoint |
collapse_filters |
(act_i, act_s, ...) β (df_i, df_s) |
Group filter activations |
create_force_data |
(act_i, act_s, ...) β (Series, Series) |
Aggregate forces |
merge_small_forces |
(forces, thresh) β Series |
Combine small contributions |
draw_force_plot |
(seqs, annots, ...) β Figure |
Create visualization |
sequence_logo.py
| Function |
Signature |
Description |
plot_logo |
(df, thresh, ax, colors) β None |
Draw sequence logo |
compute_freqs |
(kmers) β DataFrame |
Nucleotide frequencies |
compute_info |
(freqs) β ndarray |
Information content |
compute_heights |
(freqs) β DataFrame |
Logo heights |
sequence_logo_heights |
(df) β DataFrame |
Combined calculation |
draw_floating_logo |
(heights, ..., ax) β None |
Overlay logo on axes |
compute_EDLogo_scores |
(kmers, normed) β DataFrame |
Enrichment/depletion |
plot_EDLogo |
(df, thresh, ax) β None |
Draw ED logo |
draw_stem_loop.py
| Function |
Signature |
Description |
draw_line |
(d, x1, y1, x2, y2, color) β None |
SVG line |
draw_nucleotide |
(d, x, y, nt, color) β None |
SVG nucleotide circle |
draw_oligo |
(d, xs, ys, nts, colors) β None |
SVG oligonucleotide |
draw_stem_loop |
(nts, stem_len, colors, file) β None |
Complete stem-loop SVG |
kl.py
| Function |
Signature |
Description |
knn_distance |
(point, sample, k) β float |
k-NN distance |
verify_sample_shapes |
(s1, s2, k) β None |
Validate input shapes |
naive_estimator |
(s1, s2, k) β float |
Brute-force KL estimate |
scipy_estimator |
(s1, s2, k) β float |
KDTree-based KL |
skl_estimator |
(s1, s2, k) β float |
sklearn-based KL |
skl_estimator_efficient |
(s1, s2, k) β float |
Vectorized KL |
generate_custom_model.py
| Function |
Signature |
Description |
lanczos_kernel |
(x, order) β ndarray |
Lanczos interpolation kernel |
lanczos_interpolate |
(arr, positions, order) β ndarray |
Interpolate at positions |
lanczos_resampling |
(arr, new_len, order) β ndarray |
Resample to new length |
resample_one_positional_bias |
(weights, len, pad) β ndarray |
Resample position bias |
resample_positional_bias_weights |
(weights, len, pad) β ndarray |
Resample all biases |
generate_custom_model |
(new_len, delta_basal) β Model |
Create modified model |
figutils.py
| Function |
Signature |
Description |
subsample_points |
(x, y, max) β (x, y) |
Random subsampling |
scatter_with_kde |
(x, y, ax, alpha) β None |
Density scatter plot |
safelog |
(x, tol) β ndarray |
Numerically safe log |
bin_kl |
(y_true, y_pred) β ndarray |
Binary KL divergence |
flatten_dict |
(d) β (keys, values) |
Flatten nested dict |
insert_motif_in_middle_of_sequence |
(seq, motif) β str |
Insert motif |
insert_motif_in_middle_of_sequences |
(seqs, motif) β Dict |
Batch insert |
landing_pads_to_sw_exons |
(mers, motif, pre, post) β List |
Create landing pads |
all_seqs |
(length) β List[str] |
Generate all k-mers |
extract_str_patches |
(lst, n) β List[List[str]] |
Extract n-grams |
compute_activations_simple_conv |
(layer, window) β Dict |
k-mer activations |
Usage Examples
Making Predictions
from model_training.model import binary_KL, Selector, ResidualTuner, SumDiff, RegularizedBiasLayer
import tensorflow as tf
from joblib import load
model = tf.keras.models.load_model(
'output/custom_adjacency_regularizer_20210731_124_step3.h5',
custom_objects={
'binary_KL': binary_KL,
'Selector': Selector,
'ResidualTuner': ResidualTuner,
'SumDiff': SumDiff,
'RegularizedBiasLayer': RegularizedBiasLayer,
}
)
xTe = load('data/xTe_ES7_HeLa_ABC.pkl.gz')
yTe = load('data/yTe_ES7_HeLa_ABC.pkl.gz')
predictions = model.predict(xTe)
Creating Force Plots
import sys
sys.path.append('figures')
from force_plot import draw_force_plot
fig = draw_force_plot(
sequences=['ATGC...' * 22 + 'AT'],
annotations=['My Sequence'],
)
fig.savefig('my_force_plot.pdf')
Processing New Sequences
from data_preprocessing.utils import add_flanking, create_input_data
exon = 'ACGT' * 17 + 'AC'
full_seq = add_flanking(exon, 10)
seq_oh, struct_oh, wobbles = create_input_data([full_seq])
X = [seq_oh, struct_oh, wobbles]
psi = model.predict(X)[0, 0]
print(f"Predicted PSI: {psi:.3f}")