splicing-predictor / webapp /docs /09-API-REFERENCE.md
sachin1801
help page revamp according to requirements, removed tutorial page, search filter improved on history, login with email pass created
068e060

API Reference

Complete reference for all functions and classes in the project.


Data Preprocessing Module

utils.py

String Functions

Function Signature Description
human_format (num: float) β†’ str Format number with K/M/B suffix
hamming (s1: str, s2: str) β†’ int Hamming distance between strings
revcomp (str: str) β†’ str Reverse complement of DNA
get_qualities (str: str) β†’ List[int] ASCII quality to Phred scores
contains_Esp3I_site (str: str) β†’ bool Check for restriction site

File I/O

Function Signature Description
tqdm_readline (file, pbar) β†’ str Read line with progress update
process_paired_fastq_file (f1, f2, callback) β†’ int Process paired FASTQ files

Sequence Features

Function Signature Description
add_flanking (nts: str, len: int) β†’ str Add flanking sequences
add_barcode_flanking (nts: str, len: int) β†’ str Add barcode flanking
nts_to_vector (nts: str, rna=False) β†’ ndarray One-hot encode sequence
folding_to_vector (nts: str) β†’ ndarray One-hot encode structure
str_to_vector (str: str, template: str) β†’ ndarray Generic one-hot encoding
ei_vec (i: int, len: int) β†’ List[int] Create one-hot vector

RNA Structure

Function Signature Description
rna_fold_structs (seqs, maxBPspan=0) β†’ (structs, mfes) Predict structures
compute_structure (seqs) β†’ (struct_oh, structs, mfes) One-hot encoded structures
compute_seq_oh (seqs) β†’ ndarray One-hot encode sequences
compute_wobbles (seqs, structs) β†’ ndarray Identify wobble pairs
create_input_data (seqs) β†’ (seq_oh, struct_oh, wobbles) Complete feature extraction

Structure Analysis

Function Signature Description
find_parentheses (s: str) β†’ Dict[int, int] Map base pair positions
compute_bijection (s: str) β†’ ndarray Pairing array
compute_wobble_indicator (seq, struct) β†’ List[int] Wobble pair flags

RNAutils.py

Function Signature Description
RNAfold (seqs, bin, temp, span, cmd) β†’ List[[str, float]] MFE structure prediction
RNAsubopt (seq, bin, delta) β†’ List[(str, float)] Suboptimal structures
RNAsample (seqs, bin, temp, n, span) β†’ List[List[str]] Boltzmann sampling
RNA_partition_function (seqs, constraints, ...) β†’ List[float] Partition function

compute_coupling.py

Function Signature Description
collect_barcodes (r1, r2, r1_q, r2_q) β†’ None Extract barcode-exon pairs

Global variables: couplings, good_reads, reads_with_N, unidentified_reads


compute_splicing_outcomes.py

Function Signature Description
identify_splicing_pattern (r1, r2, r1_q, r2_q) β†’ None Classify splicing

Splicing categories: num_exon_inclusion, num_exon_skipping, num_intron_retention, num_splicing_in_exon, num_unknown_splicing


generate_training_data.py

Function Signature Description
read_dataset (path, filter=True) β†’ DataFrame Load filtered CSV
to_input_data (df, flanking=10) β†’ tuple Create model inputs
to_target_data (df) β†’ ndarray Compute PSI values

Model Training Module

model.py

Custom Layers

Class Purpose Key Parameters
Selector Select between inputs trainable=False
ResidualTuner Residual MLP hidden_units=100
SumDiff Energy difference freeze=False
RegularizedBiasLayer Position bias Regularization params

Regularizers

Class/Function Purpose
MultiRegularizer Combined regularizer
pos_reg L2 position penalty
adj_reg_fo First-order smoothness
adj_reg_so Second-order smoothness

Functions

Function Signature Description
binary_KL (y_true, y_pred) β†’ scalar Binary KL divergence loss
regularized_act (x, reg, act) β†’ tensor Activation with regularization
train_model (model, X, y, file, ...) β†’ history Train with checkpointing
get_model (**kwargs) β†’ Model Create model instance

get_model() Parameters

Parameter Type Default Description
input_length int 90 Sequence length
randomized_region tuple (10, 80) Exon position
num_filters int 20 Sequence filters
num_structure_filters int 8 Structure filters
filter_width int 6 Sequence filter size
structure_filter_width int 30 Structure filter size
dropout_rate float 0.01 Dropout probability
activity_regularization float 0.0 Activation L1
position_regularization float 2.5e-5 Position L2
adjacency_regularization float 0.0 First-order smoothness
adjacency_regularization_so float 0.0 Second-order smoothness
energy_activation str "softplus" Energy activation
tune_energy bool True Train energy params

Figures Module

force_plot.py

Function Signature Description
get_link_midpoint (fn, mid, eps, ...) β†’ float Find sigmoid midpoint
collapse_filters (act_i, act_s, ...) β†’ (df_i, df_s) Group filter activations
create_force_data (act_i, act_s, ...) β†’ (Series, Series) Aggregate forces
merge_small_forces (forces, thresh) β†’ Series Combine small contributions
draw_force_plot (seqs, annots, ...) β†’ Figure Create visualization

sequence_logo.py

Function Signature Description
plot_logo (df, thresh, ax, colors) β†’ None Draw sequence logo
compute_freqs (kmers) β†’ DataFrame Nucleotide frequencies
compute_info (freqs) β†’ ndarray Information content
compute_heights (freqs) β†’ DataFrame Logo heights
sequence_logo_heights (df) β†’ DataFrame Combined calculation
draw_floating_logo (heights, ..., ax) β†’ None Overlay logo on axes
compute_EDLogo_scores (kmers, normed) β†’ DataFrame Enrichment/depletion
plot_EDLogo (df, thresh, ax) β†’ None Draw ED logo

draw_stem_loop.py

Function Signature Description
draw_line (d, x1, y1, x2, y2, color) β†’ None SVG line
draw_nucleotide (d, x, y, nt, color) β†’ None SVG nucleotide circle
draw_oligo (d, xs, ys, nts, colors) β†’ None SVG oligonucleotide
draw_stem_loop (nts, stem_len, colors, file) β†’ None Complete stem-loop SVG

kl.py

Function Signature Description
knn_distance (point, sample, k) β†’ float k-NN distance
verify_sample_shapes (s1, s2, k) β†’ None Validate input shapes
naive_estimator (s1, s2, k) β†’ float Brute-force KL estimate
scipy_estimator (s1, s2, k) β†’ float KDTree-based KL
skl_estimator (s1, s2, k) β†’ float sklearn-based KL
skl_estimator_efficient (s1, s2, k) β†’ float Vectorized KL

generate_custom_model.py

Function Signature Description
lanczos_kernel (x, order) β†’ ndarray Lanczos interpolation kernel
lanczos_interpolate (arr, positions, order) β†’ ndarray Interpolate at positions
lanczos_resampling (arr, new_len, order) β†’ ndarray Resample to new length
resample_one_positional_bias (weights, len, pad) β†’ ndarray Resample position bias
resample_positional_bias_weights (weights, len, pad) β†’ ndarray Resample all biases
generate_custom_model (new_len, delta_basal) β†’ Model Create modified model

figutils.py

Function Signature Description
subsample_points (x, y, max) β†’ (x, y) Random subsampling
scatter_with_kde (x, y, ax, alpha) β†’ None Density scatter plot
safelog (x, tol) β†’ ndarray Numerically safe log
bin_kl (y_true, y_pred) β†’ ndarray Binary KL divergence
flatten_dict (d) β†’ (keys, values) Flatten nested dict
insert_motif_in_middle_of_sequence (seq, motif) β†’ str Insert motif
insert_motif_in_middle_of_sequences (seqs, motif) β†’ Dict Batch insert
landing_pads_to_sw_exons (mers, motif, pre, post) β†’ List Create landing pads
all_seqs (length) β†’ List[str] Generate all k-mers
extract_str_patches (lst, n) β†’ List[List[str]] Extract n-grams
compute_activations_simple_conv (layer, window) β†’ Dict k-mer activations

Usage Examples

Making Predictions

from model_training.model import binary_KL, Selector, ResidualTuner, SumDiff, RegularizedBiasLayer
import tensorflow as tf
from joblib import load

# Load model
model = tf.keras.models.load_model(
    'output/custom_adjacency_regularizer_20210731_124_step3.h5',
    custom_objects={
        'binary_KL': binary_KL,
        'Selector': Selector,
        'ResidualTuner': ResidualTuner,
        'SumDiff': SumDiff,
        'RegularizedBiasLayer': RegularizedBiasLayer,
    }
)

# Load test data
xTe = load('data/xTe_ES7_HeLa_ABC.pkl.gz')
yTe = load('data/yTe_ES7_HeLa_ABC.pkl.gz')

# Predict
predictions = model.predict(xTe)

Creating Force Plots

import sys
sys.path.append('figures')
from force_plot import draw_force_plot

fig = draw_force_plot(
    sequences=['ATGC...' * 22 + 'AT'],  # 90 nt
    annotations=['My Sequence'],
)
fig.savefig('my_force_plot.pdf')

Processing New Sequences

from data_preprocessing.utils import add_flanking, create_input_data

exon = 'ACGT' * 17 + 'AC'  # 70 nt
full_seq = add_flanking(exon, 10)  # 90 nt

seq_oh, struct_oh, wobbles = create_input_data([full_seq])

# Now use with model
X = [seq_oh, struct_oh, wobbles]
psi = model.predict(X)[0, 0]
print(f"Predicted PSI: {psi:.3f}")