Skip to content

Python module

Here, interface is very simple: first you create an Annotator object with fixed chain types and numbering scheme you need (and optional min_confidence value), then call number() or segment() on each sequence.

Numbering assigns an IMGT or Kabat position label to every residue:

from immunum import Annotator

annotator = Annotator(chains=["H", "K", "L"], scheme="imgt")

result = annotator.number(
    "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"
)
print(result.chain)       # "H"
print(result.scheme)      # "IMGT"
print(result.numbering["1"])  # "Q"

Segmentation splits the sequence into FR1–FR4 and CDR1–CDR3 regions plus prefix/postfix:

from immunum import Annotator

annotator = Annotator(chains=["H", "K", "L"], scheme="imgt")
result = annotator.segment(
    "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"
)
print(result.cdr3)  # "AREGTTGKPIGAFAH"
print(result.fr4)   # "WGQGTLVTVSS"

By default, sequences with an alignment confidence below 0.5 raise a ValueError. Pass min_confidence=0.0 to disable this check, or raise the threshold to filter non-immunoglobulin sequences more aggressively.