Skip to content

immunum

immunum.Annotator

Annotates antibody and T-cell receptor sequences with IMGT or Kabat position numbers.

Parameters:

Name Type Description Default
chains list[str]

Chain types to consider during auto-detection. Each entry is a case-insensitive string. Accepted values:

  • Antibody heavy chain: "IGH" / "H" / "heavy"
  • Antibody kappa chain: "IGK" / "K" / "kappa"
  • Antibody lambda chain: "IGL" / "L" / "lambda"
  • TCR alpha chain: "TRA" / "A" / "alpha"
  • TCR beta chain: "TRB" / "B" / "beta"
  • TCR gamma chain: "TRG" / "G" / "gamma"
  • TCR delta chain: "TRD" / "D" / "delta"

Pass all chains you want to consider; the annotator scores each and picks the best-matching one. To consider every supported chain pass all seven values.

required
scheme str

Numbering scheme to use for output positions. Accepted values (case-insensitive):

  • "IMGT" / "i" — IMGT numbering (recommended; used internally)
  • "Kabat" / "k" — Kabat numbering (derived from IMGT)

Note: Kabat is only supported for antibody chains (IGH, IGK, IGL).

required
min_confidence float | None

Minimum alignment confidence threshold in the range [0, 1]. Sequences scoring below this value raise a ValueError. Defaults to 0.5, which filters non-immunoglobulin sequences while retaining all validated antibody sequences. Pass 0.0 to disable filtering.

None
Source code in immunum/__init__.py
class Annotator:
    """Annotates antibody and T-cell receptor sequences with IMGT or Kabat position numbers.

    Args:
        chains: Chain types to consider during auto-detection. Each entry is a
            case-insensitive string. Accepted values:

            - Antibody heavy chain: ``"IGH"`` / ``"H"`` / ``"heavy"``
            - Antibody kappa chain: ``"IGK"`` / ``"K"`` / ``"kappa"``
            - Antibody lambda chain: ``"IGL"`` / ``"L"`` / ``"lambda"``
            - TCR alpha chain:       ``"TRA"`` / ``"A"`` / ``"alpha"``
            - TCR beta chain:        ``"TRB"`` / ``"B"`` / ``"beta"``
            - TCR gamma chain:       ``"TRG"`` / ``"G"`` / ``"gamma"``
            - TCR delta chain:       ``"TRD"`` / ``"D"`` / ``"delta"``

            Pass all chains you want to consider; the annotator scores each and picks the
            best-matching one. To consider every supported chain pass all seven values.

        scheme: Numbering scheme to use for output positions. Accepted values
            (case-insensitive):

            - ``"IMGT"`` / ``"i"`` — IMGT numbering (recommended; used internally)
            - ``"Kabat"`` / ``"k"`` — Kabat numbering (derived from IMGT)

            Note: Kabat is only supported for antibody chains (IGH, IGK, IGL).

        min_confidence: Minimum alignment confidence threshold in the range ``[0, 1]``.
            Sequences scoring below this value raise a ``ValueError``. Defaults to
            ``0.5``, which filters non-immunoglobulin sequences while retaining all
            validated antibody sequences. Pass ``0.0`` to disable filtering.
    """

    def __init__(
        self,
        chains: list[str],
        scheme: str,
        min_confidence: float | None = None,
    ):
        """Create an Annotator.

        Args:
            chains: Chain types to consider. See class docstring for accepted values.
            scheme: Numbering scheme — ``"imgt"`` (default) or ``"kabat"``.
            min_confidence: Reject sequences with alignment confidence below this
                threshold. Defaults to ``0.5``; pass ``0.0`` to disable.

        Raises:
            ValueError: If any chain or scheme value is unrecognised, if Kabat is
                requested for TCR chains, or if ``min_confidence`` is outside ``[0, 1]``.
        """
        if min_confidence is not None and not (0 <= min_confidence <= 1.0):
            raise ValueError(
                f"min_confidence should be in [0, 1], got {min_confidence=}"
            )
        self._annotator = _Annotator(
            chains=_normalize_chains(chains),
            scheme=_normalize_scheme(scheme),
            min_confidence=min_confidence,
        )

    def number(self, sequence: str) -> NumberingResult:
        """Assign IMGT or Kabat position numbers to every residue in a sequence.

        Args:
            sequence: Amino-acid sequence string (single-letter codes).

        Returns:
            A `NumberingResult` with the detected chain, scheme, confidence score,
            and a ``{position: residue}`` numbering dict.

        Raises:
            ValueError: If the sequence is empty or scores below ``min_confidence``.
        """
        return NumberingResult(**self._annotator.number(sequence))

    def segment(self, sequence: str) -> SegmenationResult:
        """Split a sequence into FR/CDR regions.

        Args:
            sequence: Amino-acid sequence string (single-letter codes).

        Returns:
            A `SegmenationResult` with ``fr1``–``fr4``, ``cdr1``–``cdr3``,
            and any unaligned ``prefix``/``postfix`` residues.

        Raises:
            ValueError: If the sequence is empty or scores below ``min_confidence``.
        """
        return SegmenationResult(**self._annotator.segment(sequence))

_annotator instance-attribute

__init__(chains, scheme, min_confidence=None)

Create an Annotator.

Parameters:

Name Type Description Default
chains list[str]

Chain types to consider. See class docstring for accepted values.

required
scheme str

Numbering scheme — "imgt" (default) or "kabat".

required
min_confidence float | None

Reject sequences with alignment confidence below this threshold. Defaults to 0.5; pass 0.0 to disable.

None

Raises:

Type Description
ValueError

If any chain or scheme value is unrecognised, if Kabat is requested for TCR chains, or if min_confidence is outside [0, 1].

Source code in immunum/__init__.py
def __init__(
    self,
    chains: list[str],
    scheme: str,
    min_confidence: float | None = None,
):
    """Create an Annotator.

    Args:
        chains: Chain types to consider. See class docstring for accepted values.
        scheme: Numbering scheme — ``"imgt"`` (default) or ``"kabat"``.
        min_confidence: Reject sequences with alignment confidence below this
            threshold. Defaults to ``0.5``; pass ``0.0`` to disable.

    Raises:
        ValueError: If any chain or scheme value is unrecognised, if Kabat is
            requested for TCR chains, or if ``min_confidence`` is outside ``[0, 1]``.
    """
    if min_confidence is not None and not (0 <= min_confidence <= 1.0):
        raise ValueError(
            f"min_confidence should be in [0, 1], got {min_confidence=}"
        )
    self._annotator = _Annotator(
        chains=_normalize_chains(chains),
        scheme=_normalize_scheme(scheme),
        min_confidence=min_confidence,
    )

number(sequence)

Assign IMGT or Kabat position numbers to every residue in a sequence.

Parameters:

Name Type Description Default
sequence str

Amino-acid sequence string (single-letter codes).

required

Returns:

Type Description
NumberingResult

A NumberingResult with the detected chain, scheme, confidence score,

NumberingResult

and a {position: residue} numbering dict.

Raises:

Type Description
ValueError

If the sequence is empty or scores below min_confidence.

Source code in immunum/__init__.py
def number(self, sequence: str) -> NumberingResult:
    """Assign IMGT or Kabat position numbers to every residue in a sequence.

    Args:
        sequence: Amino-acid sequence string (single-letter codes).

    Returns:
        A `NumberingResult` with the detected chain, scheme, confidence score,
        and a ``{position: residue}`` numbering dict.

    Raises:
        ValueError: If the sequence is empty or scores below ``min_confidence``.
    """
    return NumberingResult(**self._annotator.number(sequence))

segment(sequence)

Split a sequence into FR/CDR regions.

Parameters:

Name Type Description Default
sequence str

Amino-acid sequence string (single-letter codes).

required

Returns:

Type Description
SegmenationResult

A SegmenationResult with fr1fr4, cdr1cdr3,

SegmenationResult

and any unaligned prefix/postfix residues.

Raises:

Type Description
ValueError

If the sequence is empty or scores below min_confidence.

Source code in immunum/__init__.py
def segment(self, sequence: str) -> SegmenationResult:
    """Split a sequence into FR/CDR regions.

    Args:
        sequence: Amino-acid sequence string (single-letter codes).

    Returns:
        A `SegmenationResult` with ``fr1``–``fr4``, ``cdr1``–``cdr3``,
        and any unaligned ``prefix``/``postfix`` residues.

    Raises:
        ValueError: If the sequence is empty or scores below ``min_confidence``.
    """
    return SegmenationResult(**self._annotator.segment(sequence))

immunum.NumberingResult dataclass

Python dataclass containing numbering results. Allows for direct attribute access via result.chain, result.numbering, etc.:

from immunum import Annotator

annotator = Annotator(
    chains=["H", "K", "L"],
    scheme="imgt",
)

sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"

result = annotator.number(sequence)
assert result.chain == "H"
assert result.scheme == "IMGT"
assert isinstance(
    result.confidence, float
)
assert result.numbering["1"] == "Q"

for (
    position,
    amino_acid,
) in result.numbering.items():
    print(f"{position}: {amino_acid}")

# 1: Q
# 2: V
# 3: Q
# ...
Source code in immunum/__init__.py
@dataclass(frozen=True)
class NumberingResult:
    """Python dataclass containing numbering results. Allows for direct attribute access
    via `result.chain`, `result.numbering`, etc.:

    ```python
    from immunum import Annotator

    annotator = Annotator(
        chains=["H", "K", "L"],
        scheme="imgt",
    )

    sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"

    result = annotator.number(sequence)
    assert result.chain == "H"
    assert result.scheme == "IMGT"
    assert isinstance(
        result.confidence, float
    )
    assert result.numbering["1"] == "Q"

    for (
        position,
        amino_acid,
    ) in result.numbering.items():
        print(f"{position}: {amino_acid}")

    # 1: Q
    # 2: V
    # 3: Q
    # ...
    ```
    """

    chain: str
    scheme: str
    confidence: float
    numbering: dict[str, str]

chain instance-attribute

scheme instance-attribute

confidence instance-attribute

numbering instance-attribute

__init__(chain, scheme, confidence, numbering)

immunum.SegmenationResult dataclass

Python dataclass containing numbering results. Allows for direct atribute access via results.fr1, and also for iterating through segmentation results via as_dict():

from immunum import Annotator

annotator = Annotator(
    chains=["H", "K", "L"],
    scheme="imgt",
)

sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"

result = annotator.segment(sequence)
assert (
    result.fr1
    == "QVQLVQSGAEVKRPGSSVTVSCKAS"
)
assert result.cdr1 == "GGSFSTYA"
assert result.fr2 == "LSWVRQAPGRGLEWMGG"
assert result.cdr2 == "VIPLLTIT"
assert (
    result.fr3
    == "NYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYC"
)
assert result.cdr3 == "AREGTTGKPIGAFAH"
assert result.fr4 == "WGQGTLVTVSS"

for (
    segment,
    aminoacids,
) in result.as_dict().items():
    print(f"{segment}: {aminoacids}")

# fr1: QVQLVQSGAEVKRPGSSVTVSCKAS
# cdr1: GGSFSTYA
# fr2: LSWVRQAPGRGLEWMGG
# cdr2: VIPLLTIT
# fr3: NYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYC
# cdr3: AREGTTGKPIGAFAH
# fr4: WGQGTLVTVSS
# prefix:
# postfix:
Source code in immunum/__init__.py
@dataclass(frozen=True)
class SegmenationResult:
    """
    Python dataclass containing numbering results. Allows for direct atribute access
    via `results.fr1`, and also for iterating through segmentation results via `as_dict()`:

    ```python
    from immunum import Annotator

    annotator = Annotator(
        chains=["H", "K", "L"],
        scheme="imgt",
    )

    sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"

    result = annotator.segment(sequence)
    assert (
        result.fr1
        == "QVQLVQSGAEVKRPGSSVTVSCKAS"
    )
    assert result.cdr1 == "GGSFSTYA"
    assert result.fr2 == "LSWVRQAPGRGLEWMGG"
    assert result.cdr2 == "VIPLLTIT"
    assert (
        result.fr3
        == "NYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYC"
    )
    assert result.cdr3 == "AREGTTGKPIGAFAH"
    assert result.fr4 == "WGQGTLVTVSS"

    for (
        segment,
        aminoacids,
    ) in result.as_dict().items():
        print(f"{segment}: {aminoacids}")

    # fr1: QVQLVQSGAEVKRPGSSVTVSCKAS
    # cdr1: GGSFSTYA
    # fr2: LSWVRQAPGRGLEWMGG
    # cdr2: VIPLLTIT
    # fr3: NYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYC
    # cdr3: AREGTTGKPIGAFAH
    # fr4: WGQGTLVTVSS
    # prefix:
    # postfix:
    ```
    """

    fr1: str
    cdr1: str
    fr2: str
    cdr2: str
    fr3: str
    cdr3: str
    fr4: str
    prefix: str
    postfix: str

    def as_dict(self) -> dict[str, str]:
        """Return dict mapping segment names to sequences

        Returns:
            dict[str, str]: dict mapping ['fr1', 'fr2', ...] to their aminoacid sequences
        """
        return {
            "fr1": self.fr1,
            "cdr1": self.cdr1,
            "fr2": self.fr2,
            "cdr2": self.cdr2,
            "fr3": self.fr3,
            "cdr3": self.cdr3,
            "fr4": self.fr4,
            "prefix": self.prefix,
            "postfix": self.postfix,
        }

fr1 instance-attribute

cdr1 instance-attribute

fr2 instance-attribute

cdr2 instance-attribute

fr3 instance-attribute

cdr3 instance-attribute

fr4 instance-attribute

prefix instance-attribute

postfix instance-attribute

__init__(fr1, cdr1, fr2, cdr2, fr3, cdr3, fr4, prefix, postfix)

as_dict()

Return dict mapping segment names to sequences

Returns:

Type Description
dict[str, str]

dict[str, str]: dict mapping ['fr1', 'fr2', ...] to their aminoacid sequences

Source code in immunum/__init__.py
def as_dict(self) -> dict[str, str]:
    """Return dict mapping segment names to sequences

    Returns:
        dict[str, str]: dict mapping ['fr1', 'fr2', ...] to their aminoacid sequences
    """
    return {
        "fr1": self.fr1,
        "cdr1": self.cdr1,
        "fr2": self.fr2,
        "cdr2": self.cdr2,
        "fr3": self.fr3,
        "cdr3": self.cdr3,
        "fr4": self.fr4,
        "prefix": self.prefix,
        "postfix": self.postfix,
    }