Integrations¶

Duffy provides native integrations with LangChain, LlamaIndex, NetworkX, and PyTorch Geometric (PyG).

LangChain¶

DuffyVectorStore¶

LangChain-compatible VectorStore backed by pgvector. Stores documents with text, metadata, and embedding columns in a SQL table.

from duffy._integrations.langchain import DuffyVectorStore

vs = DuffyVectorStore(
    driver,
    embedding,                # LangChain Embeddings instance
    table="documents",        # table name (default: "langchain_documents")
    create_table=True,        # auto-create table if missing
    embedding_dimensions=768, # vector dimensions (for table creation)
)

Adding documents:

ids = vs.add_texts(
    ["First document", "Second document"],
    metadatas=[{"source": "a"}, {"source": "b"}],
)

Searching:

# By text query (embeds automatically)
docs = vs.similarity_search("search query", k=4)

# With scores
docs_with_scores = vs.similarity_search_with_score("search query", k=4)

# By pre-computed vector
docs = vs.similarity_search_by_vector(embedding_vector, k=4)

Deleting:

vs.delete(ids=["doc-id-1", "doc-id-2"])

Factory method:

vs = DuffyVectorStore.from_texts(
    texts=["doc1", "doc2"],
    embedding=embedding,
    driver=driver,
    table="my_docs",
)

DuffyGraphStore (LangChain)¶

LangChain-compatible GraphStore backed by Apache AGE. Use with LangChain's GraphCypherQAChain or knowledge graph pipelines.

from duffy._integrations.langchain import DuffyGraphStore

gs = DuffyGraphStore(driver, graph="my_graph")

Adding graph documents:

gs.add_graph_documents(graph_documents)

Accepts LangChain GraphDocument objects. Nodes are upserted with MERGE (matched on id), relationships are created between matched nodes. All labels are validated against [A-Za-z_][A-Za-z0-9_]* and all values use parameterized queries.

Querying:

results = gs.query("MATCH (n:Person) RETURN n.name")  # list of dicts

Schema:

gs.get_schema            # string representation
gs.get_structured_schema # dict with node_props, rel_props, relationships
gs.refresh_schema()      # refresh cached schema from database

LlamaIndex¶

Duffy provides two LlamaIndex integrations: a simple triplet-based GraphStore and a full-featured PropertyGraphStore.

DuffyGraphStore (LlamaIndex)¶

Simple triplet-based store. Stores knowledge as (:Entity)-[:PREDICATE]->(:Entity) triples.

from duffy._integrations.llamaindex import DuffyGraphStore

store = DuffyGraphStore(driver, graph="my_graph")

Upserting triplets:

store.upsert_triplet("Alice", "knows", "Bob")
store.upsert_triplet("Alice", "works_at", "Acme")

Querying:

# Get triplets for a subject
triplets = store.get("Alice", depth=2, limit=30)
# [["Alice", "knows", "Bob"], ["Alice", "works_at", "Acme"]]

# Get relationship map for multiple subjects
rel_map = store.get_rel_map(subjs=["Alice", "Bob"])
# {"Alice": [["knows", "Bob"], ["works_at", "Acme"]], "Bob": [...]}

Deleting:

store.delete("Alice", "knows", "Bob")

Schema and raw queries:

schema_str = store.get_schema(refresh=True)
result_str = store.query("MATCH (n:Entity) RETURN n.name")

DuffyPropertyGraphStore (LlamaIndex)¶

Modern labeled property graph store. Supports typed nodes, typed relations, structured queries, and vector queries.

Architecture: graph structure lives in AGE (vertices/edges), embeddings live in a companion pgvector table ({graph}_embeddings).

from duffy._integrations.llamaindex_property import DuffyPropertyGraphStore

store = DuffyPropertyGraphStore(driver, graph="my_graph")

Upserting nodes:

from llama_index.core.graph_stores.types import EntityNode, ChunkNode

nodes = [
    EntityNode(id_="alice", label="Person", properties={"age": 30}),
    ChunkNode(id_="doc1", text="Some document text"),
]
store.upsert_nodes(nodes)

Upserting relations:

from llama_index.core.graph_stores.types import Relation

rels = [
    Relation(source_id="alice", target_id="bob", label="KNOWS"),
]
store.upsert_relations(rels)

Querying:

# Get nodes by ID
nodes = store.get(ids=["alice"])

# Get nodes by properties
nodes = store.get(properties={"age": 30})

# Get triplets
triplets = store.get_triplets(entity_names=["Alice"])

# Get relationship map
rel_map = store.get_rel_map(subjs=["Alice"], depth=2)

# Structured Cypher query
nodes = store.structured_query(
    "MATCH (n:Person) WHERE n.age > 25 RETURN n",
    param_map={}
)

Vector queries:

from llama_index.core.vector_stores.types import VectorStoreQuery

vq = VectorStoreQuery(query_embedding=[0.1, 0.2, ...], similarity_top_k=5)
result = store.vector_query(vq)
# Returns (nodes, scores)

Deleting:

store.delete(ids=["alice"], delete_from_graph=True)

Schema:

schema = store.get_schema()

NetworkX¶

Convert AGE graphs to NetworkX for analysis and visualization.

to_networkx¶

G = db.to_networkx()  # full graph as a DiGraph

# Filter by label/type
G = db.to_networkx(labels=["Person"], rel_types=["KNOWS"])

# Undirected
G = db.to_networkx(directed=False)

# Without properties (faster for large graphs)
G = db.to_networkx(node_properties=False, edge_properties=False)

Graph algorithms¶

Built-in algorithms use NetworkX under the hood:

# PageRank
pr_df = db.pagerank(label="Person", rel_type="KNOWS", damping=0.85)

# Community detection (Louvain)
comm_df = db.communities(method="louvain")

# Shortest path (returns list of node IDs)
path = db.shortest_path(source_id, target_id, weight="distance")

# Centrality measures
cent_df = db.centrality(measure="degree")       # also: betweenness, closeness

PyTorch Geometric (PyG)¶

Train GNNs on AGE graph data. Duffy provides PyG's FeatureStore and GraphStore interfaces.

DuffyFeatureStore¶

Stores node feature tensors in PostgreSQL (serialized as bytea).

from duffy._integrations.pyg import DuffyFeatureStore
from torch_geometric.data.feature_store import TensorAttr

fs = DuffyFeatureStore(driver, graph="my_graph")

# Store features
fs._put_tensor(tensor, TensorAttr("Person", "x"))

# Retrieve features
t = fs._get_tensor(TensorAttr("Person", "x"))

# List stored attributes
attrs = fs.get_all_tensor_attrs()

DuffyGraphStore (PyG)¶

Extracts edge topology in COO format from AGE. Node IDs are remapped from AGE's 64-bit IDs to contiguous 0-based indices as required by PyG.

from duffy._integrations.pyg import DuffyGraphStore
from torch_geometric.data.graph_store import EdgeAttr

gs = DuffyGraphStore(driver, graph="my_graph")

# Get edge index [2, num_edges] for a specific edge type
edge_index = gs._get_edge_index(
    EdgeAttr(edge_type=("Person", "KNOWS", "Person"), layout="coo")
)

# List all edge types
edge_attrs = gs.get_all_edge_attrs()

Typical PyG workflow¶

import duffy
from duffy._integrations.pyg import DuffyFeatureStore, DuffyGraphStore

db = duffy.connect("postgresql://...", graph="social")

# 1. Extract edge topology
gs = DuffyGraphStore(db, graph="social")
edge_index = gs._get_edge_index(
    EdgeAttr(edge_type=("Person", "KNOWS", "Person"), layout="coo")
)

# 2. Load/compute node features
fs = DuffyFeatureStore(db, graph="social")
# ... compute embeddings, store with fs._put_tensor(...)

# 3. Build PyG Data object
from torch_geometric.data import Data
data = Data(x=node_features, edge_index=edge_index)

# 4. Train your GNN
# ...