Quick Start¶

Get up and running with Duffy in 5 minutes.

Prerequisites¶

Python 3.10+
PostgreSQL with Apache AGE and pgvector extensions installed

Install¶

pip install duffy

# With pandas support (recommended)
pip install duffy[pandas]

Connect¶

import duffy

db = duffy.connect("postgresql://localhost:5432/mydb", graph="my_graph")

The graph parameter creates the graph if it doesn't exist. You can also connect without a default graph and set it later:

db = duffy.connect("postgresql://localhost:5432/mydb")
db.set_graph("my_graph")

Create nodes and relationships¶

# Create nodes
db.cypher("CREATE (:Person {name: 'Alice', age: 30})")
db.cypher("CREATE (:Person {name: 'Bob', age: 25})")

# Create a relationship
db.cypher("""
    MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
    CREATE (a)-[:KNOWS {since: 2020}]->(b)
""")

db.commit()

Query the graph¶

# Cypher query → DataFrame
df = db.cypher("MATCH (n:Person) RETURN n.name, n.age").to_df()
print(df)
#     n.name  n.age
# 0   Alice     30
# 1     Bob     25

# Traverse relationships
df = db.cypher("""
    MATCH (a:Person)-[r:KNOWS]->(b:Person)
    RETURN a.name AS from, b.name AS to, r.since
""").to_df()

Parameterized queries¶

Use %s placeholders for safe parameterization:

result = db.cypher(
    "MATCH (n:Person {name: %s}) RETURN n.name, n.age",
    params=("Alice",),
)
record = result.single()
print(record["n.name"])  # Alice

Vector search¶

Search for similar vectors using pgvector:

query_vec = [0.1, 0.2, 0.3, ...]  # your query embedding

results = db.vector_search(
    "documents",       # table name
    "embedding",       # vector column
    query_vec,
    k=10,              # number of results
    metric="cosine",   # cosine, l2, or inner_product
)
df = results.to_df()

Create an index for faster searches:

db.create_vector_index("documents", "embedding", method="hnsw", metric="cosine")

Hybrid search¶

Combine graph traversal with vector similarity — find nodes via Cypher, then rank by embedding distance:

results = db.hybrid_search(
    cypher="MATCH (p:Paper)-[:CITES]->(cited) RETURN cited",
    vector_table="papers",
    vector_column="abstract_embedding",
    query_vector=query_vec,
    k=10,
)
df = results.to_df()

Output formats¶

Every query returns a Result object with multiple output options:

result = db.cypher("MATCH (n:Person) RETURN n.name, n.age")

result.to_df()       # pandas DataFrame
result.to_dicts()    # list of dicts
result.to_arrow()    # PyArrow Table
result.records       # list of Record objects

# Expand Vertex/Edge objects into flat columns
result.to_df(expand=True)

Transactions¶

Group operations in a transaction that auto-commits on success and rolls back on exception:

with db.transaction():
    db.cypher("CREATE (:Person {name: 'Carol'})")
    db.cypher("CREATE (:Person {name: 'Dave'})")
# auto-committed here

Context manager¶

Use connect() as a context manager for automatic cleanup:

with duffy.connect("postgresql://localhost:5432/mydb", graph="g") as db:
    df = db.cypher("MATCH (n) RETURN n").to_df()
# connection closed automatically

Next steps¶

API Reference — full method signatures and parameters
Integrations — LangChain, LlamaIndex, NetworkX, PyG