When building backend applications with Python, choosing the right database is one of the most critical decisions you will make. MongoDB, the world's most popular NoSQL database, has become the go-to choice for applications that demand flexibility, scalability, and performance. This complete guide will teach you everything about integrating Python with MongoDB using the official PyMongo driver, from installation to a fully functional project.
What is MongoDB?
MongoDB is a document-oriented NoSQL database. Unlike traditional relational databases such as MySQL or PostgreSQL, which store data in fixed rows and columns, MongoDB stores data in flexible BSON documents (a binary extension of JSON). Each document can have a different structure, making MongoDB ideal for applications with dynamic schemas, rapid prototyping, and integration with data from multiple sources.
Common MongoDB use cases include product catalogs, e-commerce platforms, content management systems, IoT applications, real-time analytics, and log storage. Major companies like Forbes, Uber, and EA Sports rely on MongoDB in production. You can explore more details in the official MongoDB documentation.
Why Use Python with MongoDB?
The Python + MongoDB combination is natural and powerful for several reasons. First, MongoDB's document structure closely resembles Python dictionaries, making the conversion between them nearly seamless. Second, the PyMongo driver is mature, well-documented, and follows Python's best practices. Third, MongoDB's schema flexibility allows you to evolve your application without migrating entire databases.
Additionally, the Python ecosystem offers complementary libraries that integrate smoothly with MongoDB, such as pandas for data analysis and Flask or FastAPI for building APIs. If you are just getting started with Python, check out our guide on dependency management with pip before proceeding.
Installation and Setup
Installing MongoDB
Before writing any code, you need MongoDB installed. There are two main options: install locally or use a cloud instance with MongoDB Atlas. For local development, visit the MongoDB Community Server download page and pick the version for your operating system. On Windows, the installer includes MongoDB as a Windows service. On Linux, use your package manager. On macOS, Homebrew simplifies installation: brew install mongodb-community.
If you prefer not to install anything locally, MongoDB Atlas offers a free tier (M0) with 512 MB of storage, perfect for learning and personal projects. Just create an account, configure a free cluster, and get your connection string.
Installing PyMongo
With Python and MongoDB ready, install the official PyMongo driver using pip:
pip install pymongo
To install with compression and TLS/SSL support, use:
pip install pymongo[snappy,gssapi,srv,tls]
The library is available on PyPI, and the full documentation is on the official PyMongo documentation page.
Verifying the Installation
To confirm everything works, open a Python terminal and run:
import pymongo
print(pymongo.__version__)
If it returns the installed version (e.g., 4.9.1), the installation was successful.
Connecting to MongoDB
Connect to MongoDB using the MongoClient class. The connection string varies depending on your environment:
from pymongo import MongoClient
Local connection (default)
client = MongoClient("mongodb://localhost:27017/")
MongoDB Atlas connection
client = MongoClient("mongodb+srv://user:[email protected]/")
Accessing a database
db = client["my_database"]
Accessing a collection
collection = db["my_collection"]
MongoDB is lazy: the connection is only established when the first operation runs. The MongoClient object manages the connection pool internally, so you should create a single global instance and reuse it throughout your application rather than creating a new connection for each operation.
Always handle connection exceptions. Use try/except to catch pymongo.errors.ConnectionFailure and pymongo.errors.ServerSelectionTimeoutError:
from pymongo.errors import ConnectionFailure, ServerSelectionTimeoutError
try:
client.admin.command("ping")
print("Connected to MongoDB!")
except (ConnectionFailure, ServerSelectionTimeoutError) as e:
print(f"Connection error: {e}")
CRUD Operations with PyMongo
CRUD (Create, Read, Update, Delete) operations form the foundation of any database-driven application. Let us explore each one with practical examples.
Create: Inserting Documents
Use insert_one or insert_many to insert documents:
# Inserting a single document
user = {
"name": "Ana Silva",
"email": "[email protected]",
"age": 28,
"skills": ["Python", "MongoDB", "FastAPI"],
"active": True
}
result = collection.insert_one(user)
print(f"Inserted document ID: {result.inserted_id}")
Inserting multiple documents
users = [
{"name": "Carlos", "email": "[email protected]", "age": 35},
{"name": "Maria", "email": "[email protected]", "age": 42},
{"name": "John", "email": "[email protected]", "age": 25}
]
results = collection.insert_many(users)
print(f"Inserted IDs: {results.inserted_ids}")
If you do not specify an _id field, MongoDB automatically generates a unique ObjectId. You can also set your own _id if you need a custom identifier.
Read: Querying Documents
Use find_one (returns one document) or find (returns an iterable cursor) to query data:
# Find a single document by field
user = collection.find_one({"email": "[email protected]"})
print(user["name"])
Find all documents with a filter
active_users = collection.find({"active": True})
for user in active_users:
print(user["name"])
Projection: select specific fields
users = collection.find(
{"age": {"$gte": 30}},
{"name": 1, "email": 1, "_id": 0}
)
Sorting
sorted_users = collection.find().sort("name", 1) # 1 ascending, -1 descending
Limit and skip (pagination)
page = collection.find().sort("name", 1).skip(0).limit(10)
MongoDB offers a rich set of query operators such as $gt, $lt, $in, $regex, $exists, and many more. Refer to the MongoDB aggregation guide for advanced queries.
Update: Modifying Documents
Use update_one or update_many to modify documents:
# Update a specific document
collection.update_one(
{"email": "[email protected]"},
{"$set": {"age": 29, "active": True}}
)
Increment a numeric value
collection.update_one(
{"name": "Carlos"},
{"$inc": {"age": 1}}
)
Add an item to an array
collection.update_one(
{"name": "Ana Silva"},
{"$push": {"skills": "Docker"}}
)
Update multiple documents
collection.update_many(
{"active": False},
{"$set": {"active": True}}
)
Update operators like $set, $unset, $inc, $push, $pull, and $addToSet allow precise document manipulation without sending the entire document over the network.
Delete: Removing Documents
Use delete_one or delete_many to remove documents:
# Remove a single document
collection.delete_one({"email": "[email protected]"})
Remove multiple documents
collection.delete_many({"active": False})
Remove all documents from a collection (careful!)
collection.delete_many({})
Drop the entire collection
collection.drop()
Aggregation Pipeline
The Aggregation Pipeline is one of MongoDB's most powerful features. It processes documents through sequential stages, similar to Unix command pipelines. Each stage transforms the incoming documents and passes the result to the next stage.
pipeline = [
{"$match": {"age": {"$gte": 25}}},
{"$group": {
"_id": "$city",
"total": {"$sum": 1},
"avg_age": {"$avg": "$age"}
}},
{"$sort": {"total": -1}},
{"$limit": 5}
]
results = collection.aggregate(pipeline)
for result in results:
print(f"City: {result['_id']}, Total: {result['total']}")
Common stages include $match (filtering), $group (grouping), $sort (sorting), $project (projection/transformation), $lookup (cross-collection joins), and $unwind (array deconstruction). Mastering the aggregation pipeline is essential for extracting advanced insights from your data.
Indexes and Performance
Indexes are critical for database performance as data grows. Without indexes, MongoDB must scan every document in a collection (collection scan) to find matching documents, which becomes impractical with millions of records.
# Create a single-field index
collection.create_index("email")
Create a compound index
collection.create_index([("city", 1), ("age", -1)])
Create a unique index
collection.create_index("email", unique=True)
Create a text index for full-text search
collection.create_index([("name", "text"), ("bio", "text")])
List all indexes
for index in collection.list_indexes():
print(index)
Drop an index
collection.drop_index("email_1")
Use the explain() method to analyze how MongoDB executes your queries and verify that indexes are being used. The official MongoDB indexes documentation explains each index type in detail.
PyMongo Best Practices
1. Connection Pooling
Create a single MongoClient instance and reuse it throughout your application. The client manages a connection pool internally. Configure maxPoolSize and minPoolSize according to your expected application load.
2. Exception Handling
Always wrap database operations in try/except blocks. In addition to connection exceptions, catch DuplicateKeyError for unique index violations and BulkWriteError for batch operations.
3. Field Projection
Always use projection to return only the fields you need. This reduces network traffic and improves query performance.
# Instead of fetching the entire document...
user = collection.find_one({"email": "[email protected]"})
...project only the required fields
user = collection.find_one(
{"email": "[email protected]"},
{"name": 1, "email": 1}
)
4. Use bulk_write for Batch Operations
When inserting, updating, or deleting many documents, prefer batch operations with bulk_write over individual calls inside a loop. This dramatically reduces the number of round trips to the database.
5. Proper Indexing
Analyze your most frequent queries and create indexes that cover them. Use explain() to identify slow queries and MongoDB Compass for visual performance monitoring.
6. Connection String and Authentication
Always use environment variables or a secure vault to store your MongoDB connection string, never hardcode credentials in your source code. For Atlas connections, use the SRV format with TLS enabled. Configure authentication properly using SCRAM, x.509 certificates, or LDAP depending on your security requirements.
7. Monitoring and Logging
Enable slow query logging in MongoDB to identify bottlenecks. Configure slowms and profile levels to capture queries that exceed a certain threshold. Integrate Python's logging module with PyMongo's event listeners to monitor driver behavior and diagnose issues in production.
import logging
from pymongo import monitoring
logging.basicConfig(level=logging.INFO)
class CommandLogger(monitoring.CommandListener):
def started(self, event):
logging.info(f"Command {event.command_name} started on {event.database_name}")
def succeeded(self, event):
logging.info(f"Command {event.command_name} succeeded in {event.duration_micros}µs")
def failed(self, event):
logging.error(f"Command {event.command_name} failed: {event.failure}")
monitoring.register(CommandLogger())
8. Data Validation and Schemas
Even though MongoDB is schema-less, you can enforce document structure using Schema Validation directly in the database. Define JSON Schema rules per collection to ensure data quality at the database level, complementing validation at the application layer with libraries like Pydantic.
Data Modeling with MongoDB
One of the most important decisions when working with MongoDB is how to model your data. Unlike relational databases where normalization is the rule, MongoDB encourages embedding related documents within a single document whenever possible. This reduces the need for joins and improves read performance.
For example, in an e-commerce system, instead of having separate orders and items tables, you can embed items directly in the order document:
order = {
"_id": ObjectId(),
"customer": "Maria Souza",
"date": datetime.now(),
"total": 250.00,
"items": [
{"product": "Mechanical Keyboard", "qty": 1, "price": 200.00},
{"product": "Mouse Pad", "qty": 1, "price": 50.00}
],
"shipping_address": {
"street": "Av. Paulista",
"number": 1000,
"city": "São Paulo",
"zip": "01310-100"
}
}
The general rule is: embed when the relationship is a "contains" type (one-to-few) and reference when it is many-to-many or when the related data is large and accessed independently. MongoDB also supports $lookup for joins between collections when needed.
Transactions and Atomicity
Although MongoDB is known for its flexibility, it also supports multi-document transactions for scenarios that require atomicity. A transaction ensures that multiple operations execute as a single unit: either all are applied, or none are. This is essential in financial systems, booking platforms, and any application requiring strict consistency.
With PyMongo, you can use transactions in replica sets or sharded clusters starting from MongoDB 4.0:
from pymongo import MongoClient
client = MongoClient()
db = client["bank"]
accounts = db["accounts"]
transactions = db["transactions"]
with client.start_session() as session:
with session.start_transaction():
accounts.update_one(
{"account": "A"}, {"$inc": {"balance": -100}}, session=session
)
accounts.update_one(
{"account": "B"}, {"$inc": {"balance": 100}}, session=session
)
transactions.insert_one({
"from": "A", "to": "B", "amount": 100
}, session=session)
If any operation fails, everything is rolled back automatically
Use transactions sparingly, as they come with a performance cost. Prefer atomic operations on a single document whenever possible. MongoDB guarantees document-level atomicity by default, which covers most use cases.
MongoDB Ecosystem Tools
The MongoDB ecosystem offers several tools that simplify database development and administration. MongoDB Compass is the official GUI for visualizing and manipulating data, creating indexes, and analyzing query performance visually. MongoDB Shell (mongosh) is the interactive terminal for advanced administration. For local development, mongod is the database server, while mongos manages sharded clusters in production.
For larger projects, MongoDB Charts lets you create data visualizations directly from the database, and MongoDB Realm provides real-time synchronization for mobile applications. In production environments, configure monitoring with MongoDB Ops Manager or Atlas Monitoring to track metrics like memory usage, operations per second, and query latency.
Beyond PyMongo, there are other Python libraries that abstract MongoDB interaction. MongoEngine is an ODM (Object Document Mapper) similar to an ORM, letting you define schemas with Python classes. Beanie is an async ODM based on Pydantic that integrates seamlessly with FastAPI. For data analysis, pandas can read directly from MongoDB collections using pd.DataFrame(list(collection.find())).
# MongoDB integration with pandas
import pandas as pd
from pymongo import MongoClient
client = MongoClient()
db = client["sales"]
collection = db["orders"]
Load MongoDB data into a DataFrame
df = pd.DataFrame(list(collection.find({"status": "delivered"})))
print(f"Total sales: ${df['total'].sum():.2f}")
print(f"Average per order: ${df['total'].mean():.2f}")
Practical Project: Library Management System
Let us consolidate everything we have learned with a practical project: a simple library management system with operations for adding books, recording loans, and generating reports.
import pymongo
from pymongo import MongoClient
from datetime import datetime, timedelta
class Library:
def init(self):
self.client = MongoClient("mongodb://localhost:27017/")
self.db = self.client["library"]
self.books = self.db["books"]
self.loans = self.db["loans"]
self._create_indexes()
def _create_indexes(self):
self.books.create_index("isbn", unique=True)
self.books.create_index([("title", "text"), ("author", "text")])
self.loans.create_index("book_id")
def add_book(self, title, author, isbn, year, copies=1):
book = {
"title": title,
"author": author,
"isbn": isbn,
"year": year,
"copies_available": copies,
"copies_total": copies
}
try:
self.books.insert_one(book)
return f"Book '{title}' added successfully."
except pymongo.errors.DuplicateKeyError:
return f"ISBN {isbn} already exists in the system."
def record_loan(self, isbn, user):
book = self.books.find_one({"isbn": isbn})
if not book or book["copies_available"] <= 0:
return "Book is not available for loan."
loan = {
"book_id": book["_id"],
"user": user,
"loan_date": datetime.now(),
"due_date": datetime.now() + timedelta(days=14),
"returned": False
}
self.loans.insert_one(loan)
self.books.update_one(
{"_id": book["_id"]},
{"$inc": {"copies_available": -1}}
)
return f"Loan recorded for {user}. Due in 14 days."
def most_loaned_books(self, limit=5):
pipeline = [
{"$group": {
"_id": "$book_id",
"total_loans": {"$sum": 1}
}},
{"$sort": {"total_loans": -1}},
{"$limit": limit},
{"$lookup": {
"from": "books",
"localField": "_id",
"foreignField": "_id",
"as": "book"
}},
{"$unwind": "$book"},
{"$project": {
"title": "$book.title",
"author": "$book.author",
"total_loans": 1
}}
]
return list(self.loans.aggregate(pipeline))
Example usage
library = Library()
library.add_book("Python for Data Analysis", "Wes McKinney", "978-1-449-35901-7", 2023, 3)
library.add_book("MongoDB: The Definitive Guide", "Kristina Chodorow", "978-1-449-34480-9", 2022, 2)
print(library.record_loan("978-1-449-35901-7", "Carlos"))
print(library.most_loaned_books())
Conclusion
The integration between Python and MongoDB with PyMongo is a powerful and versatile combination for building modern applications. In this guide, you learned everything from installation to a complete project, covering CRUD operations, aggregation pipeline, indexes, and best practices. The MongoDB ecosystem is vast and offers features like Change Streams, Transactions, and GridFS for large files. To go deeper, take the free courses at MongoDB University and follow the official MongoDB blog.
If you have not set up your Python environment yet, check out our tutorial on virtual environments with venv to keep your dependencies organized. MongoDB is constantly evolving, with new features being added in every major release, so keep an eye on the official changelog and community forums to stay up to date. With these skills, you are ready to build scalable and flexible Python applications with MongoDB!