Large Language Models (LLMs) have radically transformed how we build software. Integrating generative artificial intelligence into Python applications has never been more accessible — nor more in demand. Whether you want to build intelligent chatbots, semantic search systems, virtual assistants, or automated analysis tools, mastering LLMs with Python is the most valuable skill right now.
In this complete guide, you'll learn everything from fundamental concepts to practical implementation of applications with models like GPT-4, Llama, and embeddings, using the core libraries of the Python ecosystem.
What Are Large Language Models?
LLMs are artificial intelligence models trained on massive volumes of text — books, articles, source code, and web pages. They learn statistical patterns of human language and can generate coherent text, translate languages, summarize documents, write code, and much more.
Popular examples include GPT-4 (OpenAI), Claude (Anthropic), Llama (Meta), and Gemini (Google). Most of these models are accessible via API, and Python is the go-to language for interacting with them.
Why Use Python with LLMs?
Python became the standard language for AI and machine learning for three main reasons:
- Mature ecosystem: Libraries like
openai,langchain, andtransformersmake LLM integration trivial - Active community: Thousands of tutorials, examples, and ready-to-use packages
- Flexibility: From simple scripts to complex web applications with FastAPI or Django
According to the Stack Overflow Survey 2025, Python continues to be the fastest-growing language among developers, driven precisely by AI demand.
Setting Up the Environment
Before you start, install the essential libraries:
pip install openai langchain chromadb tiktoken python-dotenv
Create a .env file in your project root to store your API keys securely:
OPENAI_API_KEY=your_key_here
Never share your key or commit it to public repositories. Check the OpenAI official documentation to get your key.
First Call to the OpenAI API
Let's make our first request to an LLM using Python:
from openai import OpenAI
import os
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a Python expert assistant."},
{"role": "user", "content": "Explain what a decorator is in Python."}
]
)
print(response.choices[0].message.content)
This pattern — system prompt + user message — is the foundation of all chat model interactions. The model parameter defines which GPT version to use. Browse all available models in the OpenAI official model catalog.
Working with Embeddings
Embeddings are vector representations of text that capture semantic meaning. They are the foundation for semantic search, recommendation, and clustering systems.
from openai import OpenAI
import os
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
text = "Python is a versatile and powerful programming language"
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
vector = response.data[0].embedding
print(f"Embedding dimensions: {len(vector)}")
The text-embedding-3-small model generates 1536-dimensional vectors. These vectors can be stored in vector databases like ChromaDB or Pinecone for semantic similarity searches. The Real Python article on embeddings dives deeper into this concept.
Building a RAG System (Retrieval-Augmented Generation)
RAG is one of the most powerful architectures with LLMs. It combines information retrieval with text generation, allowing the model to answer based on your specific company documents.
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
Load document
loader = TextLoader("document.txt")
documents = loader.load()
Split into chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = splitter.split_documents(documents)
Create vector database
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectordb = Chroma.from_documents(chunks, embeddings)
Set up RAG
llm = ChatOpenAI(model="gpt-4", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectordb.as_retriever()
)
Ask
answer = qa_chain.invoke("What is the main goal of this document?")
print(answer)
This flow — load, split, embed, store, and query — is the gold standard for applications that need to reason over private documents. The official LangChain RAG documentation provides advanced examples.
Using Open-Source Models with Hugging Face
You don't always need paid APIs. Open-source models like Llama 3, Mistral, and Gemma can run locally via transformers:
from transformers import pipeline
generator = pipeline(
"text-generation",
model="meta-llama/Llama-3.2-3B",
device=0 # use -1 for CPU
)
result = generator(
"Explain the concept of object-oriented programming:",
max_length=200,
temperature=0.7
)
print(result[0]["generated_text"])
The Hugging Face Transformers ecosystem offers thousands of pre-trained models for classification, generation, translation, and more.
Streaming Responses
For a more natural experience, stream model responses and display text as it's being generated:
from openai import OpenAI
import os
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
stream = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Write a poem about Python."}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
Streaming is essential for chatbots and virtual assistants where low latency matters. The official OpenAI Python SDK supports streaming natively.
Function Calling with LLMs
Function calling enables LLMs to execute your Python functions based on user input. Instead of just generating text, the model can decide to call APIs, query databases, or perform specific actions:
from openai import OpenAI
import json
import os
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def get_weather(city):
Weather lookup simulation
return {"city": city, "temperature": 22, "unit": "celsius"}
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Gets the current temperature for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "Name of the city"
}
},
"required": ["city"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "What is the temperature in New York?"}],
tools=tools,
tool_choice="auto"
)
if response.choices[0].message.tool_calls:
call = response.choices[0].message.tool_calls[0]
if call.function.name == "get_weather":
args = json.loads(call.function.arguments)
result = get_weather(args["city"])
print(result)
This pattern turns LLMs into true system orchestrators, allowing the model to dynamically decide which tools to use to complete a task. It is the foundation for building autonomous agents that interact with the real world.
Agents with LangChain
LangChain takes the agent concept to a higher level, providing complete infrastructure for creating autonomous systems with access to multiple tools, memory, and planning capabilities:
from langchain.agents import create_openai_tools_agent
from langchain.agents import AgentExecutor
from langchain.tools import tool
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
@tool
def calculate_age(name: str, birth_year: int) -> str:
"""Calculates a person's age."""
age = 2026 - birth_year
return f"{name} is {age} years old."
@tool
def greeting(name: str) -> str:
"""Generates a personalized greeting."""
return f"Hello, {name}! Welcome to Universo Python."
tools = [calculate_age, greeting]
llm = ChatOpenAI(model="gpt-4", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}")
])
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = agent_executor.invoke({
"input": "Calculate the age of John who was born in 1990 and then say hello to him."
})
print(result["output"])
LangChain agents enable creating everything from simple assistants to complex multi-agent systems, where each agent has a specialty and they collaborate to solve problems.
Best Practices with LLMs
1. Efficient System Prompts
The system prompt defines model behavior. Be specific: set the tone, response format, and clear constraints.
2. Temperature Control
The temperature parameter (0.0 to 2.0) controls model creativity. Use low values (0.0-0.3) for factual tasks and high values (0.7-1.0) for creative ones.
3. Token Management
Long inputs cost more tokens. Use chunking techniques to split large texts and keep chat history within the model's context window.
4. Embedding Caching
Avoid generating repeated embeddings for the same text. Store vectors in a persistent vector database like ChromaDB or Qdrant.
5. Input and Output Validation
Always validate what the model receives and produces. Use guardrails to prevent prompt injections and inappropriate content. The TensorFlow tutorials and other ML tools can help validate AI pipelines.
Practical Applications
Customer Support Chatbot
Use RAG with support documents to build a chatbot that answers accurately about your company's products or services.
Sentiment Analyzer
With well-structured prompts, LLMs can classify sentiment in reviews, comments, and social media with high accuracy.
Code Assistant
Build a tool that helps developers write, review, and document code using models like GPT-4 or Llama.
Intelligent Translator
LLMs outperform traditional translators in specific contexts, preserving tone, style, and cultural nuances.
These applications can be served via web APIs built with FastAPI, combining LLM power with asynchronous framework performance.
LLMs vs Traditional ML Models
LLMs don't replace all machine learning models. For tasks like image classification, specialized models like convolutional networks are still superior. For natural language tasks, however, LLMs offer unprecedented flexibility, eliminating the need to train task-specific models.
A hybrid approach — combining LLMs with libraries like Pandas for preprocessing — is often the most effective in real-world projects. Check our complete Pandas for data analysis guide to complement your AI projects.
Limitations and Caveats
- Hallucinations: LLMs can confidently generate false information. Always validate critical outputs.
- Cost: Large models like GPT-4 have per-token costs. Optimize your prompts to reduce expenses.
- Privacy: Data sent to external APIs may not be completely private. For sensitive data, prefer local models.
- Bias: Models reflect training data biases. Watch for prejudiced or discriminatory results.
To understand current limitations further, check the Real Python AI guide, which covers both technical and ethical aspects.
Next Steps
Now that you've mastered the fundamentals of Python with LLMs, explore these directions:
- Build a multi-agent system using LangChain Agents and external tools
- Implement fine-tuning with the OpenAI API for specific tasks
- Create multimodal applications that process text, image, and audio simultaneously
- Deploy your application as a REST API with authentication and rate limiting
Integrating Large Language Models with Python is the most exciting technology frontier today. With the tools and knowledge from this guide, you're ready to build intelligent applications that once seemed like science fiction. Start with small projects, experiment with different models, and most importantly, stay updated — this field evolves in weeks, not years.
Keep following Universo Python for more content on artificial intelligence, web development, and best practices in the most versatile language on the market!