Large Language Models (LLMs) have radically transformed how we build software. Integrating generative artificial intelligence into Python applications has never been more accessible — nor more in demand. Whether you want to build intelligent chatbots, semantic search systems, virtual assistants, or automated analysis tools, mastering LLMs with Python is the most valuable skill right now.

In this complete guide, you'll learn everything from fundamental concepts to practical implementation of applications with models like GPT-4, Llama, and embeddings, using the core libraries of the Python ecosystem.

What Are Large Language Models?

LLMs are artificial intelligence models trained on massive volumes of text — books, articles, source code, and web pages. They learn statistical patterns of human language and can generate coherent text, translate languages, summarize documents, write code, and much more.

Popular examples include GPT-4 (OpenAI), Claude (Anthropic), Llama (Meta), and Gemini (Google). Most of these models are accessible via API, and Python is the go-to language for interacting with them.

Why Use Python with LLMs?

Python became the standard language for AI and machine learning for three main reasons:

  • Mature ecosystem: Libraries like openai, langchain, and transformers make LLM integration trivial
  • Active community: Thousands of tutorials, examples, and ready-to-use packages
  • Flexibility: From simple scripts to complex web applications with FastAPI or Django

According to the Stack Overflow Survey 2025, Python continues to be the fastest-growing language among developers, driven precisely by AI demand.

Setting Up the Environment

Before you start, install the essential libraries:

pip install openai langchain chromadb tiktoken python-dotenv

Create a .env file in your project root to store your API keys securely:

OPENAI_API_KEY=your_key_here

Never share your key or commit it to public repositories. Check the OpenAI official documentation to get your key.

First Call to the OpenAI API

Let's make our first request to an LLM using Python:

from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

response = client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a Python expert assistant."}, {"role": "user", "content": "Explain what a decorator is in Python."} ] )

print(response.choices[0].message.content)

This pattern — system prompt + user message — is the foundation of all chat model interactions. The model parameter defines which GPT version to use. Browse all available models in the OpenAI official model catalog.

Working with Embeddings

Embeddings are vector representations of text that capture semantic meaning. They are the foundation for semantic search, recommendation, and clustering systems.

from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

text = "Python is a versatile and powerful programming language" response = client.embeddings.create( model="text-embedding-3-small", input=text )

vector = response.data[0].embedding print(f"Embedding dimensions: {len(vector)}")

The text-embedding-3-small model generates 1536-dimensional vectors. These vectors can be stored in vector databases like ChromaDB or Pinecone for semantic similarity searches. The Real Python article on embeddings dives deeper into this concept.

Building a RAG System (Retrieval-Augmented Generation)

RAG is one of the most powerful architectures with LLMs. It combines information retrieval with text generation, allowing the model to answer based on your specific company documents.

from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

Load document

loader = TextLoader("document.txt") documents = loader.load()

Split into chunks

splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200 ) chunks = splitter.split_documents(documents)

Create vector database

embeddings = OpenAIEmbeddings(model="text-embedding-3-small") vectordb = Chroma.from_documents(chunks, embeddings)

Set up RAG

llm = ChatOpenAI(model="gpt-4", temperature=0) qa_chain = RetrievalQA.from_chain_type( llm=llm, retriever=vectordb.as_retriever() )

Ask

answer = qa_chain.invoke("What is the main goal of this document?") print(answer)

This flow — load, split, embed, store, and query — is the gold standard for applications that need to reason over private documents. The official LangChain RAG documentation provides advanced examples.

Using Open-Source Models with Hugging Face

You don't always need paid APIs. Open-source models like Llama 3, Mistral, and Gemma can run locally via transformers:

from transformers import pipeline

generator = pipeline( "text-generation", model="meta-llama/Llama-3.2-3B", device=0 # use -1 for CPU )

result = generator( "Explain the concept of object-oriented programming:", max_length=200, temperature=0.7 )

print(result[0]["generated_text"])

The Hugging Face Transformers ecosystem offers thousands of pre-trained models for classification, generation, translation, and more.

Streaming Responses

For a more natural experience, stream model responses and display text as it's being generated:

from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

stream = client.chat.completions.create( model="gpt-4", messages=[ {"role": "user", "content": "Write a poem about Python."} ], stream=True )

for chunk in stream: if chunk.choices[0].delta.content is not None: print(chunk.choices[0].delta.content, end="")

Streaming is essential for chatbots and virtual assistants where low latency matters. The official OpenAI Python SDK supports streaming natively.

Function Calling with LLMs

Function calling enables LLMs to execute your Python functions based on user input. Instead of just generating text, the model can decide to call APIs, query databases, or perform specific actions:

from openai import OpenAI
import json
import os
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def get_weather(city):

Weather lookup simulation

return {"city": city, "temperature": 22, "unit": "celsius"}

tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Gets the current temperature for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "Name of the city" } }, "required": ["city"] } } } ]

response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "What is the temperature in New York?"}], tools=tools, tool_choice="auto" )

if response.choices[0].message.tool_calls: call = response.choices[0].message.tool_calls[0] if call.function.name == "get_weather": args = json.loads(call.function.arguments) result = get_weather(args["city"]) print(result)

This pattern turns LLMs into true system orchestrators, allowing the model to dynamically decide which tools to use to complete a task. It is the foundation for building autonomous agents that interact with the real world.

Agents with LangChain

LangChain takes the agent concept to a higher level, providing complete infrastructure for creating autonomous systems with access to multiple tools, memory, and planning capabilities:

from langchain.agents import create_openai_tools_agent
from langchain.agents import AgentExecutor
from langchain.tools import tool
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

@tool def calculate_age(name: str, birth_year: int) -> str: """Calculates a person's age.""" age = 2026 - birth_year return f"{name} is {age} years old."

@tool def greeting(name: str) -> str: """Generates a personalized greeting.""" return f"Hello, {name}! Welcome to Universo Python."

tools = [calculate_age, greeting]

llm = ChatOpenAI(model="gpt-4", temperature=0)

prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}") ])

agent = create_openai_tools_agent(llm, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = agent_executor.invoke({ "input": "Calculate the age of John who was born in 1990 and then say hello to him." }) print(result["output"])

LangChain agents enable creating everything from simple assistants to complex multi-agent systems, where each agent has a specialty and they collaborate to solve problems.

Best Practices with LLMs

1. Efficient System Prompts

The system prompt defines model behavior. Be specific: set the tone, response format, and clear constraints.

2. Temperature Control

The temperature parameter (0.0 to 2.0) controls model creativity. Use low values (0.0-0.3) for factual tasks and high values (0.7-1.0) for creative ones.

3. Token Management

Long inputs cost more tokens. Use chunking techniques to split large texts and keep chat history within the model's context window.

4. Embedding Caching

Avoid generating repeated embeddings for the same text. Store vectors in a persistent vector database like ChromaDB or Qdrant.

5. Input and Output Validation

Always validate what the model receives and produces. Use guardrails to prevent prompt injections and inappropriate content. The TensorFlow tutorials and other ML tools can help validate AI pipelines.

Practical Applications

Customer Support Chatbot

Use RAG with support documents to build a chatbot that answers accurately about your company's products or services.

Sentiment Analyzer

With well-structured prompts, LLMs can classify sentiment in reviews, comments, and social media with high accuracy.

Code Assistant

Build a tool that helps developers write, review, and document code using models like GPT-4 or Llama.

Intelligent Translator

LLMs outperform traditional translators in specific contexts, preserving tone, style, and cultural nuances.

These applications can be served via web APIs built with FastAPI, combining LLM power with asynchronous framework performance.

LLMs vs Traditional ML Models

LLMs don't replace all machine learning models. For tasks like image classification, specialized models like convolutional networks are still superior. For natural language tasks, however, LLMs offer unprecedented flexibility, eliminating the need to train task-specific models.

A hybrid approach — combining LLMs with libraries like Pandas for preprocessing — is often the most effective in real-world projects. Check our complete Pandas for data analysis guide to complement your AI projects.

Limitations and Caveats

  • Hallucinations: LLMs can confidently generate false information. Always validate critical outputs.
  • Cost: Large models like GPT-4 have per-token costs. Optimize your prompts to reduce expenses.
  • Privacy: Data sent to external APIs may not be completely private. For sensitive data, prefer local models.
  • Bias: Models reflect training data biases. Watch for prejudiced or discriminatory results.

To understand current limitations further, check the Real Python AI guide, which covers both technical and ethical aspects.

Next Steps

Now that you've mastered the fundamentals of Python with LLMs, explore these directions:

  • Build a multi-agent system using LangChain Agents and external tools
  • Implement fine-tuning with the OpenAI API for specific tasks
  • Create multimodal applications that process text, image, and audio simultaneously
  • Deploy your application as a REST API with authentication and rate limiting

Integrating Large Language Models with Python is the most exciting technology frontier today. With the tools and knowledge from this guide, you're ready to build intelligent applications that once seemed like science fiction. Start with small projects, experiment with different models, and most importantly, stay updated — this field evolves in weeks, not years.

Keep following Universo Python for more content on artificial intelligence, web development, and best practices in the most versatile language on the market!