Skip to main content
Open In ColabOpen on GitHub

MLX

This notebook shows how to get started using MLX LLM's as chat models.

In particular, we will:

  1. Utilize the MLXPipeline,
  2. Utilize the ChatMLX class to enable any of these LLMs to interface with LangChain's Chat Messages abstraction.
  3. Demonstrate how to use an open-source LLM to power an ChatAgent pipeline
%pip install --upgrade --quiet  mlx-lm transformers huggingface_hub
ERROR: Cannot install mlx-lm==0.0.1, mlx-lm==0.0.10, mlx-lm==0.0.11, mlx-lm==0.0.12, mlx-lm==0.0.13, mlx-lm==0.0.14, mlx-lm==0.0.2, mlx-lm==0.0.3, mlx-lm==0.0.5, mlx-lm==0.0.6, mlx-lm==0.0.7, mlx-lm==0.0.8, mlx-lm==0.0.9, mlx-lm==0.1.0, mlx-lm==0.10.0, mlx-lm==0.11.0, mlx-lm==0.12.0, mlx-lm==0.12.1, mlx-lm==0.13.0, mlx-lm==0.13.1, mlx-lm==0.14.0, mlx-lm==0.14.1, mlx-lm==0.14.2, mlx-lm==0.14.3, mlx-lm==0.15.0, mlx-lm==0.15.1, mlx-lm==0.15.2, mlx-lm==0.15.3, mlx-lm==0.16.0, mlx-lm==0.16.1, mlx-lm==0.17.0, mlx-lm==0.17.1, mlx-lm==0.18.1, mlx-lm==0.18.2, mlx-lm==0.19.0, mlx-lm==0.19.1, mlx-lm==0.19.2, mlx-lm==0.19.3, mlx-lm==0.2.0, mlx-lm==0.20.1, mlx-lm==0.20.2, mlx-lm==0.20.3, mlx-lm==0.20.4, mlx-lm==0.3.0, mlx-lm==0.4.0, mlx-lm==0.5.0, mlx-lm==0.6.0, mlx-lm==0.7.0, mlx-lm==0.8.0 and mlx-lm==0.9.0 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
Note: you may need to restart the kernel to use updated packages.

1. Instantiate an LLMโ€‹

There are three LLM options to choose from.

from langchain_community.llms.mlx_pipeline import MLXPipeline

llm = MLXPipeline.from_model_id(
"mlx-community/quantized-gemma-2b-it",
pipeline_kwargs={"max_tokens": 10, "temp": 0.1},
)
API Reference:MLXPipeline
---------------------------------------------------------------------------
``````output
ModuleNotFoundError Traceback (most recent call last)
``````output
Cell In[4], line 1
----> 1 from langchain_community.llms.mlx_pipeline import MLXPipeline
3 llm = MLXPipeline.from_model_id(
4 "mlx-community/quantized-gemma-2b-it",
5 pipeline_kwargs={"max_tokens": 10, "temp": 0.1},
6 )
``````output
ModuleNotFoundError: No module named 'langchain_community'

2. Instantiate the ChatMLX to apply chat templatesโ€‹

Instantiate the chat model and some messages to pass.

from langchain_community.chat_models.mlx import ChatMLX
from langchain_core.messages import HumanMessage

messages = [
HumanMessage(
content="What happens when an unstoppable force meets an immovable object?"
),
]

chat_model = ChatMLX(llm=llm)
API Reference:ChatMLX | HumanMessage

Inspect how the chat messages are formatted for the LLM call.

chat_model._to_chat_prompt(messages)

Call the model.

res = chat_model.invoke(messages)
print(res.content)

3. Take it for a spin as an agent!โ€‹

Here we'll test out gemma-2b-it as a zero-shot ReAct Agent. The example below is taken from here.

Note: To run this section, you'll need to have a SerpAPI Token saved as an environment variable: SERPAPI_API_KEY

from langchain import hub
from langchain.agents import AgentExecutor, load_tools
from langchain.agents.format_scratchpad import format_log_to_str
from langchain.agents.output_parsers import (
ReActJsonSingleInputOutputParser,
)
from langchain.tools.render import render_text_description
from langchain_community.utilities import SerpAPIWrapper

Configure the agent with a react-json style prompt and access to a search engine and calculator.

# setup tools
tools = load_tools(["serpapi", "llm-math"], llm=llm)

# setup ReAct style prompt,and remove system role
human_prompt = """
Answer the following questions as best you can. You have access to the following tools:

{tools}

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are: {tool_names}

The $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:

\`\`\`
{{
"action": $TOOL_NAME,
"action_input": $INPUT
}}
\`\`\`

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action:
\`\`\`
$JSON_BLOB
\`\`\`
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Reminder to always use the exact characters `Final Answer` when responding.

{input}

{agent_scratchpad}

"""

prompt = human_prompt.partial(
tools=render_text_description(tools),
tool_names=", ".join([t.name for t in tools]),
)

# define the agent
chat_model_with_stop = chat_model.bind(stop=["\nObservation"])
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_log_to_str(x["intermediate_steps"]),
}
| prompt
| chat_model_with_stop
| ReActJsonSingleInputOutputParser()
)

# instantiate AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke(
{
"input": "Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?"
}
)

Was this page helpful?