手机也能玩AI斯坦福的小章鱼让大年夜模型走进移动端

日前，斯坦福推出了 Octopus v2，它是Nexa AI开拓的一款开源措辞模型，具有20亿参数，专为Android API的功能调用而设计。
通过采取独特的functional token策略，Octopus-v2在演习和推理阶段都展现出了与GPT-4相媲美的性能，同时大幅提高了推理速率，特殊适用于边缘打算设备。
Octopus v2 可以在智好手机、汽车、个人电脑等端侧运行，在准确性和延迟方面超越了 GPT-4，并将高下文长度减少了 95%。
此外，Octopus v2 比 Llama7B + RAG 方案快 36 倍。

Octopus 的亮点：移动设备支持：Octopus-V2-2B 专为在 Android 设备上无缝运行而设计，将其实用性扩展到从 Android 系统管理到多设备编排等各种运用程序。
函数 token：Octopus-V2-2B 能够在各种繁芜场景中天生单独的、嵌套的和并行的函数调用。
此策略将高下文长度减少 95%。
传统的基于检索的方法与我们当条件出的模型之间的差异如图：

推理速率：在基准测试中，Octopus-V2-2B 表现出了卓越的推理速率，在单个 A100 GPU 上的性能比“Llama7B + RAG 办理方案”组合赶过 36 倍。
此外，与依赖集群 A100/H100 GPU 的 GPT-4-turbo (gpt-4-0125-preview) 比较，Octopus-V2-2B 速率提高了 168%。
这种效率归功于我们的函数token 设计。
高精度：Octopus-V2-2B 不仅在速率上表现出色，而且在准确度上也表现出色，在函数调用准确度上超越“Llama7B + RAG 方案”31%。
它实现了与 GPT-4 和 RAG + GPT-3.5 相称的函数调用精度，在基准数据集上的得分范围在 98% 到 100% 之间。

您可以利用以下代码在单个 GPU 上运行模型。

from transformers import AutoTokenizer, GemmaForCausalLMimport torchimport timedef inference(input_text): start_time = time.time() input_ids = tokenizer(input_text, return_tensors="pt").to(model.device) input_length = input_ids["input_ids"].shape[1] outputs = model.generate( input_ids=input_ids["input_ids"], max_length=1024, do_sample=False) generated_sequence = outputs[:, input_length:].tolist() res = tokenizer.decode(generated_sequence[0]) end_time = time.time() return {"output": res, "latency": end_time - start_time}model_id = "NexaAIDev/Octopus-v2"tokenizer = AutoTokenizer.from_pretrained(model_id)model = GemmaForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto")input_text = "Take a selfie for me with front camera"nexa_query = f"Below is the query from the users, please call the correct function and generate the parameters to call the function.\n\nQuery: {input_text} \n\nResponse:"start_time = time.time()print("nexa model result:\n", inference(nexa_query))print("latency:", time.time() - start_time," s")模型评估

Octopus-V2-2B 在基准测试中表现出卓越的推理速率，在单个 A100 GPU 上比「Llama7B + RAG 办理方案」快 36 倍。
此外，与依赖集群 A100/H100 GPU 的 GPT-4-turbo 比较，Octopus-V2-2B 速率提高了 168%。
这种效率打破归功于 Octopus-V2-2B 的函数性 token 设计。

手机也能玩AI斯坦福的小章鱼让大年夜模型走进移动端

Octopus-V2-2B 不仅在速率上表现出色，在准确率上也表现出色，在函数调用准确率上超越「Llama7B + RAG 方案」31%。
Octopus-V2-2B 实现了与 GPT-4 和 RAG + GPT-3.5 相称的函数调用准确率。

模型演习和微调

为了演习、验证和测试阶段采取高质量数据集，特殊是实现高效演习，研究团队用三个关键阶段创建数据集：

天生干系的查询及其关联的函数调用参数；由适当的函数组件天生不干系的查询；通过 Google Gemini 实现二进制验证支持。

项目团队编写了 20 个 Android API 描述用于演习模型。
Android API 实现以及我们的演习数据将在稍后发布。
下面是一个Android API 的示例代码：

def get_trending_news(category=None, region='US', language='en', max_results=5): """ Fetches trending news articles based on category, region, and language. Parameters: - category (str, optional): News category to filter by, by default use None for all categories. Optional to provide. - region (str, optional): ISO 3166-1 alpha-2 country code for region-specific news, by default, uses 'US'. Optional to provide. - language (str, optional): ISO 639-1 language code for article language, by default uses 'en'. Optional to provide. - max_results (int, optional): Maximum number of articles to return, by default, uses 5. Optional to provide. Returns: - list[str]: A list of strings, each representing an article. Each string contains the article's heading and URL. """

项目给出的论文中针对模型的全部内容有着非常详细的先容，这里也建议希望深入研究的同学阅读一下论文。

干系资料：论文：Octopus v2: On-device language model for super agent论文地址：https://arxiv.org/abs/2404.01744模型主页：https://huggingface.co/NexaAIDev/Octopus-v2

每期AI知识网

手机也能玩AI斯坦福的小章鱼让大年夜模型走进移动端

AI键完成老旧照片修复不懂AI也能一键完成 ai修复老照片

没有了