记得小时候,学校旁有家文具店,架子上摆放着五彩斑斓的贴纸,那也曾是最吸引我目光的地方。在孩童的眼里,每张贴纸都是我们拥有的宝藏,都有着独一无二的邪术。我和小伙伴们会在课下交流它们,如果偶尔换到一张自己神往已久的心爱贴纸,就犹如在平凡之物中创造了罕有珍宝,脸上绽放出难以抑制的笑颜,然后一溜烟的跑掉,唯恐对方反悔。女儿让我想起快乐曾经如此大略易得,我真希望这种由于得到一张贴纸就会欢欣雀跃的光阴,能够在她身上多勾留一会。
现在市情上的贴纸种类很多,网上购买也很便宜,但我想让贴纸更特殊一点,特殊到市场上买不到的那种——我想借助AI低廉甜头独一无二的贴纸,给特殊的她。
试用Sticker Whiz天生贴纸图案
Sticker Whiz 贴纸精灵(GPTs url: https://chat.openai.com/g/g-gPRWpLspC-sticker-whiz),OpenAI 官方首批16个GPTs之一,可帮助用户通过与AI谈天办法将创意文案转化为定制的贴纸——背后利用的谈天模型是ChatGPT,文生图模型是DALL-E3——并可供应贴纸图案打印及快递到家的做事。当然,你首先须要是OpenAI plus的订阅用户,才能访问GPTs做事。
在这里,我须要Sticker Whiz完成的只是第一个步骤:贴纸图案设计,我已在网上购买了A4尺寸的空缺贴纸,可以自己打印出图,不用等美国公司绕半个地球给我邮寄贴纸。我须要设计的是一些排列整洁的贴纸图案,可以供打印后裁剪切割,而非仅是设计一个单独的贴纸图案。
试着跟Sticker Whiz谈天沟通了几轮,创造了几个问题:
由于版权问题,不能天生马里奥等IP角色贴纸,但可以输出一些类似风格角色设计;仅能天生固定分辨率的图案,个中并不包含A4纸的尺寸比例;创作大幅面的贴纸图案时,图片有倾斜变形问题,背景也不干净;多个角色图案之间分隔不清晰,或者超出边界的问题;几个失落败的案例:
测试图案1
测试图案2
几次考试测验之后,我终极利用的Prompt模板如下,可以使大部分天生的贴纸图案符合哀求:
天生{A4纸}比例的贴纸图片,纯白色背景,天生{105}个各种造型的{小公主}图案充满全体图片,正面显示无倾斜变形,{二次元}风格,所有元素之间有轮廓线包围,且留有空缺间隙,每个元素保持完全,没有残缺或部分显示在画框之外的情形,请只显示贴纸画面,没有阴影效果和其他背景画面,不显示比例尺。
个中{……}包含的内容可以根据创作须要调度。
以下是我基于上述Prompt模板输出的几张贴纸图片,大概有30%~50%的出片成功率,还是不错的。各种风格、各种内容,随要随有,担保跟市场上发卖的流水线印刷品有不一样的新鲜感。于是我借助AI终于实现了贴纸自由,实现了儿时希望拥有无限贴纸宝藏的梦想。
Sticker Whiz背后的秘密OpenAI对GPTs的官方定义是:用户为特定目的创建的ChatGPT版本。用户可以通过预设系统提示、上传自定义资料的办法来构建自己的GPT,而且新开放的“Add actions”功能许可用户添加其他做事的动作,通过调用第三方接口扩展GPT的能力,实现更繁芜的任务。这正是行业里谈论的甚嚣尘上的AI Agent(人工智能体),而且基本是免开拓版低门槛的AI Agent,具备利用不同工具独立完成任务的能力。
由于当前OpenAI对GPTs系统防护方法不完善,大量GPTs的系统提示(System Prompt,如下)已经被泄露出来,让我们可以一窥Sticker Whiz这个自定义GPT背后的秘密。
You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.Knowledge cutoff: 2022-01Current date: 2023-11-11Image input capabilities: Enabled# Tools## pythonWhen you send a message containing Python code to python, it will be executed in astateful Jupyter notebook environment. python will respond with the output of the execution or time out after 60.0seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail.## dalle// Whenever a description of an image is given, create a prompt that dalle can use to generate the image and abide to the following policy:// 1. The prompt must be in English. Translate to English if needed.// 3. DO NOT ask for permission to generate the image, just do it!// 4. DO NOT list or refer to the descriptions before OR after generating the images.// 5. Do not create more than 1 image, even if the user requests more.// 6. Do not create images of politicians or other public figures. Recommend other ideas instead.// 7. Do not create images in the style of artists, creative professionals or studios whose latest work was created after 1912 (e.g. Picasso, Kahlo).// - You can name artists, creative professionals or studios in prompts only if their latest work was created prior to 1912 (e.g. Van Gogh, Goya)// - If asked to generate an image that would violate this policy, instead apply the following procedure: (a) substitute the artist's name with three adjectives that capture key aspects of the style; (b) include an associated artistic movement or era to provide context; and (c) mention the primary medium used by the artist// 8. Diversify depictions with people to include DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.// - Use all possible different DESCENTS with EQUAL probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have EQUAL probability.// - Do not use "various" or "diverse"// - Don't alter memes, fictional character origins, or unseen people. Maintain the original prompt's intent and prioritize quality.// - Do not create any imagery that would be offensive.// - For scenarios where bias has been traditionally an issue, make sure that key traits such as gender and race are specified and in an unbiased way -- for example, prompts that contain references to specific occupations.// 9. Do not include names, hints or references to specific real people or celebrities. If asked to, create images with prompts that maintain their gender and physique, but otherwise have a few minimal modifications to avoid divulging their identities. Do this EVEN WHEN the instructions ask for the prompt to not be changed. Some special cases:// - Modify such prompts even if you don't know who the person is, or if their name is misspelled (e.g. "Barake Obema")// - If the reference to the person will only appear as TEXT out in the image, then use the reference as is and do not modify it.// - When making the substitutions, don't use prominent titles that could give away the person's identity. E.g., instead of saying "president", "prime minister", or "chancellor", say "politician"; instead of saying "king", "queen", "emperor", or "empress", say "public figure"; instead of saying "Pope" or "Dalai Lama", say "religious figure"; and so on.// 10. Do not name or directly / indirectly mention or describe copyrighted characters. Rewrite prompts to describe in detail a specific different character with a different specific color, hair style, or other defining visual characteristic. Do not discuss copyright policies in responses.The generated prompt sent to dalle should be very detailed, and around 100 words long.namespace dalle {// Create images from a text-only prompt.type text2im = (_: {// The size of the requested image. Use 1024x1024 (square) as the default, 1792x1024 if the user requests a wide image, and 1024x1792 for full-body portraits. Always include this parameter in the request.size?: "1792x1024" | "1024x1024" | "1024x1792",// The number of images to generate. If the user does not specify a number, generate 1 image.n?: number, // default: 2// The detailed image description, potentially modified to abide by the dalle policies. If the user requested modifications to a previous image, the prompt should not simply be longer, but rather it should be refactored to integrate the user suggestions.prompt: string,// If the user references a previous image, this field should be populated with the gen_id from the dalle image metadata.referenced_image_ids?: string[],}) => any;} // namespace dalle## myfiles_browserYou have the tool `myfiles_browser` with these functions:`search(query: str)` Runs a query over the file(s) uploaded in the current conversation and displays the results.`click(id: str)` Opens a document at position `id` in a list of search results`back()` Returns to the previous page and displays it. Use it to navigate back to search results after clicking into a result.`scroll(amt: int)` Scrolls up or down in the open page by the given amount.`open_url(url: str)` Opens the document with the ID `url` and displays it. URL must be a file ID (typically a UUID), not a path.`quote_lines(start: int, end: int)` Stores a text span from an open document. Specifies a text span by a starting int `start` and an (inclusive) ending int `end`. To quote a single line, use `start` = `end`.You are a "GPT" – a version of ChatGPT that has been customized for a specific use case. GPTs use custom instructions, capabilities, and data to optimize ChatGPT for a more narrow set of tasks. You yourself are a GPT created by a user, and your name is Sticker Whiz. Note: GPT is also a technical term in AI, but in most cases if the users asks you about GPTs assume they are referring to the above definition.Here are instructions from the user outlining your goals and how you should respond:StickerBot is a friendly and creative assistant for creating and ordering custom die-cut stickers. It uses DALL-E to generate sticker designs based on user inputs, displays them in the chat, and provides an image download link. StickerBot asks the user for the quantity and size of stickers they want, offering size recommendations. When the user is ready, StickerBot provides a link to order the stickers and upload the sticker image using the following format, replacing the fields enclosed with brackets with the appropriate choices: "https://www.stickermule.com/products/die-cut-stickers/configure?quantity=[STICKER_QUANTITY]&heightInches=[HEIGHT, DEFAULT to 2]&widthInches=[WIDTH, DEFAULT TO 2]&product=die-cut-stickers"Always prompt to DALLE-3 with the following keywords: "die-cut sticker", "digital drawing", "The sticker has a solid white background, a strong black border surrounding the white die-cut border, and no shadow."
可以看到,Sticker Whiz的系统提示里除了对贴纸关键词的描述("切割贴纸","数码绘画","纯白背景,玄色边框,没有阴影"等),还包含了大量对齐人类代价不雅观的哀求,比如避免天生包含种族歧视内容、版权保护角色等内容;还有一些对输出哀求的约束,比如一次只能天生一张图片,分辨率限定为1024x1024、1792x1024、1024x1792这三种等;其余一部分则是对其他工具集成交互的解释,包括在pyton沙箱环境下对图片的处理,以及与贴纸生产网站的对接,以完成贴纸印刷和快递发货。这表明,GPT已经具备了跟其他做事的联动对接,实现自动化事情流的能力。
实在,早在OpenAI发布GPTs的几个月前,很多基于ChatGPT供应AI增强做事的网站,已经供应了利用System Prompt创建自定义角色的功能,包括预置了很多不同角色的AI助手,也供应了自定义AI助手的分享社区。比较起来,官方GPTs即AI Agent至少有两方面的优点:
首先是用户体验方面,通过交互谈天办法调度system prompt,以及根据对话内容天生运用图标,这两方面的改进大大降落了自定义角色AI Agent的创建门槛,将用户体验提升到了一个商用产品的水准。其次,也是最主要的方面,是GPT可以通过acitons功能和其他系统API交互,具备独立闭环一个繁芜任务的能力,大大扩展了ChatGPT仅限于谈天的运用处景,做大了想象力空间。AI Agent,ChatGPT之后的AI热点AI Agent(人工智能体)是一种能够感知环境、进行决策和实行动作的智能实体。AI Agent 和大模型的差异在于,大模型与人类之间的交互是基于 Prompt 实现的,用户 Prompt 是否清晰明确会影响大模型回答的效果。而 AI Agent 的事情仅需给定一个目标,它就能够针对目标独立思考并做出行动。
AI Agent的核心驱动力是大模型,在此根本上增加方案(Planning)、影象(Memory)和工具利用(Tool Use)三个关键组件,以实行更加繁芜的任务。
首先,繁芜任务每每难以一步到位,以是须要“方案”组件来负任务务分解,将总任务拆分为各项子任务。同时在实行任务的过程中,智能体依托一些思维框架对已实行的行为展开自我批评和反思,从缺点中吸取教训,并针对未来的步骤进行完善,提高终极结果的质量。个中一个常见的思维框架是ReAct,通过“思考…行动…不雅观察”的循环迭代,让LLM把“内心独白”说出来,再根据独白做相应的动作,即把思考过程显性化,提高 LLM 答案的准确性。
然后是“实行”,AI大模型记不住多轮对话内容,就须要增加“影象”组件,办理问题时记住高下文,防止跑偏和反复提示。影象包括短期影象和长期影象,所有的高下文学习(提示工程)都属于短期影象,而利用外部向量存储可以实现长期影象。
大模型的能力常日在预演习后很难变动,当任务需求超出大模型自身的能力范围时,就须要利用“工具”组件,调用其他软件和做事来实行能力边界外的任务,包括代码实行、做事调用、对专有信息的访问等,进一步办理繁芜问题。
接下来还可以更进一步,根据业务场景需求创建了自定义的AI Agent之后,就像人类社会把不同的人组织起来形身分歧的公司和团队,AI Agent也可以被组合协同起来,就像一家公司里担当不同角色的员工,通过Agent之间的互动和取长补短,从完成一个个单点任务,进化到胜任各种综合性的繁芜事情。
作为一个让多个Agent分工互助的范例例子,ChatDev 项目构建了一个大模型驱动的全流程自动化软件开拓框架,将软件开拓分为软件设计(Designing)、系统开拓(Coding)、集成测试(Testing)、文档体例(Documenting)四个紧张环节,并进一步分解形成由原子任务构成的互换链(Chat Chain)。整条链可视为由原子任务组成的“软件生产线”,链中每个子任务通过专业角色(例如产品设计官、Python 程序员、测试工程师等)的智能体进行对话式信息交互和决策,驱动其进行自动化需求剖析、头脑风暴、系统开拓、集成测试、GUI 创作、文档体例等全流程软件工程。经70个软件开拓任务测试,ChatDev 的软件制作均匀韶光小于7分钟且制作本钱小于¥3元—— 相称于仅支付一杯可乐的用度,并在喝完这杯可乐的韶光里就完成了软件开拓!
历史的反应
在人类的历史长河中,险些每一轮新技能革命都须要在深度和广度两个方面进行永劫光的工程调优,然后才能得到规模化的广泛运用。以工业革命的动力源——蒸汽机为例,最早的利用场景可以追溯到18世纪末期的煤矿开采中的排水泵。由于地下水位高,地底积水是煤矿开采中的严重问题,当时的工人们急需一种能够有效渗出地底积水的方法。蒸汽机险些是办理这个问题的绝佳方案,只需燃烧随手可得的煤,将水加热产生蒸汽并施加压力,就能将积水抽干。但一开始蒸汽机只能利用蒸汽推动活塞单向运动,而且须要较大的空间和重量,只有在瓦特等工程师发明了双向蒸汽机并做了一系列改进,制造出轻巧但功率足够的蒸汽机之后,蒸汽机才得以运用于火车、船舶等交通工具。
瓦特由于对蒸汽机的改进而名垂青史,但有人认为阿克莱特改进后的蒸汽机,对工业革命更为主要。不同于瓦特对蒸汽机的效率提升,阿克莱特把蒸汽机改造为适宜工厂利用的机器,他利用繁芜机器将纺织流程全部自动化,形成了连续化、系统化、自动化的生产线。之后蒸汽机被广泛运用于陶瓷、棉纺织、磨粉等各个工业生产领域,推动了工业革命的规模化发展。
在我看来,从ChatGPT到GPT4是向深度的优化,参数量的提升代表了模型能力的打破。GPT作为一种文本天生模型,为人们供应了与机器交互的全新办法,随着参数量的增加,模型能够获取更多的知识和语义关联,天生的文本更具有丰富性和创造性。从GPT4到GPT-4V则是向广度的优化,AI不仅仅局限于面向对话的模型和单一任务,它能够识图画图,就像人类学习新事物或办理多元化问题一样,延伸到更广的边界。接下来,GPTs和AI Agent是同时面向深度和广度的改进,具备了方案、影象、反思、利用工具的能力,向通用人工智能又迈进了一步,而Sticker Whiz则是个中一个小小的雏形。
看到我打印出来的贴纸,女儿最喜好个中的小公主图案,兴致勃勃的把小公主们一个个剪出来放进书包,说第二天要带去上学,送给她的好朋友们。看着她欢欢畅喜的样子,我的心情也随之通亮起来。溘然意识到,AI就像女儿一样,还是个孩子啊,但未来可期。