Skip to content

Home

Analysis of Core Banking Market in Australia and New Zealand

The core banking software market in Australia is sizable and expanding at a healthy clip. In 2024, Australia’s core banking software market was about US$480 million, and it is projected to reach roughly US$960 million by 2030, growing at a ~12.7% CAGR (2025–2030) (Australia Core Banking Software Market Size & Outlook, 2030) (Australia Core Banking Software Market Size & Outlook, 2030). New Zealand’s market is smaller (reflecting its population and banking sector size) but follows a similar trajectory of steady growth. Both countries have high banking penetration and mature financial systems, so growth is driven largely by technology upgrades and replacements of legacy systems rather than new bank formation.

Several key trends shape the ANZ core banking market:

  • Core Modernization & Cloud Migration: Banks are modernising decades-old core systems to enable real-time processing, agility, and product innovation (Australia’s Judo Bank Goes Live with Thought Machine’s Vault Core | The Fintech Times). Many core banking transformations involve shifting from on-premise mainframes to cloud-based cores or SaaS platforms for better scalability and resilience. For example, ANZ Bank New Zealand selected a cloud-native core (FIS’s Modern Banking Platform on Azure) to upgrade its legacy core (ANZ New Zealand selects FIS for core banking upgrade), a first outside the US. Similarly, Commonwealth Bank of Australia undertook a A$1+ billion core overhaul with SAP to achieve real-time, channel-agnostic banking (CBA unfazed by non-exclusive core banking deal - iTnews).

  • Digital Banking & Neobanks: The rise of digital-only banks and fintechs has spurred incumbents to accelerate core upgrades. Australia saw a wave of neobanks (e.g. 86 400, Volt Bank, Judo Bank, etc.) around 2018–2020 that built modern cores from scratch. For instance, neobank 86 400 adopted a cloud-native core from local provider Data Action, prioritizing open APIs and cost efficiency (How 86 400 built a cloud-native bank – Computerworld). Although some challengers were acquired or closed, they left a legacy of innovation that big banks are following (e.g. Bendigo Bank launching a digital bank “Up”). In New Zealand, traditional banks like Westpac NZ began modernizing via new core platforms (Infosys Finacle in Westpac’s case (Westpac NZ selects Infosys Finacle for Core Banking)) to keep pace with digital challengers.

  • Open Banking and API Integration: Australia’s Consumer Data Right (open banking) regime (launched mid-2020) has increased interconnection between banks and fintechs, pressuring banks to have core systems that can expose services via APIs (Australian banking market ready for core systems change - Pismo). Banks need flexible cores to share data securely and support fintech partnerships. This trend, along with real-time payments (e.g. Australia’s NPP), demands core systems with 24/7 availability and modular, API-driven architectures.

  • Regulatory Compliance & Security: Regulatory factors also drive core upgrades. Banks must comply with ever-evolving rules on data, resilience, and risk (APRA in Australia, RBNZ in NZ). Modern cores can help meet stringent security and uptime requirements. For example, Kiwibank (NZ) attempted a core replacement to improve compliance and innovation but faced delays and cost overruns with an SAP core project (Kiwibank’s SAP core banking system overhaul faces delays and budget increase), underscoring the challenge but also the regulatory expectation for robust systems.

Competitive Landscape

The competitive landscape for core banking technology in Australia and New Zealand is bifurcated. The market is served by a mix of long-established global vendors and newer cloud-native entrants, all vying for a limited number of bank clients. Most of the big four Australian banks historically built or bought proprietary or big-vendor cores (e.g. CBA with SAP, NAB with Oracle, Westpac and ANZ on older Hogan/COBOL systems). This means large deals are rare and hotly contested. Meanwhile, dozens of smaller institutions (regional banks, mutual banks, credit unions) provide a broad base of opportunities for vendors, albeit each deal is smaller.

  • Global Vendors Dominate: Traditional core banking providers like Temenos, Oracle FSS, Finastra, FIS, Fiserv, TCS, and Infosys Finacle have a strong presence. Many incumbent banks run one of these systems or a heavily customized variant. For example, Temenos is a popular choice in APAC and has implementations in the region (Temenos is a Leader in the IDC 2024 APAC Core Banking MarketScape) ([PDF] Asia/Pacific Digital Core Banking Platforms 2024 Vendor Assessment) (10x named as leader in IDC MarketScape for Asia/Pacific Digital ...). Oracle’s Flexcube was selected by NAB for its “NextGen” program and by others in the region (End is nigh for NAB core banking revamp). These established vendors compete on track record and breadth of functionality, but some struggle to shake a “legacy” image unless they offer new cloud versions.

  • Neo/Core Challenger Entrants: In recent years, cloud-native core providers (“neo cores”) have entered ANZ, promising faster implementation and flexibility. Examples include 10x Banking, Thought Machine, Mambu, and Vault/Core solutions. They are gaining traction especially with challenger banks and mid-tier institutions. Australia’s Judo Bank (an SME-focused challenger) migrated its lending operations to Thought Machine’s Vault core in 2024, citing the need to be free from “constraints of legacy systems” (Australia’s Judo Bank Goes Live with Thought Machine’s Vault Core | The Fintech Times). 10x Banking (a UK-based SaaS core) formed an alliance with Deloitte Australia to modernize mutual banks’ cores (10x and Deloitte deliver digital transformation to mutuals in Australia). These new players increase competition for the incumbent vendors, often competing on cloud technology, product flexibility, and speed to market rather than decades of references.

Overall, ANZ banks have a rich vendor choice, making the landscape competitive. However, switching core providers is a massive undertaking – so vendor “wins” usually come when a bank finally decides to replace a legacy system (a decision sometimes delayed for years). Notably, ANZ Bank’s group CIO has even said their old Hogan core isn’t yet a “hindrance,” with no immediate replacement plans (indicating the inertia and lengthy timelines in this market) (ANZ CIO says old core banking system “not a hindrance”). This suggests that while many vendors compete, the sales cycle is long and relationships/trust matter greatly.

Major Core Banking System Providers in ANZ

Both traditional core system providers and neo core banking platforms operate in Australia and New Zealand. Below is an overview of the major players in each category and their footprint:

Traditional Core Platform Vendors

Overall, traditional vendors in ANZ compete on reliability and comprehensive features. Many banks stick with incumbents or their in-house legacy due to the risk of change. This is why, for example, ANZ and Westpac still run decades-old Hogan mainframes with no immediate plans to swap (ANZ CIO says old core banking system “not a hindrance”). But as those systems age, the above vendors position themselves to capture the next replacement cycle.

Neo Core Banking Providers (Cloud-Native)

In the last few years, neo core providers – cloud-native platforms often provided by fintech start-ups – have gained attention in ANZ. These systems are typically offered as SaaS, built on modern microservices architecture, and promise faster time-to-market for new products.

  • 10x Banking (UK) – A cloud-native core founded by ex-Barclays CEO Antony Jenkins. 10x entered Australia via a partnership with Westpac in 2019 to build a Banking-as-a-Service platform (Westpac partners with 10x Future Technologies to build new platform). Westpac also invested in 10x, indicating strong interest in its technology. More recently (2024), 10x and Deloitte formed an alliance to target Australia’s mutual banks with a SaaS core solution (10x and Deloitte deliver digital transformation to mutuals in Australia). While 10x hasn’t yet announced a major Australian bank as a full core client, it’s viewed as a serious contender for banks looking to modernize incrementally or launch digital subsidiaries.

  • Thought Machine (UK) – Creator of the Vault core banking platform. Thought Machine has established a Sydney office and is actively serving the ANZ market (Australia’s Judo Bank Goes Live with Thought Machine’s Vault Core | The Fintech Times). A high-profile client is Judo Bank, which selected Vault for its lending business and went live in 2024 (Australia’s Judo Bank Goes Live with Thought Machine’s Vault Core | The Fintech Times). Thought Machine’s Vault is also behind Singapore’s Trust Bank, a new digital bank launched in 2022 (Singapore’s Trust Bank taps Thought Machine for core banking tech) (Singapore’s Trust Bank taps Thought Machine for core banking tech). Its technology, emphasizing flexibility and real-time capabilities, appeals to institutions that want to build products rapidly. In NZ or Australia, other banks rumored to be evaluating Vault include Tier-2 banks and digital bank startups. Thought Machine’s success with Standard Chartered’s digital banks in Asia (e.g. Mox in Hong Kong) adds credibility in the region (Singapore’s Trust Bank taps Thought Machine for core banking tech).

  • Mambu (Germany) – A SaaS banking engine that’s API-driven and widely used by fintech lenders and neobanks worldwide. Mambu has been active in Australia’s fintech scene: for instance, it was reportedly used by Volt Bank for deposit accounts and by other non-bank lenders. In 2021, Mambu won core banking deals in Vietnam and Colombia (2021: Top five core banking deals - FinTech Futures), showing its global reach. Its sweet spot is fast deployment for digital lending, deposit and payment products, making it a popular choice for greenfield digital banks or finance companies in SE Asia. Australian financial institutions that don’t require the full feature-set of a Temenos might opt for Mambu to launch specific products quickly.

  • Banking-as-a-Service Platforms: In addition to pure core vendors, some technology players offer “banking platform” services that include core functionality. For example, SAP (though not new, SAP’s cloud banking offering can be considered a modern approach used at CBA), and regional fintechs like Vault Payment Solutions (not to be confused with Thought Machine’s product). Microsoft and AWS also partner with core providers (ANZ NZ’s FIS core is on Azure (ANZ New Zealand selects FIS for core banking upgrade), and many new cores run on AWS by default).

  • Other Notables: Finxact (US, now Fiserv) and Vault Core (different from Vault Payments) are in early discussions in APAC. Starling Bank’s Engine (from UK) had one Australian taker via a fintech called Salt Money (Outdated systems holding you back? Back in… | Mambu), showing even challenger bank tech can enter the fray. These players are still emerging.

The presence of neo-core providers is significant because they introduce new competition and innovation. They often emphasize componentized cores, open APIs, microservice design, and faster upgrade cycles compared to the traditional core systems. Australian and NZ banks are evaluating these for either replacing specific modules or launching sidecars alongside the main core (as Westpac did with 10x BaaS). Going forward, the core banking market in ANZ is expected to be a blend – large banks might stick with proven vendors (possibly their new cloud versions), whereas smaller banks and new entrants could leapfrog to the neo solutions for agility.

System Integrators for Core Banking in ANZ

Implementing or replacing a core banking system is a complex, multi-year project, and system integrators (SIs) play a crucial role in this space. In Australia and New Zealand, banks typically enlist experienced consulting and IT services firms to help select, customize, and integrate core banking platforms. Below we identify key SIs specializing in core banking integration, along with the opportunities they are pursuing and competitive dynamics:

  • Accenture: A leading integrator in core banking globally and in ANZ. Accenture has been involved in landmark projects like CBA’s core modernization (as prime integrator alongside SAP) – CBA contracted Accenture for its A$580M core overhaul in 2008 (CBA unfazed by non-exclusive core banking deal - iTnews). Accenture’s Financial Services practice also has experience with Temenos, Finacle, and Oracle implementations. The firm often leads large-scale transformations, offering end-to-end services (from consulting to coding to change management). In ANZ, Accenture’s opportunity lies in the big banks’ eventual core replacements and major upgrades, as well as smaller banks that want a top-tier firm to de-risk their projects. Competitors to Accenture include other “Big 4” consultancies and global IT firms (and occasionally the bank’s own internal IT if they choose to self-manage).

  • Deloitte: Deloitte has a strong banking tech consulting arm in Australia/NZ and has recently made core banking modernization a focus, as seen by its alliance with 10x Banking (10x and Deloitte deliver digital transformation to mutuals in Australia). Deloitte often provides strategy, selection advice, and project assurance for core projects. They have led core system integration for some regional banks and were advisors on projects like TISA’s Flexcube deployment in PNG (June 2024: Top five core banking stories of the month). Deloitte’s opportunity is to leverage its global fintech partnerships (like with 10x and AWS) to capture mid-tier bank core transformations and the new wave of mutual bank upgrades. Competitively, Deloitte goes up against Accenture for big projects and against EY/PwC on advisory-led deals.

  • Capgemini: Capgemini and its acquired entity (IGATE) have implemented core banking systems (especially Finacle and Temenos) in Asia. In Australia, Capgemini helped some smaller institutions and was involved in parts of NAB’s Oracle-based program in the 2010s. Capgemini also has a delivery center in APAC that can support lower-cost development. They aim for opportunities in mid-size banks or as a vendor’s implementation partner. Capgemini competes with TCS and Infosys when those firms implement their own products, and with other multinational SIs.

  • IBM Consulting (IBM iX): IBM has historically been integrator for many bank IT systems. While not as frequently leading new core package implementations now, IBM was integral in maintaining older cores (like IBM’s mainframe systems) and has provided custom core solutions for some smaller banks. They also bring cloud infrastructure expertise for banks moving core to cloud. IBM’s opportunity is in hybrid projects – e.g. helping a bank modernize around a legacy core (APIs, middleware) or migrate to IBM Cloud. Competitors are the cloud-native specialists and other global SIs.

  • TCS, Infosys & Wipro: These India-headquartered IT services firms often implement their own core products (TCS BaNCS, Infosys Finacle) – for example, Infosys likely supported Westpac NZ’s Finacle rollout. They also serve as system integrators for third-party cores in some cases. TCS’s local Australian arm has a long history in banking (including an insurance and stock exchange systems). Wipro and Tech Mahindra have delivered Temenos and Finastra projects in APAC as well. These firms provide strong technical teams and cost advantages, which is an opportunity for cost-conscious banks. However, they often compete with the bank’s preference for a more local presence or with the product vendor’s own professional services.

  • DXC Technology (formerly CSC): DXC actually owns some legacy core systems (the Hogan system still used by ANZ Bank was originally from CSC). DXC provides core banking outsourcing for some smaller banks and continues to maintain legacy cores in the region. While not a frontrunner for new modern core projects, DXC’s role as custodian of old cores means it competes to keep banks on those systems vs. them moving to a new vendor. It also offers integration services around its own cores.

  • Specialist Fintech Integrators: A number of niche Australian firms focus on banking tech integration. For instance, Rubik Financial was an Australian company that provided core banking and channel solutions – Temenos acquired Rubik in 2017 to strengthen its local delivery (Temenos to acquire Australian partner Rubik for $50m). XPT/Xpert Digital implements digital banking front-ends and has Temenos expertise (Xpert Digital (XD) partners with Police Bank and Border Bank to ...). Such specialists often partner with core software vendors to implement mid-size projects. They compete on deep product knowledge and agility, but may be limited in scale for the largest transformations.

Opportunities: The core banking integration market in ANZ is poised for significant activity, as many banks are reaching the limits of their legacy platforms. Each major core replacement (e.g. if ANZ or Westpac decide to replace their core, or when mid-tier banks like BOQ, Kiwibank, etc. undertake projects) represents a huge opportunity for SIs – typically multi-year contracts worth tens or hundreds of millions. Additionally, the rise of digital banking (both new entrants and digital offshoots of incumbents) creates demand for smaller-scale core deployments, which SIs can support in a more modular, agile fashion. Even upgrades of existing core installations (e.g. moving an on-prem core to cloud, or adding new modules) require integration expertise.

Competitive Dynamics: Competition among SIs is intense. Global firms (Accenture, Deloitte, etc.) often leverage their strategic relationships and end-to-end capability to win prime contractor roles. Meanwhile, vendor-aligned integrators (TCS, Infosys, etc.) leverage their product know-how for faster delivery. We also see collaborations – for example, a big 4 consultancy might do project management while a tech firm handles configuration. Banks tend to invite multiple SIs to bid; selection factors include cost, experience with the chosen software, and ability to commit resources onshore. Notably, sometimes core vendors themselves have services teams that act as integrators (e.g. Temenos and SAP both provided engineers for CBA’s project, alongside Accenture (CBA unfazed by non-exclusive core banking deal - iTnews)). Thus, SIs also compete with the software vendors’ professional services and support units.

In summary, system integrators are key enablers of core banking change in ANZ. With many core projects expected in the coming 5–10 years, there is a substantial pipeline of opportunities for those firms – but winning and successfully delivering these projects requires strong credentials and partnership across the banking ecosystem.

Future Outlook: Technology, Regulation, and Market Dynamics

Looking ahead, the core banking market in Australia and New Zealand is set to evolve under the influence of new technologies, regulatory changes, and shifting market dynamics. Below are insights into the future outlook:

  • Cloud-Native and Modular Architectures: Future core banking systems will almost universally be cloud-enabled, whether as SaaS or private cloud deployments. Both incumbent vendors and new players are re-engineering their solutions to be modular (composed of microservices) and easily integrable. For banks, this means the possibility of a gradual core renewal – for example, implementing a new core for a subset of products or customers first (a “progressive renovation” strategy) rather than big-bang replacements. We can expect more ANZ banks to adopt hybrid core environments, where parts of the business run on a new cloud core (for agility) while legacy parts are phased out. The end-state target is often a composable banking architecture, where the core is one component plugged into an ecosystem of best-of-breed services (payments, fraud, analytics etc.). Technologies like containerization and Kubernetes will underpin many core deployments to ensure scalability. As an indicator, Vietnam’s regulator recently green-lit running core banking in the public cloud (Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape | by Victor Leung | Apr, 2025 | Medium) – a trend we anticipate in ANZ as APRA becomes more comfortable with cloud for critical systems.

  • Advanced Analytics and AI Integration: While core systems themselves handle transactions, the next-gen cores are being built with real-time data and analytics capabilities in mind. This includes feeding data to AI engines for personalized offers, and using machine learning for credit decisions or fraud detection at the core level. Australian banks are investing in data warehouses and AI; a modern core can provide richer, real-time data streams. We might see cores that have built-in AI ops for self-healing or that integrate with AI-based code tools (for instance, Accenture’s use of AI to interpret legacy code for core modernization (Core banking modernization: Unlocking legacy code with generative ...)). Over the next 5 years, AI could also assist in migration (automating data mapping from old to new systems) and in testing core systems.

  • Regulatory Factors: Regulators in both countries will heavily influence core banking trends. Australia’s APRA is focused on operational resilience – it has guidelines (CPS 230 etc.) that effectively require banks to ensure their core systems are robust and recoverable. This pushes banks toward active-active core setups, cloud DR, and updated software. Additionally, Open Banking compliance means banks must have systems that can expose data in standard formats on demand; older cores often struggle here, so banks may either wrap them with API layers or upgrade to more open cores. New Zealand’s RBNZ has been encouraging tech modernization as well, albeit through moral suasion more than formal mandates. Both countries also emphasize competition in banking: Australia’s licensing of new digital banks (and NZ’s consideration of fintech charters) creates an environment where incumbents know they must innovate or lose ground. Upcoming regulations on data privacy and security could also drive core upgrades (for better encryption, audit trails, etc.).

  • Market Dynamics and Competition: We anticipate a continued blurring of lines between incumbents and challengers. Incumbent banks are launching digital subsidiaries or brands (e.g. NAB’s UBank and the acquired 86 400 platform, Westpac’s planned digital bank via 10x) to defend market share. These initiatives often involve new core platforms, meaning more business for core vendors and integrators. The failure of some early neobanks (like Xinja, Vault in Australia) has tempered the market, but their technology approach (cloud-first core) has been validated by others like Judo and 86 400 (now UBank) being successful. Going forward, competitive dynamics will likely force all banks – large and small – to modernize their core to enable faster product rollout and seamless digital experiences. The competitive landscape of vendors will also shift: big vendors are acquiring smaller ones (e.g. Temenos buying Australian firm Rubik, Fiserv buying Finxact) to bolster their cloud offerings, while Big Tech companies (like AWS, Microsoft) deepen partnerships in core banking solutions, potentially even offering their own frameworks in the future.

  • Innovation: New Products & Services: With modern core systems, banks can more easily launch innovative products (such as buy-now-pay-later style loans, digital wallets, cryptocurrency custody, etc.). Australian and NZ banks are exploring these, and a flexible core is essential to support such innovation. For example, some banks are looking at blockchain for certain ledger functions or at least ensuring the core can integrate with distributed ledgers if needed (for trade finance or asset tokenization). While blockchain is not mainstream in core banking yet, future-ready cores are being designed to accommodate digital assets. Also, Banking-as-a-Service (BaaS) is emerging: big banks might use their core to offer services to fintechs (Westpac’s 10x platform is one case). This means cores must handle multi-tenant environments and open APIs, a trend that core vendors are embracing.

In summary, the future of core banking in Australia and New Zealand will likely see accelerated modernization as banks respond to digital consumer expectations and competitive pressures. Cloud-native cores, implemented in phases to mitigate risk, will become the norm. Banks that successfully upgrade will gain agility in launching services, whereas those that delay could find themselves hampered by legacy constraints (e.g., slow time to market, high IT costs, and even customer attrition). The regulatory environment – promoting competition and operational excellence – acts as both carrot and stick to encourage this evolution.

Australia/New Zealand vs. Southeast Asia: Market Growth Comparison

When comparing the core banking market outlook in Australia/New Zealand with that of Southeast Asia (focusing on Singapore, Thailand, and Vietnam), several contrasts emerge in terms of growth potential and drivers. Both regions are experiencing core banking transformations, but Southeast Asia’s market is generally in a higher-growth phase relative to the mature ANZ market. Below is a comparative analysis:

Market Maturity: Australia and New Zealand are highly mature banking markets – almost every adult has a bank account and the banking sector is dominated by a few large incumbents. Core banking activity is largely replacement and enhancement of existing systems. By contrast, Southeast Asia is more diverse: Singapore is mature (like ANZ, dominated by big banks), whereas Thailand and Vietnam are emerging markets with expanding banking sectors. In Vietnam, for example, banking penetration has been rising and new players are emerging alongside state-owned banks (Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape | by Victor Leung | Apr, 2025 | Medium) (Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape | by Victor Leung | Apr, 2025 | Medium). This means SEA has an element of greenfield growth (new banks, new customers) in addition to modernization of incumbents.

Growth Rates: The ANZ core banking tech market is growing steadily but modestly. As noted, Australia’s core banking software market is forecast ~12.7% CAGR to 2030 (Australia Core Banking Software Market Size & Outlook, 2030) – a robust rate for a developed market, driven by major upgrade cycles. New Zealand’s growth is likely similar in percentage terms (if from a smaller base). In Southeast Asia, growth rates are generally higher. Many banks in SEA are on the cusp of core replacements or first-time core implementations (for digital banks), which suggests double-digit growth that could exceed ANZ’s. For instance, the global core banking market CAGR is estimated ~18% (Core Banking Market Size & Share Analysis - Mordor Intelligence), with emerging Asia-Pacific countries contributing strongly to that uptick. Specifically, Vietnam is witnessing aggressive modernization – an overwhelming 94% of Vietnamese bank execs in one survey said slow tech transformation cost them customers (Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape | by Victor Leung | Apr, 2025 | Medium), reflecting urgency to invest. We can infer Vietnam’s spending on core tech will grow rapidly in the coming years. Thailand is introducing new virtual banks by 2025–2026, which will spur fresh core banking projects. Singapore, while saturated with incumbent tech, is still seeing growth via its new digital banks and incumbents adopting cloud – albeit growth is more incremental there (as many Singapore banks already modernized to some degree).

Key Drivers: In ANZ, core banking investment is driven by the need to replace aging systems, improve efficiency, meet regulatory mandates, and support digital channels for an already digitally-active customer base. The driver is often internal (bank strategy and cost) and regulatory (compliance). In Southeast Asia, drivers include financial inclusion and competition: regulators are issuing new licenses to increase competition (e.g. Singapore granted digital bank licenses in 2020, Thailand approving virtual banks in 2025 (Thailand Greenlights Three Digital Banks in FinTech Shake-Up), Vietnam encouraging digital-only banks via new guidelines). These moves require banks (new and old) to deploy modern core systems to serve new customer segments (underserved populations, SMEs, etc.) (Thailand Greenlights Three Digital Banks in FinTech Shake-Up) (Thailand Greenlights Three Digital Banks in FinTech Shake-Up). Additionally, consumer demand for digital banking is soaring in SEA with its young, mobile-first population – Vietnam has over 70% of people under 35 and high smartphone adoption (Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape | by Victor Leung | Apr, 2025 | Medium), fueling demand for cutting-edge digital banking services underpinned by flexible cores. Another driver in SEA is that many banks historically had outdated or patchwork cores (some ASEAN banks run 20+ year-old systems, or multiple systems per product) and now see an opportunity to leapfrog straight to cloud-native cores, whereas Australian banks often have one core but need to modernize it for agility.

Technology Adoption: Both regions are embracing cloud tech, but Southeast Asia may actually move faster in some respects because many banks there can adopt latest-gen systems without as much legacy baggage. For example, Vietnam’s VIB bank became the first in that country to run a core banking system fully on AWS cloud in 2023 (Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape | by Victor Leung | Apr, 2025 | Medium) – something no major Australian bank has done yet for their core (due to stricter regulatory posture historically). Also, new digital banks in Singapore and Thailand are architecting everything on cloud from day one. Australia/NZ banks are also moving to cloud, but mainly in hybrid mode and still ensuring compliance with stricter data standards. The net effect is SEA could see faster innovation cycles in core banking (new features, rapid scaling) as banks there may be less tied down by older infrastructure.

Regulatory Environment: Interestingly, regulators in Southeast Asia are in some cases more explicitly pushing core banking innovation. As mentioned, the State Bank of Vietnam has shown openness to cloud and modern tech. The Bank of Thailand’s virtual bank framework even evaluates applicants on their technology plans for reaching the unbanked (Bank of Thailand sticks to 3 virtual bank licences - Bangkok Post) (Thailand Greenlights Three Digital Banks in FinTech Shake-Up). In Singapore, the Monetary Authority (MAS) fostered an environment for digital banks to emerge with modern tech (e.g., requiring strong technology risk management but supporting cloud adoption). In Australia, regulators encourage modernization indirectly via operational risk guidelines and the open banking mandate, but they did not explicitly force core system changes – it’s been more market-driven. Therefore, regulation in SEA often acts as a catalyst for new core systems (through new licenses or explicit innovation agendas), whereas in ANZ it’s more of a nudge (ensuring systems meet standards, but not dictating how banks achieve that).

Competitive Landscape & Market Potential: In ANZ, the number of potential core deals is limited by the number of banks (the big four plus a handful of regionals hold most of the market share). Once those are modernized, the market may plateau until next refresh cycle many years later. Southeast Asia, however, has a large number of banks across various sizes (from giant state banks to small rural banks), and consolidation is still happening. There’s significant market potential for vendors to sell cores to many institutions. For example, Vietnam has dozens of joint-stock banks all upgrading in stages – Temenos alone commands ~37% of that market and still sees room to grow (Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape | by Victor Leung | Apr, 2025 | Medium). Similarly, Thailand’s mid-tier banks and new entrants will be shopping for cores in coming years. Southeast Asia also has foreign banks expanding (e.g. Chinese and Japanese banks setting up operations, requiring new systems), adding to demand.

In summary, Australia/New Zealand’s core banking market is in a mature, replacement-driven growth phase (steady but not explosive), whereas Southeast Asia’s is more dynamic with higher growth potential, fueled by financial sector expansion and digital entrants. ANZ banks benefit from strong existing infrastructure and are focusing on modernization for efficiency and product agility. Southeast Asian banks, on the other hand, are often building new capabilities outright – catching up or even leapfrogging – which translates to potentially faster growth in core banking investments.

The table below encapsulates some of the comparative points between the two regions:

Factor Australia & New Zealand Southeast Asia (Singapore, Thailand, Vietnam)
Market Maturity Very high – nearly 100% banked population, few new banks forming. Core projects are mainly replacements or upgrades in established banks. Mixed – ranges from mature (Singapore) to developing (Vietnam). New banks are being licensed (e.g. virtual banks), adding greenfield core implementations.
Core Market Growth Moderate 12–13% CAGR in software spend (Australia) (Australia Core Banking Software Market Size & Outlook, 2030); growth driven by tech refresh cycles. Total market size relatively small (hundreds of $M annually). Generally higher growth trajectory. Emerging markets show strong double-digit growth as many banks invest for the first time. Vietnam and others aggressively modernizing (94% of banks cite urgency) ([Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape
Key Growth Drivers Legacy replacement (aging mainframes -> modern core), digital channel demands from customers, and regulatory compliance (open banking, resilience). Competition is primarily incumbent vs incumbent, so efficiency and CX are drivers. Financial inclusion & competition – regulators enabling new entrants (digital banks in SG (Singapore’s Trust Bank taps Thought Machine for core banking tech), TH (Thailand Greenlights Three Digital Banks in FinTech Shake-Up)) pushing incumbents to upgrade. Also high customer growth in emerging economies and desire to leapfrog to digital-first services.
Technology Adoption Moving steadily to cloud/hybrid cloud cores, but often incrementally. Emphasis on integrating new modules (e.g. real-time payments) with stable legacy cores in interim. Cautious approach due to system criticality. Some banks skipping legacy tech entirely, going straight to cloud-native cores. Regulators increasingly open to cloud deployments ([Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape
Regulatory Environment Strong oversight (APRA, RBNZ) focusing on stability. Open Banking mandated in AU (since 2020) drives API capabilities (Australian banking market ready for core systems change - Pismo). No direct mandate to replace cores, but implicit pressure via operational risk standards. Proactive stance to boost innovation: new licenses come with expectation of innovative tech. E.g. Thai virtual banks must use innovative tech to reach the underbanked (Bank of Thailand sticks to 3 virtual bank licences - Bangkok Post). Regulators encourage modernization to support digital economy goals.
Vendor/Integrator Opportunity Limited number of large banks – each big core deal is huge but infrequent. Vendors face long sales cycles; SIs compete for a few big projects (e.g. one Big4 bank core replacement could be a once-in-decades event). Smaller bank segment provides continuous but smaller opportunities. Many banks at various stages of core upgrade – a broad base of opportunities. Multiple mid-sized banks and new banks seeking solutions simultaneously. Vendors can win many smaller deals that add up. SIs can partner across countries; local tech talent gaps mean outside integrators are welcomed.

Both regions will continue to invest in core banking transformation, but Southeast Asia’s banking market is expected to grow faster in terms of new core system adoptions. Australia and New Zealand, while growing more slowly, will still see significant modernization given the critical importance of banking (and the need to keep up with global digital banking standards). In fact, ANZ banks often observe the SEA experiments – for instance, seeing Singapore’s successful digital bank launches on cloud cores provides a valuable case study that may eventually encourage more aggressive moves in Australia’s big banks. Conversely, the experience of Australia’s large banks in executing massive core projects (CBA’s success, NAB’s challenges) offers lessons to banks in developing markets.

In conclusion, Australia and New Zealand present a stable but innovation-focused core banking market, whereas Southeast Asia offers a rapidly expanding and evolving landscape. A vendor or integrator evaluating these markets would find higher immediate growth potential in Southeast Asia, but also must navigate diverse requirements country by country. Meanwhile, the ANZ market, though slower, cannot be ignored – the deals there are large and the banks are often regional trendsetters in banking technology. Both regions are ultimately converging toward the same vision: modern, flexible core banking systems enabling the digital banking era, but they are starting from different points on the curve and moving at different speeds.

Sources:

  1. Grand View Research – Australia Core Banking Software Market Outlook (Australia Core Banking Software Market Size & Outlook, 2030) (Australia Core Banking Software Market Size & Outlook, 2030)
  2. FinTech Futures – ANZ New Zealand selects FIS Modern Banking Platform (ANZ New Zealand selects FIS for core banking upgrade); ANZ CIO on legacy core (Hogan) (ANZ CIO says old core banking system “not a hindrance”)
  3. FinTech Futures – June 2024 Core Banking Tech stories (Flexcube replacing Ultracs at TISA) (June 2024: Top five core banking stories of the month) (June 2024: Top five core banking stories of the month)
  4. iTnews – CBA’s Core Modernisation (SAP+Accenture) (CBA unfazed by non-exclusive core banking deal - iTnews)
  5. Thought Machine – Trust Bank (Singapore) selects Vault core (Singapore’s Trust Bank taps Thought Machine for core banking tech) (Singapore’s Trust Bank taps Thought Machine for core banking tech)
  6. Thought Machine – Judo Bank goes live on Vault (The Fintech Times) (Australia’s Judo Bank Goes Live with Thought Machine’s Vault Core | The Fintech Times)
  7. 10x Banking – Alliance with Deloitte Australia for mutual banks (10x and Deloitte deliver digital transformation to mutuals in Australia)
  8. Apps Run The World – Westpac NZ selects Finacle (2020) (Westpac NZ selects Infosys Finacle for Core Banking)
  9. FinTech Futures – Reserve Bank of Australia selects TCS BaNCS (TCS Bancs wins AU$13.6m core banking system contract with Reserve Bank of Australia)
  10. FinTech Futures – Kiwibank SAP core project delays (Kiwibank’s SAP core banking system overhaul faces delays and budget increase)
  11. Medium (Victor Leung) – Vietnam Core Banking Market Overview (modernization urgency, market share of vendors) (Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape | by Victor Leung | Apr, 2025 | Medium) (Core Banking Market in Vietnam, Marketing Strategies and Competitive Landscape | by Victor Leung | Apr, 2025 | Medium)
  12. Nation Thailand – Thailand greenlights 3 virtual banks (2025) (Thailand Greenlights Three Digital Banks in FinTech Shake-Up) (Thailand Greenlights Three Digital Banks in FinTech Shake-Up)
  13. Basiq/Pismo – Open Banking Australia arrival in 2020 (Australian banking market ready for core systems change - Pismo)
  14. Computerworld – How 86 400 built a cloud-native bank (Data Action core) (How 86 400 built a cloud-native bank – Computerworld)

澳洲與紐西蘭核心銀行市場研究報告

澳洲的核心銀行軟體市場在2024年約為4.8億美元,預計到2030年將成長至9.6億美元,年複合成長率(CAGR)約為12.7%。 紐西蘭市場規模較小,但成長趨勢相似。兩國的銀行滲透率極高,成長動力主要來自技術升級老舊系統更換

關鍵趨勢

  • 核心現代化與雲端遷移
  • 數位銀行與新創銀行興起
  • 開放銀行(Consumer Data Right)推進
  • 監管合規與資安要求提升

競爭格局

市場呈現老牌供應商與新興雲端供應商並存的局面。大型銀行核心更換週期長,中小型銀行則提供持續機會。

澳洲與紐西蘭的主要核心銀行系統供應商

傳統核心平台供應商

  • Temenos:在澳紐與東南亞地區擁有高市佔率,積極推動SaaS轉型。
  • Oracle FSS(Flexcube):大型銀行核心升級的主力供應商。
  • Finastra:滲透於支付領域與小型銀行。
  • FIS(Modern Banking Platform):進軍亞太市場,獲得ANZ NZ採用。
  • Infosys Finacle:數位渠道整合強項,支援Westpac NZ核心升級。
  • TCS BaNCS:中型銀行與中央銀行(如RBA)選用。
  • 本地供應商 Ultradata、Data Action:服務中小型金融機構。

新興雲端核心銀行平台

  • 10x Banking:與Westpac、Deloitte合作,主攻BaaS與信用社市場。
  • Thought Machine(Vault Core):服務Judo Bank與其他新創銀行。
  • Mambu:快速部署型SaaS核心,支援新創金融科技企業。

澳洲與紐西蘭的核心銀行系統整合商

主要系統整合商

  • Accenture:大型專案首選,如CBA核心轉型案。
  • Deloitte:10x Banking夥伴,活躍於信用合作社與中型銀行。
  • Capgemini:支援Finacle、Temenos等系統導入。
  • IBM Consulting:傳統主機系統維護與中介層升級。
  • TCS、Infosys、Wipro:自家產品導入與第三方整合服務。
  • DXC Technology:Hogan核心維護與外包服務。
  • 專精型金融科技整合商(Rubik Financial、Xpert Digital):中小型銀行市場專家。

市場機會與競爭動態

  • 大型核心更換專案(Big4銀行)單案規模龐大。
  • 中小型機構持續升級需求穩定。
  • 全球SI與本地專家競爭激烈,專案交付能力與本地化支援成關鍵。

未來展望:技術、監管與市場動態

技術趨勢

  • 全面雲端化、微服務架構
  • 即時數據處理與AI整合
  • 核心系統兼容分布式帳本與數位資產

監管趨勢

  • 澳洲APRA推動營運韌性(CPS 230)
  • 消費者資料權利(開放銀行API)
  • 強化數據隱私與資安要求

市場動態

  • 傳統銀行與新興數位銀行雙軌並進
  • 供應商與整合商競爭加劇
  • 核心系統現代化成為市場共識

澳洲/紐西蘭 vs 東南亞市場成長比較

項目 澳洲與紐西蘭 東南亞(新加坡、泰國、越南)
市場成熟度 高度成熟,核心升級為主 成長中,綠地市場機會多
成長速度 CAGR約12.7% CAGR約18%以上
成長驅動因素 系統老化替換、數位渠道需求 金融普及、虛擬銀行新設
技術採用 雲端混合模式、穩健升級 雲端原生快速普及
監管政策 間接促進數位化(開放銀行) 積極推動創新與普惠金融
供應商與整合商機會 少量大型專案,競爭激烈 多元中型專案,遍佈多國市場

Vibe Coding - A New Era of AI-Accelerated Software Development

Software development is undergoing a major transformation. With the rise of large language models (LLMs), developers are adopting a new methodology called Vibe Coding — a conversational, iterative process where AI plays a central role in moving ideas into working software efficiently. At its core, Vibe Coding emphasizes logical planning, leveraging AI frameworks, continuous debugging, checkpointing, and providing clear context to AI tools. It focuses on speed, experimentation, and AI-human collaboration.

Vibe Coding, or vibecoding, is a modern approach to software development that uses natural language prompts to instruct AI systems to generate code. The term was coined by computer scientist Andrej Karpathy in February 2025 and quickly gained widespread adoption across the tech industry. Vibe Coding aims to minimize manual coding by relying heavily on AI coding assistants like ChatGPT, Claude, Copilot, and Cursor.

In practice, users describe the desired functionality in plain language. AI interprets these prompts and generates code automatically. Users test the output, troubleshoot by interacting with the AI, and iterate until the software operates as expected. This highly conversational approach centers around collaboration with AI, with Karpathy summarizing the experience as: "I just see things, say things, run things, and copy-paste things, and it mostly works."

Several key principles define the Vibe Coding mindset. It prioritizes natural language input over manual code writing, trusts the AI to handle the majority of development work, and favors rapid prototyping over immediate code perfection. The goal is to build a working version first, refine only when necessary, and accept that some imperfection is tolerable — particularly for non-critical or experimental projects. Vibe Coding also lowers the barrier to entry, making it possible for even beginners to create functional software.

Typical use cases for Vibe Coding include rapid prototyping of new ideas, building small personal productivity tools, learning new frameworks or programming languages with AI guidance, and accelerating minimum viable product (MVP) development for startups and small teams. However, it also carries limitations. AI-generated code may be messy or inefficient. Debugging can be more difficult when the user doesn't deeply understand the AI-written code. Vibe Coding is not recommended for production-grade systems that require high reliability, security, and maintainability. Overreliance on AI outputs without human review can introduce significant risks.

Compared to traditional AI-assisted programming, Vibe Coding involves deeper trust in the AI system. In Vibe Coding, users allow the AI to generate most or all of the code, perform minimal code review, and focus primarily on achieving working results quickly. In traditional AI-assisted coding, the human developer remains in control, uses AI mainly as a helper, conducts thorough reviews, and maintains responsibility for the final product. While Vibe Coding suits fast-moving projects and non-critical applications, traditional coding remains essential for production systems.

To succeed with Vibe Coding, developers need several core skills. Logical planning is crucial — clearly structuring what needs to be built before starting prompts. Awareness of AI-friendly frameworks like Rails, Django, and Next.js enables faster development. Frequent checkpointing using Git or cloud snapshots ensures stability and reduces the risk of irreversible mistakes. Developers must maintain discipline in debugging, often resetting to clean baselines to prevent technical debt. Context management is equally critical: providing the AI with full project context, documentation, and environment details significantly improves code generation accuracy.

Selecting the right tools also plays a major role. Cursor offers a deep AI integration experience inside a professional, local environment ideal for more serious projects. Windsurf is optimized for rapid prototyping and fast-paced prompting. Replit provides instant online coding, strong multiplayer capabilities, and is perfect for collaborative experiments and demos.

Tom Blomfield, a partner at Y Combinator, shares advanced Vibe Coding techniques that emphasize planning, testing, and modularity. Developers are encouraged to plan project structures in markdown before coding, prioritize integration tests over unit tests, and use AI across the stack for tasks like hosting and asset generation. When encountering problems, switching between LLMs (such as Gemini, Claude, or Sonnet) can be highly effective. Voice input and screenshots can accelerate communication with AI, and keeping the code modular — with small, clean files — supports easier collaboration between humans and AI. Regular refactoring is necessary to maintain code quality even as prototypes grow.

The Vibe Coding workflow is straightforward: describe the intended functionality clearly to the AI, generate the implementation, test the output, debug collaboratively if needed, save progress, and repeat. This iterative loop enables developers to build complex applications faster without being constrained by traditional coding bottlenecks.

Vibe Coding is reshaping the software development landscape by making building software faster, more accessible, and more experimental. It enables quick exploration of ideas at low cost but demands careful oversight to ensure that quality, security, and maintainability are not compromised. While Vibe Coding is highly effective for rapid prototyping, side projects, learning exercises, and early-stage MVPs, traditional coding practices remain indispensable for mission-critical and enterprise-grade applications. By mastering both the advantages and limitations of Vibe Coding, developers can unlock new levels of productivity and innovation in modern software development.

Vibe Coding - AI加速軟體開發的新時代

軟體開發正在經歷一場重大轉變。隨著大型語言模型(LLMs)的興起,開發者正在採用一種名為 Vibe Coding 的新方法論——這是一種以對話和迭代為核心,讓AI在將想法轉化為可運作軟體過程中扮演關鍵角色的開發方式。本質上,Vibe Coding 強調邏輯規劃、活用AI框架、持續除錯、建立檢查點,以及向AI工具提供明確上下文。它聚焦於速度、實驗性與AI與人類之間的協作。

Vibe Coding,或稱為 vibecoding,是一種現代化的軟體開發方法,透過自然語言提示來指導AI系統產生程式碼。這個術語由電腦科學家 Andrej Karpathy 於2025年2月提出,並迅速在科技界廣泛傳播。Vibe Coding 的目標是大量減少手動編碼,依賴如 ChatGPT、Claude、Copilot 和 Cursor 等AI編碼助手。

在實踐中,使用者以自然語言描述希望軟體具備的功能,AI解讀這些指示並自動生成程式碼。使用者測試輸出結果,與AI互動進行除錯,並反覆迭代,直到軟體按預期運作。這種高度對話式的方法以與AI的協作為中心,Karpathy 將這種經驗總結為:「我只看到事情、說出需求、執行程式、複製貼上,結果大多能運作。」

Vibe Coding 的心態由幾個關鍵原則定義。它優先考慮以自然語言輸入需求,而非手動撰寫程式碼,信任AI負責大部分開發工作,並且重視快速原型製作而非一開始就追求完美。目標是先構建出能運作的版本,僅在必要時進行細部優化,並接受一定程度的瑕疵,特別是在非關鍵或實驗性專案中。此外,Vibe Coding 降低了軟體開發的門檻,讓即使是初學者也能創造出功能性軟體。

Vibe Coding 的典型應用場景包括新想法的快速原型開發、小型個人效率工具的構建、在AI指導下學習新框架或程式語言,以及加速新創公司和小團隊的MVP(最小可行產品)開發。然而,它也有局限性。AI生成的程式碼可能混亂或低效,當使用者無法深刻理解AI編寫的程式時,除錯可能更加困難。對於需要高度可靠性、安全性和可維護性的生產等級系統,並不建議採用Vibe Coding。過度依賴未經充分審查的AI輸出,亦可能帶來重大風險。

與傳統的AI輔助程式設計相比,Vibe Coding 涉及更高程度的對AI系統的信任。在Vibe Coding中,使用者允許AI生成大部分甚至全部程式碼,進行最小限度的人工審查,並專注於快速實現可運作的成果。而在傳統AI輔助編碼中,開發者仍然掌握主導權,將AI作為輔助工具,並且嚴格進行代碼審查,對最終產品負責。儘管Vibe Coding適合快速推進的項目和非關鍵應用,傳統的編碼方法在生產系統中依然不可或缺。

為了成功運用Vibe Coding,開發者需要具備幾項核心技能。邏輯規劃至關重要——在開始提示之前,清楚地規劃要構建的內容。了解如 Rails、Django、Next.js 等對AI友善的框架,可以加速開發進程。透過Git或雲端快照頻繁建立檢查點,能確保穩定性並降低不可逆錯誤的風險。開發者必須在除錯時保持紀律,經常回到乾淨的基礎狀態以防止技術債堆積。上下文管理同樣關鍵:向AI提供完整的專案背景、相關文件及環境細節,可顯著提升生成程式碼的準確性。

選擇合適的工具亦扮演重要角色。Cursor 提供在專業本地環境中與AI深度整合的體驗,適合需要專注開發的項目。Windsurf 則針對快速原型開發和高頻率提示優化,非常適合進行實驗。Replit 則提供即時線上編碼和強大的多人協作能力,非常適合用於共同實驗和展示原型。

來自 Y Combinator 的合夥人 Tom Blomfield 分享了進階的 Vibe Coding 技巧,強調規劃、測試與模組化的重要性。他建議開發者在編碼前用Markdown規劃好專案結構,優先考慮整合測試而非單元測試,並在各層面上善用AI(如網站託管、資產生成等)。遇到問題時,切換不同的LLM(如Gemini、Claude或Sonnet)往往能找到更好的解法。利用語音輸入和截圖工具(如Aqua)可以加速與AI的溝通。同時,保持程式碼的模組化(小且清晰的檔案)有助於人與AI的協作,即使專案規模擴大,也能透過定期重構維持程式品質。

Vibe Coding 的工作流程十分直接:清晰地向AI描述功能需求,生成初步實作,測試結果,必要時與AI協作除錯,儲存進度,然後重複這個循環。這種迭代流程讓開發者能夠更快速地建構複雜應用程式,而不受傳統開發瓶頸的限制。

Vibe Coding 正在重塑軟體開發的格局,使建構軟體變得更快速、更具可及性與更具實驗性。它讓開發者能以低成本迅速探索各種創意,但也需要謹慎管理,以確保品質、安全性與可維護性不被犧牲。雖然Vibe Coding非常適合用於快速原型、個人專案、學習練習和早期MVP開發,但對於任務關鍵型或企業等級的應用,傳統的編碼實踐依然至關重要。透過理解並掌握Vibe Coding的優勢與限制,開發者能在現代軟體開發中解鎖更高的生產力與創新力。

Building Code Agents with Hugging Face smolagents

In the fast-evolving world of AI, agents have emerged as one of the most exciting frontiers. Thanks to projects like Hugging Face's smolagents, building specialized, secure, and powerful code agents has never been easier. In this post, we'll walk through the journey of agent development, explore how to build code agents, discuss secure execution strategies, learn how to monitor and evaluate them, and finally, design a deep research agent.

A Brief History of Agents

Agents have evolved dramatically over the past few years. Early LLM applications were static: users asked a question; models generated an answer. No memory, no decision-making, no real "agency."

But researchers dreamed of more: systems that could plan, decide, adapt, and act autonomously.

We can think of agency on a continuum:

  • Level 0: Stateless response (classic chatbots)
  • Level 1: Short-term memory and reasoning (ReAct pattern)
  • Level 2: Long-term memory, dynamic tool use
  • Level 3: Recursive self-improvement, autonomous goal setting (still experimental)

Early attempts at agency faced an "S-curve" of effectiveness. Initially, more agency added more confusion than benefit. But with improvements in prompting, tool use, and memory architectures, we're now climbing the second slope: agents are finally becoming truly effective.

Today, with frameworks like smolagents, you can build capable agents that write, execute, and even debug code in a secure and monitored environment.

Introduction to Code Agents

Code agents are agents specialized to generate and execute code to achieve a goal. Instead of just answering, they act programmatically.

Let's build a basic code agent with Hugging Face's smolagents:

from smolagents import Agent

agent = Agent(system_prompt="You are a helpful coding agent. Always solve tasks by writing Python code.")

response = agent.run("Write a function that calculates the factorial of a number.")

print(response)

What's happening?

  • We initialize an Agent with a system prompt.
  • We run a user query.
  • The agent responds by writing and executing Python code.

Sample Output:

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

Secure Code Execution

Running arbitrary code is risky. Even a well-meaning agent could:

  • Try to use undefined commands.
  • Import dangerous modules.
  • Enter infinite loops.

To build safe agents, we must:

  1. Capture Exceptions:
try:
    exec(agent_code)
except Exception as e:
    print(f"Error occurred: {e}")
  1. Filter Non-Defined Commands:

  2. Use a restricted execution environment, e.g., exec with a sanitized globals and locals dictionary.

  3. Prevent OS Imports:

  4. Scan code for forbidden keywords like os, subprocess, etc.

  5. Or disable built-ins selectively.

  6. Handle Infinite Loops:

  7. Run code in a separate thread or process with timeouts.

  8. Sandbox Execution:

  9. Use Python's multiprocessing or even Docker-based isolation for truly critical applications.

Example Secure Exec:

import multiprocessing

def safe_exec(code, timeout=2):
    def target():
        try:
            exec(code, {"__builtins__": {"print": print, "range": range}})
        except Exception as e:
            print(f"Execution error: {e}")

    p = multiprocessing.Process(target=target)
    p.start()
    p.join(timeout)
    if p.is_alive():
        p.terminate()
        print("Terminated due to timeout!")

Monitoring and Evaluating the Agent

Good agents aren't just built; they are monitored and improved over time.

Enter Phoenix.otel — an open telemetry-based tool to monitor LLM applications.

Key Metrics to Track:

  • Latency (response time)
  • Success/Error rates
  • Token usage
  • User feedback

Integration Example:

from phoenix.trace import init_tracing

init_tracing(service_name="code_agent")

# Your agent code here
agent.run("Write a quicksort algorithm.")

With this, every agent interaction is automatically traced and sent to your telemetry backend.

You can visualize execution traces, errors, and resource usage to continuously fine-tune the agent.

Building a Deep Research Agent

Sometimes, writing code isn't enough — agents need to research, retrieve information, and act based on live data.

We can supercharge our code agent with Tavily Browser, a retrieval-augmented generation (RAG) tool that lets agents browse the web.

Example:

from smolagents import Agent
from tavily import TavilyBrowser

browser = TavilyBrowser()
agent = Agent(
    system_prompt="You are a deep research coding agent.",
    tools=[browser]
)

response = agent.run("Find the latest algorithm for fast matrix multiplication and implement it.")
print(response)

Now your agent can:

  • Search academic papers.
  • Extract up-to-date methods.
  • Code the solution dynamically.

Building agents that combine reasoning, execution, and real-world retrieval unlocks a whole new level of capability.

Final Thoughts

We are entering a new era where agents can autonomously reason, code, research, and improve.

Thanks to lightweight frameworks like Hugging Face's smolagents, powerful browsing tools like Tavily, and robust monitoring with Phoenix.otel, building secure, powerful, and monitored code agents is now within reach for any developer.

The frontier of autonomous programming is wide open.

What will you build?

使用 Hugging Face smolagents 建立程式代理人

在快速演進的 AI 世界中,代理人(Agents) 成為最令人興奮的前沿領域之一。多虧了 Hugging Face 的 smolagents,現在建立專業化、安全且功能強大的程式代理人變得前所未有地簡單。在本文中,我們將探索代理人發展歷程、學習如何建立程式代理人、討論安全執行策略、了解如何監控與評估代理人,最後設計一個深入研究型的代理人。

代理人簡史:走向更高自主性的道路

代理人在過去幾年中經歷了巨大的演變。早期的 LLM 應用是靜態的:用戶提問,模型回答。沒有記憶、沒有決策、也沒有真正的 "自主性"。

但研究人員渴望更多:能夠規劃決策適應、並自主行動的系統。

我們可以將自主性視為一個連續光譜:

  • Level 0:無狀態回應(傳統聊天機器人)
  • Level 1:短期記憶與推理(ReAct 模式)
  • Level 2:長期記憶、動態工具使用
  • Level 3:遞迴自我改進、自主設定目標(仍在研究中)

早期的代理人嘗試面臨 "S 曲線" 效益挑戰。最初,自主性增加反而帶來更多混亂。但隨著提示工程、工具使用與記憶架構的進步,我們正攀登第二段斜坡:代理人終於變得真正有效。

今天,藉由像 smolagents 這樣的框架,你可以輕鬆建立能撰寫、執行、甚至除錯程式碼的代理人。

介紹程式代理人(含範例)

程式代理人 是專門用來生成並執行程式碼以達成目標的代理人。他們不只是回答,而是以程式行動

讓我們用 Hugging Face 的 smolagents 建立一個基本的程式代理人:

from smolagents import Agent

agent = Agent(system_prompt="You are a helpful coding agent. Always solve tasks by writing Python code.")

response = agent.run("Write a function that calculates the factorial of a number.")

print(response)

發生了什麼事? - 初始化一個具有系統提示的 Agent。 - 使用 run 來執行使用者查詢。 - 代理人透過撰寫並執行 Python 程式碼回應。

範例輸出:

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

安全執行程式碼

執行任意程式碼具有風險。即使是善意的代理人也可能: - 嘗試使用未定義的指令。 - 匯入危險模組。 - 進入無限迴圈。

要建立安全代理人,必須做到:

  1. 捕捉例外

    try:
        exec(agent_code)
    except Exception as e:
        print(f"Error occurred: {e}")
    

  2. 過濾未定義指令

  3. 使用受限的 globalslocals 字典執行 exec

  4. 防止危險匯入

  5. 掃描程式碼中是否包含如 ossubprocess 等危險關鍵字。
  6. 或選擇性地禁用部分 built-ins。

  7. 處理無限迴圈

  8. 在獨立執行緒或程序中運行程式碼並設定超時。

  9. 沙箱化執行

  10. 使用 Python 的 multiprocessing,甚至是 Docker 隔離關鍵應用。

安全執行範例:

import multiprocessing

def safe_exec(code, timeout=2):
    def target():
        try:
            exec(code, {"__builtins__": {"print": print, "range": range}})
        except Exception as e:
            print(f"Execution error: {e}")

    p = multiprocessing.Process(target=target)
    p.start()
    p.join(timeout)
    if p.is_alive():
        p.terminate()
        print("Terminated due to timeout!")

監控與評估代理人

好的代理人不僅要建構,還要持續監控與改進

使用 Phoenix.otel —— 一個基於 OpenTelemetry 的工具,來監控 LLM 應用程式。

需追蹤的關鍵指標: - 延遲(回應時間) - 成功/錯誤率 - Token 使用量 - 用戶回饋

整合範例:

from phoenix.trace import init_tracing

init_tracing(service_name="code_agent")

# 你的代理人程式碼
agent.run("Write a quicksort algorithm.")

透過此方式,每次代理人互動都會自動追蹤並傳送到遙測後端。

你可以視覺化執行過程、錯誤與資源使用情況,持續優化代理人。

建立深入研究型代理人(使用 Tavily Browser)

有時候,單純撰寫程式碼還不夠 —— 代理人需要研究檢索資訊,並基於即時資料行動。

我們可以使用 Tavily Browser 為程式代理人加持,打造檢索增強生成(RAG)能力。

範例:

from smolagents import Agent
from tavily import TavilyBrowser

browser = TavilyBrowser()
agent = Agent(
    system_prompt="You are a deep research coding agent.",
    tools=[browser]
)

response = agent.run("Find the latest algorithm for fast matrix multiplication and implement it.")
print(response)

現在你的代理人可以: - 搜尋學術論文。 - 抽取最新的方法論。 - 動態撰寫並執行程式碼。

結合推理執行即時檢索的代理人,開啟了全新層級的能力。

結語

我們正進入一個代理人能自主推理、編程、研究與持續改進的新時代。

有了像 Hugging Face smolagents 這樣的輕量級框架,加上 Tavily 的強大檢索功能與 Phoenix.otel 的監控工具,建立安全強大可監控的程式代理人已觸手可及。

自主編程的疆界已全面展開。

你會打造什麼?

LangSmith - Visibility While Building with Tracing

As the complexity of LLM-powered applications increases, understanding what’s happening under the hood becomes crucial—not just for debugging but for continuous optimization and ensuring system reliability. This is where LangSmith shines, providing developers with powerful tools to trace, visualize, and debug their AI workflows.

In this post, we'll explore how LangSmith enables deep observability in your applications through tracing, allowing for a more efficient and transparent development process.

Tracing with @traceable

The cornerstone of LangSmith’s tracing capabilities is the @traceable decorator. This decorator is a simple and effective way to log detailed traces from your Python functions.

How it Works

By applying @traceable to a function, LangSmith automatically generates a run tree each time the function is called. This tree links all function calls to the current trace, capturing essential information such as:

  • Function inputs
  • Function name
  • Execution metadata

Furthermore, if the function raises an error or returns a response, LangSmith captures this and adds it to the trace. The result is sent to LangSmith in real-time, allowing you to monitor the health of your application. Importantly, this happens in a background thread, ensuring that your app’s performance remains unaffected.

This method is invaluable when debugging or identifying the root cause of an issue. The detailed trace data allows you to trace errors back to their source and quickly rectify problems in your codebase.

Code Example: Using @traceable

from langsmith.traceable import traceable
import random

# Apply the @traceable decorator to the function you want to trace
@traceable
def process_transaction(transaction_id, amount):
    """
    Simulates processing a financial transaction.
    """
    # Simulate processing logic
    result = random.choice(["success", "failure"])

    # Simulate an error for demonstration
    if result == "failure":
        raise ValueError(f"Transaction {transaction_id} failed due to insufficient funds.")

    return f"Transaction {transaction_id} processed with amount {amount}."

# Call the function
try:
    print(process_transaction(101, 1000))  # Expected to succeed
    print(process_transaction(102, 2000))  # Expected to raise an error
except ValueError as e:
    print(e)
Explanation:
  • The @traceable decorator logs detailed traces each time the process_transaction function is called.
  • Inputs such as transaction_id and amount are automatically captured.
  • Execution metadata, such as the function name, is also logged.
  • If an error occurs (as in the second transaction), LangSmith captures the error and associates it with the trace.

Adding Metadata for Richer Traces

LangSmith allows you to send arbitrary metadata along with each trace. This metadata is a set of key-value pairs that can be attached to your function runs, providing additional context. Some examples include:

  • Version of the application that generated the run
  • Environment in which the run occurred (e.g., development, staging, production)
  • Custom data relevant to the trace

Metadata is especially useful when you need to filter or group runs in the LangSmith UI for more granular analysis. For instance, you could group traces by version to monitor how specific changes are impacting your system.

Code Example: Adding Metadata

from langsmith.traceable import traceable

@traceable(metadata={"app_version": "1.2.3", "environment": "production"})
def process_order(order_id, user_id, amount):
    """
    Processes an order and simulates transaction completion.
    """
    # Simulate order processing logic
    if amount <= 0:
        raise ValueError("Invalid order amount")
    return f"Order {order_id} processed for user {user_id} with amount {amount}"

try:
    print(process_order(101, 1001, 150))
    print(process_order(102, 1002, -10))  # This will raise an error
except ValueError as e:
    print(f"Error: {e}")
Explanation:
  • The metadata parameter is added to the decorator, including the app version and environment.
  • This metadata will be logged with the trace, allowing you to filter and group runs by these values in LangSmith’s UI.

LLM Runs for Chat Models

LangSmith offers special processing and rendering for LLM traces. To make full use of this feature, you need to log LLM traces in a specific format.

Input Format

For chat-based models, inputs should be logged as a list of messages, formatted in an OpenAI-compatible style. Each message must contain:

  • role: the role of the message sender (e.g., user, assistant)
  • content: the content of the message
Output Format

Outputs from your LLM can be logged in various formats:

  1. A dictionary containing choices, which is a list of dictionaries. Each dictionary must contain a message key with the message object (role and content).
  2. A dictionary containing a message key, which maps to the message object.
  3. A tuple/array with the role as the first element and content as the second element.
  4. A dictionary with role and content directly.

Additionally, LangSmith allows for the inclusion of metadata such as:

  • ls_provider: the model provider (e.g., "openai", "anthropic")
  • ls_model_name: the model name (e.g., "gpt-4o-mini", "claude-3-opus")

These fields help LangSmith identify the model and compute associated costs, ensuring that the tracking is precise.

LangChain and LangGraph Integration

LangSmith integrates seamlessly with LangChain and LangGraph, enabling advanced functionality in your AI workflows. LangChain provides powerful tools for managing LLM chains, while LangGraph offers a visual representation of your AI workflow. Together with LangSmith’s tracing tools, you can gain deep insights into how your chains and graphs are performing, making optimization easier.

Tracing Context Manager

Sometimes, you might want more control over the tracing process. This is where the Tracing Context Manager comes in. The context manager gives you the flexibility to log traces for specific blocks of code, especially when it's not feasible to use a decorator or wrapper.

Using the context manager, you can control the inputs, outputs, and other trace attributes within a specific scope. It integrates smoothly with the @traceable decorator and other wrappers, allowing you to mix and match tracing strategies depending on your use case.

Code Example: Using the Tracing Context Manager

from langsmith.traceable import TraceContext

def complex_function(data):
    # Start tracing specific block of code
    with TraceContext() as trace:
        # Simulate processing logic
        result = sum(data)
        trace.set_metadata({"data_size": len(data), "processing_method": "sum"})
        return result

# Call the function
print(complex_function([1, 2, 3, 4, 5]))
Explanation:
  • The TraceContext context manager is used to start tracing for a specific block of code (in this case, summing a list of numbers).
  • You can set additional metadata using trace.set_metadata() within the context.
  • This method gives you fine-grained control over where and when traces are logged, providing flexibility when you cannot use the @traceable decorator.

Conversational Threads

In many LLM applications, especially chatbots, tracking conversations across multiple turns is critical. LangSmith’s Threads feature allows you to group traces into a single conversation, maintaining context as the conversation progresses.

Grouping Traces

To link traces together, you’ll need to pass a special metadata key (session_id, thread_id, or conversation_id) with a unique value (usually a UUID). This key ensures that all traces related to a particular conversation are grouped together, making it easy to track the progression of each interaction.

Summary

LangSmith empowers developers with unparalleled visibility into their applications, especially when working with LLMs. By leveraging the @traceable decorator, adding rich metadata, and using advanced features like tracing context managers and conversational threads, you can optimize the performance, reliability, and transparency of your AI applications.

Whether you're building complex chat applications, debugging deep-seated issues, or simply monitoring your system’s health, LangSmith provides the tools necessary to ensure a smooth development process. Happy coding!

LangSmith - 建立過程中的可視性與追蹤

隨著LLM(大規模語言模型)驅動的應用程式越來越複雜,了解系統背後的運作變得至關重要——這不僅對調試至關重要,還有助於持續優化和確保系統可靠性。在這方面,LangSmith發揮了重要作用,為開發者提供了強大的工具來追蹤、可視化和調試其AI工作流程。

在這篇文章中,我們將探討LangSmith如何通過追蹤功能為你的應用程式提供深度可觀察性,從而實現更高效且透明的開發過程。

使用 @traceable 進行追蹤

LangSmith追蹤功能的基石是 @traceable 裝飾器。這個裝飾器是一種簡單有效的方法,用來記錄Python函數的詳細追蹤信息。

它是如何運作的

通過將 @traceable 應用到一個函數,LangSmith會在每次調用該函數時自動生成一棵運行樹。這棵樹將所有函數調用鏈接到當前的追蹤,並捕捉以下重要信息:

  • 函數輸入
  • 函數名稱
  • 執行元數據

此外,若函數引發錯誤或返回回應,LangSmith會捕捉到這些信息並將其添加到追蹤中。結果會實時發送到LangSmith,讓你可以監控應用程式的健康狀況。重要的是,這一切發生在後台執行緒中,確保應用程式的性能不受影響。

這種方法對於調試或識別問題的根源至關重要。詳細的追蹤數據讓你能夠追溯錯誤的源頭,並迅速修正代碼中的問題。

代碼範例:使用 @traceable

from langsmith.traceable import traceable
import random

# 將 @traceable 裝飾器應用到你想追蹤的函數
@traceable
def process_transaction(transaction_id, amount):
    """
    模擬處理金融交易。
    """
    # 模擬處理邏輯
    result = random.choice(["success", "failure"])

    # 模擬錯誤,演示使用
    if result == "failure":
        raise ValueError(f"交易 {transaction_id} 由於資金不足而失敗。")

    return f"交易 {transaction_id} 已處理,金額為 {amount}。"

# 調用函數
try:
    print(process_transaction(101, 1000))  # 預期成功
    print(process_transaction(102, 2000))  # 預期引發錯誤
except ValueError as e:
    print(e)
解釋:
  • @traceable 裝飾器會在每次調用 process_transaction 函數時記錄詳細的追蹤信息。
  • 輸入(如 transaction_idamount)會自動捕捉。
  • 執行元數據(如函數名稱)也會被記錄。
  • 如果發生 錯誤(如第二次交易),LangSmith會捕捉錯誤並將其與追蹤關聯。

為更豐富的追蹤添加元數據

LangSmith允許你與每個追蹤一起發送任意元數據。這些元數據是一組鍵值對,可以附加到你的函數運行中,提供額外的上下文信息。以下是一些示例:

  • 生成運行的應用程式 版本
  • 運行發生的 環境(例如:開發、測試、上線)
  • 與追蹤相關的 自定義數據

元數據在需要過濾或分組運行時特別有用,這可以讓你在LangSmith的UI中進行更精細的分析。例如,你可以按版本分組追蹤,監控特定變更對系統的影響。

代碼範例:添加元數據

from langsmith.traceable import traceable

@traceable(metadata={"app_version": "1.2.3", "environment": "production"})
def process_order(order_id, user_id, amount):
    """
    處理訂單並模擬交易完成。
    """
    # 模擬訂單處理邏輯
    if amount <= 0:
        raise ValueError("無效的訂單金額")
    return f"訂單 {order_id} 為用戶 {user_id} 處理,金額為 {amount}"

try:
    print(process_order(101, 1001, 150))
    print(process_order(102, 1002, -10))  # 這將引發錯誤
except ValueError as e:
    print(f"錯誤: {e}")
解釋:
  • 元數據 參數被添加到裝飾器中,包含應用程式版本和環境。
  • 這些元數據會與追蹤一起記錄,允許你在LangSmith的UI中按這些值進行過濾和分組。

LLM 聊天模型的運行

LangSmith提供了對LLM(大規模語言模型)追蹤的特別處理和渲染。為了充分利用這一功能,你需要按照特定格式記錄LLM的追蹤。

輸入格式

對於基於聊天的模型,輸入應該作為消息列表記錄,並以OpenAI兼容的格式表示。每條消息必須包含:

  • role:消息發送者的角色(例如:userassistant
  • content:消息的內容
輸出格式

LLM的輸出可以以以下幾種格式記錄:

  1. 包含 choices 的字典,choices 是字典列表,每個字典必須包含 message 鍵,該鍵對應消息對象(角色和內容)。
  2. 包含 message 鍵的字典,該鍵對應消息對象。
  3. 包含兩個元素的元組/數組,第一個元素是角色,第二個元素是內容。
  4. 包含 rolecontent 直接的字典。

此外,LangSmith還允許包含以下元數據:

  • ls_provider:模型提供者(例如:“openai”,“anthropic”)
  • ls_model_name:模型名稱(例如:“gpt-4o-mini”,“claude-3-opus”)

這些字段幫助LangSmith識別模型並計算相關的成本,確保追蹤的精確性。

LangChain 和 LangGraph 集成

LangSmith與 LangChainLangGraph 無縫集成,使你的AI工作流程擁有更強大的功能。LangChain為管理LLM鏈提供了強大的工具,而LangGraph則提供了可視化的AI工作流程表示。結合LangSmith的追蹤工具,你可以深入了解你的鏈和圖的表現,從而更輕鬆地進行優化。

追蹤上下文管理器

有時候,你可能希望對追蹤過程有更多控制。這時,追蹤上下文管理器 可以派上用場。這個上下文管理器讓你能夠為特定的代碼區塊記錄追蹤,特別是當無法使用裝飾器或包裝器時。

使用上下文管理器,你可以在特定範圍內控制輸入、輸出和其他追蹤屬性。它與 @traceable 裝飾器和其他包裝器無縫集成,讓你根據需要混合使用不同的追蹤策略。

代碼範例:使用追蹤上下文管理器

from langsmith.traceable import TraceContext

def complex_function(data):
    # 開始追蹤特定代碼區塊
    with TraceContext() as trace:
        # 模擬處理邏輯
        result = sum(data)
        trace.set_metadata({"data_size": len(data), "processing_method": "sum"})
        return result

# 調用函數
print(complex_function([1, 2, 3, 4, 5]))
解釋:
  • 使用 TraceContext 上下文管理器來開始追蹤特定代碼區塊(在此案例中是對一組數字求和)。
  • 你可以使用 trace.set_metadata() 在上下文中設置附加的元數據。
  • 這種方法讓你能夠精細控制在哪裡和何時記錄追蹤,提供了在無法使用 @traceable 裝飾器時的靈活性。

聊天會話追蹤

在許多LLM應用中,特別是聊天機器人,追蹤多輪對話至關重要。LangSmith的 會話(Threads) 功能允許你將多個追蹤組織為單一會話,並在會話進行過程中保持上下文。

追蹤分組

為了將追蹤關聯起來,你需要傳遞一個特殊的元數據鍵(session_idthread_id,或 conversation_id)和唯一值(通常是UUID)。這個鍵確保與特定會話相關的所有追蹤會被分組在一起,便於追蹤每次交互的進展。

小結

LangSmith為開發者提供了前所未有的應用程式可見性,特別是在處理LLM時。通過利用 @traceable 裝飾器、添加豐富的元數據以及使用追蹤上下文管理器和會話追蹤等先進功能,你可以優化AI應用程式的性能、可靠性和透明度。

無論你是在構建複雜的聊天應用、調試深層次問題,還是單純監控系統的健康狀況,LangSmith都提供了確保開發過程順利進行所需的工具。祝你編程愉快!

LangChain - From Simple Prompts to Autonomous Agents

As large language models (LLMs) like OpenAI’s GPT-4 continue to evolve, so do the frameworks and techniques that make them easier to use and integrate into real-world applications. Whether you're building a chatbot, automating document analysis, or creating intelligent agents that can reason and use tools, understanding how to interact with LLMs is key. This post walks through a practical journey of using both the OpenAI API and LangChain — exploring everything from basic prompt engineering to building modular, structured, and even parallelized chains of functionality.

Sending Basic Prompts with OpenAI and LangChain

The first step in any LLM-powered app is learning how to send a prompt and receive a response.

Using OpenAI API directly:

import openai

openai.api_key = "your-api-key"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response['choices'][0]['message']['content'])

Using LangChain with OpenAI under the hood:

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(model_name="gpt-4")
response = chat([HumanMessage(content="Explain quantum computing in simple terms.")])
print(response.content)

LangChain abstracts away boilerplate while enabling advanced functionality.

Streaming and Batch Processing with LangChain

LangChain simplifies both streaming and batch processing:

Streaming Responses:

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

chat = ChatOpenAI(
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
    model_name="gpt-4"
)

chat([HumanMessage(content="Tell me a long story about a brave cat.")])

Batch Processing:

messages = [
    [HumanMessage(content="What is AI?")],
    [HumanMessage(content="Define machine learning.")],
]

responses = chat.batch(messages)
for r in responses:
    print(r.content)

Iterative Prompt Engineering

Prompt engineering is not a one-and-done task. It's an iterative process of experimentation and improvement.

Start simple:

"Summarize this article."

Then refine:

"Summarize this article in bullet points, emphasizing key technical insights and potential implications for developers."

Observe results. Adjust tone, structure, examples, or context as needed. LangChain allows quick iteration by swapping prompt templates or changing message context.

Prompt Templates for Reuse and Abstraction

LangChain provides prompt templates to create reusable, parameterized prompts.

from langchain.prompts import ChatPromptTemplate

template = ChatPromptTemplate.from_template("Translate '{text}' to {language}")
prompt = template.format_messages(text="Hello", language="Spanish")

This modularity is essential as your application grows more complex.

LangChain Expression Language (LCEL)

LCEL enables you to compose reusable, declarative chains like functional pipelines.

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
llm = ChatOpenAI(model="gpt-4")
parser = StrOutputParser()

chain = prompt | llm | parser
print(chain.invoke({"topic": "AI"}))

You can compose chains in a clean, modular way using LCEL's pipe operator.

Custom Runnables for Extensibility

Sometimes, you need to insert custom logic into a chain. LangChain allows this with custom runnables.

from langchain_core.runnables import RunnableLambda

def uppercase(input: str) -> str:
    return input.upper()

uppercase_runnable = RunnableLambda(uppercase)
chain = prompt | uppercase_runnable | llm

Perfect for injecting business logic or data preprocessing into a flow.

Composing Chains and Running in Parallel

Chains can be composed to run sequentially or in parallel:

Parallel example:

from langchain.schema.runnable import RunnableParallel

parallel_chain = RunnableParallel({
    "english": prompt.bind(topic="cats"),
    "spanish": prompt.bind(topic="gatos")
}) | llm | parser

result = parallel_chain.invoke({})
print(result)

This is great for multi-lingual output, comparison tasks, or speeding up multiple independent calls.

Understanding Chat Message Types

Working with system, user, and assistant roles allows for nuanced conversations.

messages = [
    {"role": "system", "content": "You are a kind tutor."},
    {"role": "user", "content": "Help me understand Newton's laws."}
]

You can experiment with few-shot examples, chain-of-thought reasoning, or tightly controlling behavior via the system message.

Storing Messages: Conversation History for Chatbots

Use LangChain’s ConversationBufferMemory to track chat history:

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
conversation = ConversationChain(llm=chat, memory=memory)

conversation.predict(input="Hello!")
conversation.predict(input="Can you remember what I just said?")

This enables persistent, context-aware chatbot behavior.

Structured Output from LLMs

LangChain helps enforce response schemas:

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel

class Info(BaseModel):
    topic: str
    summary: str

parser = PydanticOutputParser(pydantic_object=Info)

chain = prompt | llm | parser
result = chain.invoke({"topic": "cloud computing"})

You get structured, type-safe data instead of freeform text.

Analyzing and Tagging Long Documents

LangChain supports splitting and analyzing long documents:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_text(long_text)

# Process each chunk with a summarization chain

Apply tagging, summarization, sentiment analysis, and more at scale.

Augmenting LLMs with Custom Tools

To overcome the limits of LLMs, you can give them access to tools like search, databases, or calculators.

from langchain.agents import load_tools, initialize_agent

tools = load_tools(["serpapi", "llm-math"], llm=chat)
agent = initialize_agent(tools, chat, agent="zero-shot-react-description")

agent.run("What is the weather in Singapore and what is 3*7?")

LLMs can now act based on real-world data and logic.

Creating Autonomous Agents with Tool Use

Agents go a step further: they reason about when to use tools and how to combine outputs.

LangChain’s agent framework lets you build intelligent systems that think step-by-step and make decisions, improving user experience and application power.

Final Thoughts

We started with simple prompts and ended up creating parallelized, structured, tool-augmented LLM pipelines — all thanks to the power of OpenAI's API and LangChain. Whether you're building a smart assistant, document analyzer, or fully autonomous agent, mastering these tools and patterns gives you a strong foundation to push the boundaries of what’s possible with LLMs.

使用 LangChain - 從基礎提示到自主代理

隨著大型語言模型(LLMs)如 OpenAI 的 GPT-4 持續演進,讓它們更容易使用與整合到真實世界應用中的框架與技術也不斷推進。不論你是在打造聊天機器人、自動化文件分析,或是創建能推理與使用工具的智能代理,理解如何與 LLM 互動都是關鍵。這篇文章會帶你從基礎開始,使用 OpenAI API 和 LangChain,逐步深入到模組化、結構化,甚至可以平行作業的鏈式功能。

使用 OpenAI 和 LangChain 傳送基礎提示

開發任何 LLM 應用的第一步,就是學會如何發送提示(prompt)並接收回應。

直接使用 OpenAI API

import openai

openai.api_key = "your-api-key"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "你是一個樂於助人的助手。"},
        {"role": "user", "content": "用簡單的方式解釋量子運算。"}
    ]
)

print(response['choices'][0]['message']['content'])

使用 LangChain(底層仍是 OpenAI)

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

chat = ChatOpenAI(model_name="gpt-4")
response = chat([HumanMessage(content="用簡單的方式解釋量子運算。")])
print(response.content)

LangChain 幫助你隱藏繁瑣的細節,同時開啟進階功能的大門。

使用 LangChain 串流和批次處理回應

LangChain 也讓串流回應與批次處理變得很簡單:

串流回應

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

chat = ChatOpenAI(
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
    model_name="gpt-4"
)

chat([HumanMessage(content="講一個關於勇敢貓咪的長故事。")])

批次處理

messages = [
    [HumanMessage(content="什麼是人工智慧?")],
    [HumanMessage(content="介紹一下機器學習。")]
]

responses = chat.generate(messages)
for res in responses.generations:
    print(res[0].text)

迭代式提示工程的重要性

在許多情境中,單次提示通常無法直接得到理想的回應。因此,我們必須進行迭代式提示工程:反覆調整、測試和改進提示,直到引導 LLM 產生符合需求的回應。這種實踐讓開發者可以更好地控制 LLM 行為,提高模型回應的準確性與品質。

抽象與重用提示:使用提示模板

若你發現自己不斷重複類似的提示內容,可以透過提示模板(Prompt Templates)來抽象化提示。這樣做可以提高重用性,減少錯誤,同時讓系統更容易擴展和維護。

探索 LangChain 表達式語言(LCEL)

LangChain Expression Language(LCEL)允許你以優雅且模組化的方式組合鏈(chains)。LCEL 不只讓提示組合變得容易,還能清楚定義每個步驟的輸入與輸出,讓開發複雜流程變得直覺又清晰。

創建自訂 LCEL 功能:客製化 Runnable

當內建元件無法滿足你的需求時,你可以透過自訂 Runnable來擴充 LCEL。自訂 Runnable 讓你可以添加自己的邏輯,插入到鏈條的任意位置,實現高度個性化的行為。

組合與平行作業的鏈

LangChain 允許你串接多個鏈條(chains),甚至可以設計鏈條並行作業。這意味著你可以同時處理多個請求或多個分析步驟,大幅提高整體運算效率和應用的即時性。

深入理解 Chat LLM 的訊息類型

與聊天型 LLM 互動時,我們使用不同訊息類型(message types),例如:

  • system 訊息(設定模型角色)
  • user 訊息(用戶輸入)
  • assistant 訊息(模型回應)

掌握這些訊息類型,才能有效地使用少量樣本提示(few-shot prompting)思維鏈提示(chain-of-thought prompting),並且透過控制 system 訊息來定義 LLM 的行為與語氣。

聊天記憶:保存對話歷史

為了讓聊天機器人能記住過去的對話內容,我們可以將人類訊息(HumanMessage)和 AI 訊息(AIMessage)儲存下來。這樣可以實現更自然的多輪對話,並且讓 LLM 理解上下文。

定義結構化輸出格式

有時,我們需要 LLM 產生特定結構的回應(例如 JSON 或表格格式)。透過明確地在提示中指定輸出結構,可以大幅提升 LLM 生成可解析資料的可靠度。

結構化資料分析與標記

透過結合 LLM 與結構化資料集合,可以執行如長篇文件分析、自動化分類、標籤生成等多種文本處理任務,讓 LLM 成為強大的資料助手。

超越 LLM:自訂工具與代理

雖然 LLM 很強大,但它們也有侷限。透過創建自訂工具(custom tools)並提供給 LLM 使用,可以擴展模型能力,讓它能查詢外部資料、運算或操作 API。

更進一步,可以打造具備推理能力的智能代理(agents)。代理能根據情境判斷何時使用工具,並將工具的結果整合到最終回應中,實現更複雜的任務自動化。


結語

從單純發送提示,到打造模組化、結構化、甚至具備推理與工具使用能力的智能代理,LLM 應用開發是一場精彩的旅程。掌握 OpenAI API 和 LangChain 的運用,將為你的 AI 開發帶來無限可能!