As an end user you have access to various websites and web applications. You login and and then can look up data, submit requests, manipulate existing records. But in many cases you do not have an API that you can invoke so there is no way to programmatically perform the things that you can do manually. Having that API can be quite convenient to automate workflows and even more so in the age of AI powered agents. An API is akin to having an MCP Server which in turn enables agents to make use of that same functionality.
This article describes a trick — an approach that gives you an API for anything you can do manually. A work around with a few moving pieces — the most important of which is a Chromium Browser Extension.
What the approach outlined below allows you to do: send an HTTP request to a REST API to execute any action in any Web Application that you can perform manually and return any results and data from that Web Application. Some examples: send an email through the Outlook Web Client connected to your company email account, create an expense report in your company’s HR system, collect the top news stories from a paywalled site you have an account on, check customer status in Salesforce, create cloud resource through the Cloud provider’s portal.
The image shows the flow in the application.
The image shows (in light colored rectangles) the Chromium browser extension that is at the heart of the implementation. The background.js script runs as a service worker. It receives the SSE endpoint — exposed by server.js — from settings.js. The registers as an SSE client with server.js. Then it sits back and waits. For any supported web site or web application, a tailor made content.js script is created and configured in the browser extension. This script performs DOM manipulation to mimic the manual actions performed by the user. (This is very similar to what you can do with automated browser test tools such as Playwright, Selenium and Puppeteer)
When an application — or agent — makes a call to the API, server.js publishes an SSE, a event with the details from the API call as well as a request identifier and a callback endpoint. Background.js receives the event and gets to work. It determines which web application it needs to perform the requested action. It checks if there is a tab open for that application currently in the browser. If there is, it will publish a message intended for the content.js script that has been injected to the required web application. This script takes instructions from the message, performs the actions through DOM manipulation — leveraging anything the web client allows the user to do and then in a programmatic way — and collects the results. It responds to the message from background.js. Background.js receives the [Web Application’s] response from content.js; if a call back url was indicated in the SSE event that started this whole sequence, background.js will now send a message to that URL with the outcome of the action. At this point, server.js can complete the API request and send the appropriate response.
This approach allows you to expose things you can do — with your personal accounts — in web applications you can access on your device as an API that on your device makes it possible for applications, scripts and agents to perform what you can perform. It hinges on a browser running with tabs open and connected for all applications required for handling the API requests.
An example: API for sending emails through my corporate Outlook Email account
I spend a lot of time sending emails. And many of these emails are quite similar to dozens of colleagues who attend sessions and workshops I organize. Emails that can easily be automated — especially with the aid of an LLM. However, sending the email still requires an API (or MCP Server) and I do not at present have access to the M365 Graph API that is required for sending emails. I will now show you how I automated email sending through an API (that is powered by a local browser running a tab with the Outlook Web Client loaded and authenticated).
The end result:
In this example I run a simple NodeJS command line script. It asks me for details for the email to be sent. In then makes the HTTP call to the API. The API server publishes the SSE event with the email details. The background.js script consumes the SSE event. It unpacks it, determines the action to take: which tab to target and which message to broadcast. The message with “sendEmail” action it sends is consumed by the content.js script. This manipulates the DOM of the Outlook client to create the email and send it. It reports back to background.js that in turn reports back to the callback url (also the API server).
This whole sequence took around 5 seconds which would put the capacity of the email sending API at about 12 mail messages per minute. Provided that the Outlook Web Client is not used for handling other API calls — or the human user themselves. Not superfast obviously — but still a helluvalot faster than I could it manually.
Note that the code and easily be extended to support multiple recipients, cc, bcc, images, attachments, scheduled send. Perhaps we can even enlist Copilot in the Outlook web client to rewrite the mail body.
The code for this prototype is in this GitHub Repo. Instructions on how to use it are in the readme file.
More details on the inner workings.
The extension is loaded into the browser as an unpacked extension from the local folder that contains file manifest.json. The extension can be inspected and configured from within the browser.
The image shows the extension after it has been loaded. The link to extension Options has been clicked. This brings up the web page where the SSE Endpoint can be configured. It refers to the local endpoint published by the server.js script — the NodeJS application.
The essential code in the content script — the script that is injected as part of the extension into the Outlook Web Client — for composing and sending the email looks like this:
async function sendEmail(data) {
try {
// Click the New Message button
const newMessageButton = await waitForElement('button[aria-label="New message"], button[aria-label="New mail"]');
newMessageButton.click();
// Wait for compose form to appear
const toField = await waitForElement('div[aria-label="To"], input[aria-label="To"]');
// Give the compose form time to fully initialize
await sleep(1000);
// Fill in recipient
await fillRecipient(toField, data.to);
// Fill in subject
const subjectField = await waitForElement('input[aria-label="Subject"]');
subjectField.focus();
subjectField.value = data.subject;
subjectField.dispatchEvent(new Event('input', { bubbles: true }));
// Fill in body - find the editable div for the email body
const bodyField = await waitForElement('div[role="textbox"][aria-label="Message body, press Alt+F10 to exit"], div[contenteditable="true"][aria-label="Message body, press Alt+F10 to exit"], div[role="textbox"][aria-label="Message body"], div[contenteditable="true"][aria-label="Message body"]');
// Set HTML content
bodyField.innerHTML = data.body;
bodyField.dispatchEvent(new Event('input', { bubbles: true }));
// Click Send button
const sendButton = await waitForElement('button[aria-label="Send"], button[name="Send"]');
sendButton.click();
return { success: true };
} catch (error) {
console.error('Error sending email:', error);
There is much more — but this piece emulates the end user — clicking on the New Message button, populating the recipient and subject and writing the body before hitting the send button.
The other essential pieces: the bridge between background.js (the service worker that receives the SSE event) and content.js (the injected script in the Outlook Web Client that does the email specific work.
First the background.js script:
// Process email instructions received from SSE
async function processEmailInstruction(instruction) {
// Validate the instruction
if (!instruction.to || !instruction.subject || !instruction.body) {
console.error('Invalid email instruction:', instruction);
return { success: false, error: 'Invalid email instruction', requestId: instruction.requestId };
}
try {
// Find or create an Outlook tab
const outlookTabResult = await activateOrCreateOutlookTab();
if (!outlookTabResult.success) {
throw new Error('Failed to open Outlook tab: ' + outlookTabResult.error);
}
// Give the tab time to load if it's new
const delay = outlookTabResult.existing ? 500 : 3000;
await new Promise(resolve => setTimeout(resolve, delay));
// ==>>> Send the email instruction to the content script
return new Promise((resolve) => {
chrome.tabs.sendMessage(outlookTabResult.tabId, {
action: 'sendEmail',
data: {
to: instruction.to,
subject: instruction.subject,
body: instruction.body
}
}, (response) => { // <<======
if (chrome.runtime.lastError) {
console.error('Error sending message to content script:', chrome.runtime.lastError);
resolve({
success: false,
error: chrome.runtime.lastError.message,
requestId: instruction.requestId
});
return;
}
if (response && response.success) {
console.log('Email sent successfully via SSE instruction');
resolve({
success: true,
requestId: instruction.requestId
});
} else {
The other end of this collaboration, the content.js script:
/**
* Content script for interacting with Outlook Web Client
* This file serves as the entry point and message router for the extension
*/
// Import functionality from separate modules
import { sendEmail } from './email.js';
// Listen for messages from the extension's background script
chrome.runtime.onMessage.addListener(function (request, sender, sendResponse) {
// Handle email-related messages
if (request.action === 'sendEmail') {
sendEmail(request.data)
.then(result => sendResponse(result))
.catch(error => sendResponse({ success: false, error: error.message }));
return true; // Required for async sendResponse
}
// If no handler matched, return false
return false;
});
Summary
Anything you can do through a web browser can be turned into an API. That allows automation of all manual tasks — in traditional scripts and applications and also as part of agentic workflows. What is exposed through an API can just as easily be exposed as an MCP Server.
The real hard work in all of this is creating the code to do the proper DOM manipulation.