꿈꾸는 늑대의 날로먹는 개발일지

Table of Contents

[TOC]

1. 개요

소스 저장소:
https://gitlab.ntiple.com/developers/loadbalance-node-launcher

이번 프로젝트에서 next-js(v14) 를 도입했는데 성능테스트를 하던 중 node 서버가
일반적으로는 single-thread 로 작동한다는것을 처음 알았다.

웹투비 서버에 얹어서(reverse-proxy) 쓰고 있었는데 스루풋이 밀린다는거다
(cpu, memory 사용률은 피크가 20% 대 밖에 안나오고..)

솔직히 전에 웹투비 서버에 데인적이 많아서 웹투비 에서 막힌거 아니냐고 옥신각신 하다가

~~WebToB Shit!!!~~

젠장 java로 날코딩해서 만드는 웹서버 튜토리얼 코드도 멀티스레드를 고려해서 내놓는 판인데.
설마 싱글스레드로 만들었겠어?

라는 생각으로 npm run start 해서 브라우저로 테스트 해 보니

일단 완전히 직렬화 된 싱글스레드는 아니고 (그래프에서 녹색이 서버 처리중, 파란색이 결과 다운로드 상태)

~~최대 피크 6개 스레드 정도는 받아내는 듯 하다~~

네트워크 등 상태에 따라 다르겠지만 JMeter로 측정해 본 결과 1개 인스턴스에서 약 200tps 까지는 받아낸다.

실제로 이전 프로젝트 때도 브라우저 로그에서 여러줄 뜬거 보고 아! 멀티스레딩은 되는 구나 했던 기억이....

생각 같아서는 코어소스를 뜯어보고싶은 생각이 아주 쵸큼 들기는 했지만

뭐 여튼! 이걸로 운영 서비스로 쓰기에는 애매하다는 거다.

그리고 일단은 당장 당일 근로시간 안에 모든 해결책을 마련해야만 하는 상황이었다.

2. 그래서..

별수 있나 뭐... 일단은 인터넷에서 정보를 모아 보기로 했다.

바로 첫줄에 뜬거 보니

https://stackoverflow.com/questions/72848978/is-next-js-backend-multithreaded-by-default

대충 PM2 라는걸 이용하라는거 같다.

https://pm2.keymetrics.io/docs/usage/quick-start/

...

일단...좀.. 보자~~~~

...

아! 몰라! 복잡하다구!!!!..

일단 당시에는 당일안에 모든걸 마쳐야 하는 만큼 급해서 뭘 제대로 읽을 시간이 없었다.

일단 설치하고.. (다행히 고객사 nexus 에 pm2 가 들어있었다.) 실행부터 해보자

$ mkdir launcher
$ cd launcher
$ echo "{}" > package.json
$ npm install pm2

added 138 packages, and audited 139 packages in 5s
13 packages are looking for funding
  run `npm fund` for details
found 0 vulnerabilities

$ npx pm2 start

[PM2][ERROR] File ecosystem.config.js not found

ecosystem.config.js 를 작성하란다.. (다른경로의 node 를 실행하려면 cwd 옵션을 주어야 한다.)

module.exports = {
  apps : [{
    name: 'app',
    cwd: '/home/coder/documents/study2',
    script: 'npm',
    args: 'start',
    exec_mode: 'cluster',
    env: { },
  }, {
    name: 'app2',
    cwd: '/home/coder/documents/study2',
    script: 'npm',
    args: 'start',
    exec_mode: 'cluster',
    env: { },
  }]
}

이제 다시 실행해 보자

$ npx pm2 start

[PM2] Spawning PM2 daemon with pm2_home=/home/coder/.pm2
[PM2] PM2 Successfully daemonized
[PM2][WARN] Applications app, app2 not running, starting...
[PM2] App [app] launched (1 instances)
[PM2] App [app2] launched (1 instances)
┌────┬─────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name    │ namespace   │ version │ mode    │ pid      │ uptime │ ↺    │ status    │ cpu      │ mem      │ user     │ watching │
├────┼─────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 0  │ app     │ default     │ N/A     │ cluster │ 2925574  │ 0s     │ 0    │ online    │ 0%       │ 49.5mb   │ coder    │ disabled │
│ 1  │ app2    │ default     │ N/A     │ cluster │ 2925575  │ 0s     │ 0    │ online    │ 0%       │ 49.9mb   │ coder    │ disabled │
└────┴─────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘

$ npx pm2 list

┌────┬─────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name    │ namespace   │ version │ mode    │ pid      │ uptime │ ↺    │ status    │ cpu      │ mem      │ user     │ watching │
├────┼─────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 0  │ app     │ default     │ N/A     │ cluster │ 0        │ 0      │ 15   │ errored   │ 0%       │ 0b       │ coder    │ disabled │
│ 1  │ app2    │ default     │ N/A     │ cluster │ 2925575  │ 40s    │ 0    │ online    │ 0%       │ 63.2mb   │ coder    │ disabled │
└────┴─────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘

뭔가 이상하다... 두개 인스턴스가 실행되어야 하는데 한개만 실행된다.... 로그를 확인해 보자

$ npx pm2 log

... 중략 ...
 ⨯ Failed to start server
Error: listen EADDRINUSE: address already in use :::3000
... 중략 ...

3000번 포트충돌이 일어났다....

아니이!!!!!!!!!!! PM2 로 실행하면 멀티스레드로 실행하면서 다 해결해 주는거 아니었어?!!!!

시간없어서 못 읽어본 게시물 내용을 다시한번 정독해 보았다....

... 중략 ...

Instead of multithreading and clustering you can run Next.js on multiple instances, for example containers, behind a load balancer. It gives you horizontal scaling.

... 중략 ...

...

multiple instance 란다 (여러개 실행시키라는거지..).

그것도 behind a load balancer 로 (포트 여러개 띄워서 부하분산기 뒤에 놓으라고..).

...

말하자면 이런거

stateDiagram-V2
direction TB
USER --> LOAD_BALANCER
PM2 --> LOAD_BALANCER: 프로세스관리
PM2 --> INSTANCE_1: 프로세스관리
PM2 --> INSTANCE_2: 프로세스관리
PM2 --> INSTANCE_3: 프로세스관리
PM2 --> INSTANCE_N: 프로세스관리
LOAD_BALANCER --> INSTANCE_1: 부하분산
LOAD_BALANCER --> INSTANCE_2: 부하분산
LOAD_BALANCER --> INSTANCE_3: 부하분산
LOAD_BALANCER --> INSTANCE_N: 부하분산
note left of LOAD_BALANCER 3000번 포트로 실행 end note
note right of INSTANCE_1 3001번 포트로 실행 end note
note right of INSTANCE_2 3002번 포트로 실행 end note
note right of INSTANCE_3 3003번 포트로 실행 end note
note right of INSTANCE_N 300n번 포트로 실행 end note

...

아! 정말 시간 없다구우웃!!!!!!!!!!!

혹시나 다른 사용방법이 있나 해서 챗GPT 의 도움을 한번 받아보기로 했다 (콩글리쉬 질문 주의!!)

[ 나 ]
show me pm2 config for nextjs-14 using 3 thread port number 3000

[ CHAT-GPT ]
Here's a sample ecosystem.config.js configuration file for PM2 to run a Next.js 14 application using 3 instances (threads) on port 3000:

module.exports = {
  apps: [
    {
      name: 'nextjs-app',
      script: 'npm',
      args: 'start',
      instances: 3,
      exec_mode: 'cluster',
      env: {
        NODE_ENV: 'production'
      }Create a proxy server script (proxy-server.js):,
      env_port_0: {
        PORT: 3000
      },
      env_port_1: {
        PORT: 3001
      },
      env_port_2: {
        PORT: 3002
      }
    }
  ]
};

[ CHAT-GPT ]
To use this configuration:
Save the above content to a file named ecosystem.config.js in the root directory of your Next.js project.
... 중략 ...
Configure Nginx to balance traffic across the different ports:
... 중략 ...

[ 나 ]
pm2 cannot proxy ?

[ CHAT-GPT ]
PM2 itself is primarily a process manager for Node.js applications and does not include built-in proxying capabilities. However, you can use PM2 to manage a Node.js application that acts as a proxy. Here’s how you can set up a proxy server using PM2 with either http-proxy or express-http-proxy:
... 중략 ...
Create a proxy server script (proxy-server.js):

const express = require('express');
const proxy = require('express-http-proxy');

const app = express();

const targets = [
  'http://localhost:3001',
  'http://localhost:3002',
  'http://localhost:3003'
];

app.use((req, res, next) => {
  const target = targets[Math.floor(Math.random() * targets.length)];
  proxy(target)(req, res, next);
});

const port = 3000;
app.listen(port, () => {
  console.log(`Proxy server listening on port ${port}`);
});

대충 내용 요약 해 보면, nodejs 를 멀티스레드로 사용하려면 PM2 로 여러개 포트로 인스턴스를 띄우고, nginx 같은걸로 로드밸런싱 시켜주어라 nginx 를 못 쓰면 직접 만들어 써라

PM2에는 멀티스레드 프록시 기능 그딴거 없다

라는거다....

...

결국은 위 소스를 토대로 약간 변경해서 어찌저찌 성공하기는 했다.

성능도 뭐 뛰어나지는 않지만 그럭저럭 쓸만은 하다. 라는 정도 선에서 성능 테스트도 통과했다...

정말 간발의 차이로 근무시간 이내에...

...

3. 그리고 나서..

대충 만들어서 고객사에 설치해주긴 했지만. 아무래도 설정들 하드코딩 한것 들이 맘에 안들어서

집에와서 복습해 보았다.

일단 설정부 ecosystem.config.js 는 하드코딩이 들어가지 않도록 env 를 적극 활용하기로 했고

loadbalancer.js 에서 ecosystem.config.js 를 참조하여 자동설정 하도록 수정했다.

ecosystem.config.js 파일을 작성한다.

const dotenv = require('dotenv')
const apps = [ ]
dotenv.config({ path: `${__dirname}/.env` })
/** 다중 인스턴스로 실행시킬 node 서버 경로 (npm run build 가 수행된 경로) */
const path = process.env.INSTANCE_PATH || __dirname
/** 인스턴스 갯수 */
const instances = Number(process.env.INSTANCES || 3)
/** 인스턴스명 (노드 프로젝트명) */
const INSTANCE_NAME = process.env.INSTANCE_NAME || 'node'
/** 실행시킬 프로파일 */
const PROFILE = process.env.PROFILE || 'local'
/** 실행시킬 포트번호 */
const PORT = Number(process.env.PORT || 3000)
/** PING 체크용 컨텐츠 URI */
const PING_URI = process.env.PING_URI || '/ping.html'

/** 0번 인스턴스는 로드밸런서 */
apps.push({
  name: '##load-balancer##',
  cwd: __dirname,
  script: 'loadbalancer.js',
  exec_mode: 'cluster',
  env: {
    PROFILE,
    INSTANCE_NAME,
    PING_URI,
    PORT
  }
})

for (let inx = 0; inx < instances; inx++) {
  apps.push({
    name: INSTANCE_NAME,
    cwd: path,
    script: 'npm',
    args: 'start',
    exec_mode: 'cluster',
    env: {
      PROFILE,
      PORT: (PORT + 1 + inx),
    }
  })
}

module.exports = { apps: apps }

loadbalancer.js 파일을 작성한다. (로드밸런서)

const http = require('http')
const { createProxyServer } = require('http-proxy')
const pm2apps = require('./ecosystem.config')

/** PING 체크 제한시간 */
const PING_TIME = 50
/** 비정상인경우 다음 PING 체크 시간 (30초) */
const NEXT_PING_INVALID = 1000 * 30
/** 정상인경우 다음 PING 체크 시간 (2초) */
const NEXT_PING_VALID = 1000 * 2
/** 정상여부 체크 시간간격 (5초) */
const HEARTBEAT_INTERVAL = 1000 * 5
/** http 요청 제한시간 */
const REQUEST_TIMEOUT = 3000
/** PING 체크용 컨텐츠 URI */
const PING_URI = pm2apps.apps[0].env.PING_URI
/** 인스턴스명 (노드 프로젝트명) */
const INSTANCE_NAME = pm2apps.apps[0].env.INSTANCE_NAME

/** 서버 포인터 (Round-Robin 형태) */
let svrinx = 0
/** 정상체크용 핸들러 */
let hndHeartbeat = null 

const proxy = proxyServer()

/** 편집기에서 자동완성 사용을 위해 공스키마 작성후 삭제 */
const servers = [{
    target: '',
    alive: true,
    nextping: 0,
    proxy: proxy
}]
servers.splice(0, 1)

/** pm2 설정에 있는 서버 인스턴스대로 프록시 설정 작성 */
for (const itm of pm2apps.apps) {
  if (itm.name !== INSTANCE_NAME) { continue }
  const target = `http://localhost:${itm.env.PORT}`
  servers.push({
    target: target,
    alive: true,
    nextping: 0,
    proxy: proxy })
  console.log('CREATE-PROXY:', servers.length, itm.name, target)
  /** 서버 활성화를 위해 약 구동1초 정도 지난후 PING 수행 */
  setTimeout(() => ping(target, { timeout: 5000 }), 1000)
}

/** PING체크, HEAD 메소드로 접근하여 부담을 최소화 시킨다 */
function ping(target, opt) {
  return fetch (`${target}${PING_URI}`, {
    method: 'HEAD',
    keepalive: true,
    signal: AbortSignal.timeout(opt?.timeout || PING_TIME)
  })
}

/** 모든 서버의 주기걱 구동 정상 여부 모니터링 (백그라운드에서 에이전트로 실행됨) */
async function checkAlive() {
  if (hndHeartbeat) { clearTimeout(hndHeartbeat) }
  for (let svrinx = 0; svrinx < servers.length; svrinx++) {
    const curtime = new Date().getTime()
    const server = servers[svrinx]
    try {
      const res = await ping(server.target, { timeout: 1000 })
      if (res.status === 200) {
        serverValid(server, curtime)
      } else {
        serverInvalid(server, curtime)
      }
    } catch (e) {
      console.log('E:', svrinx)
      serverInvalid(server, curtime)
    }
  }
  console.log('CHECK-ALIVE..', servers.map((v, i) => v.alive))
  hndHeartbeat = setTimeout(checkAlive, HEARTBEAT_INTERVAL)
}
setTimeout(checkAlive, 2000)

/** 서버가 정상인 경우 셋팅 */
function serverValid(server, curtime) {
  server.alive = true
  server.nextping = curtime + NEXT_PING_VALID
  if (!server.proxy) { server.proxy = proxyServer() }
}

/** 서버가 비정상인 경우 셋팅 */
function serverInvalid(server, curtime) {
  server.alive = false
  server.nextping = curtime + NEXT_PING_INVALID
  /** 프록시를 삭제하지 않으면 기존 프록시로 리퀘스트를 수행하여 HANG 에 걸린다 */
  server.proxy = undefined
}

/** 로드밸런싱 */
const loadbalancer = http.createServer(async (req, res) => {
  /** 서버 갯수만큼 retry 한다 */
  RETRY_LOOP: for (let retry = 0; retry < servers.length + 1; retry++) {
    const curtime = new Date().getTime()
    const server = servers[(svrinx = (svrinx + 1) % servers.length)]
    if (server.nextping == 0) { server.nextping = curtime }
    if (retry > 0) { console.log('SERVER:', svrinx, retry) }
    /** 서버가 죽어있고 ping 체크시간이 도래하지 않은경우 다음서버로 */
    if (!server.alive && server.nextping > curtime) { continue RETRY_LOOP }
    /** ping 체크시간이 지난경우 ping 수행 */
    if (server.nextping <= curtime) {
      try {
        const res = await ping(server.target)
        // console.log('PING:', svrinx, res.status, server.nextping, curtime)
        /** ping 체크 후 정상이 아닌경우 건너뜀 */
        if (res.status !== 200) {
          serverInvalid(server, curtime)
          continue RETRY_LOOP
        }
        serverValid(server, curtime)
      } catch (e) {
        console.log('E:', svrinx, retry)
        serverInvalid(server, curtime)
        continue RETRY_LOOP
      }
    }
    /** 서버 정상판단여부가 끝나면 Proxy 를 통해 본 Request 수행 */
    if (server.alive && server.proxy?.web) {
      req.headers['x-svrinx'] = svrinx
      req.rawHeaders['x-svrinx'] = svrinx
      server.proxy.web(req, res, { target: server.target, proxyTimeout: REQUEST_TIMEOUT, timeout: REQUEST_TIMEOUT })
      break RETRY_LOOP
    }
  }
})
/** 로드밸런싱 서버 설정 */
Object.assign(loadbalancer, {
  timeout: REQUEST_TIMEOUT,
  maxRequestsPerSocket: 1000,
  maxConnections: 1000,
  keepAliveTimeout: 1000 * 10,
  listenerCount: 1000
})

process.on('uncoughtException', (err) => {
  console.error('Uncought exception:', err)
})

function proxyServer() {
  const proxy = createProxyServer({})
  /** 각 서버 인스턴스에서 Request 처리중 오류가 발생한 경우 */
  proxy.on('error', (err, req, res) => {
    const curtime = new Date().getTime()
    const svrinx = req.rawHeaders['x-svrinx']
    serverInvalid(servers[svrinx], curtime)
    console.log('ERROR:', svrinx, err)
    res.writeHead(500, { 'Content-Type': 'application/json' })
    res.end('Internal Server Error')
  })
  proxy.on('proxyRes', (pres, req, res) => {
    const svrinx = req.rawHeaders['x-svrinx']
    const path = req.url
    console.log('SVR-INX:', svrinx, path)
    res.setHeader('x-svrinx', svrinx)
  })
  return proxy
}

const PORT = pm2apps.apps[0].env.PORT
loadbalancer.listen(PORT, () => { console.log(`Load Balancer running on port ${PORT}`) })

아래 예제를 참고하여 .env 파일을 작성한다.

PORT = 3000
PING_URI = /ping.html
PROFILE = local
INSTANCE_NAME = 'nextjs-app'
INSTANCE_PATH = '/project/nextapp'
INSTANCES = 3

이제 실행해 보자

$ npx pm2 start

[PM2] Spawning PM2 daemon with pm2_home=/home/coder/.pm2
[PM2] PM2 Successfully daemonized
[PM2] App [##load-balancer##] launched (1 instances)
[PM2] App [nextjs-app] launched (1 instances)
[PM2] App [nextjs-app] launched (1 instances)
[PM2] App [nextjs-app] launched (1 instances)
┌────┬──────────────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐
│ id │ name                 │ namespace   │ version │ mode    │ pid      │ uptime │ ↺    │ status    │ cpu      │ mem      │ user     │ watching │
├────┼──────────────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤
│ 0  │ ##load-balancer##    │ default     │ N/A     │ cluster │ 2958875  │ 0s     │ 0    │ online    │ 0%       │ 56.9mb   │ coder    │ disabled │
│ 1  │ nextjs-app           │ default     │ N/A     │ cluster │ 2958876  │ 0s     │ 0    │ online    │ 0%       │ 58.5mb   │ coder    │ disabled │
│ 2  │ nextjs-app           │ default     │ N/A     │ cluster │ 2958889  │ 0s     │ 0    │ online    │ 0%       │ 56.3mb   │ coder    │ disabled │
│ 3  │ nextjs-app           │ default     │ N/A     │ cluster │ 2958890  │ 0s     │ 0    │ online    │ 0%       │ 54.7mb   │ coder    │ disabled │
└────┴──────────────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘

이제 브라우저로 접근해 보고 로그를 확인해 보자

$ npx pm2 log

0|##load-balancer##  | CHECK-ALIVE.. [ true, true, true ]
0|##load-balancer##  | SVR-INX: 2 /
0|##load-balancer##  | SVR-INX: 0 /_next/static/css/9900d685c9f61b99.css
0|##load-balancer##  | SVR-INX: 1 /_next/static/chunks/webpack-1c0ab6009f69a96d.js
0|##load-balancer##  | SVR-INX: 2 /_next/static/chunks/framework-ecc4130bc7a58a64.js
0|##load-balancer##  | SVR-INX: 0 /_next/static/chunks/main-422e8890d5985d09.js
0|##load-balancer##  | SVR-INX: 1 /_next/static/chunks/pages/_app-f2eb11ca359c9c09.js
0|##load-balancer##  | SVR-INX: 2 /_next/static/chunks/29107295-2a9ca7f07685bf37.js
0|##load-balancer##  | SVR-INX: 0 /_next/static/chunks/520c8229-ad81e311dd7f6294.js
0|##load-balancer##  | SVR-INX: 1 /_next/static/chunks/fc83e031-5b0d22286ebec8c3.js
0|##load-balancer##  | SVR-INX: 2 /_next/static/LGdLo9cheEXBhYpZbPXvO/_buildManifest.js
0|##load-balancer##  | SVR-INX: 0 /_next/static/chunks/pages/index-459e1c1b837ad300.js
0|##load-balancer##  | SVR-INX: 1 /_next/static/chunks/715-2abeef8269a89727.js
0|##load-balancer##  | SVR-INX: 2 /_next/static/LGdLo9cheEXBhYpZbPXvO/_ssgManifest.js
0|##load-balancer##  | SVR-INX: 0 /favicon.ico

파일마다 서버가 잘 분산되는걸 확인할 수 있다.

단 접속자가 애매한 경우에는 더 느릴 수 있다

4. 오늘의 교훈

시간이 없더라도 문서를 꼼꼼히 잘 보자!

~~Read! The Fucking! Manual !!!~~