Skip to main content

Deploying Transformer Models

Reposted from medium
Published onMar 25, 2023
Deploying Transformer Models
·

Using GCP, FastAPI, Docker, and huggingface to deploy sota language models

Initial Set Up

app/
    main.py
    nlp.py
requirements/
    requirements.txt
docker-compose.yml
Dockerfile
Pipfile

NLP


from transformers import (
    pipeline,
    GPT2LMHeadModel,
    GPT2Tokenizer
)

class NLP:
    def __init__(self):
        self.gen_model = GPT2LMHeadModel.from_pretrained('gpt2')
        self.gen_tokenizer = GPT2Tokenizer.from_pretrained('gpt2') 
         
    def generate(self, prompt="The epistemelogical limit"):
        
        inputs = self.gen_tokenizer.encode( prompt, add_special_tokens=False, return_tensors="pt")
        prompt_length = len(self.gen_tokenizer.decode(inputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
        outputs = self.gen_model.generate(inputs, max_length=200, do_sample=True, top_p=0.95, top_k=60)
        generated = prompt + self.gen_tokenizer.decode(outputs[0])[prompt_length:]
        return generated

    def sentiments(self, text: str):
        nlp = pipeline("sentiment-analysis")
        result = nlp(text)[0]
        return f"label: {result['label']}, with score: {round(result['score'], 4)}"
view rawnlp.py hosted with ❤ by GitHub

This is a very simple class that abstracts the code for text gen and sentiment analysis. the prompt is tokenized, the length of the encoded sequence is captured, and output is generated. We then receive the decoded output and return it as the generated text. The text will look something like:

>> from nlp import NLP
>> nlp = NLP()
>> print(nlp.sentiments("A bee sting is not cool"))
   'label: NEGATIVE, with score: 0.9998'

API


from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from app.nlp import NLP

class Message(BaseModel):
    input: str
    output: str = None

app = FastAPI()
nlp = NLP()

origins = [
    "http://localhost",
    "http://localhost:3000",
    "http://127.0.0.1:3000"
]

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["POST"],
    allow_headers=["*"],
)

@app.post("/generative/")
async def  generate(message: Message):
    message.output  = nlp.generate(prompt=message.input)
    return {"output" : message.output}

@app.post("/sentiment/")
async def sentiment_analysis(message: Message):
    message.output  = str(nlp.sentiments(message.input))
    return {"output" : message.output}

The API has four sections. The first part is setting up a pydantic object to handle typing for request messages. This means we get input validation and clear error messages for the wrong input. You should only be handling strings so this is perfect. We then create an instance of the endpoint and the NLP class. Next, set up the request origin URLs that you want your app to communicate with. If you do not have dedicated origin addresses, the code will still run, but anyone can access your API so be careful. After that, we set up the middleware to control the origins, request methods, headers, and cookies. I have taken most of the code from FastAPI but it is extensible to your particular security needs. The last part contains two async post request methods that handle the calls to the API and return the generated text or sentiment analysis.

uvicorn app.main:app --reload

Containerization

FROM python:3.7

COPY ./requirements/requirements.txt ./requirements/requirements.txt
RUN pip3 install -r requirements/requirements.txt

COPY ./app /app
RUN useradd -m myuser
USER myuser

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

The container will copy the requirements to the container, install them, copy over the files from the app directory, create a user (not necessary), and run the API on the container. Yes it is that simple. With regular docker commands we have to stop the container, rebuild the image, and start the container each time. To avoid this, add the following to your docker-compose.yml. Note that your container name can be whatever you want.

version: '3'
services: 
    chatsume:
        build: .
        container_name: "chsme"
        ports: 
            - "8000:8080"
        volumes:
            - ./app/:/app

This allows us to rebuild the image and container and spin it up in two lines

docker-compose build 
docker-compose up -d

Deployment

docker tag nlp_api gcr.io/fast_hug/nlp_api:latest
docker push gcr.io/fast_hug/nlp_api:latest

Conclusion

Comments
0
comment
No comments here
Why not start the discussion?