Case Study: Translation and Question-Answering System for a 125-page German Novel

Anote
3 min readJul 9, 2023

--

Background / Problem Statement

One of our customers was to looking to translate of a lengthy 125-page German novel owned by a royal duke and a town bearer. The document’s considerable length presented a unique challenge, as we aimed not only to provide an accurate translation but also to enable seamless interaction with the novel in both English and German. Additionally, our objective was to develop a question-answering system capable of extracting key information from the translated document. The task at hand required efficient management of the translation process and the ability to facilitate meaningful conversations with the novel’s content.

Approach

To overcome these challenges, we adopted a comprehensive approach that encompassed the following steps:

Translation Process:

To facilitate the translation, we leveraged the power of the OpenAI API and employed prompt engineering techniques. By utilizing the API’s language translation capabilities, we successfully translated the German novel into English. This initial step provided us with an English version of the document, which served as the foundation for subsequent analysis.

Question-Answering System Development:

To enable users to interact with the translated document and ask questions, we developed a custom-built module. The module incorporated various steps for efficient information retrieval:

  • Text Chunking: To manage the extensive document, we divided the translated text into manageable chunks with defined token lengths. This approach allowed us to process the document in sections, facilitating more effective analysis.
  • Embedding Conversion: Each text chunk underwent a transformation into embeddings, which are numerical representations capturing the semantic information of the text. This conversion enabled consistent comparison between the document chunks and user queries.
  • User Query Processing: When users posed questions, we converted their queries into embedding representations. By aligning the document chunks and user queries within the same vector space, we facilitated similarity analysis.
  • Cosine Similarity Analysis: Using cosine similarity, we measured the similarity between the user query embeddings and the embeddings of each document chunk. This analysis helped identify the document chunk most closely related to the user’s question.
  • Answer Formulation: Based on the most similar embedding identified in the previous step, we extracted the corresponding document chunk and formulated an answer to the user’s question using the information within that chunk.

Solution

Our solution comprised the following key components and techniques:

  • Translation: By leveraging the capabilities of the OpenAI API, we successfully translated the 125-page German novel into English. This step provided us with an accurate English version of the document for further analysis and exploration.
  • Question-Answering Module: We developed a custom module that empowered users to engage with the translated document and ask questions. By implementing text chunking, embedding conversion, user query processing, and cosine similarity analysis, we enabled the system to identify the most relevant document chunk and provide precise answers to user queries.

Conclusion

In conclusion, our translation and question-answering system effectively addressed the challenge of translating a lengthy 125-page German novel into English and extracting key information. The combination of prompt engineering, advanced embedding techniques, and the OpenAI API facilitated a seamless translation process and precise question-answering capabilities.

--

--

Anote

General Purpose Artificial Intelligence. Like our product, our medium articles are written by novel generative AI models, with human feedback on the edge cases.