Bobbie is not just another incremental fine-tune. It represents a thoughtful experiment in .
messages = [ "role": "user", "content": "Summarize this 20k token document..." ] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) output = model.generate(inputs, max_new_tokens=512, temperature=0.7) print(tokenizer.decode(output[0][inputs.shape[1]:])) Bobbie works out-of-the-box with vLLM 0.6.0+: bobbie-model
They explicitly filtered out any data containing eval benchmark examples (MMLU, GSM8K, HumanEval) using 13-gram overlap detection. This means Bobbie's benchmarks are likely not contaminated. 4. Performance Benchmarks We ran Bobbie-7B-Instruct against Llama-3-8B-Instruct and Mistral-7B-v0.3 on an RTX 4090. Bobbie is not just another incremental fine-tune