Your Next Apple Device Could Detect What You’re Doing; Whether You’re Cooking or Exercising
Big tech companies continue to explore the limits of artificial intelligence to better understand the context in which users interact with their devices.
Now, a team of Apple researchers has published a study showing how language models can accurately determine what physical task a person is performing simply by combining motion sensor data with processed sound information.
Instead of feeding the AI direct audio recordings — which could pose privacy risks — the system uses text-based sound descriptions. These labels, generated by smaller models, are combined with accelerometer and gyroscope data. This allows the language model to interpret the overall situation.
Apple’s AI Can Deduce Human Actions Without Prior Training
To test this theory, Apple engineers used the large Ego4D dataset, which includes first-person recordings of real-life situations. The experiment focused on identifying 12 everyday activities, ranging from household tasks like washing dishes, cooking, and vacuuming, to leisure and fitness activities like reading, playing with pets, or lifting weights.
What stands out is how efficient the tested models, such as Gemini and Qwen, turned out to be. They correctly identified the activity using the zero-shot technique — meaning they were not specifically trained to recognize those patterns beforehand. The AI inferred the action based solely on the combined data logic and improved accuracy even more when given just one example for reference.
This progress suggests that future smartwatches or iPhones could provide far more detailed health and activity tracking. By processing these signals through a language model, devices could understand complex contexts where traditional sensors struggle. This approach also saves memory and resources since it doesn’t require large, specialized models for each type of human activity.

















